Add documentation
This commit is contained in:
parent
83b1961cd8
commit
cf14fd110f
@ -32,6 +32,9 @@ follow the[Semantic Versioning 2.0](http://semver.org/) guidelines:
|
||||
- Add new files or modify copies of the old files to add new functions or
|
||||
modify existing functions (remember to rename a function if the signature
|
||||
changes)
|
||||
- Add or modify the corresponding documentation files in the `doc` folder.
|
||||
Since we expect to have highly technical functions here, an extense
|
||||
background explanation would be of great help to users of this extension.
|
||||
- Create tests for the new functions/behaviour
|
||||
|
||||
* Generate the **upgrade and downgrade files** for the extension
|
||||
|
71
pg/doc/02_moran.md
Normal file
71
pg/doc/02_moran.md
Normal file
@ -0,0 +1,71 @@
|
||||
### Moran's I
|
||||
|
||||
#### What is Moran's I and why is it significant for CartoDB?
|
||||
|
||||
Moran's I is a geostatistical calculation which gives a measure of the global
|
||||
clustering and presence of outliers within the geographies in a map. Here global
|
||||
means over all of the geographies in a dataset. Imagine mapping the incidence
|
||||
rates of cancer in neighborhoods of a city. If there were areas covering several
|
||||
neighborhoods with abnormally low rates of cancer, those areas are positively
|
||||
spatially correlated with one another and would be considered a cluster. If
|
||||
there was a single neighborhood with a high rate but with all neighbors on
|
||||
average having a low rate, it would be considered a spatial outlier.
|
||||
|
||||
While Moran's I gives a global snapshot, there are local indicators for
|
||||
clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
|
||||
process related to autocorrelation -- i.e., a process that compares a
|
||||
geography's attribute to the attribute in neighbor geographies.
|
||||
|
||||
For the example of cancer rates in neighborhoods, since these neighborhoods have
|
||||
a high value for rate of cancer, and all of their neighbors do as well, they are
|
||||
designated as "High High" or simply **HH**. For areas with multiple neighborhoods
|
||||
with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
|
||||
naturally fit into the concept of clustering and are in the correlated
|
||||
variables.
|
||||
|
||||
"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
|
||||
where a geog has a high value and it's neighbors, on average, have a low value
|
||||
(or vice versa). An example of this is a "gated community" or placement of a
|
||||
city housing project in a rich region. These deliberate developments have
|
||||
opposite median income as compared to the neighbors around them. They have a
|
||||
high (or low) value while their neighbors have a low (or high) value. They exist
|
||||
typically as islands, and in rare circumstances can extend as chains dividing
|
||||
**LL** or **HH**.
|
||||
|
||||
Strong policies such as rent stabilization (probably) tend to prevent the
|
||||
clustering of high rent areas as they integrate middle class incomes. Luxury
|
||||
apartment buildings, which are a kind of gated community, probably tend to skew
|
||||
an area's median income upwards while housing projects have the opposite effect.
|
||||
What are the nuggets in the analysis?
|
||||
|
||||
Two functions are available to compute Moran I statistics:
|
||||
|
||||
* `cdb_moran_local` computes Moran I measures, quad classification and
|
||||
significance values from numerial values associated to geometry entities
|
||||
in an input table. The geometries should be contiguous polygons When
|
||||
then `queen` `w_type` is used.
|
||||
* `cdb_moran_local_rate` computes the same statistics using a ratio between
|
||||
numerator and denominator columns of a table.
|
||||
|
||||
The parameters for `cdb_moran_local` are:
|
||||
|
||||
* `table` name of the table that contains the data values
|
||||
* `attr` name of the column
|
||||
* `signficance` significance threshold for the quads values
|
||||
* `num_ngbrs` number of neighbors to consider (default: 5)
|
||||
* `permutations` number of random permutations for calculation of
|
||||
pseudo-p values (default: 99)
|
||||
* `geom_column` number of the geometry column (default: "the_geom")
|
||||
* `id_col` PK column of the table (default: "cartodb_id")
|
||||
* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
|
||||
or "queen" for contiguity based weights.
|
||||
|
||||
The function returns a table with the following columns:
|
||||
|
||||
* `moran` Moran's value
|
||||
* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
|
||||
* `significance` significance value
|
||||
* `ids` id of the corresponding record in the input table
|
||||
|
||||
Function `cdb_moran_local_rate` only differs in that the `attr` input
|
||||
parameter is substituted by `numerator` and `denominator`.
|
29
pg/doc/03_overlap_sum.md
Normal file
29
pg/doc/03_overlap_sum.md
Normal file
@ -0,0 +1,29 @@
|
||||
### Aereal Weighting
|
||||
|
||||
Aereal weighting is a simple interpolation technique to assign a value
|
||||
to a polygon given a set of polygons with one value assigned to each one.
|
||||
|
||||
The value is assigned by averaging the values of intersecting areas
|
||||
weighted by the intersection area.
|
||||
|
||||
Its accuracy depends on the values assigned to reference areas being
|
||||
homogeneous over each area.
|
||||
|
||||
The `cdb_overlap_function` takes three required parameters:
|
||||
|
||||
* `geometry` a Polygon geometry which defines the area where a value will be
|
||||
estimated.
|
||||
* `table_name`: name of the values table that provides the source values;
|
||||
this table must have a geometric column `the_geom` containing the polygons
|
||||
to which values are assigned.
|
||||
* `column_name`: name of the column that contains the values in the values
|
||||
table (should be a numeric column)
|
||||
|
||||
There's also an additional optional parameter to define the schema to which
|
||||
the values table belongs. This is necessary only if it is not in the
|
||||
`search_path`. Note that `table_name` should never include the schema in it.
|
||||
|
||||
* `schema_name` name of the schema that contains the values table
|
||||
|
||||
This function returns a numeric value resulting from the aggregation
|
||||
of the polygons in
|
Loading…
Reference in New Issue
Block a user