Add documentation

This commit is contained in:
Javier Goizueta 2016-02-18 18:49:48 +01:00
parent 83b1961cd8
commit cf14fd110f
3 changed files with 103 additions and 0 deletions

View File

@ -32,6 +32,9 @@ follow the[Semantic Versioning 2.0](http://semver.org/) guidelines:
- Add new files or modify copies of the old files to add new functions or
modify existing functions (remember to rename a function if the signature
changes)
- Add or modify the corresponding documentation files in the `doc` folder.
Since we expect to have highly technical functions here, an extense
background explanation would be of great help to users of this extension.
- Create tests for the new functions/behaviour
* Generate the **upgrade and downgrade files** for the extension

71
pg/doc/02_moran.md Normal file
View File

@ -0,0 +1,71 @@
### Moran's I
#### What is Moran's I and why is it significant for CartoDB?
Moran's I is a geostatistical calculation which gives a measure of the global
clustering and presence of outliers within the geographies in a map. Here global
means over all of the geographies in a dataset. Imagine mapping the incidence
rates of cancer in neighborhoods of a city. If there were areas covering several
neighborhoods with abnormally low rates of cancer, those areas are positively
spatially correlated with one another and would be considered a cluster. If
there was a single neighborhood with a high rate but with all neighbors on
average having a low rate, it would be considered a spatial outlier.
While Moran's I gives a global snapshot, there are local indicators for
clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
process related to autocorrelation -- i.e., a process that compares a
geography's attribute to the attribute in neighbor geographies.
For the example of cancer rates in neighborhoods, since these neighborhoods have
a high value for rate of cancer, and all of their neighbors do as well, they are
designated as "High High" or simply **HH**. For areas with multiple neighborhoods
with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
naturally fit into the concept of clustering and are in the correlated
variables.
"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
where a geog has a high value and it's neighbors, on average, have a low value
(or vice versa). An example of this is a "gated community" or placement of a
city housing project in a rich region. These deliberate developments have
opposite median income as compared to the neighbors around them. They have a
high (or low) value while their neighbors have a low (or high) value. They exist
typically as islands, and in rare circumstances can extend as chains dividing
**LL** or **HH**.
Strong policies such as rent stabilization (probably) tend to prevent the
clustering of high rent areas as they integrate middle class incomes. Luxury
apartment buildings, which are a kind of gated community, probably tend to skew
an area's median income upwards while housing projects have the opposite effect.
What are the nuggets in the analysis?
Two functions are available to compute Moran I statistics:
* `cdb_moran_local` computes Moran I measures, quad classification and
significance values from numerial values associated to geometry entities
in an input table. The geometries should be contiguous polygons When
then `queen` `w_type` is used.
* `cdb_moran_local_rate` computes the same statistics using a ratio between
numerator and denominator columns of a table.
The parameters for `cdb_moran_local` are:
* `table` name of the table that contains the data values
* `attr` name of the column
* `signficance` significance threshold for the quads values
* `num_ngbrs` number of neighbors to consider (default: 5)
* `permutations` number of random permutations for calculation of
pseudo-p values (default: 99)
* `geom_column` number of the geometry column (default: "the_geom")
* `id_col` PK column of the table (default: "cartodb_id")
* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
or "queen" for contiguity based weights.
The function returns a table with the following columns:
* `moran` Moran's value
* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
* `significance` significance value
* `ids` id of the corresponding record in the input table
Function `cdb_moran_local_rate` only differs in that the `attr` input
parameter is substituted by `numerator` and `denominator`.

29
pg/doc/03_overlap_sum.md Normal file
View File

@ -0,0 +1,29 @@
### Aereal Weighting
Aereal weighting is a simple interpolation technique to assign a value
to a polygon given a set of polygons with one value assigned to each one.
The value is assigned by averaging the values of intersecting areas
weighted by the intersection area.
Its accuracy depends on the values assigned to reference areas being
homogeneous over each area.
The `cdb_overlap_function` takes three required parameters:
* `geometry` a Polygon geometry which defines the area where a value will be
estimated.
* `table_name`: name of the values table that provides the source values;
this table must have a geometric column `the_geom` containing the polygons
to which values are assigned.
* `column_name`: name of the column that contains the values in the values
table (should be a numeric column)
There's also an additional optional parameter to define the schema to which
the values table belongs. This is necessary only if it is not in the
`search_path`. Note that `table_name` should never include the schema in it.
* `schema_name` name of the schema that contains the values table
This function returns a numeric value resulting from the aggregation
of the polygons in