diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9cfb951..f642d45 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -45,8 +45,8 @@ source envs/dev/bin/activate Update extension in a working database with: -* `ALTER EXTENSION crankshaft VERSION TO 'current';` - `ALTER EXTENSION crankshaft VERSION TO 'dev';` +* `ALTER EXTENSION crankshaft UPDATE TO 'current';` + `ALTER EXTENSION crankshaft UPDATE TO 'dev';` Note: we keep the current development version install as 'dev' always; we update through the 'current' alias to allow changing the extension @@ -58,7 +58,10 @@ should be dropped manually before the update. If the extension has not previously been installed in a database, it can be installed directly with: -* `CREATE EXTENSION crankshaft WITH VERSION 'dev';` +* `CREATE EXTENSION IF NOT EXISTS plpythonu;` + `CREATE EXTENSION IF NOT EXISTS postgis;` + `CREATE EXTENSION IF NOT EXISTS cartodb;` + `CREATE EXTENSION crankshaft WITH VERSION 'dev';` Note: the development extension uses the development python virtual environment automatically. diff --git a/doc/02_moran.md b/doc/02_moran.md index 85384cb..c91eb3f 100644 --- a/doc/02_moran.md +++ b/doc/02_moran.md @@ -1,4 +1,102 @@ -### Moran's I +## Name + +CDB_AreasOfInterest -- returns a table with a cluster/outlier classification, the significance of a classification, an autocorrelation statistic (Local Moran's I), and the geometry id for each geometry in the original dataset. + +## Synopsis + +```sql +table(numeric moran_val, text quadrant, numeric significance, int ids, numeric column_values) CDB_AreasOfInterest(text query, text column_name) + +table(numeric moran_val, text quadrant, numeric significance, int ids, numeric column_values) CDB_AreasOfInterest(text query, text column_name, int permutations, text geom_column, text id_column, text weight_type, int num_ngbrs) +``` + +## Description + +CDB_AreasOfInterest is a table-returning function that classifies the geometries in a table by an attribute and gives a significance for that classification. This information can be used to find "Areas of Interest" by using the correlation of a geometry's attribute with that of its neighbors. Areas can be clusters, outliers, or neither (depending on which significance value is used). + +Inputs: + +* `query` (required): an arbitrary query against tables you have access to (e.g., in your account, shared in your organization, or through the Data Observatory). This string must contain the following columns: an id `INT` (e.g., `cartodb_id`), geometry (e.g., `the_geom`), and the numeric attribute which is specified in `column_name` +* `column_name` (required): column to perform the area of interest analysis tool on. The data must be numeric (e.g., `float`, `int`, etc.) +* `permutations` (optional): used to calculate the significance of a classification. Defaults to 99, which is sufficient in most situations. +* `geom_column` (optional): the name of the geometry column. Data must be of type `geometry`. +* `id_column` (optional): the name of the id column (e.g., `cartodb_id`). Data must be of type `int` or `bigint` and have a unique condition on the data. +* `weight_type` (optional): the type of weight used for determining what defines a neighborhood. Options are `knn` or `queen`. +* `num_ngbrs` (optional): the number of neighbors in a neighborhood around a geometry. Only used if `knn` is chosen above. + +Outputs: + +* `moran_val`: underlying correlation statistic used in analysis +* `quadrant`: human-readable interpretation of classification +* `significance`: significance of classification (closer to 0 is more significant) +* `ids`: id of original geometry (used for joining against original table if desired -- see examples) +* `column_values`: original column values from `column_name` + +Availability: crankshaft v0.0.1 and above + +## Examples + +```sql +SELECT + t.the_geom_webmercator, + t.cartodb_id, + aoi.significance, + aoi.quadrant As aoi_quadrant +FROM + observatory.acs2013 As t +JOIN + crankshaft.CDB_AreasOfInterest('SELECT * FROM observatory.acs2013', + 'gini_index') +``` + +## API Usage + +Example + +```text +http://eschbacher.cartodb.com/api/v2/sql?q=SELECT * FROM crankshaft.CDB_AreasOfInterest('SELECT * FROM observatory.acs2013','gini_index') +``` + +Result +```json +{ + time: 0.120, + total_rows: 100, + rows: [{ + moran_vals: 0.7213, + quadrant: 'High area', + significance: 0.03, + ids: 1, + column_value: 0.22 + }, + { + moran_vals: -0.7213, + quadrant: 'Low outlier', + significance: 0.13, + ids: 2, + column_value: 0.03 + }, + ... + ] +} +``` + +## See Also + +crankshaft's areas of interest functions: + +* [CDB_AreasOfInterest_Global]() +* [CDB_AreasOfInterest_Rate_Local]() +* [CDB_AreasOfInterest_Rate_Global]() + + +PostGIS clustering functions: + +* [ST_ClusterIntersecting](http://postgis.net/docs/manual-2.2/ST_ClusterIntersecting.html) +* [ST_ClusterWithin](http://postgis.net/docs/manual-2.2/ST_ClusterWithin.html) + + +-- removing below, working into above #### What is Moran's I and why is it significant for CartoDB? diff --git a/doc/docs_template.md b/doc/docs_template.md new file mode 100644 index 0000000..9d5b550 --- /dev/null +++ b/doc/docs_template.md @@ -0,0 +1,24 @@ + +## Name + +## Synopsis + +## Description + +Availability: v... + +## Examples + +```SQL +-- example of the function in use +SELECT cdb_awesome_function(the_geom, 'total_pop') +FROM table_name +``` + +## API Usage + +_asdf_ + +## See Also + +_Other function pages_