Merge branch 'develop' into update-moran-docs

This commit is contained in:
Andy Eschbacher 2018-01-09 11:24:36 -05:00
commit d89e07328e
11 changed files with 146 additions and 64 deletions

View File

@ -1,5 +1,14 @@
## Areas of Interest Functions ## Areas of Interest Functions
A family of analyses to uncover groupings of areas with consistently high or low values (clusters) and smaller areas with values unlike those around them (outliers). A cluster is labeled by an 'HH' (high value compared to the entire dataset in an area with other high values), or its opposite 'LL'. An outlier is labeled by an 'LH' (low value surrounded by high values) or an 'HL' (the opposite). Each cluster and outlier classification has an associated p-value, a measure of how significant the pattern of highs and lows is compared to a random distribution.
These functions have two forms: local and global. The local versions classify every input geometry while the global function gives a rating of the overall clustering characteristics of the dataset. Both forms accept an optional denomiator (see the rate versions) if, for example, working with count data and a denominator is needed.
### Notes
* Rows with null values will be omitted from this analysis. To ensure they are added to the analysis, fill the null-valued cells with an appropriate value such as the mean of a column, the mean of the most recent two time steps, or use a `LEFT JOIN` to get null outputs from the analysis.
* Input query can only accept tables (datasets) in the users database account. Common table expressions (CTEs) do not work as an input unless specified within the `subquery` argument.
### CDB_AreasOfInterestLocal(subquery text, column_name text) ### CDB_AreasOfInterestLocal(subquery text, column_name text)
This function classifies your data as being part of a cluster, as an outlier, or not part of a pattern based the significance of a classification. The classification happens through an autocorrelation statistic called Local Moran's I. This function classifies your data as being part of a cluster, as an outlier, or not part of a pattern based the significance of a classification. The classification happens through an autocorrelation statistic called Local Moran's I.
@ -29,6 +38,7 @@ A table with the following columns.
| vals | NUMERIC | Values from `'column_name'`. | | vals | NUMERIC | Values from `'column_name'`. |
#### Example Usage #### Example Usage
```sql ```sql
@ -37,8 +47,10 @@ SELECT
aoi.quads, aoi.quads,
aoi.significance, aoi.significance,
c.num_cyclists_per_total_population c.num_cyclists_per_total_population
FROM CDB_AreasOfInterestLocal('SELECT * FROM commute_data' FROM
'num_cyclists_per_total_population') As aoi cdb_crankshaft.CDB_AreasOfInterestLocal(
'SELECT * FROM commute_data'
'num_cyclists_per_total_population') As aoi
JOIN commute_data As c JOIN commute_data As c
ON c.cartodb_id = aoi.rowid; ON c.cartodb_id = aoi.rowid;
``` ```
@ -71,8 +83,12 @@ A table with the following columns.
#### Examples #### Examples
```sql ```sql
SELECT * SELECT
FROM CDB_AreasOfInterestGlobal('SELECT * FROM commute_data', 'num_cyclists_per_total_population') *
FROM
cdb_crankshaft.CDB_AreasOfInterestGlobal(
'SELECT * FROM commute_data',
'num_cyclists_per_total_population')
``` ```
### CDB_AreasOfInterestLocalRate(subquery text, numerator_column text, denominator_column text) ### CDB_AreasOfInterestLocalRate(subquery text, numerator_column text, denominator_column text)
@ -113,9 +129,11 @@ SELECT
aoi.quads, aoi.quads,
aoi.significance, aoi.significance,
c.cyclists_per_total_population c.cyclists_per_total_population
FROM CDB_AreasOfInterestLocalRate('SELECT * FROM commute_data' FROM
'num_cyclists', cdb_crankshaft.CDB_AreasOfInterestLocalRate(
'total_population') As aoi 'SELECT * FROM commute_data'
'num_cyclists',
'total_population') As aoi
JOIN commute_data As c JOIN commute_data As c
ON c.cartodb_id = aoi.rowid; ON c.cartodb_id = aoi.rowid;
``` ```
@ -149,10 +167,13 @@ A table with the following columns.
#### Examples #### Examples
```sql ```sql
SELECT * SELECT
FROM CDB_AreasOfInterestGlobalRate('SELECT * FROM commute_data', *
'num_cyclists', FROM
'total_population') cdb_crankshaft.CDB_AreasOfInterestGlobalRate(
'SELECT * FROM commute_data',
'num_cyclists',
'total_population')
``` ```
## Hotspot, Coldspot, and Outlier Functions ## Hotspot, Coldspot, and Outlier Functions

View File

@ -8,7 +8,7 @@ This function takes time series data associated with geometries and outputs like
| Name | Type | Description | | Name | Type | Description |
|------|------|-------------| |------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM real_estate_history`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments | | subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM real_estate_history`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments. Tables in queries must exist in user's database (i.e., no CTEs at present) |
| column_names | TEXT Array | Names of column that form the history of measurements for the geometries (e.g., `Array['y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016']`). | | column_names | TEXT Array | Names of column that form the history of measurements for the geometries (e.g., `Array['y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016']`). |
| num_classes (optional) | INT | Number of quantile classes to separate data into. | | num_classes (optional) | INT | Number of quantile classes to separate data into. |
| weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). | | weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). |
@ -30,18 +30,29 @@ A table with the following columns.
| rowid | NUMERIC | id of the row that corresponds to the `id_col` (by default `cartodb_id` of the input rows) | | rowid | NUMERIC | id of the row that corresponds to the `id_col` (by default `cartodb_id` of the input rows) |
#### Notes
* Rows will null values will be omitted from this analysis. To ensure they are added to the analysis, fill the null-valued cells with an appropriate value such as the mean of a column, the mean of the most recent two time steps, etc.
* Input query can only accept tables (datasets) in the users database account. Common table expressions (CTEs) do not work as an input unless specified in the `subquery` parameter.
#### Example Usage #### Example Usage
```sql ```sql
SELECT SELECT
c.cartodb_id, c.cartodb_id,
c.the_geom, c.the_geom,
c.the_geom_webmercator,
m.trend, m.trend,
m.trend_up, m.trend_up,
m.trend_down, m.trend_down,
m.volatility m.volatility
FROM CDB_SpatialMarkovTrend('SELECT * FROM nyc_real_estate' FROM
Array['m03y2009','m03y2010','m03y2011','m03y2012','m03y2013','m03y2014','m03y2015','m03y2016']) As m cdb_crankshaft.CDB_SpatialMarkovTrend(
'SELECT * FROM nyc_real_estate'
Array['m03y2009', 'm03y2010', 'm03y2011',
'm03y2012', 'm03y2013', 'm03y2014',
'm03y2015','m03y2016']) As m
JOIN nyc_real_estate As c JOIN nyc_real_estate As c
ON c.cartodb_id = m.rowid; ON c.cartodb_id = m.rowid;
``` ```

View File

@ -54,9 +54,9 @@ with t as (
SELECT SELECT
array_agg(cartodb_id::bigint) as id, array_agg(cartodb_id::bigint) as id,
array_agg(the_geom) as g, array_agg(the_geom) as g,
array_agg(coalesce(gla,0)::numeric) as w array_agg(coalesce(gla, 0)::numeric) as w
FROM FROM
abel.centros_comerciales_de_madrid centros_comerciales_de_madrid
WHERE not no_cc WHERE not no_cc
), ),
s as ( s as (
@ -67,12 +67,15 @@ SELECT
FROM FROM
sscc_madrid sscc_madrid
) )
select SELECT
g.the_geom, g.the_geom,
trunc(g.h,2) as h, trunc(g.h, 2) as h,
round(g.hpop) as hpop, round(g.hpop) as hpop,
trunc(g.dist/1000,2) as dist_km trunc(g.dist/1000, 2) as dist_km
FROM t, s, CDB_Gravity1(t.id, t.g, t.w, s.id, s.g, s.p, newmall_ID, 100000, 5000) g FROM
t,
s,
cdb_crankshaft.CDB_Gravity(t.id, t.g, t.w, s.id, s.g, s.p, newmall_ID, 100000, 5000) as g
``` ```

View File

@ -44,11 +44,18 @@ Default values:
#### Example Usage #### Example Usage
```sql ```sql
with a as ( WITH a as (
select SELECT
array_agg(the_geom) as geomin, array_agg(the_geom) as geomin,
array_agg(temp::numeric) as colin array_agg(temp::numeric) as colin
from table_4804232032 FROM table_4804232032
) )
SELECT CDB_SpatialInterpolation(geomin, colin, CDB_latlng(41.38, 2.15),1) FROM a; SELECT
cdb_crankshaft.CDB_SpatialInterpolation(
geomin,
colin,
CDB_latlng(41.38, 2.15),
1)
FROM
a
``` ```

View File

@ -27,12 +27,20 @@ PostGIS wil include this in future versions ([doc for dev branch](http://postgis
```sql ```sql
WITH a AS ( WITH a AS (
SELECT SELECT
ARRAY[ST_GeomFromText('POINT(2.1744 41.403)', 4326),ST_GeomFromText('POINT(2.1228 41.380)', 4326),ST_GeomFromText('POINT(2.1511 41.374)', 4326),ST_GeomFromText('POINT(2.1528 41.413)', 4326),ST_GeomFromText('POINT(2.165 41.391)', 4326),ST_GeomFromText('POINT(2.1498 41.371)', 4326),ST_GeomFromText('POINT(2.1533 41.368)', 4326),ST_GeomFromText('POINT(2.131386 41.41399)', 4326)] AS geomin ARRAY[
ST_GeomFromText('POINT(2.1744 41.403)', 4326),
ST_GeomFromText('POINT(2.1228 41.380)', 4326),
ST_GeomFromText('POINT(2.1511 41.374)', 4326),
ST_GeomFromText('POINT(2.1528 41.413)', 4326),
ST_GeomFromText('POINT(2.165 41.391)', 4326),
ST_GeomFromText('POINT(2.1498 41.371)', 4326),
ST_GeomFromText('POINT(2.1533 41.368)', 4326),
ST_GeomFromText('POINT(2.131386 41.41399)', 4326)
] AS geomin
) )
SELECT SELECT
st_transform( ST_TRANSFORM(
(st_dump(CDB_voronoi(geomin, 0.2, 1e-9) (ST_Dump(cdb_crankshaft.CDB_Voronoi(geomin, 0.2, 1e-9))).geom,
)).geom 3857) as the_geom_webmercator
, 3857) as the_geom_webmercator
FROM a; FROM a;
``` ```

View File

@ -1,6 +1,6 @@
## K-Means Functions ## K-Means Functions
### CDB_KMeans(subquery text, no_clusters INTEGER) ### CDB_KMeans(subquery text, no_clusters integer)
This function attempts to find n clusters within the input data. It will return a table to CartoDB ids and This function attempts to find n clusters within the input data. It will return a table to CartoDB ids and
the number of the cluster each point in the input was assigend to. the number of the cluster each point in the input was assigend to.
@ -29,8 +29,10 @@ A table with the following columns.
SELECT SELECT
customers.*, customers.*,
km.cluster_no km.cluster_no
FROM cdb_crankshaft.CDB_Kmeans('SELECT * from customers' , 6) km, customers_3 FROM
WHERE customers.cartodb_id = km.cartodb_id cdb_crankshaft.CDB_Kmeans('SELECT * from customers' , 6) km, customers_3
WHERE
customers.cartodb_id = km.cartodb_id
``` ```
### CDB_WeightedMean(subquery text, weight_column text, category_column text) ### CDB_WeightedMean(subquery text, weight_column text, category_column text)
@ -57,6 +59,12 @@ A table with the following columns.
### Example Usage ### Example Usage
```sql ```sql
SELECT ST_TRANSFORM(the_geom, 3857) as the_geom_webmercator, class SELECT
FROM cdb_weighted_mean('SELECT *, customer_value FROM customers','customer_value','cluster_no') ST_Transform(m.the_geom, 3857) AS the_geom_webmercator,
m.class
FROM
cdb_crankshaft.cdb_WeightedMean(
'SELECT * FROM customers',
'customer_value',
'cluster_no') AS m
``` ```

View File

@ -23,11 +23,17 @@ Function to find the [PIA](https://en.wikipedia.org/wiki/Pole_of_inaccessibility
#### Example Usage #### Example Usage
```sql ```sql
with a as( WITH a as (
select st_geomfromtext('POLYGON((-432540.453078056 4949775.20452642,-432329.947920966 4951361.232584,-431245.028163694 4952223.31516671,-429131.071033529 4951768.00415574,-424622.07505895 4952843.13503987,-423688.327170174 4953499.20752423,-424086.294349759 4954968.38274191,-423068.388925945 4954378.63345336,-423387.653225542 4953355.67417084,-420594.869840519 4953781.00230592,-416026.095299382 4951484.06849063,-412483.018546414 4951024.5410983,-410490.399661215 4954502.24032205,-408186.197521284 4956398.91417441,-407627.262358013 4959300.94633864,-406948.770061627 4959874.85407739,-404949.583326472 4959047.74518163,-402570.908447199 4953743.46829807,-400971.358683991 4952193.11680804,-403533.488084088 4949649.89857885,-406335.177028373 4950193.19571096,-407790.456731515 4952391.46015616,-412060.672398345 4950381.2389307,-410716.93482498 4949156.7509561,-408464.162289794 4943912.8940387,-409350.599394983 4942819.84896006,-408087.791091424 4942451.6711778,-407274.045613725 4940572.4807777,-404446.196589102 4939976.71501489,-402422.964843936 4940450.3670813,-401010.654464241 4939054.8061663,-397647.247369412 4940679.80737878,-395658.413346901 4940528.84765185,-395536.852462953 4938829.79565997,-394268.923462818 4938003.7277717,-393388.720249116 4934757.80596815,-392393.301362444 4934326.71675815,-392573.527618037 4932323.40974412,-393464.640141837 4931903.10653605,-393085.597275686 4931094.7353605,-398426.261165985 4929156.87541607,-398261.174361137 4926238.00816416,-394045.059966834 4925765.18668498,-392982.960705174 4926391.81893628,-393090.272694301 4927176.84692181,-391648.240010564 4924626.06386961,-391889.914625075 4923086.14787613,-394345.177314013 4923235.086036,-395550.878718795 4917812.79243978,-399009.463978251 4912927.7157945,-398948.794855767 4911941.91010796,-398092.636652078 4911806.57392519,-401991.601817112 4911722.9204501,-406225.972607907 4914505.47286319,-411104.994569885 4912569.26941163,-412925.513522316 4913030.3608866,-414630.148884835 4914436.69169949,-414207.691417276 4919205.78028405,-418306.141109809 4917994.9580478,-424184.700779621 4918938.12432889,-426816.961458921 4923664.37379373,-420956.324227126 4923381.98014807,-420186.661267781 4924286.48693378,-420943.411166194 4926812.76394433,-419779.45457046 4928527.43466337,-419768.767899344 4930681.94459216,-421911.668097113 4930432.40620397,-423482.386112205 4933451.28047252,-427272.814773717 4934151.56473242,-427144.908678797 4939731.77191996,-428982.125554848 4940522.84445172,-428986.133056516 4942437.17281266,-431237.792396792 4947309.68284815,-432476.889648814 4947791.74800037,-432540.453078056 4949775.20452642))', 3857) as g SELECT
ST_GeomFromText(
'POLYGON((-432540.453078056 4949775.20452642,-432329.947920966 4951361.232584,-431245.028163694 4952223.31516671,-429131.071033529 4951768.00415574,-424622.07505895 4952843.13503987,-423688.327170174 4953499.20752423,-424086.294349759 4954968.38274191,-423068.388925945 4954378.63345336,-423387.653225542 4953355.67417084,-420594.869840519 4953781.00230592,-416026.095299382 4951484.06849063,-412483.018546414 4951024.5410983,-410490.399661215 4954502.24032205,-408186.197521284 4956398.91417441,-407627.262358013 4959300.94633864,-406948.770061627 4959874.85407739,-404949.583326472 4959047.74518163,-402570.908447199 4953743.46829807,-400971.358683991 4952193.11680804,-403533.488084088 4949649.89857885,-406335.177028373 4950193.19571096,-407790.456731515 4952391.46015616,-412060.672398345 4950381.2389307,-410716.93482498 4949156.7509561,-408464.162289794 4943912.8940387,-409350.599394983 4942819.84896006,-408087.791091424 4942451.6711778,-407274.045613725 4940572.4807777,-404446.196589102 4939976.71501489,-402422.964843936 4940450.3670813,-401010.654464241 4939054.8061663,-397647.247369412 4940679.80737878,-395658.413346901 4940528.84765185,-395536.852462953 4938829.79565997,-394268.923462818 4938003.7277717,-393388.720249116 4934757.80596815,-392393.301362444 4934326.71675815,-392573.527618037 4932323.40974412,-393464.640141837 4931903.10653605,-393085.597275686 4931094.7353605,-398426.261165985 4929156.87541607,-398261.174361137 4926238.00816416,-394045.059966834 4925765.18668498,-392982.960705174 4926391.81893628,-393090.272694301 4927176.84692181,-391648.240010564 4924626.06386961,-391889.914625075 4923086.14787613,-394345.177314013 4923235.086036,-395550.878718795 4917812.79243978,-399009.463978251 4912927.7157945,-398948.794855767 4911941.91010796,-398092.636652078 4911806.57392519,-401991.601817112 4911722.9204501,-406225.972607907 4914505.47286319,-411104.994569885 4912569.26941163,-412925.513522316 4913030.3608866,-414630.148884835 4914436.69169949,-414207.691417276 4919205.78028405,-418306.141109809 4917994.9580478,-424184.700779621 4918938.12432889,-426816.961458921 4923664.37379373,-420956.324227126 4923381.98014807,-420186.661267781 4924286.48693378,-420943.411166194 4926812.76394433,-419779.45457046 4928527.43466337,-419768.767899344 4930681.94459216,-421911.668097113 4930432.40620397,-423482.386112205 4933451.28047252,-427272.814773717 4934151.56473242,-427144.908678797 4939731.77191996,-428982.125554848 4940522.84445172,-428986.133056516 4942437.17281266,-431237.792396792 4947309.68284815,-432476.889648814 4947791.74800037,-432540.453078056 4949775.20452642))',
3857) as g
), ),
b as ( b as (
select ST_Transform(g, 4326) as g from a SELECT ST_Transform(g, 4326) as g
FROM a
) )
SELECT st_astext(CDB_PIA(g)) from b; SELECT
ST_AsText(cdb_crankshaft.CDB_PIA(g))
FROM b
``` ```

View File

@ -24,12 +24,22 @@ Returns a table object
#### Example Usage #### Example Usage
```sql ```sql
with data as ( WITH data as (
select SELECT
ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin, ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),ST_GeomFromText('POINT(2.1228 41.3809)'),ST_GeomFromText('POINT(2.1511 41.3742)'),ST_GeomFromText('POINT(2.1528 41.4136)'),ST_GeomFromText('POINT(2.165 41.3917)'),ST_GeomFromText('POINT(2.1498 41.3713)'),ST_GeomFromText('POINT(2.1533 41.3683)'),ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin ARRAY[
ST_GeomFromText('POINT(2.1744 41.4036)'),
ST_GeomFromText('POINT(2.1228 41.3809)'),
ST_GeomFromText('POINT(2.1511 41.3742)'),
ST_GeomFromText('POINT(2.1528 41.4136)'),
ST_GeomFromText('POINT(2.165 41.3917)'),
ST_GeomFromText('POINT(2.1498 41.3713)'),
ST_GeomFromText('POINT(2.1533 41.3683)'),
ST_GeomFromText('POINT(2.131386 41.413998)')
] as geomin
) )
select CDB_Densify(geomin, colin, 2) from data; SELECT cdb_crankshaft.CDB_Densify(geomin, colin, 2)
FROM data
``` ```

View File

@ -26,11 +26,19 @@ Returns a table object
#### Example Usage #### Example Usage
```sql ```sql
with data as ( WITH data as (
select SELECT
ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin, ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),ST_GeomFromText('POINT(2.1228 41.3809)'),ST_GeomFromText('POINT(2.1511 41.3742)'),ST_GeomFromText('POINT(2.1528 41.4136)'),ST_GeomFromText('POINT(2.165 41.3917)'),ST_GeomFromText('POINT(2.1498 41.3713)'),ST_GeomFromText('POINT(2.1533 41.3683)'),ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),
ST_GeomFromText('POINT(2.1228 41.3809)'),
ST_GeomFromText('POINT(2.1511 41.3742)'),
ST_GeomFromText('POINT(2.1528 41.4136)'),
ST_GeomFromText('POINT(2.165 41.3917)'),
ST_GeomFromText('POINT(2.1498 41.3713)'),
ST_GeomFromText('POINT(2.1533 41.3683)'),
ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin
) )
select CDB_TINmap(geomin, colin, 2) from data; SELECT cdb_crankshaft.CDB_TINmap(geomin, colin, 2)
FROM data
``` ```

View File

@ -43,7 +43,7 @@ With a table `website_visits` and a column of the number of website visits in un
```sql ```sql
SELECT SELECT
id, id,
CDB_StaticOutlier(visits_10k, 11.0) As outlier, cdb_crankshaft.CDB_StaticOutlier(visits_10k, 11.0) As outlier,
visits_10k visits_10k
FROM website_visits FROM website_visits
``` ```
@ -93,7 +93,7 @@ WITH cte As (
unnest(Array[1,3,5,1,32,3,57,2]) As visits_10k unnest(Array[1,3,5,1,32,3,57,2]) As visits_10k
) )
SELECT SELECT
(CDB_PercentOutlier(array_agg(visits_10k), 2.0, array_agg(id))).* (cdb_crankshaft.CDB_PercentOutlier(array_agg(visits_10k), 2.0, array_agg(id))).*
FROM cte; FROM cte;
``` ```
@ -144,7 +144,7 @@ WITH cte As (
unnest(Array[1,3,5,1,32,3,57,2]) As visits_10k unnest(Array[1,3,5,1,32,3,57,2]) As visits_10k
) )
SELECT SELECT
(CDB_StdDevOutlier(array_agg(visits_10k), 2.0, array_agg(id))).* (cdb_crankshaft.CDB_StdDevOutlier(array_agg(visits_10k), 2.0, array_agg(id))).*
FROM cte; FROM cte;
``` ```