Merge branch 'develop' into update-moran-docs

This commit is contained in:
Andy Eschbacher 2018-01-09 11:24:36 -05:00
commit d89e07328e
11 changed files with 146 additions and 64 deletions

View File

@ -1,5 +1,14 @@
## Areas of Interest Functions
A family of analyses to uncover groupings of areas with consistently high or low values (clusters) and smaller areas with values unlike those around them (outliers). A cluster is labeled by an 'HH' (high value compared to the entire dataset in an area with other high values), or its opposite 'LL'. An outlier is labeled by an 'LH' (low value surrounded by high values) or an 'HL' (the opposite). Each cluster and outlier classification has an associated p-value, a measure of how significant the pattern of highs and lows is compared to a random distribution.
These functions have two forms: local and global. The local versions classify every input geometry while the global function gives a rating of the overall clustering characteristics of the dataset. Both forms accept an optional denomiator (see the rate versions) if, for example, working with count data and a denominator is needed.
### Notes
* Rows with null values will be omitted from this analysis. To ensure they are added to the analysis, fill the null-valued cells with an appropriate value such as the mean of a column, the mean of the most recent two time steps, or use a `LEFT JOIN` to get null outputs from the analysis.
* Input query can only accept tables (datasets) in the users database account. Common table expressions (CTEs) do not work as an input unless specified within the `subquery` argument.
### CDB_AreasOfInterestLocal(subquery text, column_name text)
This function classifies your data as being part of a cluster, as an outlier, or not part of a pattern based the significance of a classification. The classification happens through an autocorrelation statistic called Local Moran's I.
@ -29,6 +38,7 @@ A table with the following columns.
| vals | NUMERIC | Values from `'column_name'`. |
#### Example Usage
```sql
@ -37,8 +47,10 @@ SELECT
aoi.quads,
aoi.significance,
c.num_cyclists_per_total_population
FROM CDB_AreasOfInterestLocal('SELECT * FROM commute_data'
'num_cyclists_per_total_population') As aoi
FROM
cdb_crankshaft.CDB_AreasOfInterestLocal(
'SELECT * FROM commute_data'
'num_cyclists_per_total_population') As aoi
JOIN commute_data As c
ON c.cartodb_id = aoi.rowid;
```
@ -71,8 +83,12 @@ A table with the following columns.
#### Examples
```sql
SELECT *
FROM CDB_AreasOfInterestGlobal('SELECT * FROM commute_data', 'num_cyclists_per_total_population')
SELECT
*
FROM
cdb_crankshaft.CDB_AreasOfInterestGlobal(
'SELECT * FROM commute_data',
'num_cyclists_per_total_population')
```
### CDB_AreasOfInterestLocalRate(subquery text, numerator_column text, denominator_column text)
@ -113,9 +129,11 @@ SELECT
aoi.quads,
aoi.significance,
c.cyclists_per_total_population
FROM CDB_AreasOfInterestLocalRate('SELECT * FROM commute_data'
'num_cyclists',
'total_population') As aoi
FROM
cdb_crankshaft.CDB_AreasOfInterestLocalRate(
'SELECT * FROM commute_data'
'num_cyclists',
'total_population') As aoi
JOIN commute_data As c
ON c.cartodb_id = aoi.rowid;
```
@ -149,10 +167,13 @@ A table with the following columns.
#### Examples
```sql
SELECT *
FROM CDB_AreasOfInterestGlobalRate('SELECT * FROM commute_data',
'num_cyclists',
'total_population')
SELECT
*
FROM
cdb_crankshaft.CDB_AreasOfInterestGlobalRate(
'SELECT * FROM commute_data',
'num_cyclists',
'total_population')
```
## Hotspot, Coldspot, and Outlier Functions

View File

@ -8,7 +8,7 @@ This function takes time series data associated with geometries and outputs like
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM real_estate_history`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments |
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM real_estate_history`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments. Tables in queries must exist in user's database (i.e., no CTEs at present) |
| column_names | TEXT Array | Names of column that form the history of measurements for the geometries (e.g., `Array['y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016']`). |
| num_classes (optional) | INT | Number of quantile classes to separate data into. |
| weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). |
@ -30,18 +30,29 @@ A table with the following columns.
| rowid | NUMERIC | id of the row that corresponds to the `id_col` (by default `cartodb_id` of the input rows) |
#### Notes
* Rows will null values will be omitted from this analysis. To ensure they are added to the analysis, fill the null-valued cells with an appropriate value such as the mean of a column, the mean of the most recent two time steps, etc.
* Input query can only accept tables (datasets) in the users database account. Common table expressions (CTEs) do not work as an input unless specified in the `subquery` parameter.
#### Example Usage
```sql
SELECT
c.cartodb_id,
c.the_geom,
c.the_geom_webmercator,
m.trend,
m.trend_up,
m.trend_down,
m.volatility
FROM CDB_SpatialMarkovTrend('SELECT * FROM nyc_real_estate'
Array['m03y2009','m03y2010','m03y2011','m03y2012','m03y2013','m03y2014','m03y2015','m03y2016']) As m
FROM
cdb_crankshaft.CDB_SpatialMarkovTrend(
'SELECT * FROM nyc_real_estate'
Array['m03y2009', 'm03y2010', 'm03y2011',
'm03y2012', 'm03y2013', 'm03y2014',
'm03y2015','m03y2016']) As m
JOIN nyc_real_estate As c
ON c.cartodb_id = m.rowid;
```

View File

@ -54,9 +54,9 @@ with t as (
SELECT
array_agg(cartodb_id::bigint) as id,
array_agg(the_geom) as g,
array_agg(coalesce(gla,0)::numeric) as w
array_agg(coalesce(gla, 0)::numeric) as w
FROM
abel.centros_comerciales_de_madrid
centros_comerciales_de_madrid
WHERE not no_cc
),
s as (
@ -67,12 +67,15 @@ SELECT
FROM
sscc_madrid
)
select
SELECT
g.the_geom,
trunc(g.h,2) as h,
trunc(g.h, 2) as h,
round(g.hpop) as hpop,
trunc(g.dist/1000,2) as dist_km
FROM t, s, CDB_Gravity1(t.id, t.g, t.w, s.id, s.g, s.p, newmall_ID, 100000, 5000) g
trunc(g.dist/1000, 2) as dist_km
FROM
t,
s,
cdb_crankshaft.CDB_Gravity(t.id, t.g, t.w, s.id, s.g, s.p, newmall_ID, 100000, 5000) as g
```

View File

@ -44,11 +44,18 @@ Default values:
#### Example Usage
```sql
with a as (
select
WITH a as (
SELECT
array_agg(the_geom) as geomin,
array_agg(temp::numeric) as colin
from table_4804232032
FROM table_4804232032
)
SELECT CDB_SpatialInterpolation(geomin, colin, CDB_latlng(41.38, 2.15),1) FROM a;
SELECT
cdb_crankshaft.CDB_SpatialInterpolation(
geomin,
colin,
CDB_latlng(41.38, 2.15),
1)
FROM
a
```

View File

@ -27,12 +27,20 @@ PostGIS wil include this in future versions ([doc for dev branch](http://postgis
```sql
WITH a AS (
SELECT
ARRAY[ST_GeomFromText('POINT(2.1744 41.403)', 4326),ST_GeomFromText('POINT(2.1228 41.380)', 4326),ST_GeomFromText('POINT(2.1511 41.374)', 4326),ST_GeomFromText('POINT(2.1528 41.413)', 4326),ST_GeomFromText('POINT(2.165 41.391)', 4326),ST_GeomFromText('POINT(2.1498 41.371)', 4326),ST_GeomFromText('POINT(2.1533 41.368)', 4326),ST_GeomFromText('POINT(2.131386 41.41399)', 4326)] AS geomin
ARRAY[
ST_GeomFromText('POINT(2.1744 41.403)', 4326),
ST_GeomFromText('POINT(2.1228 41.380)', 4326),
ST_GeomFromText('POINT(2.1511 41.374)', 4326),
ST_GeomFromText('POINT(2.1528 41.413)', 4326),
ST_GeomFromText('POINT(2.165 41.391)', 4326),
ST_GeomFromText('POINT(2.1498 41.371)', 4326),
ST_GeomFromText('POINT(2.1533 41.368)', 4326),
ST_GeomFromText('POINT(2.131386 41.41399)', 4326)
] AS geomin
)
SELECT
st_transform(
(st_dump(CDB_voronoi(geomin, 0.2, 1e-9)
)).geom
, 3857) as the_geom_webmercator
ST_TRANSFORM(
(ST_Dump(cdb_crankshaft.CDB_Voronoi(geomin, 0.2, 1e-9))).geom,
3857) as the_geom_webmercator
FROM a;
```

View File

@ -1,8 +1,8 @@
## K-Means Functions
### CDB_KMeans(subquery text, no_clusters INTEGER)
### CDB_KMeans(subquery text, no_clusters integer)
This function attempts to find n clusters within the input data. It will return a table to CartoDB ids and
This function attempts to find n clusters within the input data. It will return a table to CartoDB ids and
the number of the cluster each point in the input was assigend to.
@ -26,18 +26,20 @@ A table with the following columns.
#### Example Usage
```sql
SELECT
customers.*,
km.cluster_no
FROM cdb_crankshaft.CDB_Kmeans('SELECT * from customers' , 6) km, customers_3
WHERE customers.cartodb_id = km.cartodb_id
SELECT
customers.*,
km.cluster_no
FROM
cdb_crankshaft.CDB_Kmeans('SELECT * from customers' , 6) km, customers_3
WHERE
customers.cartodb_id = km.cartodb_id
```
### CDB_WeightedMean(subquery text, weight_column text, category_column text)
Function that computes the weighted centroid of a number of clusters by some weight column.
### Arguments
### Arguments
| Name | Type | Description |
|------|------|-------------|
@ -45,18 +47,24 @@ Function that computes the weighted centroid of a number of clusters by some wei
| weight\_column | TEXT | The name of the column to use as a weight |
| category\_column | TEXT | The name of the column to use as a category |
### Returns
### Returns
A table with the following columns.
| Column Name | Type | Description |
|-------------|------|-------------|
| the\_geom | GEOMETRY | A point for the weighted cluster center |
| class | INTEGER | The cluster class |
| class | INTEGER | The cluster class |
### Example Usage
### Example Usage
```sql
SELECT ST_TRANSFORM(the_geom, 3857) as the_geom_webmercator, class
FROM cdb_weighted_mean('SELECT *, customer_value FROM customers','customer_value','cluster_no')
```sql
SELECT
ST_Transform(m.the_geom, 3857) AS the_geom_webmercator,
m.class
FROM
cdb_crankshaft.cdb_WeightedMean(
'SELECT * FROM customers',
'customer_value',
'cluster_no') AS m
```

View File

@ -3,7 +3,7 @@
### CDB_CreateAndPredictSegment(query TEXT, variable_name TEXT, target_query TEXT)
This function trains a [Gradient Boosting](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) model to attempt to predict the target data and then generates predictions for new data.
This function trains a [Gradient Boosting](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) model to attempt to predict the target data and then generates predictions for new data.
#### Arguments
@ -34,12 +34,12 @@ A table with the following columns.
SELECT * from cdb_crankshaft.CDB_CreateAndPredictSegment(
'SELECT agg, median_rent::numeric, male_pop::numeric, female_pop::numeric FROM late_night_agg',
'agg',
'SELECT row_number() OVER () As cartodb_id, median_rent, male_pop, female_pop FROM ml_learning_ny');
'SELECT row_number() OVER () As cartodb_id, median_rent, male_pop, female_pop FROM ml_learning_ny');
```
### CDB_CreateAndPredictSegment(target numeric[], train_features numeric[], prediction_features numeric[], prediction_ids numeric[])
This function trains a [Gradient Boosting](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) model to attempt to predict the target data and then generates predictions for new data.
This function trains a [Gradient Boosting](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) model to attempt to predict the target data and then generates predictions for new data.
#### Arguments
@ -76,7 +76,7 @@ WITH training As (
FROM late_night_agg),
target AS (
SELECT cdb_crankshaft.CDB_PyAgg(Array[median_rent, male_pop, female_pop]::Numeric[]) As features,
array_agg(cartodb_id) As cartodb_ids FROM late_night_agg)
array_agg(cartodb_id) As cartodb_ids FROM late_night_agg)
SELECT cdb_crankshaft.CDB_CreateAndPredictSegment(training.target, training.features, target.features, target.cartodb_ids)
FROM training, target;

View File

@ -23,11 +23,17 @@ Function to find the [PIA](https://en.wikipedia.org/wiki/Pole_of_inaccessibility
#### Example Usage
```sql
with a as(
select st_geomfromtext('POLYGON((-432540.453078056 4949775.20452642,-432329.947920966 4951361.232584,-431245.028163694 4952223.31516671,-429131.071033529 4951768.00415574,-424622.07505895 4952843.13503987,-423688.327170174 4953499.20752423,-424086.294349759 4954968.38274191,-423068.388925945 4954378.63345336,-423387.653225542 4953355.67417084,-420594.869840519 4953781.00230592,-416026.095299382 4951484.06849063,-412483.018546414 4951024.5410983,-410490.399661215 4954502.24032205,-408186.197521284 4956398.91417441,-407627.262358013 4959300.94633864,-406948.770061627 4959874.85407739,-404949.583326472 4959047.74518163,-402570.908447199 4953743.46829807,-400971.358683991 4952193.11680804,-403533.488084088 4949649.89857885,-406335.177028373 4950193.19571096,-407790.456731515 4952391.46015616,-412060.672398345 4950381.2389307,-410716.93482498 4949156.7509561,-408464.162289794 4943912.8940387,-409350.599394983 4942819.84896006,-408087.791091424 4942451.6711778,-407274.045613725 4940572.4807777,-404446.196589102 4939976.71501489,-402422.964843936 4940450.3670813,-401010.654464241 4939054.8061663,-397647.247369412 4940679.80737878,-395658.413346901 4940528.84765185,-395536.852462953 4938829.79565997,-394268.923462818 4938003.7277717,-393388.720249116 4934757.80596815,-392393.301362444 4934326.71675815,-392573.527618037 4932323.40974412,-393464.640141837 4931903.10653605,-393085.597275686 4931094.7353605,-398426.261165985 4929156.87541607,-398261.174361137 4926238.00816416,-394045.059966834 4925765.18668498,-392982.960705174 4926391.81893628,-393090.272694301 4927176.84692181,-391648.240010564 4924626.06386961,-391889.914625075 4923086.14787613,-394345.177314013 4923235.086036,-395550.878718795 4917812.79243978,-399009.463978251 4912927.7157945,-398948.794855767 4911941.91010796,-398092.636652078 4911806.57392519,-401991.601817112 4911722.9204501,-406225.972607907 4914505.47286319,-411104.994569885 4912569.26941163,-412925.513522316 4913030.3608866,-414630.148884835 4914436.69169949,-414207.691417276 4919205.78028405,-418306.141109809 4917994.9580478,-424184.700779621 4918938.12432889,-426816.961458921 4923664.37379373,-420956.324227126 4923381.98014807,-420186.661267781 4924286.48693378,-420943.411166194 4926812.76394433,-419779.45457046 4928527.43466337,-419768.767899344 4930681.94459216,-421911.668097113 4930432.40620397,-423482.386112205 4933451.28047252,-427272.814773717 4934151.56473242,-427144.908678797 4939731.77191996,-428982.125554848 4940522.84445172,-428986.133056516 4942437.17281266,-431237.792396792 4947309.68284815,-432476.889648814 4947791.74800037,-432540.453078056 4949775.20452642))', 3857) as g
WITH a as (
SELECT
ST_GeomFromText(
'POLYGON((-432540.453078056 4949775.20452642,-432329.947920966 4951361.232584,-431245.028163694 4952223.31516671,-429131.071033529 4951768.00415574,-424622.07505895 4952843.13503987,-423688.327170174 4953499.20752423,-424086.294349759 4954968.38274191,-423068.388925945 4954378.63345336,-423387.653225542 4953355.67417084,-420594.869840519 4953781.00230592,-416026.095299382 4951484.06849063,-412483.018546414 4951024.5410983,-410490.399661215 4954502.24032205,-408186.197521284 4956398.91417441,-407627.262358013 4959300.94633864,-406948.770061627 4959874.85407739,-404949.583326472 4959047.74518163,-402570.908447199 4953743.46829807,-400971.358683991 4952193.11680804,-403533.488084088 4949649.89857885,-406335.177028373 4950193.19571096,-407790.456731515 4952391.46015616,-412060.672398345 4950381.2389307,-410716.93482498 4949156.7509561,-408464.162289794 4943912.8940387,-409350.599394983 4942819.84896006,-408087.791091424 4942451.6711778,-407274.045613725 4940572.4807777,-404446.196589102 4939976.71501489,-402422.964843936 4940450.3670813,-401010.654464241 4939054.8061663,-397647.247369412 4940679.80737878,-395658.413346901 4940528.84765185,-395536.852462953 4938829.79565997,-394268.923462818 4938003.7277717,-393388.720249116 4934757.80596815,-392393.301362444 4934326.71675815,-392573.527618037 4932323.40974412,-393464.640141837 4931903.10653605,-393085.597275686 4931094.7353605,-398426.261165985 4929156.87541607,-398261.174361137 4926238.00816416,-394045.059966834 4925765.18668498,-392982.960705174 4926391.81893628,-393090.272694301 4927176.84692181,-391648.240010564 4924626.06386961,-391889.914625075 4923086.14787613,-394345.177314013 4923235.086036,-395550.878718795 4917812.79243978,-399009.463978251 4912927.7157945,-398948.794855767 4911941.91010796,-398092.636652078 4911806.57392519,-401991.601817112 4911722.9204501,-406225.972607907 4914505.47286319,-411104.994569885 4912569.26941163,-412925.513522316 4913030.3608866,-414630.148884835 4914436.69169949,-414207.691417276 4919205.78028405,-418306.141109809 4917994.9580478,-424184.700779621 4918938.12432889,-426816.961458921 4923664.37379373,-420956.324227126 4923381.98014807,-420186.661267781 4924286.48693378,-420943.411166194 4926812.76394433,-419779.45457046 4928527.43466337,-419768.767899344 4930681.94459216,-421911.668097113 4930432.40620397,-423482.386112205 4933451.28047252,-427272.814773717 4934151.56473242,-427144.908678797 4939731.77191996,-428982.125554848 4940522.84445172,-428986.133056516 4942437.17281266,-431237.792396792 4947309.68284815,-432476.889648814 4947791.74800037,-432540.453078056 4949775.20452642))',
3857) as g
),
b as (
select ST_Transform(g, 4326) as g from a
SELECT ST_Transform(g, 4326) as g
FROM a
)
SELECT st_astext(CDB_PIA(g)) from b;
SELECT
ST_AsText(cdb_crankshaft.CDB_PIA(g))
FROM b
```

View File

@ -24,12 +24,22 @@ Returns a table object
#### Example Usage
```sql
with data as (
select
ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),ST_GeomFromText('POINT(2.1228 41.3809)'),ST_GeomFromText('POINT(2.1511 41.3742)'),ST_GeomFromText('POINT(2.1528 41.4136)'),ST_GeomFromText('POINT(2.165 41.3917)'),ST_GeomFromText('POINT(2.1498 41.3713)'),ST_GeomFromText('POINT(2.1533 41.3683)'),ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin
WITH data as (
SELECT
ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
ARRAY[
ST_GeomFromText('POINT(2.1744 41.4036)'),
ST_GeomFromText('POINT(2.1228 41.3809)'),
ST_GeomFromText('POINT(2.1511 41.3742)'),
ST_GeomFromText('POINT(2.1528 41.4136)'),
ST_GeomFromText('POINT(2.165 41.3917)'),
ST_GeomFromText('POINT(2.1498 41.3713)'),
ST_GeomFromText('POINT(2.1533 41.3683)'),
ST_GeomFromText('POINT(2.131386 41.413998)')
] as geomin
)
select CDB_Densify(geomin, colin, 2) from data;
SELECT cdb_crankshaft.CDB_Densify(geomin, colin, 2)
FROM data
```

View File

@ -26,11 +26,19 @@ Returns a table object
#### Example Usage
```sql
with data as (
select
ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),ST_GeomFromText('POINT(2.1228 41.3809)'),ST_GeomFromText('POINT(2.1511 41.3742)'),ST_GeomFromText('POINT(2.1528 41.4136)'),ST_GeomFromText('POINT(2.165 41.3917)'),ST_GeomFromText('POINT(2.1498 41.3713)'),ST_GeomFromText('POINT(2.1533 41.3683)'),ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin
WITH data as (
SELECT
ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),
ST_GeomFromText('POINT(2.1228 41.3809)'),
ST_GeomFromText('POINT(2.1511 41.3742)'),
ST_GeomFromText('POINT(2.1528 41.4136)'),
ST_GeomFromText('POINT(2.165 41.3917)'),
ST_GeomFromText('POINT(2.1498 41.3713)'),
ST_GeomFromText('POINT(2.1533 41.3683)'),
ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin
)
select CDB_TINmap(geomin, colin, 2) from data;
SELECT cdb_crankshaft.CDB_TINmap(geomin, colin, 2)
FROM data
```

View File

@ -43,7 +43,7 @@ With a table `website_visits` and a column of the number of website visits in un
```sql
SELECT
id,
CDB_StaticOutlier(visits_10k, 11.0) As outlier,
cdb_crankshaft.CDB_StaticOutlier(visits_10k, 11.0) As outlier,
visits_10k
FROM website_visits
```
@ -93,7 +93,7 @@ WITH cte As (
unnest(Array[1,3,5,1,32,3,57,2]) As visits_10k
)
SELECT
(CDB_PercentOutlier(array_agg(visits_10k), 2.0, array_agg(id))).*
(cdb_crankshaft.CDB_PercentOutlier(array_agg(visits_10k), 2.0, array_agg(id))).*
FROM cte;
```
@ -144,7 +144,7 @@ WITH cte As (
unnest(Array[1,3,5,1,32,3,57,2]) As visits_10k
)
SELECT
(CDB_StdDevOutlier(array_agg(visits_10k), 2.0, array_agg(id))).*
(cdb_crankshaft.CDB_StdDevOutlier(array_agg(visits_10k), 2.0, array_agg(id))).*
FROM cte;
```