From 4b409cc9f49ea142e9d1be6cbe0ed13df2cd44db Mon Sep 17 00:00:00 2001 From: John Krauss Date: Wed, 1 Feb 2017 09:12:18 -0500 Subject: [PATCH 1/8] first-pass docs for obs_getdata and obs_getmeta --- doc/measures_functions.md | 220 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 8f2f026..b01b2b0 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -195,3 +195,223 @@ Add the Category to an empty column text column based on point locations in your UPDATE tablename SET segmentation = OBS_GetCategory(the_geom, 'us.census.spielman_singleton_segments.X55') ``` + +## OBS_GetMeta(extent geometry, metadata json) + +The ```OBS_GetMeta(extent, metadata)``` function returns a completed Data +Observatory metadata JSON Object for use in ```OBS_GetData(geomvals, +metadata)``` or ```OBS_GetData(ids, metadata)```. It is not possible to pass +metadata to those functions if it is not processed by ```OBS_GetMeta(extent, +metadata)``` first. + +#### Arguments + +Name | Description +---- | ----------- +extent | A geometry of the extent of the input geometries +metadata | A JSON array composed of metadata input objects. Each indicates one desired measured for an output column, and optionally additional parameters about that column + +The schema of the metadata input objects are as follows: + +Metadata Input Key | Description +--- | ----------- +numer_id | The identifier for the desired measurement. If left blank, but a `geom_id` is specified, the column will return a geometry instead of a measurement. +geom_id | Identifier for a desired geographic boundary level to use when calculating measures. Will be automatically assigned if undefined. If defined but `numer_id` is blank, then the column will return a geometry instead of a measurement. +normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. Ignored if this metadata object specifies a geometry. +denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry. +numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified. +geom_timespan | The desired timespan for the geometry. Defaults to most recent timespan available if left unspecified. + +#### Returns + +A JSON array composed of metadata output objects. + +Key | Description +--- | ----------- +meta | A JSON array with completed metadata for the requested data, including all keys below + +The schema of the metadata output objects are as follows. You should pass this +array as-is to ```OBS_GetData```. If you modify any values the function will +fail. + +Metadata Output Key | Description +--- | ----------- +numer_id | Identifier for desired measurement. +numer_timespan | Timespan that will be used of the desired measurement. +numer_name | Human-readable name of desired measure +numer_type | PostgreSQL/PostGIS type of desired measure +numer_colname | Internal identifier for column name +numer_tablename | Internal identifier for table +numer_geomref_colname | Internal identifier for geomref column name +denom_id | Identifier for desired normalization. +denom_timespan | Timespan that will be used of the desired normalization. +denom_name | Human-readable name of desired measure's normalization +denom_type | PostgreSQL/PostGIS type of desired measure's normalization +denom_colname | Internal identifier for normalization column name +denom_tablename | Internal identifier for normalization table +denom_geomref_colname | Internal identifier for normalization geomref column name +geom_id | Identifier for desired boundary geometry. +geom_timespan | Timespan that will be used of the desired boundary geometry. +geom_name | Human-readable name of desired boundary geometry's +geom_type | PostgreSQL/PostGIS type of desired boundary geometry +geom_colname | Internal identifier for boundary geometry column name +geom_tablename | Internal identifier for boundary geometry table +geom_geomref_colname | Internal identifier for boundary geometry ref column name + +#### Examples + +Obtain metadata that can augment with one additional column of US population +data, using a boundary relevant for the geometry provided and latest timespan. + +```SQL +SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001"}]') +``` + +Obtain metadata that can augment with one additional column of US population +data, using census tract boundaries. + +```SQL +SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract"}]') +``` + +Obtain metadata that can augment with two additional columns, one for total +population and one for male population. + +```SQL +SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001"}, {"numer_id": "us.census.acs.B01001002"}]') +``` + +## OBS_GetData(geomvals array[geomval], metadata json) + +The ```OBS_GetData(geomvals, metadata)``` function returns a measure and/or +geometry corresponding to the `metadata` JSON array for each every Geometry of +the `geomval` element in the `geomvals` array. The metadata argument must be +obtained from ```OBS_GetMeta(extent, metadata)```. + +#### Arguments + +Name | Description +---- | ----------- +geomvals | An array of `geomval` elements, which are obtained by casting together a `Geometry` and a `Numeric`. This should be obtained by using `ARRAY_AGG((the_geom, cartodb_id)::geomval)` from the CARTO table one wishes to obtain data for. +metadata | A JSON array composed of metadata output objects from `OBS_GetMeta(extent, metadata)`. The schema of the elements of the `metadata` JSON array corresponds to that of the output of ```OBS_GetMeta(extent, metadata)```, and this argument must be obtained from that function in order for the call to be valid. + +#### Returns + +A TABLE with the following schema, where each element of the input `geomvals` +array corresponds to one row: + +Column | Type | Description +------ | ---- | ----------- +id | Numeric | ID corresponding to the `val` component of an element of the input `geomvals` array +data | JSON | A JSON array with elements corresponding to the input `metadata` JSON array + +Each `data` object has the following keys: + +Key | Description +--- | ----------- +value | The value of the measurement or geometry for the geometry corresponding to this row and measurement corresponding to this position in the `metadata` JSON array + +To determine the appropriate cast for `value`, one can use the `numer_type` +or `geom_type` key corresponding to that value in the input `metadata` JSON +array. + +#### Examples + +Obtain population densities for every geometry in a table, keyed by cartodb_id: + +```SQL +WITH meta AS ( + SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001"}]') meta) +SELECT id AS cartodb_id, (data->1->>'value') AS pop_density +FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename), + (SELECT meta FROM meta)) +``` + +Update a table with population densities + +```SQL +WITH meta AS ( + SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001"}]') meta), +data AS ( + SELECT id AS cartodb_id, (data->1->>'value') AS pop_density + FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename), + (SELECT meta FROM meta))) +UPDATE tablename +SET pop_density = data.pop_density +FROM data +WHERE cartodb_id = data.id +``` + +## OBS_GetData(ids array[text], metadata json) + +The ```OBS_GetData(ids, metadata)``` function returns a measure and/or +geometry corresponding to the `metadata` JSON array for each every id of +the `ids` array. The metadata argument must be obtained from +```OBS_GetMeta(extent, metadata)```. When obtaining metadata, one must include +the `geom_id` corresponding to the boundary the `ids` refer to. + +#### Arguments + +Name | Description +---- | ----------- +ids | An array of `TEXT` elements. This should be obtained by using `ARRAY_AGG(col_of_geom_refs)` from the CARTO table one wishes to obtain data for. +metadata | A JSON array composed of metadata output objects from `OBS_GetMeta(extent, metadata)`. The schema of the elements of the `metadata` JSON array corresponds to that of the output of ```OBS_GetMeta(extent, metadata)```, and this argument must be obtained from that function in order for the call to be valid. + +For this function to work, the `metadata` argument must include a `geom_id` +that corresponds to the IDS found in `col_of_geom_refs`. + +#### Returns + +A TABLE with the following schema, where each element of the input `ids` array +corresponds to one row: + +Column | Type | Description +------ | ---- | ----------- +id | Text | ID corresponding to an element of the input `ids` array +data | JSON | A JSON array with elements corresponding to the input `metadata` JSON array + +Each `data` object has the following keys: + +Key | Description +--- | ----------- +value | The value of the measurement or geometry for the geometry corresponding to this row and measurement corresponding to this position in the `metadata` JSON array + +To determine the appropriate cast for `value`, one can use the `numer_type` +or `geom_type` key corresponding to that value in the input `metadata` JSON +array. + +#### Examples + +Obtain population densities for every row of a table with FIPS code county IDs +(USA). + +```SQL +WITH meta AS ( + SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]') meta) +SELECT id AS fips, (data->1->>'value') AS pop_density +FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename), + (SELECT meta FROM meta)) +``` + +Update a table with population densities for every FIPS code county ID (USA). + +```SQL +WITH meta AS ( + SELECT OBS_GetMeta(ST_Extent(the_geom), + '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]') meta), +data as ( + SELECT id AS fips, (data->1->>'value') AS pop_density + FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename), + (SELECT meta FROM meta))) +UPDATE tablename +SET pop_density = data.pop_density +FROM data +WHERE fips = data.id +``` + From 01b70dd06e8edf111eed61c62f1b84a021df67a6 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 6 Feb 2017 14:58:07 -0500 Subject: [PATCH 2/8] proof-reading changes --- doc/measures_functions.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index b01b2b0..17485ea 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -8,15 +8,15 @@ You can [access](https://carto.com/docs/carto-engine/data/accessing) measures th ## OBS_GetUSCensusMeasure(point geometry, measure_name text) -The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables at a point location. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory, to access the full list, use measure IDs with the ```OBS_GetMeasure``` function below. +The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables at a point location. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use measure IDs with the ```OBS_GetMeasure``` function below. #### Arguments Name |Description --- | --- point | a WGS84 point geometry (the_geom) -measure_name | a human readable name of a US Census variable. The list of measure_names is [available in the Glossary](https://carto.com/docs/carto-engine/data/glossary/#obsgetuscensusmeasure-names-table). -normalize | for measures that are are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional) +measure_name | a human-readable name of a US Census variable. The list of measure_names is [available in the Glossary](https://carto.com/docs/carto-engine/data/glossary/#obsgetuscensusmeasure-names-table). +normalize | for measures that are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional) boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract') time_span | time span of interest (e.g., 2010 - 2014) @@ -39,7 +39,7 @@ SET total_population = OBS_GetUSCensusMeasure(the_geom, 'Total Population') ## OBS_GetUSCensusMeasure(polygon geometry, measure_name text) -The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory, to access the full list, use the ```OBS_GetUSCensusMeasure``` function below. +The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use the ```OBS_GetUSCensusMeasure``` function below. #### Arguments @@ -78,7 +78,7 @@ Name |Description --- | --- point | a WGS84 point geometry (the_geom) measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)). It is important to note that these are different than 'measure_name' used in the Census based functions above. -normalize | for measures that are are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. The other option is 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html). (optional) +normalize | for measures that are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. The other option is 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html). (optional) boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract') time_span | time span of interest (e.g., 2010 - 2014) @@ -109,7 +109,7 @@ Name |Description --- | --- polygon_geometry | a WGS84 polygon geometry (the_geom) measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)) -normalize | for measures that are are **sums** (e.g. population) the default normalization is 'none' and response comes back as a raw value. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional) +normalize | for measures that are **sums** (e.g. population) the default normalization is 'none' and response comes back as a raw value. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional) boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract') time_span | time span of interest (e.g., 2010 - 2014) @@ -236,23 +236,23 @@ fail. Metadata Output Key | Description --- | ----------- -numer_id | Identifier for desired measurement. -numer_timespan | Timespan that will be used of the desired measurement. +numer_id | Identifier for desired measurement +numer_timespan | Timespan that will be used of the desired measurement numer_name | Human-readable name of desired measure numer_type | PostgreSQL/PostGIS type of desired measure numer_colname | Internal identifier for column name numer_tablename | Internal identifier for table numer_geomref_colname | Internal identifier for geomref column name -denom_id | Identifier for desired normalization. -denom_timespan | Timespan that will be used of the desired normalization. +denom_id | Identifier for desired normalization +denom_timespan | Timespan that will be used of the desired normalization denom_name | Human-readable name of desired measure's normalization denom_type | PostgreSQL/PostGIS type of desired measure's normalization denom_colname | Internal identifier for normalization column name denom_tablename | Internal identifier for normalization table denom_geomref_colname | Internal identifier for normalization geomref column name -geom_id | Identifier for desired boundary geometry. -geom_timespan | Timespan that will be used of the desired boundary geometry. -geom_name | Human-readable name of desired boundary geometry's +geom_id | Identifier for desired boundary geometry +geom_timespan | Timespan that will be used of the desired boundary geometry +geom_name | Human-readable name of desired boundary geometry geom_type | PostgreSQL/PostGIS type of desired boundary geometry geom_colname | Internal identifier for boundary geometry column name geom_tablename | Internal identifier for boundary geometry table @@ -331,7 +331,7 @@ FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablena (SELECT meta FROM meta)) ``` -Update a table with population densities +Update a table with population densities: ```SQL WITH meta AS ( @@ -353,7 +353,7 @@ The ```OBS_GetData(ids, metadata)``` function returns a measure and/or geometry corresponding to the `metadata` JSON array for each every id of the `ids` array. The metadata argument must be obtained from ```OBS_GetMeta(extent, metadata)```. When obtaining metadata, one must include -the `geom_id` corresponding to the boundary the `ids` refer to. +the `geom_id` corresponding to the boundary that the `ids` refer to. #### Arguments @@ -363,7 +363,7 @@ ids | An array of `TEXT` elements. This should be obtained by using `ARRAY_AGG( metadata | A JSON array composed of metadata output objects from `OBS_GetMeta(extent, metadata)`. The schema of the elements of the `metadata` JSON array corresponds to that of the output of ```OBS_GetMeta(extent, metadata)```, and this argument must be obtained from that function in order for the call to be valid. For this function to work, the `metadata` argument must include a `geom_id` -that corresponds to the IDS found in `col_of_geom_refs`. +that corresponds to the ids found in `col_of_geom_refs`. #### Returns From 60ab773549b296d13ab48678a33ddf8342fa0ed4 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 6 Feb 2017 15:57:49 -0500 Subject: [PATCH 3/8] change point to polygon in GetUSCensusMeasure --- doc/measures_functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 17485ea..2b6de90 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -39,7 +39,7 @@ SET total_population = OBS_GetUSCensusMeasure(the_geom, 'Total Population') ## OBS_GetUSCensusMeasure(polygon geometry, measure_name text) -The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use the ```OBS_GetUSCensusMeasure``` function below. +The ```OBS_GetUSCensusMeasure(polygon, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use the ```OBS_GetUSCensusMeasure``` function below. #### Arguments From d15b74a5941a8f878877c0234d2f59e8d074beac Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 6 Feb 2017 16:18:56 -0500 Subject: [PATCH 4/8] Change ```OBS_GetUSCensusMeasure``` --- doc/measures_functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 2b6de90..6a97ba4 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -39,7 +39,7 @@ SET total_population = OBS_GetUSCensusMeasure(the_geom, 'Total Population') ## OBS_GetUSCensusMeasure(polygon geometry, measure_name text) -The ```OBS_GetUSCensusMeasure(polygon, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use the ```OBS_GetUSCensusMeasure``` function below. +The ```OBS_GetUSCensusMeasure(polygon, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use the ```OBS_GetMeasure``` function below. #### Arguments From 72ced1a7a7bf09d1b0dd45c135bad629061832b1 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 6 Feb 2017 16:27:43 -0500 Subject: [PATCH 5/8] Change 'raise' to 'raises' Changes semantic meaning-- user does not raise the error, CARTO raises the error --- doc/measures_functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 6a97ba4..8276f16 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -132,7 +132,7 @@ SET household_count = OBS_GetMeasure(the_geom, 'us.census.acs.B11001001') #### Errors -* If an unrecognized normalization type is input, raise an error: `'Only valid inputs for "normalize" are "area" (default) and "denominator".` +* If an unrecognized normalization type is input, raises error: `'Only valid inputs for "normalize" are "area" (default) and "denominator".` ## OBS_GetMeasureById(geom_ref text, measure_id text, boundary_id text) From 8120081d68b78aba873f611d6908b7d92fbdad65 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 6 Feb 2017 16:37:27 -0500 Subject: [PATCH 6/8] Typo fix Typo fix of "measured" to "measure" --- doc/measures_functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 8276f16..46d65de 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -209,7 +209,7 @@ metadata)``` first. Name | Description ---- | ----------- extent | A geometry of the extent of the input geometries -metadata | A JSON array composed of metadata input objects. Each indicates one desired measured for an output column, and optionally additional parameters about that column +metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column The schema of the metadata input objects are as follows: From af671931d42e37550cb8ec712828c807b96d6927 Mon Sep 17 00:00:00 2001 From: John Krauss Date: Thu, 23 Feb 2017 20:12:27 +0000 Subject: [PATCH 7/8] integrate michelles comments --- doc/measures_functions.md | 106 ++++++++++++++++++++++++++++++-------- 1 file changed, 84 insertions(+), 22 deletions(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 46d65de..d3db385 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -196,7 +196,7 @@ UPDATE tablename SET segmentation = OBS_GetCategory(the_geom, 'us.census.spielman_singleton_segments.X55') ``` -## OBS_GetMeta(extent geometry, metadata json) +## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_boundary_score_rank, num_target_geoms) The ```OBS_GetMeta(extent, metadata)``` function returns a completed Data Observatory metadata JSON Object for use in ```OBS_GetData(geomvals, @@ -204,12 +204,18 @@ metadata)``` or ```OBS_GetData(ids, metadata)```. It is not possible to pass metadata to those functions if it is not processed by ```OBS_GetMeta(extent, metadata)``` first. +`OBS_GetMeta` makes it possible to automatically select appropriate timespans +and boundaries for the measurement you want. + #### Arguments Name | Description ---- | ----------- extent | A geometry of the extent of the input geometries metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column +max_timespan_rank | How many historical time periods to include. Defaults to 1 +max_boundary_score_rank | How many alternative boundary levels to include. Defaults to 1 +num_target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest. The schema of the metadata input objects are as follows: @@ -220,7 +226,7 @@ geom_id | Identifier for a desired geographic boundary level to use when calcula normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. Ignored if this metadata object specifies a geometry. denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry. numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified. -geom_timespan | The desired timespan for the geometry. Defaults to most recent timespan available if left unspecified. +geom_timespan | The desired timespan for the geometry. Defaults to timespan matching numer_timespan if left unspecified. #### Returns @@ -257,31 +263,51 @@ geom_type | PostgreSQL/PostGIS type of desired boundary geometry geom_colname | Internal identifier for boundary geometry column name geom_tablename | Internal identifier for boundary geometry table geom_geomref_colname | Internal identifier for boundary geometry ref column name +timespan_rank | Ranking of this measurement by time, most recent is 1, second most recent 2, etc. +score | The score of this measurement's boundary compared to the `extent` and `num_target_geoms` passed in. Between 0 and 100. +score_rank | The ranking of this measurement's boundary, highest ranked is 1, second is 2, etc. +numer_aggregate | The aggregate type of the numerator, either `sum`, `average`, `median`, or blank +denom_aggregate | The aggregate type of the denominator, either `sum`, `average`, `median`, or blank +normalization | The sort of normalization that will be used for this measure, either `area`, `predenominated`, or `denominated` #### Examples Obtain metadata that can augment with one additional column of US population data, using a boundary relevant for the geometry provided and latest timespan. +Limit to only the most recent column most relevant to the extent & density of +input geometries in `tablename`. ```SQL -SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001"}]') +SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001"}]', + 1, 1, + COUNT(*) +) FROM tablename ``` Obtain metadata that can augment with one additional column of US population data, using census tract boundaries. ```SQL -SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract"}]') +SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract"}]', + 1, 1, + COUNT(*) +) FROM tablename ``` Obtain metadata that can augment with two additional columns, one for total population and one for male population. ```SQL -SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001"}, {"numer_id": "us.census.acs.B01001002"}]') +SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001"}, {"numer_id": "us.census.acs.B01001002"}]', + 1, 1, + COUNT(*) +) FROM tablename ``` ## OBS_GetData(geomvals array[geomval], metadata json) @@ -324,21 +350,28 @@ Obtain population densities for every geometry in a table, keyed by cartodb_id: ```SQL WITH meta AS ( - SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001"}]') meta) -SELECT id AS cartodb_id, (data->1->>'value') AS pop_density + SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001"}]', + 1, 1, COUNT(*) +) meta FROM tablename) +SELECT id AS cartodb_id, (data->0->>'value')::Numeric AS pop_density FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename), (SELECT meta FROM meta)) ``` -Update a table with population densities: +Update a table with a blank numeric column called `pop_density` with population +densities: ```SQL WITH meta AS ( - SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001"}]') meta), + SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001"}]', + 1, 1, COUNT(*) +) meta FROM tablename), data AS ( - SELECT id AS cartodb_id, (data->1->>'value') AS pop_density + SELECT id AS cartodb_id, (data->0->>'value')::Numeric AS pop_density FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename), (SELECT meta FROM meta))) UPDATE tablename @@ -347,6 +380,30 @@ FROM data WHERE cartodb_id = data.id ``` +Update a table with two measurements at once, population density and household +density. The table should already have a Numeric column `pop_density` and +`household_density`. + +``` +WITH meta AS ( + SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom),4326), + '[{"numer_id": "us.census.acs.B01003001"},{"numer_id": "us.census.acs.B11001001"}]', + 1, 1, COUNT(*) +) meta from tablename), +data AS ( + SELECT id, + data->0->>'value' AS pop_density, + data->1->>'value' AS household_density + FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename), + (SELECT meta FROM meta))) +UPDATE tablename +SET pop_density = data.pop_density, + household_density = data.household_density +FROM data +WHERE cartodb_id = data.id +``` + ## OBS_GetData(ids array[text], metadata json) The ```OBS_GetData(ids, metadata)``` function returns a measure and/or @@ -392,21 +449,27 @@ Obtain population densities for every row of a table with FIPS code county IDs ```SQL WITH meta AS ( - SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]') meta) -SELECT id AS fips, (data->1->>'value') AS pop_density + SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]' +) meta FROM tablename) +SELECT id AS fips, (data->0->>'value')::Numeric AS pop_density FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename), (SELECT meta FROM meta)) ``` Update a table with population densities for every FIPS code county ID (USA). +This table has a blank column called `pop_density` and fips codes stored in a +column `fips`. ```SQL WITH meta AS ( - SELECT OBS_GetMeta(ST_Extent(the_geom), - '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]') meta), + SELECT OBS_GetMeta( + ST_SetSRID(ST_Extent(the_geom), 4326), + '[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]' +) meta FROM tablename), data as ( - SELECT id AS fips, (data->1->>'value') AS pop_density + SELECT id AS fips, (data->0->>'value') AS pop_density FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename), (SELECT meta FROM meta))) UPDATE tablename @@ -414,4 +477,3 @@ SET pop_density = data.pop_density FROM data WHERE fips = data.id ``` - From 63ae7c13928b9a2cf672aa044dcb61ea98947733 Mon Sep 17 00:00:00 2001 From: John Krauss Date: Tue, 28 Feb 2017 21:33:06 +0000 Subject: [PATCH 8/8] add obs_getavailableX metadata API docs --- doc/discovery_functions.md | 303 +++++++++++++++++++++++++++++++++++++ 1 file changed, 303 insertions(+) diff --git a/doc/discovery_functions.md b/doc/discovery_functions.md index 36425e4..dab630b 100644 --- a/doc/discovery_functions.md +++ b/doc/discovery_functions.md @@ -56,3 +56,306 @@ time_span | the timespan attached the boundary. this does not mean that the boun ```SQL SELECT * FROM OBS_GetAvailableBoundaries(CDB_LatLng(40.7, -73.9)) ``` + +## OBS_GetAvailableNumerators(bounds, filter_tags, denom_id, geom_id, timespan) + +Return available numerators within a boundary and with the specified +`filter_tags`. + +#### Arguments + +Name | Type | Description +--- | --- | --- +bounds | Geometry(Geometry, 4326) | a geometry which some of the numerator's data must intersect with +filter_tags | Text[] | a list of filters. Only numerators for which all of these apply are returned `NULL` to ignore (optional) +denom_id | Text | the ID of a denominator to check whether the numerator is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional) +geom_id | Text | the ID of a geometry to check whether the numerator is valid against. Will not reduce length of returned table, but will change values for `valid_geom` (optional) +timespan | Text | the ID of a timespan to check whether the numerator is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional) + +#### Returns + +A TABLE containing the following properties + +Key | Type | Description +--- | ---- | ----------- +numer_id | Text | The ID of the numerator +numer_name | Text | A human readable name for the numerator +numer_description | Text | Description of the numerator. Is sometimes NULL +numer_weight | Numeric | Numeric "weight" of the numerator. Ignored. +numer_license | Text | ID of the license for the numerator +numer_source | Text | ID of the source for the numerator +numer_type | Text | Postgres type of the numerator +numer_aggregate | Text | Aggregate type of the numerator. If `'SUM'`, this can be normalized by area +numer_extra | JSONB | Extra information about the numerator column. Ignored. +numer_tags | Text[] | Array of all tags applying to this numerator +valid_denom | Boolean | True if the `denom_id` argument is a valid denominator for this numerator, False otherwise +valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this numerator, False otherwise +valid_timespan | Boolean | True if the `timespan` argument is a valid timespan for this numerator, False otherwise + +#### Examples + +Obtain all numerators that are available within a small rectangle. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326)) +``` + +Obtain all numerators that are available within a small rectangle and are for +the United States only. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states}'); +``` + +Obtain all numerators that are available within a small rectangle and are +employment related for the United States only. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states, subsection/tags.employment}'); +``` + +Obtain all numerators that are available within a small rectangle and are +related to both employment and age & gender for the United States only. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states, subsection/tags.employment, subsection/tags.age_gender}'); +``` + +Obtain all numerators that work with US population (`us.census.acs.B01003001`) +as a denominator. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01003001') +WHERE valid_denom IS True; +``` + +Obtain all numerators that work with US states (`us.census.tiger.state`) +as a geometry. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, 'us.census.tiger.state') +WHERE valid_geom IS True; +``` + +Obtain all numerators available in the timespan `2011 - 2015`. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, '2011 - 2015') +WHERE valid_timespan IS True; +``` + +## OBS_GetAvailableDenominators(bounds, filter_tags, numer_id, geom_id, timespan) + +Return available denominators within a boundary and with the specified +`filter_tags`. + +#### Arguments + +Name | Type | Description +--- | --- | --- +bounds | Geometry(Geometry, 4326) | a geometry which some of the denominator's data must intersect with +filter_tags | Text[] | a list of filters. Only denominators for which all of these apply are returned `NULL` to ignore (optional) +numer_id | Text | the ID of a numerator to check whether the denominator is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional) +geom_id | Text | the ID of a geometry to check whether the denominator is valid against. Will not reduce length of returned table, but will change values for `valid_geom` (optional) +timespan | Text | the ID of a timespan to check whether the denominator is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional) + +#### Returns + +A TABLE containing the following properties + +Key | Type | Description +--- | ---- | ----------- +denom_id | Text | The ID of the denominator +denom_name | Text | A human readable name for the denominator +denom_description | Text | Description of the denominator. Is sometimes NULL +denom_weight | Numeric | Numeric "weight" of the denominator. Ignored. +denom_license | Text | ID of the license for the denominator +denom_source | Text | ID of the source for the denominator +denom_type | Text | Postgres type of the denominator +denom_aggregate | Text | Aggregate type of the denominator. If `'SUM'`, this can be normalized by area +denom_extra | JSONB | Extra information about the denominator column. Ignored. +denom_tags | Text[] | Array of all tags applying to this denominator +valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this denominator, False otherwise +valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this denominator, False otherwise +valid_timespan | Boolean | True if the `timespan` argument is a valid timespan for this denominator, False otherwise + +#### Examples + +Obtain all denominators that are available within a small rectangle. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326)); +``` + +Obtain all denominators that are available within a small rectangle and are for +the United States only. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states}'); +``` + +Obtain all denominators for male population (`us.census.acs.B01001002`). + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01001002') +WHERE valid_numer IS True; +``` + +Obtain all denominators that work with US states (`us.census.tiger.state`) +as a geometry. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, 'us.census.tiger.state') +WHERE valid_geom IS True; +``` + +Obtain all denominators available in the timespan `2011 - 2015`. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, '2011 - 2015') +WHERE valid_timespan IS True; +``` + +## OBS_GetAvailableGeometries(bounds, filter_tags, numer_id, denom_id, timespan) + +Return available geometries within a boundary and with the specified +`filter_tags`. + +#### Arguments + +Name | Type | Description +--- | --- | --- +bounds | Geometry(Geometry, 4326) | a geometry which must intersect the geometry +filter_tags | Text[] | a list of filters. Only geometries for which all of these apply are returned `NULL` to ignore (optional) +numer_id | Text | the ID of a numerator to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional) +denom_id | Text | the ID of a denominator to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional) +timespan | Text | the ID of a timespan to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional) + +#### Returns + +A TABLE containing the following properties + +Key | Type | Description +--- | ---- | ----------- +geom_id | Text | The ID of the geometry +geom_name | Text | A human readable name for the geometry +geom_description | Text | Description of the geometry. Is sometimes NULL +geom_weight | Numeric | Numeric "weight" of the geometry. Ignored. +geom_aggregate | Text | Aggregate type of the geometry. Ignored. +geom_license | Text | ID of the license for the geometry +geom_source | Text | ID of the source for the geometry +geom_type | Text | Postgres type of the geometry +geom_extra | JSONB | Extra information about the geometry column. Ignored. +geom_tags | Text[] | Array of all tags applying to this geometry +valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this geometry, False otherwise +valid_denom | Boolean | True if the `geom_id` argument is a valid geometry for this geometry, False otherwise +valid_timespan | Boolean | True if the `timespan` argument is a valid timespan for this geometry, False otherwise +score | Numeric | Score between 0 and 100 for this geometry, higher numbers mean that this geometry is a better choice for the passed extent +numtiles | Numeric | How many raster tiles were read for score, numgeoms, and percentfill estimates +numgeoms | Numeric | About how many of these geometries fit inside the passed extent +percentfill | Numeric | About what percentage of the passed extent is filled with these geometries +estnumgeoms | Numeric | Ignored +meanmediansize | Numeric | Ignored + +#### Examples + +Obtain all geometries that are available within a small rectangle. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries( + ST_MakeEnvelope(-74, 41, -73, 40, 4326)); +``` + +Obtain all geometries that are available within a small rectangle and are for +the United States only. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states}'); +``` + +Obtain all geometries that work with total population (`us.census.acs.B01003001`). + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01003001') +WHERE valid_numer IS True; +``` + +Obtain all geometries with timespan `2015`. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, '2015') +WHERE valid_timespan IS True; +``` + +## OBS_GetAvailableTimespans(bounds, filter_tags, numer_id, denom_id, geom_id) + +Return available timespans within a boundary and with the specified +`filter_tags`. + +#### Arguments + +Name | Type | Description +--- | --- | --- +bounds | Geometry(Geometry, 4326) | a geometry which some of the timespan's data must intersect with +filter_tags | Text[] | a list of filters. Ignore +numer_id | Text | the ID of a numerator to check whether the timespans is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional) +denom_id | Text | the ID of a denominator to check whether the timespans is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional) +geom_id | Text | the ID of a geometry to check whether the timespans is valid against. Will not reduce length of returned table, but will change values for `valid_geom` (optional) + +#### Returns + +A TABLE containing the following properties + +Key | Type | Description +--- | ---- | ----------- +timespan_id | Text | The ID of the timespan +timespan_name | Text | A human readable name for the timespan +timespan_description | Text | Ignored +timespan_weight | Numeric | Ignored +timespan_license | Text | Ignored +timespan_source | Text | Ignored +timespan_aggregate | Text | Ignored +valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this timespan, False otherwise +valid_denom | Boolean | True if the `timespan` argument is a valid timespan for this timespan, False otherwise +valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this timespan, False otherwise + +#### Examples + +Obtain all timespans that are available within a small rectangle. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableTimespans( + ST_MakeEnvelope(-74, 41, -73, 40, 4326)); +``` + +Obtain all timespans for total population (`us.census.acs.B01003001`). + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableTimespans( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01003001') +WHERE valid_numer IS True; +``` + +Obtain all timespans that work with US states (`us.census.tiger.state`) +as a geometry. + +```SQL +SELECT * FROM cdb_observatory.OBS_GetAvailableTimespans( + ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, 'us.census.tiger.state') +WHERE valid_geom IS True; +```