From 1f7f8015ad3dbb896278b857226e1f7ec7c4ec76 Mon Sep 17 00:00:00 2001 From: Mario de Frutos Date: Thu, 10 Aug 2017 13:29:42 +0200 Subject: [PATCH 1/2] OBS_MetadataValidation doc --- doc/measures_functions.md | 60 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 3 deletions(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index 88794e4..dfa0261 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -108,7 +108,7 @@ The ```OBS_GetMeasure(polygon, measure_id)``` function returns any Data Observat Name |Description --- | --- polygon_geometry | a WGS84 polygon geometry (the_geom) -measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)) +measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)) normalize | for measures that are **sums** (e.g. population) the default normalization is 'none' and response comes back as a raw value. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional) boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract') time_span | time span of interest (e.g., 2010 - 2014) @@ -143,7 +143,7 @@ The ```OBS_GetMeasureById(geom_ref, measure_id, boundary_id)``` function returns Name |Description --- | --- geom_ref | a geometry reference (e.g., a US Census geoid) -measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)) +measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)) boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract') time_span (optional) | time span of interest (e.g., 2010 - 2014). If `NULL` is passed, the measure from the most recent data will be used. @@ -215,7 +215,7 @@ extent | A geometry of the extent of the input geometries metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column num_timespan_options | How many historical time periods to include. Defaults to 1 num_score_options | How many alternative boundary levels to include. Defaults to 1 -target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest. +target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest. The schema of the metadata input objects are as follows: @@ -321,6 +321,60 @@ SELECT OBS_GetMeta( ) FROM tablename ``` +## OBS_MetadataValidation(extent geometry, geometry_type text, metadata json, target_geoms) + +The ```OBS_MetadataValidation``` function peforms a validation check over +the known issues using the extent, type of geometry and metadata we're going +to use in the ```OBS_GetMeta``` function. + +#### Arguments + +Name | Description +---- | ----------- +extent | A geometry of the extent of the input geometries +geometry_type | The geometry type of the source data. +metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column +target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest. + +The schema of the metadata input objects are as follows: + +Metadata Input Key | Description +--- | ----------- +numer_id | The identifier for the desired measurement. If left blank, but a `geom_id` is specified, the column will return a geometry instead of a measurement. +geom_id | Identifier for a desired geographic boundary level to use when calculating measures. Will be automatically assigned if undefined. If defined but `numer_id` is blank, then the column will return a geometry instead of a measurement. +normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. Ignored if this metadata object specifies a geometry. +denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry. +numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified. +geom_timespan | The desired timespan for the geometry. Defaults to timespan matching numer_timespan if left unspecified. +target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area. Unit is square degrees WGS84. Set this to `0` if you want to use the smallest source geometry for this element of metadata, for example if you're passing in points. +target_geoms | Override global `target_geoms` for this element of metadata +max_timespan_rank | Only include timespans of this recency (for example, `1` is only the most recent timespan). No limit by default +max_score_rank | Only include boundaries of this relevance (for example, `1` is the most relevant boundary). Is `1` by default + +#### Returns + +Key | Description +--- | ----------- +valid | A boolean field that represents if the validation was succesful or not +errors | A text array with all the possible errors. + +#### Examples + +Validate metadata with two additional column of US census +data, using a boundary relevant for the geometry provided and latest timespan. +Limit to only the most recent column most relevant to the extent & density of +input geometries in `tablename`. + +```SQL +SELECT OBS_MetadataValidation( + ST_SetSRID(ST_Extent(the_geom), 4326), + ST_GeometryType(the_geom), + '[{"numer_id": "us.census.acs.B01003001"}, {"numer_id": "us.census.acs.B01001002"}]', + COUNT(*)::INTEGER +) FROM tablename +GROUP BY ST_GeometryType(the_geom) +``` + ## OBS_GetData(geomvals array[geomval], metadata json) The ```OBS_GetData(geomvals, metadata)``` function returns a measure and/or From 7e550cf909a19507c7534ffb44058a2f67cd82c5 Mon Sep 17 00:00:00 2001 From: csobier Date: Fri, 11 Aug 2017 08:00:00 -0400 Subject: [PATCH 2/2] applied quick copyedit to new docs code added --- doc/measures_functions.md | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/doc/measures_functions.md b/doc/measures_functions.md index dfa0261..6e6c2b7 100644 --- a/doc/measures_functions.md +++ b/doc/measures_functions.md @@ -323,47 +323,42 @@ SELECT OBS_GetMeta( ## OBS_MetadataValidation(extent geometry, geometry_type text, metadata json, target_geoms) -The ```OBS_MetadataValidation``` function peforms a validation check over -the known issues using the extent, type of geometry and metadata we're going -to use in the ```OBS_GetMeta``` function. +The ```OBS_MetadataValidation``` function performs a validation check over the known issues using the extent, type of geometry, and metadata that is being used in the ```OBS_GetMeta``` function. #### Arguments Name | Description ---- | ----------- extent | A geometry of the extent of the input geometries -geometry_type | The geometry type of the source data. -metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column -target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest. +geometry_type | The geometry type of the source data +metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optional additional parameters about that column +target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest The schema of the metadata input objects are as follows: Metadata Input Key | Description --- | ----------- -numer_id | The identifier for the desired measurement. If left blank, but a `geom_id` is specified, the column will return a geometry instead of a measurement. -geom_id | Identifier for a desired geographic boundary level to use when calculating measures. Will be automatically assigned if undefined. If defined but `numer_id` is blank, then the column will return a geometry instead of a measurement. -normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. Ignored if this metadata object specifies a geometry. -denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry. -numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified. -geom_timespan | The desired timespan for the geometry. Defaults to timespan matching numer_timespan if left unspecified. -target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area. Unit is square degrees WGS84. Set this to `0` if you want to use the smallest source geometry for this element of metadata, for example if you're passing in points. +numer_id | The identifier for the desired measurement. If left blank, a `geom_id` is specified and the column returns a geometry, instead of a measurement +geom_id | Identifier for a desired geographic boundary level used to calculate measures. If undefined, this is automatically assigned. If defined, `numer_id` is blank and the column returns a geometry, instead of a measurement +normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. If the metadata object specifies a geometry, this is ignored +denom_id | When `normalization` is 'denominated', this is the identifier for a desired normalization column. This is automatically assigned. If the metadata object specifies a geometry, this is ignored +numer_timespan | The desired timespan for the measurement. If left unspecified, it defaults to the most recent timespan available +geom_timespan | The desired timespan for the geometry. If left unspecified, it defaults to the timespan matching `numer_timespan` +target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area. Unit is square degrees WGS84. Set this to `0` if you want to use the smallest source geometry for this element of metadata. For example, if you are passing in points target_geoms | Override global `target_geoms` for this element of metadata -max_timespan_rank | Only include timespans of this recency (for example, `1` is only the most recent timespan). No limit by default -max_score_rank | Only include boundaries of this relevance (for example, `1` is the most relevant boundary). Is `1` by default +max_timespan_rank | Only include timespans of this recency (For example, `1` is only the most recent timespan). There is no limit by default +max_score_rank | Only include boundaries of this relevance (for example, `1` is the most relevant boundary). The default is `1` #### Returns Key | Description --- | ----------- -valid | A boolean field that represents if the validation was succesful or not -errors | A text array with all the possible errors. +valid | A boolean field that represents if the validation was successful or not +errors | A text array with all possible errors #### Examples -Validate metadata with two additional column of US census -data, using a boundary relevant for the geometry provided and latest timespan. -Limit to only the most recent column most relevant to the extent & density of -input geometries in `tablename`. +Validate metadata with two additional columns of US census data; using a boundary relevant for the geometry provided and the latest timespan. Limited to the most recent column, and the most relevant, based on the extent and density of input geometries in `tablename`. ```SQL SELECT OBS_MetadataValidation(