data-services/geocoder/admin1/README.md

206 lines
14 KiB
Markdown
Raw Normal View History

2015-06-24 23:43:53 +08:00
Administrative regions geocoder - Level 1
============
2014-08-22 05:08:45 +08:00
2015-06-24 23:43:53 +08:00
# Function
2014-08-22 05:08:45 +08:00
2014-09-01 07:36:10 +08:00
Accepts a list of terms. Terms are searched against the ```name_``` column in ```admin1_synonyms_qs```. The ```name_``` column is an automatically cleaned and populated column based on the raw values in ```name``` . The synonym table returns the proper global_id (based on rank values in table below). The global_id is then matched against the single row in the **adm1** table to return the correct polygon(s).
2014-08-22 05:08:45 +08:00
2015-06-24 23:43:53 +08:00
# Creation steps
2014-08-22 05:08:45 +08:00
1. Upload fresh Quattro Shapes admin1 data to `qs_adm1` table.
2. Upload fresh Quattro Shapes admin1 **region** data to `qs_adm1_region` table.
2014-08-22 23:16:00 +08:00
3. Upload fresh Natural Earth admin1 states provinces data to `ne_admin1_v3`
2015-06-24 23:43:53 +08:00
3. If fresh, add all `sql/indexes.sql` and `sql/triggers.sql`
4. Run the `sql/build_data_table.sql` script.
5. Run the `sql/build_admin1_synonyms.sql` script.
6. If needed, load or replace the function with `sql/geocoder.sql`
# Tables
Some tables involved in the creation of this geocoder which are not used by the geocoder functions are `qs_adm1`, `ne_admin1_v3` and `qs_adm1_region`, which are obtained directly from the source.
### admin1_synonyms
#### Table structure
````
Table "public.admin1_synonyms"
Column | Type | Modifiers | Storage | Stats target | Description
----------------------+--------------------------+-----------------------------------------------------------------------+----------+--------------+-------------
cartodb_id | integer | not null default nextval('untitled_table_1_cartodb_id_seq'::regclass) | plain | |
name | text | | extended | |
rank | double precision | | plain | |
created_at | timestamp with time zone | not null default now() | plain | |
updated_at | timestamp with time zone | not null default now() | plain | |
the_geom | geometry(Geometry,4326) | | main | |
the_geom_webmercator | geometry(Geometry,3857) | | main | |
adm0_a3 | text | | extended | |
name_ | text | | extended | |
global_id | integer | | plain | |
`````
#### Current indexes
````
Indexes:
"untitled_table_1_pkey" PRIMARY KEY, btree (cartodb_id)
"idx_admin1_synonyms_name" btree (name)
"idx_admin1_synonyms_name_" btree (name_)
"idx_admin1_synonyms_name_adm0" btree (name_, adm0_a3)
"idx_admin1_synonyms_rank" btree (rank)
"untitled_table_1_the_geom_idx" gist (the_geom)
"untitled_table_1_the_geom_webmercator_idx" gist (the_geom_webmercator)
`````
### adm1
#### Table structure
````
Table "public.adm1"
Column | Type | Modifiers | Storage | Stats target | Description
----------------------+--------------------------+------------------------------------------------------------------------+----------+--------------+-------------
cartodb_id | integer | not null default nextval('untitled_table_2_cartodb_id_seq1'::regclass) | plain | |
name | text | | extended | |
description | text | | extended | |
created_at | timestamp with time zone | not null default now() | plain | |
updated_at | timestamp with time zone | not null default now() | plain | |
the_geom | geometry(Geometry,4326) | | main | |
the_geom_webmercator | geometry(Geometry,3857) | | main | |
global_id | integer | | plain | |
````
#### Current indexes
````
Indexes:
"untitled_table_2_pkey1" PRIMARY KEY, btree (cartodb_id)
"untitled_table_2_the_geom_idx1" gist (the_geom)
"untitled_table_2_the_geom_webmercator_idx1" gist (the_geom_webmercator)
````
### admin1_synonyms
#### Table structure
````
Table "public.admin1_synonyms"
Column | Type | Modifiers | Storage | Stats target | Description
----------------------+--------------------------+-----------------------------------------------------------------------+----------+--------------+-------------
cartodb_id | integer | not null default nextval('untitled_table_1_cartodb_id_seq'::regclass) | plain | |
name | text | | extended | |
rank | double precision | | plain | |
created_at | timestamp with time zone | not null default now() | plain | |
updated_at | timestamp with time zone | not null default now() | plain | |
the_geom | geometry(Geometry,4326) | | main | |
the_geom_webmercator | geometry(Geometry,3857) | | main | |
adm0_a3 | text | | extended | |
name_ | text | | extended | |
global_id | integer | | plain | |
````
#### Current indexes
````
Indexes:
"untitled_table_1_pkey" PRIMARY KEY, btree (cartodb_id)
"idx_admin1_synonyms_name" btree (name)
"idx_admin1_synonyms_name_" btree (name_)
"idx_admin1_synonyms_name_adm0" btree (name_, adm0_a3)
"idx_admin1_synonyms_rank" btree (rank)
"untitled_table_1_the_geom_idx" gist (the_geom)
"untitled_table_1_the_geom_webmercator_idx" gist (the_geom_webmercator)
````
### admin1_decoder - planned deprecation
#### Table structure
`````
Table "public.admin1_decoder"
Column | Type | Modifiers | Storage | Stats target | Description
----------------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
name | text | | extended | |
admin1 | text | | extended | |
iso2 | text | | extended | |
geoname_id | integer | | plain | |
cartodb_id | integer | not null default nextval('admin1_decoder_cartodb_id_seq'::regclass) | plain | |
created_at | timestamp with time zone | not null default now() | plain | |
updated_at | timestamp with time zone | not null default now() | plain | |
the_geom | geometry(Geometry,4326) | | main | |
the_geom_webmercator | geometry(Geometry,3857) | | main | |
synonyms | text[] | | extended | |
iso3 | text | | extended | |
users | double precision | | plain | |
`````
#### Current indexes
`````
Indexes:
"admin1_decoder_pkey" PRIMARY KEY, btree (cartodb_id)
"admin1_decoder_the_geom_idx" gist (the_geom)
"idx_admin1_decoder_admin1" btree (admin1)
"idx_admin1_decoder_geoname_id" btree (geoname_id)
"idx_admin1_decoder_iso2" btree (iso2)
"idx_admin1_decoder_iso3" btree (iso3)
"idx_admin1_decoder_name" btree (name)
"the_geom_webmercator_3b4ba2fe_9d91_11e3_bb72_7054d21a95e5" gist (the_geom_webmercator)
`````
# Functions
2015-06-24 23:47:10 +08:00
## test_geocode_admin1_polygons
````
Schema | Name | Result data type | Argument data types | Type
--------+------------------------------+--------------------------------+--------------------------------+--------
public | test_geocode_admin1_polygons | SETOF geocode_admin_country_v1 | names text[], country text[] | normal
public | test_geocode_admin1_polygons | SETOF geocode_admin_v1 | name text[] | normal
public | test_geocode_admin1_polygons | SETOF geocode_admin_country_v1 | name text[], inputcountry text | normal
````
## geocode_admin1_polygons
````
Schema | Name | Result data type | Argument data types | Type
--------+-------------------------+--------------------------------+--------------------------------+--------
public | geocode_admin1_polygons | SETOF geocode_admin_country_v1 | names text[], country text[] | normal
public | geocode_admin1_polygons | SETOF geocode_admin_v1 | name text[] | normal
public | geocode_admin1_polygons | SETOF geocode_admin_v1 | name text[], inputcountry text | normal
````
2015-06-24 23:43:53 +08:00
# Data Sources
2014-08-22 05:08:45 +08:00
(see the wiki page: [Geocoder Data Sources #admin1-states-provinces](https://github.com/CartoDB/data-services/wiki/Geocoder-Datasources#admin1-statesprovinces))
2014-08-22 23:16:00 +08:00
- Quattro Shapes admin1 and admin1 region polygons are being used for geometry. Users dislike natural earth's small admin1 units in countries like Spain, Italy and France so we have replaced these smaller units with their parent regions.
2014-08-22 05:08:45 +08:00
2014-08-22 23:16:00 +08:00
- Natural Earth admin1 alternate name spellings will be used as synonyms when the Quattro Shapes `qs_source` = 'Natural Earth'.
2014-08-22 05:08:45 +08:00
2015-06-24 23:43:53 +08:00
# Admin1 Geometry Table
2014-09-01 07:36:10 +08:00
The table name is currently being called `adm1` and is built from a combination of data from Quattro Shapes: `qs_adm1` and `qs_adm1_region`. All countries that contain regional admin1 provinces use geometry from the `qs_adm1_region` table and not the `qs_adm1` table. This is an improvement made based on tickets/issues submitted by users when geocoding admin1 states / provinces.
2015-06-24 23:43:53 +08:00
# Admin1 Synonyms Table
2014-08-22 05:08:45 +08:00
The table contains the following columns to be populated:
1. **adm0_a3** : ISO code for the region. Used to get the unique geometry for the region in terms of the synonym.
2. **name**: Actually, the synonym you want to include for a specific region (identified ad adm0_a3).
4. **rank**: Rank of the synonym being matched to. 0 is highest.
5. **global_id** Unique identifier created in `build_data_table.sql`.
2015-06-24 23:43:53 +08:00
# Ranks
2014-08-22 05:08:45 +08:00
| rank number | origin data | origin column | description |
|-------------|-----------------------------|---------------|----------------------|
| 0 | Quattro Shapes | qs_a1 | default name for qs_adm1 |
| 0 | Quattro Shapes | qs_a1r | default name for qs_adm1_region |
| 1 | Quattro Shapes | qs_a1_lc | admin code |
2014-08-22 23:16:00 +08:00
| 2 | Natural Earth | name_alt | alternate spelling |
| 3 | Natural Earth | abbrev | abbreviation |
| 4 | Natural Earth | postal | postal code |
| 5 | Natural Earth | gn_name | formal english name |
| 6 | Natural Earth | woe_label | woe label name |
| 7 | TIGER | stusps | Abbreviation (USA only) |
2015-06-24 23:43:53 +08:00
# Known issues
* `admin1_decoder` table which is meant to be depreacted is being used in other geocoders, as namedplaces
2014-08-22 05:08:45 +08:00
2015-06-24 23:43:53 +08:00
# Historic:
* [24/06/2015]:
* Updated Readme: add information for tables, functions, indexes and known issues section
2015-06-25 00:03:29 +08:00
* Reviewed functions and [uploaded the ones in production](https://github.com/CartoDB/data-services/pull/151)