data-services/geocoder/admin0/README.md

84 lines
4.3 KiB
Markdown
Raw Normal View History

2014-08-05 06:29:28 +08:00
Admin0 Geocoder
===============
### Function
Accepts a list of terms. Terms are searched against the ```name_``` column in ```admin0_synonyms```. The ```name_``` column is an automatically cleaned and populated column based on the raw values in ```name_``` . The synonym table returns the proper ISO code (based on rank values in table below). The iso code is then matched against the single row in ```ne_admin0_v3``` to return the polygon.
### Creation steps
1. Upload fresh NaturalEarth data to ```ne_admin0_v3```.
2. Delete all rows in the ```admin0_synonyms``` table.
3. If fresh, add all sql/indexes.sql and sql/triggers.sql
2014-08-05 06:29:28 +08:00
4. Upload the data/wikipedia_countries_native_names.csv table if it doesn't already exist
5. Run the sql/subdivide_polygons.sql
6. Run the sql/build_synonym_table.sql
7. If needed, load or replace the function with sql/geocoder.sql
### Data Sources
(see the wiki page: [Geocoder Data Sources #admin0-countries](https://github.com/CartoDB/data-services/wiki/Geocoder-Datasources#admin0-countries))
- natural earth data: ne_10m_admin_0_countries (version 3.0) which is currently stored in Geocoding.CartoDB as ne_admin0_v3
- native language spellings were gathered from http://en.wikipedia.org/wiki/List_of_countries_and_dependencies_and_their_capitals_in_native_languages and stored in data/wikipedia_countries_native_names.csv
### Preparation details
2014-08-22 05:09:11 +08:00
Users dislike the NaturalEarth aggregation of French regions into the mainland France polygon. We have done a minimal amount of subdivision. This can be done by executing `sql/subdivide_polygons.sql`.
2014-08-05 06:29:28 +08:00
## Admin0_synonyms
2014-08-13 00:27:18 +08:00
In order to add new entries manually for admin0, table [admin0_synonym_additions](https://geocoding.cartodb.com/tables/admin0_synonym_additions) has been created.
2014-08-05 06:29:28 +08:00
2014-08-13 00:27:18 +08:00
The table contains the following columns to be populated:
2014-08-13 00:27:18 +08:00
1. **adm0_a3** : ISO code for the region. Used to get the unique geometry for the region in terms of the synonym.
2014-08-13 00:27:18 +08:00
2. **name**: Actually, the synonym you want to include for a specific region (identified ad adm0_a3).
2014-08-22 05:09:11 +08:00
3. **notes:** Extra information as the source of the data. Use: 'data source: X'.
4. **rank:** Use '10' for manually curated additions.
2014-08-05 06:29:28 +08:00
2014-08-13 00:27:18 +08:00
The following query can be used:
2014-08-13 00:27:18 +08:00
````
INSERT INTO admin0_synonym_additions (adm0_a3, name, notes, rank) VALUES ($iso3_code, $synonym, $notes, 10)
2014-08-13 00:27:18 +08:00
````
2014-08-14 00:43:20 +08:00
**Note:** If you have a complete dataset of synonyms to be included, you will need to add it as part of the build script. If you need to add single entries for synonyms, they can be included in the `admin0_synonym_additions` table manually (or using the previously defined SQL query).
2014-08-05 06:29:28 +08:00
### Ranks
| rank number | origin data | origin column | description |
|-------------|-----------------------------|---------------|----------------------|
| 0 | natural earth 10m countries | name | literal name |
| 1 | natural earth 10m countries | name_alt | alternate name |
| 2 | wiki country navive names | country_endonym | local variation |
| 3 | natural earth 10m countries | adm0_a3 | 3 digit country code |
2015-01-14 23:49:45 +08:00
| 4 | natural earth 10m countries | iso_a2 | 2 digit country code |
2014-08-05 06:29:28 +08:00
| 5 | natural earth 10m countries | formal_en | formal english |
| 6 | natural earth 10m countries | brk_name | ? |
| 7 | natural earth 10m countries | formal_fr | formal french |
2015-01-14 23:49:45 +08:00
| 8 | natural earth 10m countries | abbrev | abbreviation |
| 9 | natural earth 10m countries | subunit | complete literal name |
| 10 | admin0_synonym_additions | n/a | manually curated additions |
2014-08-05 06:29:28 +08:00
__notes:__
- The column `adm0_a3` will be used as a unique identifier.
- The ranks are somewhat arbitrarily organized and should be modified later based on our users use of the geocoder (will users more commonly geocode an adm0_a3 or abbreviation?)
2014-09-20 06:15:20 +08:00
- I also forgot to assign a `rank` of `2` to a synonym.
# Admin0 Synonym Service
If you need to look up the iso code for any list of countries without returning any geometries, you can use the endpoint defined in sql/synonym_service.sql. An example works like this,
```sql
SELECT (admin0_synonym_lookup(Array['United States', 'ESP'])).*
```