data-services/geocoder/postal-codes/README.md
Greg Allen 786ead424b update french postalcode table name
the french postal code table name has changed to `codes_postaux_region`
updated documentation for clarity
😎
2015-12-02 13:27:37 -05:00

11 KiB

Postal code geocoder

This section is divided in geocoding postal codes as points or as polygons. Each option has its own sources and its own functions which are described below.

Postal code geocoder: Polygons

Function

By following the next steps a table is populated with zipcodes from Australia, Canada, USA and France (identified by iso3) related with their spatial location in terms of polygons.

Usage example

SELECT (geocode_postalcode_polygons(Array['11211'],Array['USA'])).*

Creation steps

  1. Import the four files attached in the section "Datasources" for Australia (doc table), Canada (gfsa000a11a_e table), USA (tl_2013_us_zcta510 table) and France (codes_postaux table, renamed from codes_postaux_region).

  2. Run sql/build_data_table.sql. Notice that table postal_code_polygons should exist in advance with columns: the_geom, adm0_a3 and postal_code.

Tables

postal_code_polygons

Table structure

                                                              Table "public.postal_code_polygons"
        Column        |           Type           |                               Modifiers                               | Storage  | Stats target | Description
----------------------+--------------------------+-----------------------------------------------------------------------+----------+--------------+-------------
 cartodb_id           | integer                  | not null default nextval('untitled_table_2_cartodb_id_seq'::regclass) | plain    |              |
 postal_code          | text                     |                                                                       | extended |              |
 adm0_a3              | text                     |                                                                       | extended |              |
 created_at           | timestamp with time zone | not null default now()                                                | plain    |              |
 updated_at           | timestamp with time zone | not null default now()                                                | plain    |              |
 the_geom             | geometry(Geometry,4326)  |                                                                       | main     |              |
 the_geom_webmercator | geometry(Geometry,3857)  |                                                                       | main     |              |

Current indexes

Indexes:
    "untitled_table_2_pkey" PRIMARY KEY, btree (cartodb_id)
    "idx_postal_code_polygons_a3_code" UNIQUE, btree (adm0_a3, postal_code)
    "untitled_table_2_the_geom_idx" gist (the_geom)
    "untitled_table_2_the_geom_webmercator_idx" gist (the_geom_webmercator)

geocode_postalcode_polygons

 Schema |            Name             |          Result data type           |          Argument data types          |  Type  
--------+-----------------------------+-------------------------------------+---------------------------------------+--------
 public | geocode_postalcode_polygons | SETOF geocode_postalint_country_v1  | code integer[], inputcountries text[] | normal
 public | geocode_postalcode_polygons | SETOF geocode_namedplace_v1         | code text[]                           | normal
 public | geocode_postalcode_polygons | SETOF geocode_namedplace_country_v1 | code text[], inputcountries text[]    | normal
 public | geocode_postalcode_polygons | SETOF geocode_namedplace_v1         | code text[], inputcountry text        | normal

Response data types

  • geocode_namedplace_country_v1: CREATE TYPE geocode_namedplace_country_v1 AS (q TEXT, c TEXT, geom GEOMETRY, success BOOLEAN);
  • geocode_namedplace_v1: CREATE TYPE geocode_namedplace_v1 AS (q TEXT, geom GEOMETRY, success BOOLEAN);
  • geocode_postalint_country_v1: CREATE TYPE geocode_postalint_country_v1 AS (q INT, c TEXT, geom GEOMETRY, success BOOLEAN);

Data Sources

Postal code geocoder: Points

Function

By following the next steps a table is populated with zipcodes of different countries (identified by iso3) related with their spatial location in terms of points.

This dataset includes data for the following countries:

CH, ES, GU, ZA, MX, SJ, NL, RU, AX, TH, AR, MY, RE, LK, GB, IS, GL, JE, DK, IN,
SI, GP, MQ, BR, SM, BG, NZ, MP, CZ, DO, MD, PK, TR, VI, BD, GG, LT, PM, MC, US,
IT, LU, SK, LI, PR, IM, NO, PT, PL, FI, JP, CA, DE, HU, PH, SE, VA, YT, MK, FR,
MH, RO, FO, GF, AD, HR, DZ, GT, AU, AS, BE, AT

Usage example

SELECT (geocode_postalcode_points(Array['03204'],Array['ESP'])).*

Creation steps

  1. Download the allCountries.zip file from GeoNames. Import and rename the table as tmp_zipcode_points. You can follow the manual process explained below instead.

The columns that are loaded are the following ones: field_1: corresponding to ISO2 field_10: corresponds to latitude field_11: corresponds to longitude field_2: corresponds to ZIP code

  1. Georeference the table using field11 as longitude and field10 as latitude in order to construct the_geom.

  2. Add column iso3 (text) and run sql/build_zipcode_points_table.sql.

Alternative manual process for importing and preprocessing

  1. Open the allCountries.txt file with Excel an add a new row on top. Delete columns C-I and L.

  2. In the first row, add the following columns: iso2, zipcode, lat, long.

  3. Import the file ignoring step 2.

Tables

postal_code_points

Table structure

                                                                Table "public.postal_code_points"
        Column        |           Type           |                               Modifiers                                | Storage  | Stats target | Description
----------------------+--------------------------+------------------------------------------------------------------------+----------+--------------+-------------
 cartodb_id           | integer                  | not null default nextval('untitled_table_2_cartodb_id_seq2'::regclass) | plain    |              |
 adm0_a3              | text                     |                                                                        | extended |              |
 postal_code          | text                     |                                                                        | extended |              |
 created_at           | timestamp with time zone | not null default now()                                                 | plain    |              |
 updated_at           | timestamp with time zone | not null default now()                                                 | plain    |              |
 the_geom             | geometry(Geometry,4326)  |                                                                        | main     |              |
 the_geom_webmercator | geometry(Geometry,3857)  |                                                                        | main     |              |

Current indexes

    "untitled_table_2_pkey2" PRIMARY KEY, btree (cartodb_id)
    "untitled_table_2_the_geom_idx2" gist (the_geom)
    "untitled_table_2_the_geom_webmercator_idx2" gist (the_geom_webmercator)

geocode_postalcode_points

 Schema |           Name            |          Result data type          |          Argument data types          |  Type  
--------+---------------------------+------------------------------------+---------------------------------------+--------
 public | geocode_postalcode_points | SETOF geocode_postalint_country_v1 | code integer[], inputcountries text[] | normal
 public | geocode_postalcode_points | SETOF geocode_namedplace_v1        | code text[]                           | normal
 public | geocode_postalcode_points | SETOF geocode_place_country_iso_v1 | code text[], inputcountries text[]    | normal
 public | geocode_postalcode_points | SETOF geocode_namedplace_v1        | code text[], inputcountry text        | normal

Data Sources

Geocoder coverage map

Map

Known issues:

Known deficiencies of the service

  • For the USA polygon zipcode service, Zipcode Tabulation Areas (ZCTA) are being used which don't correspond to actual zipcode regions.

  • Regarding the point geocoder service, being offered from GeoNames data: we've detected that the accuracy for a big section of zipcodes is not as good as intended, as GeoNames interpolates zipcode-populated place information. As an example, in the case of Madrid, Spain, all the zipcodes belonging to the city are geocoded in the centroid of the city itself. This issue can be spotted easily by comparing interesecting zipcode points: SELECT the_geom, the_geom_webmercator, COUNT(*), adm0_a3 FROM postal_code_points GROUP BY the_geom, the_geom_webmercator, adm0_a3 HAVING COUNT(*) > 1 order by count(*)

    In this case, we conclude that most affected countries are Portugal, Mexico, Spain, Netherlands, Czech Republic or Slovakia, meanwhile Brazil doesn't show intersecting values.

    The visual result of intersecting zipcodes is demonstrated in the following figure: Duplicates

Historic:

  • [01/12/2015]:
    • Removed geocoder function. Check /extensions instead.
  • [30/10/2015]:
  • [19/10/2015]:
    • Updates readme with usage examples and setup scripts
  • [08/10/2015]:
    • Adds response data types
  • [15/07/2015]:
    • Adds basic tests for postal codes polygons
  • [02/07/2015]:
    • Adds known deficiencies and coverage map
  • [24/06/2015]:
    • Updated readme.md: added information about tables, function, indexes and the known issues section.
    • Review and upload functions