cartodb/doc/developer-center/import-api/guides/08-importing-geospatial-data.md
2020-06-15 10:58:47 +08:00

200 lines
14 KiB
Markdown

## Importing Geospatial Data
This section explains how importing a dataset creates columns in CARTO (and the naming conventions that you should use). It includes how CARTO guesses content during the import process, lists the supported geospatial formats for uploading data, and describes how to upload multilayer datasets or batch file uploads.
### Dataset Basics
When a file is imported, it is transformed into a dataset that can be processed by CARTO. The system automatically creates the following columns:
* **cartodb_id**
* This column is used as the primary key of the table
* Its values must be integers, non-null, and unique
* **the_geom**
* This column stores the main geometric features of a dataset in the EPSG 4326 projection
* **the_geom_webmercator**
* This column stores the geometries transformed into the EPSG 3857 projection, and is used for rendering purposes
* **_feature_count**
* This column is automatically created when overview representations of data are created (for datasets containing more than 500,000 points).
When a dataset is exported from CARTO, it includes the `cartodb_id` and `the_geom` columns, which will be reused if the dataset is then imported to the system. This ensures that importing an exported dataset contains the original exported dataset content and row order.
If these columns are generated by the user, the [CARTO table requirements](https://github.com/CartoDB/cartodb-postgresql/blob/master/doc/CartoDB-user-table.rst) must be followed in order to produce a successful import. Otherwise, importing datasets which do not meet the requirements (such as a dataset with duplicated integers in its `cartodb_id` column) will result in an import failure.
#### Naming
Apply the following naming conventions for datasets in CARTO, and avoid using the reserved words as part of your file names.
* Table names must begin with a letter (a-z). Otherwise, "table_" is prepended to the name
* Column names must begin with a letter (a-z) or an underscore (_)
* Column and table names can have a maximum of 63 characters. Names are trimmed if they exceed this length
##### Reserved Words
There are certain words reserved in the system that cannot be used to name columns or datasets, mainly the PostgreSQL reserved words. Any names that conflict with a reserved word are prefixed with an underscore (_) automatically.
### Import Guessing
CARTO includes guessing functionality during the import process. This is useful for when files or data are missing some upload information. The following guessing options are available:
* **Fields guessing**
For files whose format does not include type information (usually CSV files), field guessing options can be enabled. There are two particular guessing options for these type of files:
* **Type guessing**: determines the type of imported columns from the text contents, available in the CSV file. If enabled, it generates numeric and boolean columns when appropriate, otherwise, it uses regular string columns
* **Quoted fields guessing**: when disabled, avoids double quoted fields for type guessing. Otherwise, double quoted fields are used when enabled
* **Content Guessing**
Files that contain country, city, IP address information can be automatically geocoded by the system, if the content guessing option is enabled. This automatic geocoding only occurs if there is not a big proportion of repeated, or null values, in a column. Content guessing does not require the target columns to be named in a special way (such as "country" or "city"), CARTO inspects the different available columns and identifies which of them can be guessed geospatially.
**Tip:** For information about how to granularly configure the guessing options for your import process, view the upload file parameters on the standard tables section.
### Supported Geospatial Data Formats
CARTO supports several geospatial data formats to upload vector data. The important details of each format, as well as some guidelines to upload your files to CARTO, are defined in this section.
#### Shapefile
The Shapefile format is a multi-file format — it consists of a set of files with the same name and stored in the same directory, which are differentiated by their extension.
A Shapefile has to be formed, at least, by a .shp file, a .shx file, a .prj file and a .dbf file. These files contain the geometry data, the indexes, the projection information and the attributes, respectively. Other auxiliary files are not mandatory and contain extra information for the Shapefile. Shapefiles must be imported as a single compressed file, in the .zip or .gz format.
**Note:** The Shapefile format has certain limitations that can affect the way that your datasets are exported/imported into CARTO:
* The column name cannot exceed 10 characters. Exporting a dataset with longer names in this format will trim the names
* Date columns only support the date, not the time. Exporting and importing a date column as a Shapefile will remove all time information and maintain just the date. If you need to work with date and time data, it is recommended to export/import the information as a string and convert it to a date
* Although the projection of the file should be correctly determined and adjusted from the .prj file, it is recommended to upload Shapefiles in the EPSG 4326 projection
* For improved compatibility, ensure you save your Shapefile with encoding UTF-8, prior importing
#### Keyhole Markup Language (KML)
The KML format is a XML based format which adds to it a geographical meaning by being able to define features such as points, polygons or lines in the EPSG 4326 projection.
KML uses common XML types such as string, boolean, double, or int, so your column types will be respected when your dataset is imported or exported from CARTO.
Each feature is defined as a Placemark element, which usually contains a name, a description, and the geometry itself. If more data columns are required, these fields need to be defined and included inside a ExtendedData element of the KML document.
In terms of geometric elements, the Point, Polygon, Line, MultiGeometry and Geometry elements are supported. Different geometry types in the same layer are not supported.
#### KMZ
A Keyhole Markup language Zipped (KMZ) file corresponds to a compressed file, including a KML file and zero, or more, supporting files (images, icons, overlays or other elements referenced in the KML file). See the [Keyhole Markup Language (KML) section for more information](#keyhole-markup-language-kml).
#### GeoJSON
The GeoJSON format is an extension of the JavaScript Object Notation (JSON) that encodes geographical features and their metadata. This format supports data types such as string, double or boolean. Dates exported as GeoJSON are stored as strings and will be recognized as such, on data imports.
With respect to geometries, Points, (Multi)Polygons and (Multi)Lines are supported. GeometryCollection geometric objects are not supported and will raise an import error. The supported geometries can be imported inside FeatureCollection and Feature objects.
Importing different geometry types in a FeatureCollection element is not supported.
#### CSV
Comma-Separated Values (or TSV, Tab-Separated Values) files can be imported to CARTO. For a successful import, follow these formatting guidelines:
* The first line of the CSV file must contain the name of the columns
* The rest of the lines of the CSV file must follow the schema defined by the header column, in terms of number of columns
* To ensure correct parsing, it is recommended that string values are double-quoted
* If the data itself contains quotes, the values must be double-quoted and the internal quotes must be escaped
* CSV lines must be terminated with CR/LF, or LF line terminators. CR line terminators are not supported
###### Example: Quoted strings in a CSV
`````
name, description, score
"John Doe", "Awesome, the best player ever", 100
`````
###### Example: Escaped quotes in a CSV
````
name, geojson
"Null Island", "{""type"": ""Point"", ""coordinates"": [0,0]}"
````
##### CSV Format Guessing
As the CSV format does not specify the type of the columns in the data, CARTO applies a guessing functionality that converts your data to columns, using a supported format. This enables you to generate numeric columns, or geocode your dataset directly on import.
There are two particular guessing options for CSV files: types guessing and quoted fields guessing. View the [Import Guessing](#import-guessing) section for details.
#### Spreadsheets (Excel or OpenDocument)
Excel files, or other spreadsheets (such as OpenDocument spreadsheets or Google Drive spreadsheets) are supported by CARTO.
The format of the uploaded Spreadsheet must apply the following format:
* The first row must contain the names for each column
* Merged cells are not supported
* Graphs, charts, or other kind of elements are not supported
For multi-sheet spreadsheets, only the first sheet will be imported. For the case of Google Drive the maximum size of the spreadhseet is limited to 10MB.
#### GPX
The GPX (GPS Exchange Format) files are XML documents that contain waypoints, tracks and/or routes. When importing a GPX file, CARTO will generate different datasets for points, tracks and waypoints. The resulting names of these datasets will be a combination of the GPX name and their type: `_track_points`, `_tracks`, and `_waypoints`, respectively.
#### OSM
CARTO supports importing Open Street Map dumps (.osm files). These files are XML documents that have a `osm` parent element that can contain blocks of nodes, ways, or relations representing points, lines or polygons. CARTO will automatically separate OSM dumps into different tables, depending on the geometry. Therefore, importing a single OSM file can lead to more than one resulting dataset.
#### MapInfo
The MapInfo file format is geospatial vector data developed by MapInfo, which supports grids based multiple files. MapInfo files (.DAT, .ID, .MAP, .TAB) must be imported as a single compressed file, in the .zip or .gz format.
##### CARTO
CARTO files are CARTO generated map visualization files. This .carto file includes the dataset and visualization definition, which contains any SQL queries, CartoCSS, basemaps, attributions, metadata, and styling that was applied to a map. This is useful for downloading complete CARTO visualizations that you can share or import.
#### GeoPackage
GeoPackage (GPKG) files are an [open standard format](https://www.geopackage.org/) for spatial data. The format supports multiple layers, and all the geometry types used by CARTO: Points, (Multi)Polygons and and (Multi)Lines. GeoPackage files can be imported as an uncompressed .GPKG file or as a compressed .ZIP file. Each layer (up to 50) in the GeoPackage file will be imported as a separate CARTO table.
### FileGeodatabase
File Geodatabase (GDB) is a proprietary [Esri format](http://desktop.arcgis.com/en/arcmap/10.3/manage-data/administer-file-gdbs/file-geodatabases.htm) for spatial data. The GDB format is a directory with a `.gdb` extension containing the data files, so for download and upload a zip file containing the directory is used, either with a `.zip` or a `.gdb.zip` extension. Each layer (up to 50) in the GDB file will be imported as a separate CARTO table.
**Note:** The "[personal geodatabase](http://desktop.arcgis.com/en/arcmap/latest/manage-data/administer-file-gdbs/personal-geodatabases.htm)" (having a `.mdb` extension) format used by ArcGIS 8 and ArcGIS 9 is not supported by CARTO.
### Multilayer Uploads
Several of the formats supported by CARTO can store different layers, or geometric types, by definition. Importing a file that contains more than one layer result in different imported datasets.
If the option `create_vis` is enabled in the import process, the different layers imported will be added to the created map. The number of layers that can be included in a map depends on the maximum value of layers per map in the configuration of the user.
The maximum number of datasets created from a multilayer file is 10. If the imported file contains more than 10 layers, those layers are omitted.
**Important note:** The "[personal geodatabase](http://desktop.arcgis.com/en/arcmap/latest/manage-data/administer-file-gdbs/personal-geodatabases.htm)" (having a `.mdb` extension) format used by ArcGIS 8 and ArcGIS 9 is not supported by CARTO.
#### Shapefile
The different layers included in a Shapefile are imported as independent datasets.
#### KML Files
KML files generate a different dataset, per each Folder, that they contain.
#### GPX Files
GPX files that contain more than one type of elements (waypoints, tracks, and/or routes) are imported in a different dataset, per type.
#### OSM Files
OSM files generate a different layer, per each type of geometry that their nodes, ways, or relations represent (points, polygons or lines).
#### GeoPackage
GPKG files generate a different dataset for each layer in the file, up to 50.
#### File Geodatabase
GDB files generate a different dataset for each layer in the file, up to 50.
### Multiple File Uploads
You can perform a batch file upload if the files are sent to the server in a compressed format. As with the case of multilayer uploads, if the import process is configured to generate a map after import, the different datasets are added as layers to the new map. The number of layers that can be included in a map depends on the maximum value of layers allotted to the users account.
The maximum number of files that can be imported in a single file is 10. If the compressed file contains more than 10 files, only the first 10 files are imported and the rest of the files are omitted.