To be able to represent a large amount of data (say, hundred of thousands to millions of points) in a tile. This can be useful both for raster tiles (where the aggregation reduces the number of features to be rendered) and vector tiles (the tile contais less features).
Aggregation is available only for point geometries. During aggregation the points are grouped using a grid; all the points laying in the same cell of the grid are summarized in a single aggregated result point.
- The position of the aggregated point is controlled by the `placement` parameter.
- The aggregated rows always contain at least a column, named `_cdb_feature_count`, which contains the number of the original points that the aggregated point represents.
### Special default aggregation
When no placement or columns are specified a special default aggregation is performed.
This special mode performs only spatial aggregation (using a grid defined by the requested tile and the resolution, parameter, as all the other cases), and returns a _random_ record from each group (grid cell) with all its columns and an additional `_cdb_feature_count` with the number of features in the group.
The rationale behind having this special aggregation with all the original columns is to provide a mostly transparent way to handle large datasets without having to provide special map configurations for those cases (i.e. preserving the logic used to produce the maps with smaller datasets). [Overviews have been used so far with this intent](https://carto.com/docs/tips-and-tricks/back-end-data-performance/), but they are inflexible.
When either a explicit placement or columns are requested we no longer use the special, query; we use one determined by the placement (which will default to "centroid"), and it will have as columns only the aggregated columns specified, in addition to `_cdb_feature_count`, which is always present.
The aggregation parameters for a layer are defined inside an `aggregation` option of the layer:
```json
{
"layers": [
{
"options": {
"sql": "SELECT * FROM data",
"aggregation": {"...": "..."}
}
}
]
}
```
### `placement`
Determines the kind of aggregated geometry generated:
#### `point-sample`
This is the default placement. It will place the aggregated point at a random sample of the grouped points,
like the default aggregation does. No other attribute is sampled, though, the point will contain the aggregated attributes determined by the `columns` parameter.
#### `point-grid`
Generates points at the center of the aggregation grid cells (squares).
#### `centroid`
Generates points with the averaged coordinated of the grouped points (i.e. the points inside each grid cell).
### `columns`
The aggregated attributes defined by `columns` are computed by a applying an _aggregate function_ to all the points in each group.
Valid aggregate functions are `sum`, `avg` (average), `min` (minimum), `max` (maximum) and `mode` (the most frequent value in the group).
The values to be aggregated are defined by the _aggregated column_ of the source data. The column keys define the name of the resulting column in the aggregated dataset.
For example here we define three aggregate attributes named `total`, `max_price` and `price` which are all computed with the same column, `price`,
of the original dataset applying three different aggregate functions.
> Note that you can use the original column names as names of the result, but all the result column names must be unique. In particular, the names `cartodb_id`, `the_geom`, `the_geom_webmercator` and `_cdb_feature_count` cannot be used for aggregated columns, as they correspond to columns always present in the result.
Defines the cell-size of the spatial aggregation grid. This is equivalent to the [CartoCSS `-torque-resolution`](https://carto.com/docs/carto-engine/cartocss/properties-for-torque/#-torque-resolution-float) property of Torque maps.
The aggregation cells are `resolution`×`resolution` pixels in size, where pixels here are defined to be 1/256 of the (linear) size of a tile.
The default value is 1, so that aggregation coincides with raster pixels. A value of 2 would make each cell to be 4 (2×2) pixels, and a value of
0.5 would yield 4 cells per pixel. In teneral values less than 1 produce sub-pixel precision.
> Note that is independent of the number of pixels for raster tile or the coordinate resolution (mvt_extent) of vector tiles.
### `threshold`
This is the minimum number of (estimated) rows in the dataset (query results) for aggregation to be applied. If the number of rows estimate is less than the threshold aggregation will be disabled for the layer; the instantiation response will reflect that and tiles will be generated without aggregation.
For a given column, multiple conditions can be passed in an array; the conditions will logically ORed (any of the conditions have to be verifid for the value to be selected):
*`"myvalue": [ { "equal": 10 }, { "less_than": 0 }]` will select values of the column `myvalue` which are equal to 10 **or** less than 0.
In addition, the filters applied to different columns are logically combined with AND (all the conditions have to be satisfied for an element to be selected); for example with the following `filters` parameter we'll select aggregated records which have a `total_value` > 100 **and** a category equal to "a".
Note that the filtered columns have to be defined with the `columns` parameter, except for `_cdb_feature_count`, which is always implicitly defined and can be filtered too.