crankshaft/doc/21_gwr.md
2017-01-09 10:09:25 -05:00

9.8 KiB
Raw Blame History

Regression

Predictive geographically weighted regression (GWR)

Predictive GWR generates estimates of the dependent variable at locations where it has not been observed. It predicts these unknown values by first using the GWR model estimation analysis with known data values of the dependent and independent variables sampled from around the prediction location(s) to build a geographically weighted, spatially-varying regression model. It then uses this model and known values of the independent variables at the prediction locations to predict the value of the dependent variable where it is otherwise unknown.

For predictive GWR to work, a dataset needs known independent variables, some known dependent variables, and some unknown dependent variables. The dataset also needs to have geometry data (e.g., point, lines, or polygons).

Arguments

Name Type Description
subquery TEXT SQL query that expose the data to be analyzed (e.g., SELECT * FROM regression_inputs). This query must have the geometry column name (see the optional geom_col for default), the id column name (see id_col), and the dependent (dep_var) and independent (ind_vars) column names.
dep_var TEXT Name of the dependent variable in the regression model
ind_vars TEXT[] Text array of independent variable column names used in the model to describe the dependent variable.
bw (optional) NUMERIC Value of bandwidth. If NULL then select optimal (default).
fixed (optional) BOOLEAN True for distance based kernel function and False (default) for adaptive (nearest neighbor) kernel function. Defaults to False.
kernel (optional) TEXT Type of kernel function used to weight observations. One of gaussian, bisquare (default), or exponential.

Returns

Column Name Type Description
coeffs JSON JSON object with parameter estimates for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the parameter estimate.
stand_errs JSON Standard errors for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the respective standard errors.
t_vals JSON T-values for each of the dependent variables. The keys of the JSON object are the dependent variable names, with values corresponding to the respective t-value.
predicted NUMERIC predicted value of y
residuals NUMERIC residuals of the response
r_squared NUMERIC R-squared for the parameter fit
bandwidth NUMERIC bandwidth value consisting of either a distance or N nearest neighbors
rowid INTEGER row id of the original row

Example Usage

SELECT
  g.cartodb_id,
  g.the_geom,
  g.the_geom_webmercator,
  (gwr.coeffs->>'pctblack')::numeric as coeff_pctblack,
  (gwr.coeffs->>'pctrural')::numeric as coeff_pctrural,
  (gwr.coeffs->>'pcteld')::numeric as coeff_pcteld,
  (gwr.coeffs->>'pctpov')::numeric as coeff_pctpov,
  gwr.residuals
FROM cdb_crankshaft.CDB_GWR_Predict('select * from g_utm'::text,   
  'pctbach'::text,
  Array['pctblack', 'pctrural', 'pcteld', 'pctpov']) As gwr
JOIN g_utm as g
on g.cartodb_id = gwr.rowid

Note: See PostgreSQL syntax for parsing JSON objects.

Geographically weighted regression model estimation

This analysis generates the model coefficients for a geographically weighted, spatially-varying regression. The model coefficients, along with their respective statistics, allow one to make inferences or describe a dependent variable based on a set of independent variables. Similar to traditional linear regression, GWR takes a linear combination of independent variables and a known dependent variable to estimate an optimal set of coefficients. The model coefficients are spatially varying (controlled by the bandwidth and fixed parameters), so that the model output is allowed to vary from geometry to geometry. This allows GWR to capture non-stationarity -- that is, how local processes vary over space. In contrast, coefficients obtained from estimating a traditional linear regression model assume that processes are constant over space.

Arguments

Name Type Description
subquery TEXT SQL query that expose the data to be analyzed (e.g., SELECT * FROM regression_inputs). This query must have the geometry column name (see the optional geom_col for default), the id column name (see id_col), dependent and independent column names.
dep_var TEXT name of the dependent variable in the regression model
ind_vars TEXT[] Text array of independent variables used in the model to describe the dependent variable
bw (optional) NUMERIC Value of bandwidth. If NULL then select optimal (default).
fixed (optional) BOOLEAN True for distance based kernel function and False for adaptive (nearest neighbor) kernel function (default). Defaults to false.
kernel TEXT Type of kernel function used to weight observations. One of gaussian, bisquare (default), or exponential.

Returns

Column Name Type Description
coeffs JSON JSON object with parameter estimates for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the parameter estimate.
stand_errs JSON Standard errors for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the respective standard errors.
t_vals JSON T-values for each of the dependent variables. The keys of the JSON object are the dependent variable names, with values corresponding to the respective t-value.
predicted NUMERIC predicted value of y
residuals NUMERIC residuals of the response
r_squared NUMERIC R-squared for the parameter fit
bandwidth NUMERIC bandwidth value consisting of either a distance or N nearest neighbors
rowid INTEGER row id of the original row

Example Usage

SELECT
  g.cartodb_id,
  g.the_geom,
  g.the_geom_webmercator,
  (gwr.coeffs->>'pctblack')::numeric as coeff_pctblack,
  (gwr.coeffs->>'pctrural')::numeric as coeff_pctrural,
  (gwr.coeffs->>'pcteld')::numeric as coeff_pcteld,
  (gwr.coeffs->>'pctpov')::numeric as coeff_pctpov,
  gwr.residuals
FROM cdb_crankshaft.CDB_GWR('select * from g_utm'::text, 'pctbach'::text, Array['pctblack', 'pctrural', 'pcteld', 'pctpov']) As gwr
JOIN g_utm as g
on g.cartodb_id = gwr.rowid

Note: See PostgreSQL syntax for parsing JSON objects.

Advanced reading

GWR for prediction

GWR in application