doc edits

2024-11-01 10:20:48 +08:00 · 2016-06-24 17:00:08 -04:00 · 2016-06-24 17:00:08 -04:00 · de7d56dc29
commit de7d56dc29
parent 642935e44a
1 changed files with 31 additions and 33 deletions
--- a/doc/12_segmentation.md
+++ b/doc/12_segmentation.md
@ -1,7 +1,7 @@

 ## Segmentation Functions

-### CDB_CreateAndPredictSegment (query TEXT,variable_name TEXT,target_query TEXT)
+### CDB_CreateAndPredictSegment (query TEXT, variable_name TEXT, target_query TEXT)

 This function trains a [Gradient Boosting](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) model to attempt to predict the target data and then generates predictions for new data.  

@ -9,14 +9,14 @@ This function trains a [Gradient Boosting](http://scikit-learn.org/stable/module

 | Name | Type | Description |
 |------|------|-------------|
-| query | TEXT | The input query to train the algorithum, should have both the variable of interest and the features that will be used to predict it|
-| variablei\_name| TEXT | Specify the variable in the query to predict, all other columns are assumed to be features |
-| target\_table | TEXT | The query which returns the cartodb\_id and features for the rows your would like to predict the target variable for |
-| n\_estimators(optional) |  INTEGER DEFAULT 1200| Number of estimators to be used | 
-| max\_depth (optional) | INTEGER DEFAULT 3 | Max tree depth |
-| subsample (optional)  |  DOUBLE PRECISION DEFAULT 0.5 | Subsample paramter for GradientBooster|
-| learning\_rate(optional) |  DOUBLE PRECISION DEFAULT 0.01| Learning rate for the GradientBooster|
-| min\_samples\_leaf(optional) |  INTEGER DEFAULT 1 | Minimum samples to use per leaf|
+| query | TEXT | The input query to train the algorithm, which should have both the variable of interest and the features that will be used to predict it |
+| variable\_name| TEXT | Specify the variable in the query to predict, all other columns are assumed to be features |
+| target\_table | TEXT | The query which returns the `cartodb_id` and features for the rows your would like to predict the target variable for |
+| n\_estimators (optional) | INTEGER DEFAULT 1200 | Number of estimators to be used. Values should be between 1 and x. |
+| max\_depth (optional) | INTEGER DEFAULT 3 | Max tree depth. Values should be between 1 and n. |
+| subsample (optional)  | DOUBLE PRECISION DEFAULT 0.5 | Subsample parameter for GradientBooster. Values should be within the range 0 to 1. |
+| learning\_rate (optional) | DOUBLE PRECISION DEFAULT 0.01 | Learning rate for the GradientBooster. Values should be between 0 and 1 (??) |
+| min\_samples\_leaf (optional) | INTEGER DEFAULT 1 | Minimum samples to use per leaf. Values should range from x to y |

 #### Returns

@ -24,21 +24,20 @@ A table with the following columns.

 | Column Name | Type | Description |
 |-------------|------|-------------|
-| cartodb\_id | INTEGER | The CartoDB id of the row in the target\_query|
+| cartodb\_id | INTEGER | The CartoDB id of the row in the target\_query |
 | prediction | NUMERIC | The predicted value of the variable of interest |
 | accuracy | NUMERIC | The mean squared accuracy of the model. |

 #### Example Usage

 ```sql
-SELECT * from cdb_crankshaft.CDB_CreateAndPredictSegment( 
-'SELECT agg, median_rent::numeric, male_pop::numeric, female_pop::numeric from late_night_agg',
+SELECT * from cdb_crankshaft.CDB_CreateAndPredictSegment(
+'SELECT agg, median_rent::numeric, male_pop::numeric, female_pop::numeric FROM late_night_agg',
 'agg',
-'select ROW_NUMBER ( ) over ()  as cartodb_id, median_rent, male_pop, female_pop from ml_learning_ny');                               
+'SELECT row_number() OVER () As cartodb_id, median_rent, male_pop, female_pop FROM ml_learning_ny');                               
 ```

-
-### CDB_CreateAndPredictSegment (target NUMERIC[],train_features NUMERIC[], prediction_features Numeric[], prediction_ids NUMERIC[])
+### CDB_CreateAndPredictSegment(target numeric[], train_features numeric[], prediction_features numeric[], prediction_ids numeric[])

 This function trains a [Gradient Boosting](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) model to attempt to predict the target data and then generates predictions for new data.  

@ -47,11 +46,10 @@ This function trains a [Gradient Boosting](http://scikit-learn.org/stable/module

 | Name | Type | Description |
 |------|------|-------------|
-| target | NUMERIC[] | An array of target values of the variable you want to predict|
-| train\_features| NUMERIC[] | 1D array of length nfeatures\* n\_rows + 1 with the first entry in the array being the number of features in each row. These are the features the model will be trained on. CDB\_Crankshaft.CDB_pyAgg(Array[freature1, feature2, feature3]::Numeric[]) can be used to construct this. |
-| prediction\_features | NUMERIC[]] | 1D array of length nfeatures\* n\_rows\_ + 1 with the first entry in the array being the number of features in each row. These are the features that will be used to predict the target variable  CDB\_Crankshaft.CDB\_pyAgg(Array[freature1, feature2, feature3]::Numeric[]) can be used to construct this.  |
-
-| prediction\_ids | NUMERIC[]] | 1D array of length n\_rows with the ids that can use used to rejoin the data with inputs |
+| target | numeric[] | An array of target values of the variable you want to predict|
+| train\_features| numeric[] | 1D array of length n features \* n\_rows + 1 with the first entry in the array being the number of features in each row. These are the features the model will be trained on. CDB\_Crankshaft.CDB_pyAgg(Array[feature1, feature2, feature3]::numeric[]) can be used to construct this. |
+| prediction\_features | numeric[] | 1D array of length nfeatures\* n\_rows\_ + 1 with the first entry in the array being the number of features in each row. These are the features that will be used to predict the target variable  CDB\_Crankshaft.CDB\_pyAgg(Array[feature1, feature2, feature3]::numeric[]) can be used to construct this.  |
+| prediction\_ids | numeric[] | 1D array of length n\_rows with the ids that can use used to re-join the data with inputs |

 #### Returns

@ -62,23 +60,23 @@ A table with the following columns.
 | cartodb\_id | INTEGER | The CartoDB id of the row in the target\_query|
 | prediction | NUMERIC | The predicted value of the variable of interest |
 | accuracy | NUMERIC | The mean squared accuracy of the model. |
-| n\_estimators(optional) |  INTEGER DEFAULT 1200| Number of estimators to be used | 
+| n\_estimators (optional) | INTEGER DEFAULT 1200 | Number of estimators to be used |
 | max\_depth (optional) | INTEGER DEFAULT 3 | Max tree depth |
-| subsample (optional)  |  DOUBLE PRECISION DEFAULT 0.5 | Subsample paramter for GradientBooster|
-| learning\_rate(optional) |  DOUBLE PRECISION DEFAULT 0.01| Learning rate for the GradientBooster|
-| min\_samples\_leaf(optional) |  INTEGER DEFAULT 1 | Minimum samples to use per leaf|
+| subsample (optional)  | DOUBLE PRECISION DEFAULT 0.5 | Subsample parameter for GradientBooster|
+| learning\_rate (optional) | DOUBLE PRECISION DEFAULT 0.01 | Learning rate for the GradientBooster |
+| min\_samples\_leaf (optional) | INTEGER DEFAULT 1 | Minimum samples to use per leaf |

 #### Example Usage

 ```sql
-WITH training AS ( 
-    SELECT array_agg(agg) AS target, 
-          cdb_crankshaft.CDB_PyAgg(Array[median_rent, male_pop, female_pop]::Numeric[]) AS features
-    FROM  late_night_agg), 
+WITH training As (
+    SELECT array_agg(agg) As target,
+           cdb_crankshaft.CDB_PyAgg(Array[median_rent, male_pop, female_pop]::Numeric[]) As features
+    FROM late_night_agg),
 target AS (
-    SELECT cdb_crankshaft.CDB_PyAgg(Array[median_rent, male_pop, female_pop]::Numeric[]) AS features,
-     array_agg(cartodb_id ) AS cartodb_ids FROM late_night_agg)  
+    SELECT cdb_crankshaft.CDB_PyAgg(Array[median_rent, male_pop, female_pop]::Numeric[]) As features,
+     array_agg(cartodb_id) As cartodb_ids FROM late_night_agg)  

-SELECT cdb_crankshaft.CDB_CreateAndPredictSegment2(training.target, training.features, target.features, targetcartodb_ids) 
-FROM  training, target;                                                                       
-`````
+SELECT cdb_crankshaft.CDB_CreateAndPredictSegment2(training.target, training.features, target.features, target.cartodb_ids)
+FROM training, target;
+```