Merge branch 'develop' into moran-query-ordering-fix

2016-09-23 13:25:36 -04:00 · 2016-09-23 13:25:36 -04:00 · 07e4062237
commit 07e4062237
parent 1148aa417a c7bb9698a9
114 changed files with 18496 additions and 82 deletions
--- a/.brackets.json
+++ b/.brackets.json
@ -0,0 +1,3 @@
 {
    "sbruchmann.staticpreview.basepath": "/home/carto/Projects/crankshaft/"
 }
--- a/NEWS.md
+++ b/NEWS.md
@ -1,3 +1,20 @@
 0.4.2 (2016-09-22)
 ------------------
 * Bugfix for cdb_areasofinterestglobal: import correct modules
 0.4.1 (2016-09-21)
 ------------------
 * Let the user set the resolution in CDB_Contour function
 * Add Nearest Neighbors method to CDB_SpatialInterpolation
 * Improve error reporting for moran and markov functions
 0.4.0 (2016-08-30)
 ------------------
 * Add CDB_Contour
 * Add CDB_PIA
 * Add CDB_Densify
 * Add CDB_TINmap
 0.3.1 (2016-08-18)
 ------------------
 * Fix Voronoi projection issue
--- a/RELEASE.md
+++ b/RELEASE.md
@ -15,11 +15,12 @@ shall be performed by the designated *Release Manager*.
 1. Generate an upgrade path from the previous to the next release by copying the generated release file. E.g:
  ```shell
-  cp release/cranckshaft--X.Y.Z.sql release/cranckshaft--A.B.C--X.Y.Z.sql
+  cp release/crankshaft--X.Y.Z.sql release/crankshaft--A.B.C--X.Y.Z.sql
  ```
  NOTE: you can rely on this thanks to the compatibility checks. 
  TODO: automate this step [#94](https://github.com/CartoDB/crankshaft/issues/94)
 2. Update the [NEWS.md](https://github.com/CartoDB/crankshaft/blob/master/NEWS.md) file
 1. Commit and push the generated files.
 1. Tag the release:
@ -29,7 +30,6 @@ shall be performed by the designated *Release Manager*.
  ```
 1. Deploy and test in staging
 1. Deploy and test in production
 2. Update the [NEWS.md](https://github.com/CartoDB/crankshaft/blob/master/NEWS.md) file
 1. Merge back into develop
--- a/doc/02_moran.md
+++ b/doc/02_moran.md
@ -37,7 +37,7 @@ SELECT
  aoi.quads,
  aoi.significance,
  c.num_cyclists_per_total_population
-FROM CDB_GetAreasOfInterestLocal('SELECT * FROM commute_data'
+FROM CDB_AreasOfInterestLocal('SELECT * FROM commute_data'
                                 'num_cyclists_per_total_population') As aoi
 JOIN commute_data As c
 ON c.cartodb_id = aoi.rowid;
@ -113,7 +113,7 @@ SELECT
  aoi.quads,
  aoi.significance,
  c.cyclists_per_total_population
-FROM CDB_GetAreasOfInterestLocalRate('SELECT * FROM commute_data'
+FROM CDB_AreasOfInterestLocalRate('SELECT * FROM commute_data'
                                     'num_cyclists',
                                     'total_population') As aoi
 JOIN commute_data As c
--- a/doc/08_interpolation.md
+++ b/doc/08_interpolation.md
@ -2,7 +2,7 @@
 Function to interpolate a numeric attribute of a point in a scatter dataset of points, using one of three methos:
-* [Nearest neighbor](https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation)
+* [Nearest neighbor(s)](https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation)
 * [Barycentric](https://en.wikipedia.org/wiki/Barycentric_coordinate_system)
 * [IDW](https://en.wikipedia.org/wiki/Inverse_distance_weighting)
@ -15,7 +15,7 @@ Function to interpolate a numeric attribute of a point in a scatter dataset of p
 | query   | text | query that returns at least `the_geom` and a numeric value as `attrib` |
 | point   | geometry | The target point to calc the value |
 | method   | integer     | 0:nearest neighbor, 1: barycentric, 2: IDW|
-| p1   | integer     | IDW: limit the number of neighbors, 0->no limit|
+| p1   | integer     | limit the number of neighbors, IDW: 0->no limit, NN: 0-> closest one|
 | p2   | integer     | IDW: order of distance decay, 0-> order 1|
 ### CDB_SpatialInterpolation (geom geometry[], values numeric[], point geometry, method integer DEFAULT 1, p1 integer DEFAULT 0, ps integer DEFAULT 0)
@ -28,7 +28,7 @@ Function to interpolate a numeric attribute of a point in a scatter dataset of p
 | values | numeric[]   | Array of points' values for the param under study|
 | point   | geometry | The target point to calc the value |
 | method   | integer     | 0:nearest neighbor, 1: barycentric, 2: IDW|
-| p1   | integer     | IDW: limit the number of neighbors, 0->no limit|
+| p1   | integer     | limit the number of neighbors, IDW: 0->no limit, NN: 0-> closest one|
 | p2   | integer     | IDW: order of distance decay, 0-> order 1|
 ### Returns
@ -37,6 +37,9 @@ Function to interpolate a numeric attribute of a point in a scatter dataset of p
 |-------------|------|-------------|
 | value  | numeric | Interpolated value at the given point, `-888.888` if the given point is out of the boundaries of the source points set |
 Default values:
 * -888.888: when using Barycentric, the target point is out of the realm of the input points
 * -777.777: asking for a method not available
 #### Example Usage
--- a/doc/13_PIA.md
+++ b/doc/13_PIA.md
@ -0,0 +1,33 @@
 ## Pole of inaccessibility (PIA)
 Function to find the [PIA](https://en.wikipedia.org/wiki/Pole_of_inaccessibility) from a given polygon and tolerance, following the quadtree approach by [Vladimir Agafonkin](https://github.com/mourner) described [here](https://github.com/mapbox/polylabel)
 ### CDB_PIA (polygon geometry, tolerance numeric DEFAULT 1.0)
 #### Arguments
 | Name | Type | Description |
 |------|------|-------------|
 | polygon   | geometry  | Target polygon |
 | tolerance   | numeric | Threshold to decide to take a cell into account |
 ### Returns
 | Column Name | Type | Description |
 |-------------|------|-------------|
 | point  | geometry| Pole of inaccessibility |
 #### Example Usage
 ```sql
 with a as(
 select st_geomfromtext('POLYGON((-432540.453078056 4949775.20452642,-432329.947920966 4951361.232584,-431245.028163694 4952223.31516671,-429131.071033529 4951768.00415574,-424622.07505895 4952843.13503987,-423688.327170174 4953499.20752423,-424086.294349759 4954968.38274191,-423068.388925945 4954378.63345336,-423387.653225542 4953355.67417084,-420594.869840519 4953781.00230592,-416026.095299382 4951484.06849063,-412483.018546414 4951024.5410983,-410490.399661215 4954502.24032205,-408186.197521284 4956398.91417441,-407627.262358013 4959300.94633864,-406948.770061627 4959874.85407739,-404949.583326472 4959047.74518163,-402570.908447199 4953743.46829807,-400971.358683991 4952193.11680804,-403533.488084088 4949649.89857885,-406335.177028373 4950193.19571096,-407790.456731515 4952391.46015616,-412060.672398345 4950381.2389307,-410716.93482498 4949156.7509561,-408464.162289794 4943912.8940387,-409350.599394983 4942819.84896006,-408087.791091424 4942451.6711778,-407274.045613725 4940572.4807777,-404446.196589102 4939976.71501489,-402422.964843936 4940450.3670813,-401010.654464241 4939054.8061663,-397647.247369412 4940679.80737878,-395658.413346901 4940528.84765185,-395536.852462953 4938829.79565997,-394268.923462818 4938003.7277717,-393388.720249116 4934757.80596815,-392393.301362444 4934326.71675815,-392573.527618037 4932323.40974412,-393464.640141837 4931903.10653605,-393085.597275686 4931094.7353605,-398426.261165985 4929156.87541607,-398261.174361137 4926238.00816416,-394045.059966834 4925765.18668498,-392982.960705174 4926391.81893628,-393090.272694301 4927176.84692181,-391648.240010564 4924626.06386961,-391889.914625075 4923086.14787613,-394345.177314013 4923235.086036,-395550.878718795 4917812.79243978,-399009.463978251 4912927.7157945,-398948.794855767 4911941.91010796,-398092.636652078 4911806.57392519,-401991.601817112 4911722.9204501,-406225.972607907 4914505.47286319,-411104.994569885 4912569.26941163,-412925.513522316 4913030.3608866,-414630.148884835 4914436.69169949,-414207.691417276 4919205.78028405,-418306.141109809 4917994.9580478,-424184.700779621 4918938.12432889,-426816.961458921 4923664.37379373,-420956.324227126 4923381.98014807,-420186.661267781 4924286.48693378,-420943.411166194 4926812.76394433,-419779.45457046 4928527.43466337,-419768.767899344 4930681.94459216,-421911.668097113 4930432.40620397,-423482.386112205 4933451.28047252,-427272.814773717 4934151.56473242,-427144.908678797 4939731.77191996,-428982.125554848 4940522.84445172,-428986.133056516 4942437.17281266,-431237.792396792 4947309.68284815,-432476.889648814 4947791.74800037,-432540.453078056 4949775.20452642))', 3857) as g
 ),
 b as (
    select ST_Transform(g, 4326) as g from a
 )
 SELECT st_astext(CDB_PIA(g)) from b;
 ```
--- a/doc/14_densify.md
+++ b/doc/14_densify.md
@ -0,0 +1,35 @@
 ## Densify function
 Iterative densification of a set of scattered points using Delaunay triangulation. The new points are located at the centroids of the grid cells and  have as assigned value the barycentric average value of the cell's vertex.
 ### CDB_Densify(geomin geometry[], colin numeric[], iterations integer)
 #### Arguments
 | Name | Type | Description |
 |------|------|-------------|
 | geomin   | geometry[]  | Array of points geometries |
 | colin | numeric[]   | Array of points' values |
 | iterations   | integer     | Number of iterations |
 ### Returns
 Returns a table object
 | Name | Type | Description |
 |------|------|-------------|
 | geomout   | geometry  | Geometries of new dataset of points|
 | colout | numeric   | Values of points|
 #### Example Usage
 ```sql
 with data as (
    select
    ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
    ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),ST_GeomFromText('POINT(2.1228 41.3809)'),ST_GeomFromText('POINT(2.1511 41.3742)'),ST_GeomFromText('POINT(2.1528 41.4136)'),ST_GeomFromText('POINT(2.165 41.3917)'),ST_GeomFromText('POINT(2.1498 41.3713)'),ST_GeomFromText('POINT(2.1533 41.3683)'),ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin
 )
 select CDB_Densify(geomin, colin, 2) from data;
 ```
--- a/doc/15_tinmap.md
+++ b/doc/15_tinmap.md
@ -0,0 +1,36 @@
 ## TINMAP function
 Generates a fake contour map, in the form of a TIN map, from a set of scattered points.Depends on **CDB_Densify**.
 Its iterative nature lets the user smooth the final result as much as desired, but with a exponential time cost increase.
 ### CDB_TINmap(geomin geometry[], colin numeric[], iterations integer)
 #### Arguments
 | Name | Type | Description |
 |------|------|-------------|
 | geomin   | geometry[]  | Array of points geometries |
 | colin | numeric[]   | Array of points' values |
 | iterations   | integer     | Number of iterations |
 ### Returns
 Returns a table object
 | Name | Type | Description |
 |------|------|-------------|
 | geomout   | geometry  | Geometries of new dataset of polygons|
 | colout | numeric   | Values of each cell|
 #### Example Usage
 ```sql
 with data as (
    select
    ARRAY[7.0,8.0,1.0,2.0,3.0,5.0,6.0,4.0] as colin,
    ARRAY[ST_GeomFromText('POINT(2.1744 41.4036)'),ST_GeomFromText('POINT(2.1228 41.3809)'),ST_GeomFromText('POINT(2.1511 41.3742)'),ST_GeomFromText('POINT(2.1528 41.4136)'),ST_GeomFromText('POINT(2.165 41.3917)'),ST_GeomFromText('POINT(2.1498 41.3713)'),ST_GeomFromText('POINT(2.1533 41.3683)'),ST_GeomFromText('POINT(2.131386 41.413998)')] as geomin
 )
 select CDB_TINmap(geomin, colin, 2) from data;
 ```
--- a/doc/19_contour.md
+++ b/doc/19_contour.md
@ -0,0 +1,50 @@
 ## Contour maps
 Function to generate a contour map from an scatter dataset of points, using one of these three methods:
 * [Nearest neighbor](https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation)
 * [Barycentric](https://en.wikipedia.org/wiki/Barycentric_coordinate_system)
 * [IDW](https://en.wikipedia.org/wiki/Inverse_distance_weighting)
 ### CDB_Contour (geom geometry[], values numeric[], resolution integer, buffer numeric, method, classmethod integer, steps integer)
 #### Arguments
 | Name | Type | Description |
 |------|------|-------------|
 | geom   | geometry[]  | Array of points's geometries |
 | values | numeric[]   | Array of points' values for the param under study|
 | buffer   | numeric     | Value between 0 and 1 for spatial buffer of the set of points
 | method   | integer     | 0:nearest neighbor, 1: barycentric, 2: IDW|
 | classmethod   | integer     | 0:equals, 1: heads&tails, 2:jenks, 3:quantiles |
 | steps   | integer     | Number of steps in the classification|
 | max_time   | integer     | if <= 0: max processing time in seconds (smart resolution) , if >0: resolution in meters
 ### Returns
 Returns a table object
 | Name | Type | Description |
 |------|------|-------------|
 | the_geom   | geometry  | Geometries of the classified contour map|
 | avg_value | numeric   | Avg value of the area|
 | min_value | numeric   | Min value of the area|
 | max_value | numeric   | Max value of the areal|
 | bin | integer   | Index of the class of the area|
 #### Example Usage
 ```sql
 WITH a AS (
    SELECT
    ARRAY[800, 700, 600, 500, 400, 300, 200, 100]::numeric[] AS vals,
    ARRAY[ST_GeomFromText('POINT(2.1744 41.403)',4326),ST_GeomFromText('POINT(2.1228 41.380)',4326),ST_GeomFromText('POINT(2.1511 41.374)',4326),ST_GeomFromText('POINT(2.1528 41.413)',4326),ST_GeomFromText('POINT(2.165 41.391)',4326),ST_GeomFromText('POINT(2.1498 41.371)',4326),ST_GeomFromText('POINT(2.1533 41.368)',4326),ST_GeomFromText('POINT(2.131386 41.41399)',4326)] AS g
 ),
 b as(
 SELECT
    foo.*
 FROM
    a,
    cdb_crankshaft.CDB_contour(a.g, a.vals,  0.0, 1, 3, 5, 60) foo
 )
 SELECT bin, avg_value from b order by bin;
 ```
--- a/release/crankshaft--0.3.1--0.4.0.sql
+++ b/release/crankshaft--0.3.1--0.4.0.sql
--- a/release/crankshaft--0.4.0--0.4.1.sql
+++ b/release/crankshaft--0.4.0--0.4.1.sql
--- a/release/crankshaft--0.4.0.sql
+++ b/release/crankshaft--0.4.0.sql
--- a/release/crankshaft--0.4.1--0.4.2.sql
+++ b/release/crankshaft--0.4.1--0.4.2.sql
--- a/release/crankshaft--0.4.1.sql
+++ b/release/crankshaft--0.4.1.sql
--- a/release/crankshaft--0.4.2.sql
+++ b/release/crankshaft--0.4.2.sql
--- a/release/crankshaft.control
+++ b/release/crankshaft.control
@ -1,5 +1,5 @@
 comment = 'CartoDB Spatial Analysis extension'
-default_version = '0.3.1'
+default_version = '0.4.2'
 requires = 'plpythonu, postgis'
 superuser = true
 schema = cdb_crankshaft
--- a/release/python/0.4.0/crankshaft/crankshaft/init.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/init.py
@ -0,0 +1,5 @@
 """Import all modules"""
 import crankshaft.random_seeds
 import crankshaft.clustering
 import crankshaft.space_time_dynamics
 import crankshaft.segmentation
--- a/release/python/0.4.0/crankshaft/crankshaft/clustering/init.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/clustering/init.py
@ -0,0 +1,3 @@
 """Import all functions from for clustering"""
 from moran import *
 from kmeans import *
--- a/release/python/0.4.0/crankshaft/crankshaft/clustering/kmeans.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/clustering/kmeans.py
@ -0,0 +1,18 @@
 from sklearn.cluster import KMeans
 import plpy
 def kmeans(query, no_clusters, no_init=20):
    data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
        array_agg(ST_X(the_geom) order by cartodb_id) xs,
        array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
        where the_geom is not null
    '''.format(query=query))
    xs  = data[0]['xs']
    ys  = data[0]['ys']
    ids = data[0]['ids']
    km = KMeans(n_clusters= no_clusters, n_init=no_init)
    labels = km.fit_predict(zip(xs,ys))
    return zip(ids,labels)
--- a/release/python/0.4.0/crankshaft/crankshaft/clustering/moran.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/clustering/moran.py
@ -0,0 +1,262 @@
 """
 Moran's I geostatistics (global clustering & outliers presence)
 """
 # TODO: Fill in local neighbors which have null/NoneType values with the
 #       average of the their neighborhood
 import pysal as ps
 import plpy
 from collections import OrderedDict
 # crankshaft module
 import crankshaft.pysal_utils as pu
 # High level interface ---------------------------------------
 def moran(subquery, attr_name,
          w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I (global)
    Implementation building neighbors with a PostGIS database and Moran's I
     core clusters with PySAL.
    Andy Eschbacher
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr_name),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    plpy.notice('** Query: %s' % query)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(2)
        plpy.notice('** Query returned with %d rows' % len(result))
    except plpy.SPIError:
        plpy.error('Error: areas of interest query failed, check input parameters')
        plpy.notice('** Query failed: "%s"' % query)
        plpy.notice('** Error: %s' % plpy.SPIError)
        return pu.empty_zipped_array(2)
    ## collect attributes
    attr_vals = pu.get_attributes(result)
    ## calculate weights
    weight = pu.get_weight(result, w_type, num_ngbrs)
    ## calculate moran global
    moran_global = ps.esda.moran.Moran(attr_vals, weight,
                                       permutations=permutations)
    return zip([moran_global.I], [moran_global.EI])
 def moran_local(subquery, attr,
                w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I implementation for PL/Python
    Andy Eschbacher
    """
    # geometries with attributes that are null are ignored
    # resulting in a collection of not as near neighbors
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(5)
    except plpy.SPIError:
        plpy.error('Error: areas of interest query failed, check input parameters')
        plpy.notice('** Query failed: "%s"' % query)
        return pu.empty_zipped_array(5)
    attr_vals = pu.get_attributes(result)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
                                     permutations=permutations)
    # find quadrants for each geometry
    quads = quad_position(lisa.q)
    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
 def moran_rate(subquery, numerator, denominator,
               w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I Rate (global)
    Andy Eschbacher
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", numerator),
                         ("attr2", denominator)
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    plpy.notice('** Query: %s' % query)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(2)
        plpy.notice('** Query returned with %d rows' % len(result))
    except plpy.SPIError:
        plpy.error('Error: areas of interest query failed, check input parameters')
        plpy.notice('** Query failed: "%s"' % query)
        plpy.notice('** Error: %s' % plpy.SPIError)
        return pu.empty_zipped_array(2)
    ## collect attributes
    numer = pu.get_attributes(result, 1)
    denom = pu.get_attributes(result, 2)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    ## calculate moran global rate
    lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
                                         permutations=permutations)
    return zip([lisa_rate.I], [lisa_rate.EI])
 def moran_local_rate(subquery, numerator, denominator,
                     w_type, num_ngbrs, permutations, geom_col, id_col):
    """
        Moran's I Local Rate
        Andy Eschbacher
    """
    # geometries with values that are null are ignored
    # resulting in a collection of not as near neighbors
    qvals = OrderedDict([("id_col", id_col),
                         ("numerator", numerator),
                         ("denominator", denominator),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(5)
    except plpy.SPIError:
        plpy.error('Error: areas of interest query failed, check input parameters')
        plpy.notice('** Query failed: "%s"' % query)
        plpy.notice('** Error: %s' % plpy.SPIError)
        return pu.empty_zipped_array(5)
    ## collect attributes
    numer = pu.get_attributes(result, 1)
    denom = pu.get_attributes(result, 2)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
                                          permutations=permutations)
    # find quadrants for each geometry
    quads = quad_position(lisa.q)
    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
 def moran_local_bv(subquery, attr1, attr2,
                   permutations, geom_col, id_col, w_type, num_ngbrs):
    """
        Moran's I (local) Bivariate (untested)
    """
    plpy.notice('** Constructing query')
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr1),
                         ("attr2", attr2),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(4)
    except plpy.SPIError:
        plpy.error("Error: areas of interest query failed, " \
                   "check input parameters")
        plpy.notice('** Query failed: "%s"' % query)
        return pu.empty_zipped_array(4)
    ## collect attributes
    attr1_vals = pu.get_attributes(result, 1)
    attr2_vals = pu.get_attributes(result, 2)
    # create weights
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
                                        permutations=permutations)
    plpy.notice("len of Is: %d" % len(lisa.Is))
    # find clustering of significance
    lisa_sig = quad_position(lisa.q)
    plpy.notice('** Finished calculations')
    return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
 # Low level functions ----------------------------------------
 def map_quads(coord):
    """
        Map a quadrant number to Moran's I designation
        HH=1, LH=2, LL=3, HL=4
        Input:
        @param coord (int): quadrant of a specific measurement
        Output:
            classification (one of 'HH', 'LH', 'LL', or 'HL')
    """
    if coord == 1:
        return 'HH'
    elif coord == 2:
        return 'LH'
    elif coord == 3:
        return 'LL'
    elif coord == 4:
        return 'HL'
    else:
        return None
 def quad_position(quads):
    """
        Produce Moran's I classification based of n
        Input:
        @param quads ndarray: an array of quads classified by
          1-4 (PySAL default)
        Output:
        @param list: an array of quads classied by 'HH', 'LL', etc.
    """
    return [map_quads(q) for q in quads]
--- a/release/python/0.4.0/crankshaft/crankshaft/pysal_utils/init.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/pysal_utils/init.py
@ -0,0 +1,2 @@
 """Import all functions for pysal_utils"""
 from crankshaft.pysal_utils.pysal_utils import *
--- a/release/python/0.4.0/crankshaft/crankshaft/pysal_utils/pysal_utils.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/pysal_utils/pysal_utils.py
@ -0,0 +1,188 @@
 """
    Utilities module for generic PySAL functionality, mainly centered on
      translating queries into numpy arrays or PySAL weights objects
 """
 import numpy as np
 import pysal as ps
 def construct_neighbor_query(w_type, query_vals):
    """Return query (a string) used for finding neighbors
        @param w_type text: type of neighbors to calculate ('knn' or 'queen')
        @param query_vals dict: values used to construct the query
    """
    if w_type.lower() == 'knn':
        return knn(query_vals)
    else:
        return queen(query_vals)
 ## Build weight object
 def get_weight(query_res, w_type='knn', num_ngbrs=5):
    """
        Construct PySAL weight from return value of query
        @param query_res dict-like: query results with attributes and neighbors
    """
    # if w_type.lower() == 'knn':
    #     row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
    #     weights = {x['id']: row_normed_weights for x in query_res}
    # else:
    #     weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
    #                         if len(x['neighbors']) > 0
    #                         else [] for x in query_res}
    neighbors = {x['id']: x['neighbors'] for x in query_res}
    print 'len of neighbors: %d' % len(neighbors)
    built_weight = ps.W(neighbors)
    built_weight.transform = 'r'
    return built_weight
 def query_attr_select(params):
    """
        Create portion of SELECT statement for attributes inolved in query.
        @param params: dict of information used in query (column names,
                       table name, etc.)
    """
    attr_string = ""
    template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
    if 'time_cols' in params:
        ## if markov analysis
        attrs = params['time_cols']
        for idx, val in enumerate(attrs):
            attr_string += template % {"col": val, "alias_num": idx + 1}
    else:
        ## if moran's analysis
        attrs = [k for k in params
                 if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')]
        for idx, val in enumerate(sorted(attrs)):
            attr_string += template % {"col": params[val], "alias_num": idx + 1}
    return attr_string
 def query_attr_where(params):
    """
      Construct where conditions when building neighbors query
        Create portion of WHERE clauses for weeding out NULL-valued geometries
        Input: dict of params:
            {'subquery': ...,
             'numerator': 'data1',
             'denominator': 'data2',
             '': ...}
        Output: 'idx_replace."data1" IS NOT NULL AND idx_replace."data2" IS NOT NULL'
        Input:
        {'subquery': ...,
         'time_cols': ['time1', 'time2', 'time3'],
         'etc': ...}
        Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
          NULL AND idx_replace."time3" IS NOT NULL'
    """
    attr_string = []
    template = "idx_replace.\"%s\" IS NOT NULL"
    if 'time_cols' in params:
        ## markov where clauses
        attrs = params['time_cols']
        # add values to template
        for attr in attrs:
            attr_string.append(template % attr)
    else:
        ## moran where clauses
        # get keys
        attrs = sorted([k for k in params
                        if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')])
        # add values to template
        for attr in attrs:
            attr_string.append(template % params[attr])
        if len(attrs) == 2:
            attr_string.append("idx_replace.\"%s\" <> 0" % params[attrs[1]])
    out = " AND ".join(attr_string)
    return out
 def knn(params):
    """SQL query for k-nearest neighbors.
        @param vars: dict of values to fill template
    """
    attr_select = query_attr_select(params)
    attr_where = query_attr_where(params)
    replacements = {"attr_select": attr_select,
                    "attr_where_i": attr_where.replace("idx_replace", "i"),
                    "attr_where_j": attr_where.replace("idx_replace", "j")}
    query = "SELECT " \
                "i.\"{id_col}\" As id, " \
                "%(attr_select)s" \
                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
                              "FROM ({subquery}) As j " \
                              "WHERE " \
                                "i.\"{id_col}\" <> j.\"{id_col}\" AND " \
                                "%(attr_where_j)s " \
                              "ORDER BY " \
                                "j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
                              "LIMIT {num_ngbrs})" \
                ") As neighbors " \
            "FROM ({subquery}) As i " \
            "WHERE " \
                "%(attr_where_i)s " \
            "ORDER BY i.\"{id_col}\" ASC;" % replacements
    return query.format(**params)
 ## SQL query for finding queens neighbors (all contiguous polygons)
 def queen(params):
    """SQL query for queen neighbors.
        @param params dict: information to fill query
    """
    attr_select = query_attr_select(params)
    attr_where = query_attr_where(params)
    replacements = {"attr_select": attr_select,
                    "attr_where_i": attr_where.replace("idx_replace", "i"),
                    "attr_where_j": attr_where.replace("idx_replace", "j")}
    query = "SELECT " \
                "i.\"{id_col}\" As id, " \
                "%(attr_select)s" \
                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
                 "FROM ({subquery}) As j " \
                 "WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
                       "ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
                       "%(attr_where_j)s)" \
                ") As neighbors " \
            "FROM ({subquery}) As i " \
            "WHERE " \
                "%(attr_where_i)s " \
            "ORDER BY i.\"{id_col}\" ASC;" % replacements
    return query.format(**params)
 ## to add more weight methods open a ticket or pull request
 def get_attributes(query_res, attr_num=1):
    """
        @param query_res: query results with attributes and neighbors
        @param attr_num: attribute number (1, 2, ...)
    """
    return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
 def empty_zipped_array(num_nones):
    """
        prepare return values for cases of empty weights objects (no neighbors)
        Input:
        @param num_nones int: number of columns (e.g., 4)
        Output:
        [(None, None, None, None)]
    """
    return [tuple([None] * num_nones)]
--- a/release/python/0.4.0/crankshaft/crankshaft/random_seeds.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/random_seeds.py
@ -0,0 +1,11 @@
 """Random seed generator used for non-deterministic functions in crankshaft"""
 import random
 import numpy
 def set_random_seeds(value):
    """
    Set the seeds of the RNGs (Random Number Generators)
    used internally.
    """
    random.seed(value)
    numpy.random.seed(value)
--- a/release/python/0.4.0/crankshaft/crankshaft/segmentation/init.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/segmentation/init.py
@ -0,0 +1 @@
 from segmentation import * 
--- a/release/python/0.4.0/crankshaft/crankshaft/segmentation/segmentation.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/segmentation/segmentation.py
@ -0,0 +1,176 @@
 """
 Segmentation creation and prediction
 """
 import sklearn
 import numpy as np
 import plpy
 from sklearn.ensemble import GradientBoostingRegressor
 from sklearn import metrics
 from sklearn.cross_validation import train_test_split
 # Lower level functions
 #----------------------
 def replace_nan_with_mean(array):
    """
        Input:
            @param array: an array of floats which may have null-valued entries
        Output:
            array with nans filled in with the mean of the dataset
    """
    # returns an array of rows and column indices
    indices = np.where(np.isnan(array))
    # iterate through entries which have nan values
    for row, col in zip(*indices):
            array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
    return array
 def get_data(variable, feature_columns, query):
    """
        Fetch data from the database, clean, and package into
          numpy arrays
        Input:
            @param variable: name of the target variable
            @param feature_columns: list of column names
            @param query: subquery that data is pulled from for the packaging
        Output:
            prepared data, packaged into NumPy arrays
    """
    columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
    try:
        data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
            variable=variable,
            columns=columns,
            query=query))
    except Exception, e:
        plpy.error('Failed to access data to build segmentation model: %s' % e)
    # extract target data from plpy object
    target = np.array(data[0]['target'])
    # put n feature data arrays into an n x m array of arrays
    features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
    return replace_nan_with_mean(target), replace_nan_with_mean(features)
 # High level interface
 # --------------------
 def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
    """
    Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
    the function.
        Input:
            @param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
            @param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
            @param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
            @param model_parameters: A dictionary containing parameters for the model.
    """
    clean_target = replace_nan_with_mean(target)
    clean_features = replace_nan_with_mean(features)
    target_features = replace_nan_with_mean(target_features)
    model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
    prediction = model.predict(target_features)
    accuracy_array = [accuracy]*prediction.shape[0]
    return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
 def create_and_predict_segment(query, variable, target_query, model_params):
    """
    generate a segment with machine learning
    Stuart Lynn
    """
    ## fetch column names
    try:
        columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1  '.format(query=query))[0].keys()
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    ## extract column names to be used in building the segmentation model
    feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
    ## get data from database
    target, features = get_data(variable, feature_columns, query)
    model, accuracy = train_model(target, features, model_params, 0.2)
    cartodb_ids, result = predict_segment(model, feature_columns, target_query)
    accuracy_array = [accuracy]*result.shape[0]
    return zip(cartodb_ids, result, accuracy_array)
 def train_model(target, features, model_params, test_split):
    """
        Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
        Input:
            @param target: 1D Array of the variable that the model is to be trianed to predict
            @param features: 2D Array NSamples * NFeatures to use in trining the model
            @param model_params: A dictionary of model parameters, the full specification can be found on the
                scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
            @parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
    """
    features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
    model = GradientBoostingRegressor(**model_params)
    model.fit(features_train, target_train)
    accuracy = calculate_model_accuracy(model, features, target)
    return model, accuracy
 def calculate_model_accuracy(model, features, target):
    """
        Calculate the mean squared error of the model prediction
        Input:
            @param model: model trained from input features
            @param features: features to make a prediction from
            @param target: target to compare prediction to
        Output:
            mean squared error of the model prection compared to the target
    """
    prediction = model.predict(features)
    return metrics.mean_squared_error(prediction, target)
 def predict_segment(model, features, target_query):
    """
    Use the provided model to predict the values for the new feature set
        Input:
            @param model: The pretrained model
            @features: A list of features to use in the model prediction (list of column names)
            @target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
    """
    batch_size = 1000
    joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
    try:
        cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
            joined_features=joined_features,
            target_query=target_query))
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    results = []
    while True:
        rows = cursor.fetch(batch_size)
        if not rows:
            break
        batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
        #Need to fix this. Should be global mean. This will cause weird effects
        batch = replace_nan_with_mean(batch)
        prediction = model.predict(batch)
        results.append(prediction)
    try:
        cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    return cartodb_ids, np.concatenate(results)
--- a/release/python/0.4.0/crankshaft/crankshaft/space_time_dynamics/init.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/space_time_dynamics/init.py
@ -0,0 +1,2 @@
 """Import all functions from clustering libraries."""
 from markov import *
--- a/release/python/0.4.0/crankshaft/crankshaft/space_time_dynamics/markov.py
+++ b/release/python/0.4.0/crankshaft/crankshaft/space_time_dynamics/markov.py
@ -0,0 +1,189 @@
 """
 Spatial dynamics measurements using Spatial Markov
 """
 import numpy as np
 import pysal as ps
 import plpy
 import crankshaft.pysal_utils as pu
 def spatial_markov_trend(subquery, time_cols, num_classes=7,
                         w_type='knn', num_ngbrs=5, permutations=0,
                         geom_col='the_geom', id_col='cartodb_id'):
    """
        Predict the trends of a unit based on:
        1. history of its transitions to different classes (e.g., 1st quantile -> 2nd quantile)
        2. average class of its neighbors
        Inputs:
        @param subquery string: e.g., SELECT the_geom, cartodb_id,
          interesting_time_column FROM table_name
        @param time_cols list of strings: list of strings of column names
        @param num_classes (optional): number of classes to break distribution
          of values into. Currently uses quantile bins.
        @param w_type string (optional): weight type ('knn' or 'queen')
        @param num_ngbrs int (optional): number of neighbors (if knn type)
        @param permutations int (optional): number of permutations for test
          stats
        @param geom_col string (optional): name of column which contains the
          geometries
        @param id_col string (optional): name of column which has the ids of
          the table
        Outputs:
        @param trend_up float: probablity that a geom will move to a higher
          class
        @param trend_down float: probablity that a geom will move to a lower
          class
        @param trend float: (trend_up - trend_down) / trend_static
        @param volatility float: a measure of the volatility based on
          probability stddev(prob array)
    """
    if len(time_cols) < 2:
        plpy.error('More than one time column needs to be passed')
    qvals = {"id_col": id_col,
             "time_cols": time_cols,
             "geom_col": geom_col,
             "subquery": subquery,
             "num_ngbrs": num_ngbrs}
    try:
        query_result = plpy.execute(
            pu.construct_neighbor_query(w_type, qvals)
        )
        if len(query_result) == 0:
            return zip([None], [None], [None], [None], [None])
    except plpy.SPIError, err:
        plpy.debug('Query failed with exception %s: %s' % (err, pu.construct_neighbor_query(w_type, qvals)))
        plpy.error('Query failed, check the input parameters')
        return zip([None], [None], [None], [None], [None])
    ## build weight
    weights = pu.get_weight(query_result, w_type)
    weights.transform = 'r'
    ## prep time data
    t_data = get_time_data(query_result, time_cols)
    plpy.debug('shape of t_data %d, %d' % t_data.shape)
    plpy.debug('number of weight objects: %d, %d' % (weights.sparse).shape)
    plpy.debug('first num elements: %f' % t_data[0, 0])
    sp_markov_result = ps.Spatial_Markov(t_data,
                                         weights,
                                         k=num_classes,
                                         fixed=False,
                                         permutations=permutations)
    ## get lag classes
    lag_classes = ps.Quantiles(
        ps.lag_spatial(weights, t_data[:, -1]),
        k=num_classes).yb
    ## look up probablity distribution for each unit according to class and lag class
    prob_dist = get_prob_dist(sp_markov_result.P,
                              lag_classes,
                              sp_markov_result.classes[:, -1])
    ## find the ups and down and overall distribution of each cell
    trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist,
                                                             sp_markov_result.classes[:, -1])
    ## output the results
    return zip(trend, trend_up, trend_down, volatility, weights.id_order)
 def get_time_data(markov_data, time_cols):
    """
        Extract the time columns and bin appropriately
    """
    num_attrs = len(time_cols)
    return np.array([[x['attr' + str(i)] for x in markov_data]
                     for i in range(1, num_attrs+1)], dtype=float).transpose()
 ## not currently used
 def rebin_data(time_data, num_time_per_bin):
    """
        Convert an n x l matrix into an (n/m) x l matrix where the values are
         reduced (averaged) for the intervening states:
          1 2 3 4    1.5 3.5
          5 6 7 8 -> 5.5 7.5
          9 8 7 6    8.5 6.5
          5 4 3 2    4.5 2.5
          if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
        This process effectively resamples the data at a longer time span n
         units longer than the input data.
        For cases when there is a remainder (remainder(5/3) = 2), the remaining
         two columns are binned together as the last time period, while the
         first three are binned together for the first period.
        Input:
          @param time_data n x l  ndarray: measurements of an attribute at
           different time intervals
          @param num_time_per_bin int: number of columns to average into a new
           column
        Output:
          ceil(n / m) x l ndarray of resampled time series
    """
    if time_data.shape[1] % num_time_per_bin == 0:
        ## if fit is perfect, then use it
        n_max = time_data.shape[1] / num_time_per_bin
    else:
        ## fit remainders into an additional column
        n_max = time_data.shape[1] / num_time_per_bin + 1
    return np.array([time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
                     for i in range(n_max)]).T
 def get_prob_dist(transition_matrix, lag_indices, unit_indices):
    """
        Given an array of transition matrices, look up the probability
        associated with the arrangements passed
        Input:
        @param transition_matrix ndarray[k,k,k]:
        @param lag_indices ndarray:
        @param unit_indices ndarray:
        Output:
        Array of probability distributions
    """
    return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
                     for i in range(len(lag_indices))])
 def get_prob_stats(prob_dist, unit_indices):
    """
        get the statistics of the probability distributions
        Outputs:
            @param trend_up ndarray(float): sum of probabilities for upward
               movement (relative to the unit index of that prob)
            @param trend_down ndarray(float): sum of probabilities for downward
               movement (relative to the unit index of that prob)
            @param trend ndarray(float): difference of upward and downward
               movements
    """
    num_elements = len(unit_indices)
    trend_up = np.empty(num_elements, dtype=float)
    trend_down = np.empty(num_elements, dtype=float)
    trend = np.empty(num_elements, dtype=float)
    for i in range(num_elements):
        trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
        trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
        if prob_dist[i, unit_indices[i]] > 0.0:
            trend[i] = (trend_up[i] - trend_down[i]) / prob_dist[i, unit_indices[i]]
        else:
            trend[i] = None
    ## calculate volatility of distribution
    volatility = prob_dist.std(axis=1)
    return trend_up, trend_down, trend, volatility
--- a/release/python/0.4.0/crankshaft/setup.py
+++ b/release/python/0.4.0/crankshaft/setup.py
@ -0,0 +1,49 @@
 """
 CartoDB Spatial Analysis Python Library
 See:
 https://github.com/CartoDB/crankshaft
 """
 from setuptools import setup, find_packages
 setup(
    name='crankshaft',
    version='0.0.0',
    description='CartoDB Spatial Analysis Python Library',
    url='https://github.com/CartoDB/crankshaft',
    author='Data Services Team - CartoDB',
    author_email='dataservices@cartodb.com',
    license='MIT',
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Mapping comunity',
        'Topic :: Maps :: Mapping Tools',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 2.7',
    ],
    keywords='maps mapping tools spatial analysis geostatistics',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    extras_require={
        'dev': ['unittest'],
        'test': ['unittest', 'nose', 'mock'],
    },
    # The choice of component versions is dictated by what's
    # provisioned in the production servers.
    # IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
    install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
    requires=['pysal', 'numpy', 'sklearn'],
    test_suite='test'
 )
--- a/release/python/0.4.0/crankshaft/setup.py-r
+++ b/release/python/0.4.0/crankshaft/setup.py-r
@ -0,0 +1,49 @@
 """
 CartoDB Spatial Analysis Python Library
 See:
 https://github.com/CartoDB/crankshaft
 """
 from setuptools import setup, find_packages
 setup(
    name='crankshaft',
    version='0.0.0',
    description='CartoDB Spatial Analysis Python Library',
    url='https://github.com/CartoDB/crankshaft',
    author='Data Services Team - CartoDB',
    author_email='dataservices@cartodb.com',
    license='MIT',
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Mapping comunity',
        'Topic :: Maps :: Mapping Tools',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 2.7',
    ],
    keywords='maps mapping tools spatial analysis geostatistics',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    extras_require={
        'dev': ['unittest'],
        'test': ['unittest', 'nose', 'mock'],
    },
    # The choice of component versions is dictated by what's
    # provisioned in the production servers.
    # IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
    install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
    requires=['pysal', 'numpy', 'sklearn'],
    test_suite='test'
 )
--- a/release/python/0.4.0/crankshaft/test/fixtures/kmeans.json
+++ b/release/python/0.4.0/crankshaft/test/fixtures/kmeans.json
@ -0,0 +1 @@
 [{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]
--- a/release/python/0.4.0/crankshaft/test/fixtures/markov.json
+++ b/release/python/0.4.0/crankshaft/test/fixtures/markov.json
@ -0,0 +1 @@
 [[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]
--- a/release/python/0.4.0/crankshaft/test/fixtures/moran.json
+++ b/release/python/0.4.0/crankshaft/test/fixtures/moran.json
@ -0,0 +1,52 @@
 [[0.9319096128346788, "HH"],
 [-1.135787401862846, "HL"],
 [0.11732030672508517, "LL"],
 [0.6152779669180425, "LL"],
 [-0.14657336660125297, "LH"],
 [0.6967858120189607, "LL"],
 [0.07949310115714454, "HH"],
 [0.4703198759258987, "HH"],
 [0.4421125200498064, "HH"],
 [0.5724288737143592, "LL"],
 [0.8970743435692062, "LL"],
 [0.18327334401918674, "LL"],
 [-0.01466729201304962, "HL"],
 [0.3481559372544409, "LL"],
 [0.06547094736902978, "LL"],
 [0.15482141569329988, "HH"],
 [0.4373841193538136, "HH"],
 [0.15971286468915544, "LL"],
 [1.0543588860308968, "HH"],
 [1.7372866900020818, "HH"],
 [1.091998586053999, "LL"],
 [0.1171572584252222, "HH"],
 [0.08438455015300014, "LL"],
 [0.06547094736902978, "LL"],
 [0.15482141569329985, "HH"],
 [1.1627044812890683, "HH"],
 [0.06547094736902978, "LL"],
 [0.795275137550483, "HH"],
 [0.18562939195219, "LL"],
 [0.3010757406693439, "LL"],
 [2.8205795942839376, "HH"],
 [0.11259190602909264, "LL"],
 [-0.07116352791516614, "HL"],
 [-0.09945240794119009, "LH"],
 [0.18562939195219, "LL"],
 [0.1832733440191868, "LL"],
 [-0.39054253768447705, "HL"],
 [-0.1672071289487642, "HL"],
 [0.3337669247916343, "HH"],
 [0.2584386102554792, "HH"],
 [-0.19733845476322634, "HL"],
 [-0.9379282899805409, "LH"],
 [-0.028770969951095866, "LH"],
 [0.051367269430983485, "LL"],
 [-0.2172548045913472, "LH"],
 [0.05136726943098351, "LL"],
 [0.04191046803899837, "LL"],
 [0.7482357030403517, "HH"],
 [-0.014585767863118111, "LH"],
 [0.5410013139159929, "HH"],
 [1.0223932668429925, "LL"],
 [1.4179402898927476, "LL"]]
--- a/release/python/0.4.0/crankshaft/test/fixtures/neighbors.json
+++ b/release/python/0.4.0/crankshaft/test/fixtures/neighbors.json
@ -0,0 +1,54 @@
 [
    {"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
    {"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
    {"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
    {"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
    {"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
    {"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
    {"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
    {"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
    {"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
    {"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
    {"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
    {"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
    {"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
    {"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
    {"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
    {"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
    {"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
    {"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
    {"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
    {"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
    {"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
    {"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
    {"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
    {"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
    {"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
    {"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
    {"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
    {"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
    {"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
    {"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
    {"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
    {"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
    {"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
    {"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
    {"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
    {"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
    {"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
    {"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
    {"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
    {"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
    {"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
    {"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
    {"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
    {"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
    {"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
    {"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
    {"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
    {"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
    {"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
    {"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
    {"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
    {"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
  ]
--- a/release/python/0.4.0/crankshaft/test/fixtures/neighbors_markov.json
+++ b/release/python/0.4.0/crankshaft/test/fixtures/neighbors_markov.json
--- a/release/python/0.4.0/crankshaft/test/helper.py
+++ b/release/python/0.4.0/crankshaft/test/helper.py
@ -0,0 +1,13 @@
 import unittest
 from mock_plpy import MockPlPy
 plpy = MockPlPy()
 import sys
 sys.modules['plpy'] = plpy
 import os
 def fixture_file(name):
    dir = os.path.dirname(os.path.realpath(__file__))
    return os.path.join(dir, 'fixtures', name)
--- a/release/python/0.4.0/crankshaft/test/mock_plpy.py
+++ b/release/python/0.4.0/crankshaft/test/mock_plpy.py
@ -0,0 +1,52 @@
 import re
 class MockCursor:
    def __init__(self, data):
        self.cursor_pos = 0
        self.data = data
    def fetch(self, batch_size):
        batch = self.data[self.cursor_pos : self.cursor_pos + batch_size]
        self.cursor_pos += batch_size
        return batch
 class MockPlPy:
    def __init__(self):
        self._reset()
    def _reset(self):
        self.infos = []
        self.notices = []
        self.debugs = []
        self.logs = []
        self.warnings = []
        self.errors = []
        self.fatals = []
        self.executes = []
        self.results = []
        self.prepares = []
        self.results = []
    def _define_result(self, query, result):
        pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
        self.results.append([pattern, result])
    def notice(self, msg):
        self.notices.append(msg)
    def debug(self, msg):
        self.notices.append(msg)
    def info(self, msg):
        self.infos.append(msg)
    def cursor(self, query):
        data = self.execute(query)
        return MockCursor(data)
    def execute(self, query): # TODO: additional arguments
       for result in self.results:
          if result[0].match(query):
            return result[1]
       return []
--- a/release/python/0.4.0/crankshaft/test/test_cluster_kmeans.py
+++ b/release/python/0.4.0/crankshaft/test/test_cluster_kmeans.py
@ -0,0 +1,38 @@
 import unittest
 import numpy as np
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import numpy as np
 import crankshaft.clustering as cc
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 import json
 class KMeansTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
        self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
        self.params = {"subquery": "select * from table",
                       "no_clusters": "10"
                       }
    def test_kmeans(self):
        data = self.cluster_data
        plpy._define_result('select' ,data)
        clusters = cc.kmeans('subquery', 2)
        labels  = [a[1] for a in clusters]
        c1 = [a for a in clusters if a[1]==0]
        c2 = [a for a in clusters if a[1]==1]
        self.assertEqual(len(np.unique(labels)),2)
        self.assertEqual(len(c1),20)
        self.assertEqual(len(c2),20)
--- a/release/python/0.4.0/crankshaft/test/test_clustering_moran.py
+++ b/release/python/0.4.0/crankshaft/test/test_clustering_moran.py
@ -0,0 +1,88 @@
 import unittest
 import numpy as np
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import crankshaft.clustering as cc
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 import json
 class MoranTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
        self.params = {"id_col": "cartodb_id",
                       "attr1": "andy",
                       "attr2": "jay_z",
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.params_markov = {"id_col": "cartodb_id",
                              "time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
                              "subquery": "SELECT * FROM a_list",
                              "geom_col": "the_geom",
                              "num_ngbrs": 321}
        self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
        self.moran_data = json.loads(open(fixture_file('moran.json')).read())
    def test_map_quads(self):
        """Test map_quads"""
        self.assertEqual(cc.map_quads(1), 'HH')
        self.assertEqual(cc.map_quads(2), 'LH')
        self.assertEqual(cc.map_quads(3), 'LL')
        self.assertEqual(cc.map_quads(4), 'HL')
        self.assertEqual(cc.map_quads(33), None)
        self.assertEqual(cc.map_quads('andy'), None)
    def test_quad_position(self):
        """Test lisa_sig_vals"""
        quads = np.array([1, 2, 3, 4], np.int)
        ans = np.array(['HH', 'LH', 'LL', 'HL'])
        test_ans = cc.quad_position(quads)
        self.assertTrue((test_ans == ans).all())
    def test_moran_local(self):
        """Test Moran's I local"""
        data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
        result = [(row[0], row[1]) for row in result]
        expected = self.moran_data
        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
            self.assertAlmostEqual(res_val, exp_val)
            self.assertEqual(res_quad, exp_quad)
    def test_moran_local_rate(self):
        """Test Moran's I rate"""
        data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
        print 'result == None? ', result == None
        result = [(row[0], row[1]) for row in result]
        expected = self.moran_data
        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
            self.assertAlmostEqual(res_val, exp_val)
    def test_moran(self):
        """Test Moran's I global"""
        data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1235)
        result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
        print 'result == None?', result == None
        result_moran = result[0][0]
        expected_moran = np.array([row[0] for row in self.moran_data]).mean()
        self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)
--- a/release/python/0.4.0/crankshaft/test/test_pysal_utils.py
+++ b/release/python/0.4.0/crankshaft/test/test_pysal_utils.py
@ -0,0 +1,142 @@
 import unittest
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 class PysalUtilsTest(unittest.TestCase):
    """Testing class for utility functions related to PySAL integrations"""
    def setUp(self):
        self.params = {"id_col": "cartodb_id",
                       "attr1": "andy",
                       "attr2": "jay_z",
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.params_array = {"id_col": "cartodb_id",
                             "time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
                             "subquery": "SELECT * FROM a_list",
                             "geom_col": "the_geom",
                             "num_ngbrs": 321}
    def test_query_attr_select(self):
        """Test query_attr_select"""
        ans = "i.\"andy\"::numeric As attr1, " \
              "i.\"jay_z\"::numeric As attr2, "
        ans_array = "i.\"_2013_dec\"::numeric As attr1, " \
                    "i.\"_2014_jan\"::numeric As attr2, " \
                    "i.\"_2014_feb\"::numeric As attr3, "
        self.assertEqual(pu.query_attr_select(self.params), ans)
        self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
    def test_query_attr_where(self):
        """Test pu.query_attr_where"""
        ans = "idx_replace.\"andy\" IS NOT NULL AND " \
              "idx_replace.\"jay_z\" IS NOT NULL AND " \
              "idx_replace.\"jay_z\" <> 0"
        ans_array = "idx_replace.\"_2013_dec\" IS NOT NULL AND " \
                    "idx_replace.\"_2014_jan\" IS NOT NULL AND " \
                    "idx_replace.\"_2014_feb\" IS NOT NULL"
        self.assertEqual(pu.query_attr_where(self.params), ans)
        self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
    def test_knn(self):
        """Test knn neighbors constructor"""
        ans = "SELECT i.\"cartodb_id\" As id, " \
                     "i.\"andy\"::numeric As attr1, " \
                     "i.\"jay_z\"::numeric As attr2, " \
                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                                   "FROM (SELECT * FROM a_list) As j " \
                                   "WHERE " \
                                    "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                    "j.\"andy\" IS NOT NULL AND " \
                                    "j.\"jay_z\" IS NOT NULL AND " \
                                    "j.\"jay_z\" <> 0 " \
                                   "ORDER BY " \
                                    "j.\"the_geom\" <-> i.\"the_geom\" ASC " \
                      "LIMIT 321)) As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"andy\" IS NOT NULL AND " \
                    "i.\"jay_z\" IS NOT NULL AND " \
                    "i.\"jay_z\" <> 0 " \
              "ORDER BY i.\"cartodb_id\" ASC;"
        ans_array = "SELECT i.\"cartodb_id\" As id, " \
              "i.\"_2013_dec\"::numeric As attr1, " \
              "i.\"_2014_jan\"::numeric As attr2, " \
              "i.\"_2014_feb\"::numeric As attr3, " \
              "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                            "FROM (SELECT * FROM a_list) As j " \
                            "WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                  "j.\"_2013_dec\" IS NOT NULL AND " \
                                  "j.\"_2014_jan\" IS NOT NULL AND " \
                                  "j.\"_2014_feb\" IS NOT NULL " \
                            "ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
                            "LIMIT 321)) As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"_2013_dec\" IS NOT NULL AND " \
                    "i.\"_2014_jan\" IS NOT NULL AND " \
                    "i.\"_2014_feb\" IS NOT NULL "\
              "ORDER BY i.\"cartodb_id\" ASC;"
        self.assertEqual(pu.knn(self.params), ans)
        self.assertEqual(pu.knn(self.params_array), ans_array)
    def test_queen(self):
        """Test queen neighbors constructor"""
        ans = "SELECT i.\"cartodb_id\" As id, " \
                     "i.\"andy\"::numeric As attr1, " \
                     "i.\"jay_z\"::numeric As attr2, " \
                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                                   "FROM (SELECT * FROM a_list) As j " \
                                   "WHERE " \
                                   "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                   "ST_Touches(i.\"the_geom\", " \
                                              "j.\"the_geom\") AND " \
                                   "j.\"andy\" IS NOT NULL AND " \
                                   "j.\"jay_z\" IS NOT NULL AND " \
                                   "j.\"jay_z\" <> 0)" \
                                  ") As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"andy\" IS NOT NULL AND " \
                    "i.\"jay_z\" IS NOT NULL AND " \
                    "i.\"jay_z\" <> 0 " \
              "ORDER BY i.\"cartodb_id\" ASC;"
        self.assertEqual(pu.queen(self.params), ans)
    def test_construct_neighbor_query(self):
        """Test construct_neighbor_query"""
        # Compare to raw knn query
        self.assertEqual(pu.construct_neighbor_query('knn', self.params),
                         pu.knn(self.params))
    def test_get_attributes(self):
        """Test get_attributes"""
        ## need to add tests
        self.assertEqual(True, True)
    def test_get_weight(self):
        """Test get_weight"""
        self.assertEqual(True, True)
    def test_empty_zipped_array(self):
        """Test empty_zipped_array"""
        ans2 = [(None, None)]
        ans4 = [(None, None, None, None)]
        self.assertEqual(pu.empty_zipped_array(2), ans2)
        self.assertEqual(pu.empty_zipped_array(4), ans4)
--- a/release/python/0.4.0/crankshaft/test/test_segmentation.py
+++ b/release/python/0.4.0/crankshaft/test/test_segmentation.py
@ -0,0 +1,64 @@
 import unittest
 import numpy as np
 from helper import plpy, fixture_file
 import crankshaft.segmentation as segmentation
 import json
 class SegmentationTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
    def generate_random_data(self,n_samples,random_state,  row_type=False):
        x1 = random_state.uniform(size=n_samples)
        x2 = random_state.uniform(size=n_samples)
        x3 = random_state.randint(0, 4, size=n_samples)
        y = x1+x2*x2+x3
        cartodb_id  = range(len(x1))
        if row_type:
            return [ {'features': vals} for vals in zip(x1,x2,x3)], y
        else:
            return  [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
    def test_replace_nan_with_mean(self):
        test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
    def test_create_and_predict_segment(self):
        n_samples = 1000
        random_state_train = np.random.RandomState(13)
        random_state_test = np.random.RandomState(134)
        training_data = self.generate_random_data(n_samples, random_state_train)
        test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
        ids =  [{'cartodb_ids': range(len(test_data))}]
        rows =  [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
        plpy._define_result('select \* from  \(select \* from training\) a  limit 1',rows)
        plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
        plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
        plpy._define_result('.*select \* from test.*' ,test_data)
        model_parameters =  {'n_estimators': 1200,
                             'max_depth': 3,
                             'subsample' : 0.5,
                             'learning_rate': 0.01,
                             'min_samples_leaf': 1}
        result = segmentation.create_and_predict_segment(
                'select * from training',
                'target',
                'select * from test',
                model_parameters)
        prediction = [r[1] for r in result]
        accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
        self.assertEqual(len(result),len(test_data))
        self.assertTrue( result[0][2] < 0.01)
        self.assertTrue( accuracy < 0.5*np.mean(test_y)  )
--- a/release/python/0.4.0/crankshaft/test/test_space_time_dynamics.py
+++ b/release/python/0.4.0/crankshaft/test/test_space_time_dynamics.py
@ -0,0 +1,324 @@
 import unittest
 import numpy as np
 import unittest
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import crankshaft.space_time_dynamics as std
 from crankshaft import random_seeds
 import json
 class SpaceTimeTests(unittest.TestCase):
    """Testing class for Markov Functions."""
    def setUp(self):
        plpy._reset()
        self.params = {"id_col": "cartodb_id",
                       "time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.neighbors_data = json.loads(open(fixture_file('neighbors_markov.json')).read())
        self.markov_data = json.loads(open(fixture_file('markov.json')).read())
        self.time_data = np.array([i * np.ones(10, dtype=float) for i in range(10)]).T
        self.transition_matrix = np.array([
                [[ 0.96341463, 0.0304878 , 0.00609756, 0.        , 0.        ],
                 [ 0.06040268, 0.83221477, 0.10738255, 0.        , 0.        ],
                 [ 0.        , 0.14      , 0.74      , 0.12      , 0.        ],
                 [ 0.        , 0.03571429, 0.32142857, 0.57142857, 0.07142857],
                 [ 0.        , 0.        , 0.        , 0.16666667, 0.83333333]],
                [[ 0.79831933, 0.16806723, 0.03361345, 0.        , 0.        ],
                 [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
                 [ 0.00537634, 0.06989247, 0.8655914 , 0.05913978, 0.        ],
                 [ 0.        , 0.        , 0.06372549, 0.90196078, 0.03431373],
                 [ 0.        , 0.        , 0.        , 0.19444444, 0.80555556]],
                [[ 0.84693878, 0.15306122, 0.        , 0.        , 0.        ],
                 [ 0.08133971, 0.78947368, 0.1291866 , 0.        , 0.        ],
                 [ 0.00518135, 0.0984456 , 0.79274611, 0.0984456 , 0.00518135],
                 [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
                 [ 0.        , 0.        , 0.        , 0.10204082, 0.89795918]],
                [[ 0.8852459 , 0.09836066, 0.        , 0.01639344, 0.        ],
                 [ 0.03875969, 0.81395349, 0.13953488, 0.        , 0.00775194],
                 [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
                 [ 0.        , 0.02339181, 0.12865497, 0.75438596, 0.09356725],
                 [ 0.        , 0.        , 0.        , 0.09661836, 0.90338164]],
                [[ 0.33333333, 0.66666667, 0.        , 0.        , 0.        ],
                 [ 0.0483871 , 0.77419355, 0.16129032, 0.01612903, 0.        ],
                 [ 0.01149425, 0.16091954, 0.74712644, 0.08045977, 0.        ],
                 [ 0.        , 0.01036269, 0.06217617, 0.89637306, 0.03108808],
                 [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]]]
                 )
    def test_spatial_markov(self):
        """Test Spatial Markov."""
        data = [ { 'id': d['id'],
                   'attr1': d['y1995'],
                   'attr2': d['y1996'],
                   'attr3': d['y1997'],
                   'attr4': d['y1998'],
                   'attr5': d['y1999'],
                   'attr6': d['y2000'],
                   'attr7': d['y2001'],
                   'attr8': d['y2002'],
                   'attr9': d['y2003'],
                   'attr10': d['y2004'],
                   'attr11': d['y2005'],
                   'attr12': d['y2006'],
                   'attr13': d['y2007'],
                   'attr14': d['y2008'],
                   'attr15': d['y2009'],
                   'neighbors': d['neighbors'] } for d in self.neighbors_data]
        print(str(data[0]))
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = std.spatial_markov_trend('subquery', ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'], 5, 'knn', 5, 0, 'the_geom', 'cartodb_id')
        self.assertTrue(result != None)
        result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
        print result[0]
        expected = self.markov_data
        for ([res_trend, res_up, res_down, res_vol, res_id],
             [exp_trend, exp_up, exp_down, exp_vol, exp_id]
             ) in zip(result, expected):
            self.assertAlmostEqual(res_trend, exp_trend)
    def test_get_time_data(self):
        """Test get_time_data"""
        data = [ { 'attr1': d['y1995'],
                   'attr2': d['y1996'],
                   'attr3': d['y1997'],
                   'attr4': d['y1998'],
                   'attr5': d['y1999'],
                   'attr6': d['y2000'],
                   'attr7': d['y2001'],
                   'attr8': d['y2002'],
                   'attr9': d['y2003'],
                   'attr10': d['y2004'],
                   'attr11': d['y2005'],
                   'attr12': d['y2006'],
                   'attr13': d['y2007'],
                   'attr14': d['y2008'],
                   'attr15': d['y2009'] } for d in self.neighbors_data]
        result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'])
        ## expected was prepared from PySAL example:
        ### f = ps.open(ps.examples.get_path("usjoin.csv"))
        ### pci = np.array([f.by_col[str(y)] for y in range(1995, 2010)]).transpose()
        ### rpci = pci / (pci.mean(axis = 0))
        expected = np.array([[ 0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154,  0.83271652
        ,  0.83786314, 0.85012593, 0.85509656, 0.86416612, 0.87119375, 0.86302631
        ,  0.86148267, 0.86252252, 0.86746356],
         [ 0.9188951,  0.91757931, 0.92333258, 0.92517289, 0.92552388, 0.90746978
        ,  0.89830489, 0.89431991, 0.88924794, 0.89815176, 0.91832091, 0.91706054
        ,  0.90139505, 0.87897455, 0.86216858],
         [ 0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522, 0.78964559
        ,  0.80584442, 0.8084998,  0.82258551, 0.82668196, 0.82373724, 0.81814804
        ,  0.83675961, 0.83574199, 0.84647177],
         [ 1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841, 1.14506948
        ,  1.12151133, 1.11160697, 1.10888621, 1.11399806, 1.12168029, 1.13164797
        ,  1.12958508, 1.11371818, 1.09936775],
         [ 1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025, 1.16898201
        ,  1.17212488, 1.14752303, 1.11843284, 1.11024964, 1.11943471, 1.11736468
        ,  1.10863242, 1.09642516, 1.07762337],
         [ 1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684, 1.44184737
        ,  1.44782832, 1.41978227, 1.39092208, 1.4059372,  1.40788646, 1.44052766
        ,  1.45241216, 1.43306098, 1.4174431 ],
         [ 1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149, 1.10888138
        ,  1.11856629, 1.13062931, 1.11944984, 1.12446239, 1.11671008, 1.10880034
        ,  1.08401709, 1.06959206, 1.07875225],
         [ 1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545, 0.99854316
        ,  0.9880258,  0.99669587, 0.99327676, 1.01400905, 1.03176742, 1.040511
        ,  1.01749645, 0.9936394,  0.98279746],
         [ 0.98996986, 1.00143564, 0.99491,  1.00188408, 1.00455845, 0.99127006
        ,  0.97925917, 0.9683482,  0.95335147, 0.93694787, 0.94308213, 0.92232874
        ,  0.91284091, 0.89689833, 0.88928858],
         [ 0.87418391, 0.86416601, 0.84425695, 0.8404494,  0.83903044, 0.8578708
        ,  0.86036185, 0.86107306, 0.8500772,  0.86981998, 0.86837929, 0.87204141
        ,  0.86633032, 0.84946077, 0.83287146],
         [ 1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624, 1.14450183
        ,  1.12349752, 1.12596664, 1.12213996, 1.1119989,  1.10257792, 1.10491258
        ,  1.11059842, 1.10509795, 1.10020097],
         [ 0.97282463, 0.96700147, 0.96252588, 0.9653878,  0.96057687, 0.95831051
        ,  0.94480909, 0.94804195, 0.95430286, 0.94103989, 0.92122519, 0.91010201
        ,  0.89280392, 0.89298243, 0.89165385],
         [ 0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647, 0.9480927
        ,  0.93539182, 0.95388718, 0.94597005, 0.96918424, 0.94781281, 0.93466815
        ,  0.94281559, 0.96520315, 0.96715441],
         [ 0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897, 0.98687073
        ,  0.99237486, 0.98209969, 0.9877653,  0.97399471, 0.96910087, 0.98416665
        ,  0.98423613, 0.99823861, 0.99545704],
         [ 0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012,  0.86191535
        ,  0.84981451, 0.85472102, 0.84564835, 0.83998883, 0.83478547, 0.82803648
        ,  0.8198736,  0.82265395, 0.8399404 ],
         [ 0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136, 0.82785597
        ,  0.86008789, 0.86776298, 0.86720209, 0.8676334,  0.89179317, 0.94202108
        ,  0.9422231,  0.93902708, 0.94479184],
         [ 0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238, 0.90906632
        ,  0.92693339, 0.93695966, 0.94242697, 0.94338265, 0.91981796, 0.91108804
        ,  0.90543476, 0.91737138, 0.94793657],
         [ 1.1977611,  1.18222564, 1.18439158, 1.18267865, 1.19286723, 1.20172869
        ,  1.21328691, 1.22624778, 1.22397075, 1.23857042, 1.24419893, 1.23929384
        ,  1.23418676, 1.23626739, 1.26754398],
         [ 1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667, 1.34790023
        ,  1.34399863, 1.32575181, 1.30795492, 1.30544841, 1.30303302, 1.32107766
        ,  1.32936244, 1.33001241, 1.33288462],
         [ 1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093, 1.05059016
        ,  1.03405057, 1.02747623, 1.03162734, 0.9961416,  0.97356208, 0.94241549
        ,  0.92754547, 0.92549227, 0.92138102],
         [ 1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264, 1.13889622
        ,  1.12442212, 1.13367018, 1.13982256, 1.14029944, 1.11979401, 1.10905389
        ,  1.10577769, 1.11166825, 1.09985155],
         [ 0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284, 0.74480073
        ,  0.76098396, 0.76156903, 0.76651952, 0.76533288, 0.78205934, 0.76842416
        ,  0.77487118, 0.77768683, 0.78801192],
         [ 0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803, 0.97370819
        ,  0.96419154, 0.97209861, 0.97441313, 0.96356162, 0.94745352, 0.93965462
        ,  0.93069645, 0.94020973, 0.94358232],
         [ 0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801, 0.80071489
        ,  0.83358256, 0.83451613, 0.85175032, 0.85954307, 0.86790024, 0.87170334
        ,  0.87863799, 0.87497981, 0.87888675],
         [ 0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619, 0.98733195
        ,  0.99644997, 0.99669587, 1.02559097, 1.01116651, 0.99988024, 0.97906749
        ,  0.99323123, 1.00204939, 0.99602148],
         [ 1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683, 1.08312397
        ,  1.05192626, 1.04230892, 1.05577278, 1.08569751, 1.12443486, 1.08891079
        ,  1.08603695, 1.05997314, 1.02160943],
         [ 1.11368269, 1.1057147,  1.11893431, 1.13778669, 1.1432272,  1.18257029
        ,  1.16226243, 1.16009196, 1.14467789, 1.14820235, 1.12386598, 1.12680236
        ,  1.12357937, 1.1159258,  1.12570828],
         [ 1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667, 1.31210239
        ,  1.29989156, 1.29203193, 1.27183516, 1.26830786, 1.2617743,  1.28656675
        ,  1.29734097, 1.29390205, 1.29345446],
         [ 0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864, 0.78772975
        ,  0.82848011, 0.8259679,  0.82435705, 0.83108634, 0.84373784, 0.83891093
        ,  0.84349247, 0.85637272, 0.86539395],
         [ 1.23450087, 1.2426022,  1.23537935, 1.23581293, 1.24522626, 1.2256767
        ,  1.21126648, 1.19377804, 1.18355337, 1.19674434, 1.21536573, 1.23653297
        ,  1.27962009, 1.27968392, 1.25907738],
         [ 0.9769662,  0.97400719, 0.98035944, 0.97581531, 0.95543282, 0.96480308
        ,  0.94686376, 0.93679073, 0.92540049, 0.92988835, 0.93442917, 0.92100464
        ,  0.91475304, 0.90249622, 0.9021363 ],
         [ 0.84986886, 0.8986851,  0.84295997, 0.87280534, 0.85659368, 0.88937573
        ,  0.894401, 0.90448993, 0.95495898, 0.92698333, 0.94745352, 0.92562488
        ,  0.96635366, 1.02520312, 1.0394296 ],
         [ 1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073, 1.00759019
        ,  0.99192968, 0.99747298, 0.99550759, 0.97583768, 0.9610168,  0.94779638
        ,  0.93759089, 0.93353431, 0.94121705],
         [ 0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613, 0.83434854
        ,  0.85813595, 0.84667961, 0.84374558, 0.85951183, 0.87194227, 0.89455097
        ,  0.88283929, 0.90349491, 0.90600675],
         [ 1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086, 1.00581626
        ,  0.98850522, 0.99291168, 0.98983209, 0.97511924, 0.96134615, 0.96382634
        ,  0.95011401, 0.9434686,  0.94637765],
         [ 1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857, 1.04800023
        ,  1.03024941, 1.04200483, 1.0402554,  1.03296979, 1.02191682, 1.02476275
        ,  1.02347523, 1.02517684, 1.04359571],
         [ 1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043,  1.0531801
        ,  1.07452771, 1.09383478, 1.1052447,  1.10322136, 1.09167939, 1.08772756
        ,  1.08859544, 1.09177338, 1.1096083 ],
         [ 0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809, 0.86287327
        ,  0.85169796, 0.85411285, 0.84886336, 0.84517414, 0.84843858, 0.84488343
        ,  0.83374329, 0.82812044, 0.82878599],
         [ 0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286, 0.92652175
        ,  0.94278865, 0.93682452, 0.98655146, 0.992237, 0.9798497,  0.93869677
        ,  0.96947771, 1.00362626, 0.98102351],
         [ 0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967,  0.93092109
        ,  0.92662519, 0.93412152, 0.93501274, 0.92879506, 0.92110542, 0.91035556
        ,  0.90430364, 0.89994694, 0.90073864],
         [ 0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824, 0.98882205
        ,  0.97662234, 0.95601578, 0.94905385, 0.94934888, 0.97152609, 0.97163004
        ,  0.9700702,  0.97158948, 0.95884908],
         [ 0.83980439, 0.84726737, 0.85747,  0.85467221, 0.8556751,  0.84818516
        ,  0.85265681, 0.84502402, 0.82645665, 0.81743586, 0.83550406, 0.83338919
        ,  0.83511679, 0.82136617, 0.80921874],
         [ 0.95118156, 0.9466212,  0.94688098, 0.9508583,  0.9512441,  0.95440787
        ,  0.96364363, 0.96804412, 0.97136214, 0.97583768, 0.95571724, 0.96895368
        ,  0.97001634, 0.97082733, 0.98782366],
         [ 1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249, 1.10558188
        ,  1.1214086,  1.12292577, 1.13021031, 1.13342735, 1.14686068, 1.14502975
        ,  1.14474747, 1.14084037, 1.16142926],
         [ 1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863, 1.11856702
        ,  1.09764283, 1.08815849, 1.08044313, 1.09278827, 1.07003204, 1.08398066
        ,  1.09831768, 1.09298232, 1.09176125],
         [ 0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744, 0.77751194
        ,  0.79902974, 0.81437881, 0.80788828, 0.79603865, 0.78966436, 0.79949807
        ,  0.80172182, 0.82168155, 0.85587911],
         [ 1.0052447,  1.00007696, 1.00475899, 1.00613942, 1.00639561, 1.00162979
        ,  0.99860739, 1.00814981, 1.00574316, 0.99030032, 0.97682565, 0.97292596
        ,  0.96519561, 0.96173403, 0.95890284],
         [ 0.95808419, 0.9382568,  0.9654441,  0.95561201, 0.96987289, 0.96608031
        ,  0.99727185, 1.00781194, 1.03484236, 1.05333619, 1.0983263,  1.1704974
        ,  1.17025154, 1.18730553, 1.14242645]])
        self.assertTrue(np.allclose(result, expected))
        self.assertTrue(type(result) == type(expected))
        self.assertTrue(result.shape == expected.shape)
    def test_rebin_data(self):
        """Test rebin_data"""
        ## sample in double the time (even case since 10 % 2 = 0):
        ##   (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
        ## = 0.5,     2.5,     4.5,     6.5,     8.5
        ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
                             for i in range(0, 10, 2)]).T
        self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
        ## sample in triple the time (uneven since 10 % 3 = 1):
        ##   (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
        ## = 1,         4,         7,         9
        ans_odd  = np.array([i * np.ones(10, dtype=float)
                             for i in (1, 4, 7, 9)]).T
        self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
    def test_get_prob_dist(self):
        """Test get_prob_dist"""
        lag_indices = np.array([1, 2, 3, 4])
        unit_indices = np.array([1, 3, 2, 4])
        answer = np.array([
            [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
            [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
            [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
            [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]
        ])
        result = std.get_prob_dist(self.transition_matrix, lag_indices, unit_indices)
        self.assertTrue(np.array_equal(result, answer))
    def test_get_prob_stats(self):
        """Test get_prob_stats"""
        probs = np.array([
            [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
            [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
            [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
            [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]
        ])
        unit_indices = np.array([1, 3, 2, 4])
        answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
        answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
        answer_trend = np.array([-0.03301887 / 0.88207547, -0.05882353 / 0.87058824,  0.02475248 / 0.77722772, -0.02352941 / 0.97647059])
        answer_volatility = np.array([ 0.34221495,  0.33705421,  0.29226542,  0.38834223])
        result = std.get_prob_stats(probs, unit_indices)
        result_up = result[0]
        result_down = result[1]
        result_trend = result[2]
        result_volatility = result[3]
        self.assertTrue(np.allclose(result_up, answer_up))
        self.assertTrue(np.allclose(result_down, answer_down))
        self.assertTrue(np.allclose(result_trend, answer_trend))
        self.assertTrue(np.allclose(result_volatility, answer_volatility))
--- a/release/python/0.4.1/crankshaft/crankshaft/init.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/init.py
@ -0,0 +1,5 @@
 """Import all modules"""
 import crankshaft.random_seeds
 import crankshaft.clustering
 import crankshaft.space_time_dynamics
 import crankshaft.segmentation
--- a/release/python/0.4.1/crankshaft/crankshaft/clustering/init.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/clustering/init.py
@ -0,0 +1,3 @@
 """Import all functions from for clustering"""
 from moran import *
 from kmeans import *
--- a/release/python/0.4.1/crankshaft/crankshaft/clustering/kmeans.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/clustering/kmeans.py
@ -0,0 +1,18 @@
 from sklearn.cluster import KMeans
 import plpy
 def kmeans(query, no_clusters, no_init=20):
    data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
        array_agg(ST_X(the_geom) order by cartodb_id) xs,
        array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
        where the_geom is not null
    '''.format(query=query))
    xs  = data[0]['xs']
    ys  = data[0]['ys']
    ids = data[0]['ids']
    km = KMeans(n_clusters= no_clusters, n_init=no_init)
    labels = km.fit_predict(zip(xs,ys))
    return zip(ids,labels)
--- a/release/python/0.4.1/crankshaft/crankshaft/clustering/moran.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/clustering/moran.py
@ -0,0 +1,243 @@
 """
 Moran's I geostatistics (global clustering & outliers presence)
 """
 # TODO: Fill in local neighbors which have null/NoneType values with the
 #       average of the their neighborhood
 import pysal as ps
 import plpy
 from collections import OrderedDict
 # crankshaft module
 import crankshaft.pysal_utils as pu
 # High level interface ---------------------------------------
 def moran(subquery, attr_name,
          w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I (global)
    Implementation building neighbors with a PostGIS database and Moran's I
     core clusters with PySAL.
    Andy Eschbacher
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr_name),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(2)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(2)
    ## collect attributes
    attr_vals = pu.get_attributes(result)
    ## calculate weights
    weight = pu.get_weight(result, w_type, num_ngbrs)
    ## calculate moran global
    moran_global = ps.esda.moran.Moran(attr_vals, weight,
                                       permutations=permutations)
    return zip([moran_global.I], [moran_global.EI])
 def moran_local(subquery, attr,
                w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I implementation for PL/Python
    Andy Eschbacher
    """
    # geometries with attributes that are null are ignored
    # resulting in a collection of not as near neighbors
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(5)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(5)
    attr_vals = pu.get_attributes(result)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
                                     permutations=permutations)
    # find quadrants for each geometry
    quads = quad_position(lisa.q)
    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
 def moran_rate(subquery, numerator, denominator,
               w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I Rate (global)
    Andy Eschbacher
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", numerator),
                         ("attr2", denominator)
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(2)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(2)
    ## collect attributes
    numer = pu.get_attributes(result, 1)
    denom = pu.get_attributes(result, 2)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    ## calculate moran global rate
    lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
                                         permutations=permutations)
    return zip([lisa_rate.I], [lisa_rate.EI])
 def moran_local_rate(subquery, numerator, denominator,
                     w_type, num_ngbrs, permutations, geom_col, id_col):
    """
        Moran's I Local Rate
        Andy Eschbacher
    """
    # geometries with values that are null are ignored
    # resulting in a collection of not as near neighbors
    qvals = OrderedDict([("id_col", id_col),
                         ("numerator", numerator),
                         ("denominator", denominator),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(5)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(5)
    ## collect attributes
    numer = pu.get_attributes(result, 1)
    denom = pu.get_attributes(result, 2)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
                                          permutations=permutations)
    # find quadrants for each geometry
    quads = quad_position(lisa.q)
    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
 def moran_local_bv(subquery, attr1, attr2,
                   permutations, geom_col, id_col, w_type, num_ngbrs):
    """
        Moran's I (local) Bivariate (untested)
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr1),
                         ("attr2", attr2),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(4)
    except plpy.SPIError:
        plpy.error("Error: areas of interest query failed, " \
                   "check input parameters")
        return pu.empty_zipped_array(4)
    ## collect attributes
    attr1_vals = pu.get_attributes(result, 1)
    attr2_vals = pu.get_attributes(result, 2)
    # create weights
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
                                        permutations=permutations)
    # find clustering of significance
    lisa_sig = quad_position(lisa.q)
    return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
 # Low level functions ----------------------------------------
 def map_quads(coord):
    """
        Map a quadrant number to Moran's I designation
        HH=1, LH=2, LL=3, HL=4
        Input:
        @param coord (int): quadrant of a specific measurement
        Output:
            classification (one of 'HH', 'LH', 'LL', or 'HL')
    """
    if coord == 1:
        return 'HH'
    elif coord == 2:
        return 'LH'
    elif coord == 3:
        return 'LL'
    elif coord == 4:
        return 'HL'
    else:
        return None
 def quad_position(quads):
    """
        Produce Moran's I classification based of n
        Input:
        @param quads ndarray: an array of quads classified by
          1-4 (PySAL default)
        Output:
        @param list: an array of quads classied by 'HH', 'LL', etc.
    """
    return [map_quads(q) for q in quads]
--- a/release/python/0.4.1/crankshaft/crankshaft/pysal_utils/init.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/pysal_utils/init.py
@ -0,0 +1,2 @@
 """Import all functions for pysal_utils"""
 from crankshaft.pysal_utils.pysal_utils import *
--- a/release/python/0.4.1/crankshaft/crankshaft/pysal_utils/pysal_utils.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/pysal_utils/pysal_utils.py
@ -0,0 +1,188 @@
 """
    Utilities module for generic PySAL functionality, mainly centered on
      translating queries into numpy arrays or PySAL weights objects
 """
 import numpy as np
 import pysal as ps
 def construct_neighbor_query(w_type, query_vals):
    """Return query (a string) used for finding neighbors
        @param w_type text: type of neighbors to calculate ('knn' or 'queen')
        @param query_vals dict: values used to construct the query
    """
    if w_type.lower() == 'knn':
        return knn(query_vals)
    else:
        return queen(query_vals)
 ## Build weight object
 def get_weight(query_res, w_type='knn', num_ngbrs=5):
    """
        Construct PySAL weight from return value of query
        @param query_res dict-like: query results with attributes and neighbors
    """
    # if w_type.lower() == 'knn':
    #     row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
    #     weights = {x['id']: row_normed_weights for x in query_res}
    # else:
    #     weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
    #                         if len(x['neighbors']) > 0
    #                         else [] for x in query_res}
    neighbors = {x['id']: x['neighbors'] for x in query_res}
    print 'len of neighbors: %d' % len(neighbors)
    built_weight = ps.W(neighbors)
    built_weight.transform = 'r'
    return built_weight
 def query_attr_select(params):
    """
        Create portion of SELECT statement for attributes inolved in query.
        @param params: dict of information used in query (column names,
                       table name, etc.)
    """
    attr_string = ""
    template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
    if 'time_cols' in params:
        ## if markov analysis
        attrs = params['time_cols']
        for idx, val in enumerate(attrs):
            attr_string += template % {"col": val, "alias_num": idx + 1}
    else:
        ## if moran's analysis
        attrs = [k for k in params
                 if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')]
        for idx, val in enumerate(sorted(attrs)):
            attr_string += template % {"col": params[val], "alias_num": idx + 1}
    return attr_string
 def query_attr_where(params):
    """
      Construct where conditions when building neighbors query
        Create portion of WHERE clauses for weeding out NULL-valued geometries
        Input: dict of params:
            {'subquery': ...,
             'numerator': 'data1',
             'denominator': 'data2',
             '': ...}
        Output: 'idx_replace."data1" IS NOT NULL AND idx_replace."data2" IS NOT NULL'
        Input:
        {'subquery': ...,
         'time_cols': ['time1', 'time2', 'time3'],
         'etc': ...}
        Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
          NULL AND idx_replace."time3" IS NOT NULL'
    """
    attr_string = []
    template = "idx_replace.\"%s\" IS NOT NULL"
    if 'time_cols' in params:
        ## markov where clauses
        attrs = params['time_cols']
        # add values to template
        for attr in attrs:
            attr_string.append(template % attr)
    else:
        ## moran where clauses
        # get keys
        attrs = sorted([k for k in params
                        if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')])
        # add values to template
        for attr in attrs:
            attr_string.append(template % params[attr])
        if len(attrs) == 2:
            attr_string.append("idx_replace.\"%s\" <> 0" % params[attrs[1]])
    out = " AND ".join(attr_string)
    return out
 def knn(params):
    """SQL query for k-nearest neighbors.
        @param vars: dict of values to fill template
    """
    attr_select = query_attr_select(params)
    attr_where = query_attr_where(params)
    replacements = {"attr_select": attr_select,
                    "attr_where_i": attr_where.replace("idx_replace", "i"),
                    "attr_where_j": attr_where.replace("idx_replace", "j")}
    query = "SELECT " \
                "i.\"{id_col}\" As id, " \
                "%(attr_select)s" \
                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
                              "FROM ({subquery}) As j " \
                              "WHERE " \
                                "i.\"{id_col}\" <> j.\"{id_col}\" AND " \
                                "%(attr_where_j)s " \
                              "ORDER BY " \
                                "j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
                              "LIMIT {num_ngbrs})" \
                ") As neighbors " \
            "FROM ({subquery}) As i " \
            "WHERE " \
                "%(attr_where_i)s " \
            "ORDER BY i.\"{id_col}\" ASC;" % replacements
    return query.format(**params)
 ## SQL query for finding queens neighbors (all contiguous polygons)
 def queen(params):
    """SQL query for queen neighbors.
        @param params dict: information to fill query
    """
    attr_select = query_attr_select(params)
    attr_where = query_attr_where(params)
    replacements = {"attr_select": attr_select,
                    "attr_where_i": attr_where.replace("idx_replace", "i"),
                    "attr_where_j": attr_where.replace("idx_replace", "j")}
    query = "SELECT " \
                "i.\"{id_col}\" As id, " \
                "%(attr_select)s" \
                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
                 "FROM ({subquery}) As j " \
                 "WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
                       "ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
                       "%(attr_where_j)s)" \
                ") As neighbors " \
            "FROM ({subquery}) As i " \
            "WHERE " \
                "%(attr_where_i)s " \
            "ORDER BY i.\"{id_col}\" ASC;" % replacements
    return query.format(**params)
 ## to add more weight methods open a ticket or pull request
 def get_attributes(query_res, attr_num=1):
    """
        @param query_res: query results with attributes and neighbors
        @param attr_num: attribute number (1, 2, ...)
    """
    return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
 def empty_zipped_array(num_nones):
    """
        prepare return values for cases of empty weights objects (no neighbors)
        Input:
        @param num_nones int: number of columns (e.g., 4)
        Output:
        [(None, None, None, None)]
    """
    return [tuple([None] * num_nones)]
--- a/release/python/0.4.1/crankshaft/crankshaft/random_seeds.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/random_seeds.py
@ -0,0 +1,11 @@
 """Random seed generator used for non-deterministic functions in crankshaft"""
 import random
 import numpy
 def set_random_seeds(value):
    """
    Set the seeds of the RNGs (Random Number Generators)
    used internally.
    """
    random.seed(value)
    numpy.random.seed(value)
--- a/release/python/0.4.1/crankshaft/crankshaft/segmentation/init.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/segmentation/init.py
@ -0,0 +1 @@
 from segmentation import * 
--- a/release/python/0.4.1/crankshaft/crankshaft/segmentation/segmentation.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/segmentation/segmentation.py
@ -0,0 +1,176 @@
 """
 Segmentation creation and prediction
 """
 import sklearn
 import numpy as np
 import plpy
 from sklearn.ensemble import GradientBoostingRegressor
 from sklearn import metrics
 from sklearn.cross_validation import train_test_split
 # Lower level functions
 #----------------------
 def replace_nan_with_mean(array):
    """
        Input:
            @param array: an array of floats which may have null-valued entries
        Output:
            array with nans filled in with the mean of the dataset
    """
    # returns an array of rows and column indices
    indices = np.where(np.isnan(array))
    # iterate through entries which have nan values
    for row, col in zip(*indices):
            array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
    return array
 def get_data(variable, feature_columns, query):
    """
        Fetch data from the database, clean, and package into
          numpy arrays
        Input:
            @param variable: name of the target variable
            @param feature_columns: list of column names
            @param query: subquery that data is pulled from for the packaging
        Output:
            prepared data, packaged into NumPy arrays
    """
    columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
    try:
        data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
            variable=variable,
            columns=columns,
            query=query))
    except Exception, e:
        plpy.error('Failed to access data to build segmentation model: %s' % e)
    # extract target data from plpy object
    target = np.array(data[0]['target'])
    # put n feature data arrays into an n x m array of arrays
    features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
    return replace_nan_with_mean(target), replace_nan_with_mean(features)
 # High level interface
 # --------------------
 def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
    """
    Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
    the function.
        Input:
            @param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
            @param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
            @param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
            @param model_parameters: A dictionary containing parameters for the model.
    """
    clean_target = replace_nan_with_mean(target)
    clean_features = replace_nan_with_mean(features)
    target_features = replace_nan_with_mean(target_features)
    model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
    prediction = model.predict(target_features)
    accuracy_array = [accuracy]*prediction.shape[0]
    return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
 def create_and_predict_segment(query, variable, target_query, model_params):
    """
    generate a segment with machine learning
    Stuart Lynn
    """
    ## fetch column names
    try:
        columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1  '.format(query=query))[0].keys()
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    ## extract column names to be used in building the segmentation model
    feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
    ## get data from database
    target, features = get_data(variable, feature_columns, query)
    model, accuracy = train_model(target, features, model_params, 0.2)
    cartodb_ids, result = predict_segment(model, feature_columns, target_query)
    accuracy_array = [accuracy]*result.shape[0]
    return zip(cartodb_ids, result, accuracy_array)
 def train_model(target, features, model_params, test_split):
    """
        Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
        Input:
            @param target: 1D Array of the variable that the model is to be trianed to predict
            @param features: 2D Array NSamples * NFeatures to use in trining the model
            @param model_params: A dictionary of model parameters, the full specification can be found on the
                scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
            @parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
    """
    features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
    model = GradientBoostingRegressor(**model_params)
    model.fit(features_train, target_train)
    accuracy = calculate_model_accuracy(model, features, target)
    return model, accuracy
 def calculate_model_accuracy(model, features, target):
    """
        Calculate the mean squared error of the model prediction
        Input:
            @param model: model trained from input features
            @param features: features to make a prediction from
            @param target: target to compare prediction to
        Output:
            mean squared error of the model prection compared to the target
    """
    prediction = model.predict(features)
    return metrics.mean_squared_error(prediction, target)
 def predict_segment(model, features, target_query):
    """
    Use the provided model to predict the values for the new feature set
        Input:
            @param model: The pretrained model
            @features: A list of features to use in the model prediction (list of column names)
            @target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
    """
    batch_size = 1000
    joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
    try:
        cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
            joined_features=joined_features,
            target_query=target_query))
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    results = []
    while True:
        rows = cursor.fetch(batch_size)
        if not rows:
            break
        batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
        #Need to fix this. Should be global mean. This will cause weird effects
        batch = replace_nan_with_mean(batch)
        prediction = model.predict(batch)
        results.append(prediction)
    try:
        cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    return cartodb_ids, np.concatenate(results)
--- a/release/python/0.4.1/crankshaft/crankshaft/space_time_dynamics/init.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/space_time_dynamics/init.py
@ -0,0 +1,2 @@
 """Import all functions from clustering libraries."""
 from markov import *
--- a/release/python/0.4.1/crankshaft/crankshaft/space_time_dynamics/markov.py
+++ b/release/python/0.4.1/crankshaft/crankshaft/space_time_dynamics/markov.py
@ -0,0 +1,189 @@
 """
 Spatial dynamics measurements using Spatial Markov
 """
 import numpy as np
 import pysal as ps
 import plpy
 import crankshaft.pysal_utils as pu
 def spatial_markov_trend(subquery, time_cols, num_classes=7,
                         w_type='knn', num_ngbrs=5, permutations=0,
                         geom_col='the_geom', id_col='cartodb_id'):
    """
        Predict the trends of a unit based on:
        1. history of its transitions to different classes (e.g., 1st quantile -> 2nd quantile)
        2. average class of its neighbors
        Inputs:
        @param subquery string: e.g., SELECT the_geom, cartodb_id,
          interesting_time_column FROM table_name
        @param time_cols list of strings: list of strings of column names
        @param num_classes (optional): number of classes to break distribution
          of values into. Currently uses quantile bins.
        @param w_type string (optional): weight type ('knn' or 'queen')
        @param num_ngbrs int (optional): number of neighbors (if knn type)
        @param permutations int (optional): number of permutations for test
          stats
        @param geom_col string (optional): name of column which contains the
          geometries
        @param id_col string (optional): name of column which has the ids of
          the table
        Outputs:
        @param trend_up float: probablity that a geom will move to a higher
          class
        @param trend_down float: probablity that a geom will move to a lower
          class
        @param trend float: (trend_up - trend_down) / trend_static
        @param volatility float: a measure of the volatility based on
          probability stddev(prob array)
    """
    if len(time_cols) < 2:
        plpy.error('More than one time column needs to be passed')
    qvals = {"id_col": id_col,
             "time_cols": time_cols,
             "geom_col": geom_col,
             "subquery": subquery,
             "num_ngbrs": num_ngbrs}
    try:
        query_result = plpy.execute(
            pu.construct_neighbor_query(w_type, qvals)
        )
        if len(query_result) == 0:
            return zip([None], [None], [None], [None], [None])
    except plpy.SPIError, e:
        plpy.debug('Query failed with exception %s: %s' % (err, pu.construct_neighbor_query(w_type, qvals)))
        plpy.error('Analysis failed: %s' % e)
        return zip([None], [None], [None], [None], [None])
    ## build weight
    weights = pu.get_weight(query_result, w_type)
    weights.transform = 'r'
    ## prep time data
    t_data = get_time_data(query_result, time_cols)
    plpy.debug('shape of t_data %d, %d' % t_data.shape)
    plpy.debug('number of weight objects: %d, %d' % (weights.sparse).shape)
    plpy.debug('first num elements: %f' % t_data[0, 0])
    sp_markov_result = ps.Spatial_Markov(t_data,
                                         weights,
                                         k=num_classes,
                                         fixed=False,
                                         permutations=permutations)
    ## get lag classes
    lag_classes = ps.Quantiles(
        ps.lag_spatial(weights, t_data[:, -1]),
        k=num_classes).yb
    ## look up probablity distribution for each unit according to class and lag class
    prob_dist = get_prob_dist(sp_markov_result.P,
                              lag_classes,
                              sp_markov_result.classes[:, -1])
    ## find the ups and down and overall distribution of each cell
    trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist,
                                                             sp_markov_result.classes[:, -1])
    ## output the results
    return zip(trend, trend_up, trend_down, volatility, weights.id_order)
 def get_time_data(markov_data, time_cols):
    """
        Extract the time columns and bin appropriately
    """
    num_attrs = len(time_cols)
    return np.array([[x['attr' + str(i)] for x in markov_data]
                     for i in range(1, num_attrs+1)], dtype=float).transpose()
 ## not currently used
 def rebin_data(time_data, num_time_per_bin):
    """
        Convert an n x l matrix into an (n/m) x l matrix where the values are
         reduced (averaged) for the intervening states:
          1 2 3 4    1.5 3.5
          5 6 7 8 -> 5.5 7.5
          9 8 7 6    8.5 6.5
          5 4 3 2    4.5 2.5
          if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
        This process effectively resamples the data at a longer time span n
         units longer than the input data.
        For cases when there is a remainder (remainder(5/3) = 2), the remaining
         two columns are binned together as the last time period, while the
         first three are binned together for the first period.
        Input:
          @param time_data n x l  ndarray: measurements of an attribute at
           different time intervals
          @param num_time_per_bin int: number of columns to average into a new
           column
        Output:
          ceil(n / m) x l ndarray of resampled time series
    """
    if time_data.shape[1] % num_time_per_bin == 0:
        ## if fit is perfect, then use it
        n_max = time_data.shape[1] / num_time_per_bin
    else:
        ## fit remainders into an additional column
        n_max = time_data.shape[1] / num_time_per_bin + 1
    return np.array([time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
                     for i in range(n_max)]).T
 def get_prob_dist(transition_matrix, lag_indices, unit_indices):
    """
        Given an array of transition matrices, look up the probability
        associated with the arrangements passed
        Input:
        @param transition_matrix ndarray[k,k,k]:
        @param lag_indices ndarray:
        @param unit_indices ndarray:
        Output:
        Array of probability distributions
    """
    return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
                     for i in range(len(lag_indices))])
 def get_prob_stats(prob_dist, unit_indices):
    """
        get the statistics of the probability distributions
        Outputs:
            @param trend_up ndarray(float): sum of probabilities for upward
               movement (relative to the unit index of that prob)
            @param trend_down ndarray(float): sum of probabilities for downward
               movement (relative to the unit index of that prob)
            @param trend ndarray(float): difference of upward and downward
               movements
    """
    num_elements = len(unit_indices)
    trend_up = np.empty(num_elements, dtype=float)
    trend_down = np.empty(num_elements, dtype=float)
    trend = np.empty(num_elements, dtype=float)
    for i in range(num_elements):
        trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
        trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
        if prob_dist[i, unit_indices[i]] > 0.0:
            trend[i] = (trend_up[i] - trend_down[i]) / prob_dist[i, unit_indices[i]]
        else:
            trend[i] = None
    ## calculate volatility of distribution
    volatility = prob_dist.std(axis=1)
    return trend_up, trend_down, trend, volatility
--- a/release/python/0.4.1/crankshaft/setup.py
+++ b/release/python/0.4.1/crankshaft/setup.py
@ -0,0 +1,49 @@
 """
 CartoDB Spatial Analysis Python Library
 See:
 https://github.com/CartoDB/crankshaft
 """
 from setuptools import setup, find_packages
 setup(
    name='crankshaft',
    version='0.0.0',
    description='CartoDB Spatial Analysis Python Library',
    url='https://github.com/CartoDB/crankshaft',
    author='Data Services Team - CartoDB',
    author_email='dataservices@cartodb.com',
    license='MIT',
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Mapping comunity',
        'Topic :: Maps :: Mapping Tools',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 2.7',
    ],
    keywords='maps mapping tools spatial analysis geostatistics',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    extras_require={
        'dev': ['unittest'],
        'test': ['unittest', 'nose', 'mock'],
    },
    # The choice of component versions is dictated by what's
    # provisioned in the production servers.
    # IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
    install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
    requires=['pysal', 'numpy', 'sklearn'],
    test_suite='test'
 )
--- a/release/python/0.4.1/crankshaft/setup.py-r
+++ b/release/python/0.4.1/crankshaft/setup.py-r
@ -0,0 +1,49 @@
 """
 CartoDB Spatial Analysis Python Library
 See:
 https://github.com/CartoDB/crankshaft
 """
 from setuptools import setup, find_packages
 setup(
    name='crankshaft',
    version='0.0.0',
    description='CartoDB Spatial Analysis Python Library',
    url='https://github.com/CartoDB/crankshaft',
    author='Data Services Team - CartoDB',
    author_email='dataservices@cartodb.com',
    license='MIT',
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Mapping comunity',
        'Topic :: Maps :: Mapping Tools',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 2.7',
    ],
    keywords='maps mapping tools spatial analysis geostatistics',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    extras_require={
        'dev': ['unittest'],
        'test': ['unittest', 'nose', 'mock'],
    },
    # The choice of component versions is dictated by what's
    # provisioned in the production servers.
    # IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
    install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
    requires=['pysal', 'numpy', 'sklearn'],
    test_suite='test'
 )
--- a/release/python/0.4.1/crankshaft/test/fixtures/kmeans.json
+++ b/release/python/0.4.1/crankshaft/test/fixtures/kmeans.json
@ -0,0 +1 @@
 [{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]
--- a/release/python/0.4.1/crankshaft/test/fixtures/markov.json
+++ b/release/python/0.4.1/crankshaft/test/fixtures/markov.json
@ -0,0 +1 @@
 [[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]
--- a/release/python/0.4.1/crankshaft/test/fixtures/moran.json
+++ b/release/python/0.4.1/crankshaft/test/fixtures/moran.json
@ -0,0 +1,52 @@
 [[0.9319096128346788, "HH"],
 [-1.135787401862846, "HL"],
 [0.11732030672508517, "LL"],
 [0.6152779669180425, "LL"],
 [-0.14657336660125297, "LH"],
 [0.6967858120189607, "LL"],
 [0.07949310115714454, "HH"],
 [0.4703198759258987, "HH"],
 [0.4421125200498064, "HH"],
 [0.5724288737143592, "LL"],
 [0.8970743435692062, "LL"],
 [0.18327334401918674, "LL"],
 [-0.01466729201304962, "HL"],
 [0.3481559372544409, "LL"],
 [0.06547094736902978, "LL"],
 [0.15482141569329988, "HH"],
 [0.4373841193538136, "HH"],
 [0.15971286468915544, "LL"],
 [1.0543588860308968, "HH"],
 [1.7372866900020818, "HH"],
 [1.091998586053999, "LL"],
 [0.1171572584252222, "HH"],
 [0.08438455015300014, "LL"],
 [0.06547094736902978, "LL"],
 [0.15482141569329985, "HH"],
 [1.1627044812890683, "HH"],
 [0.06547094736902978, "LL"],
 [0.795275137550483, "HH"],
 [0.18562939195219, "LL"],
 [0.3010757406693439, "LL"],
 [2.8205795942839376, "HH"],
 [0.11259190602909264, "LL"],
 [-0.07116352791516614, "HL"],
 [-0.09945240794119009, "LH"],
 [0.18562939195219, "LL"],
 [0.1832733440191868, "LL"],
 [-0.39054253768447705, "HL"],
 [-0.1672071289487642, "HL"],
 [0.3337669247916343, "HH"],
 [0.2584386102554792, "HH"],
 [-0.19733845476322634, "HL"],
 [-0.9379282899805409, "LH"],
 [-0.028770969951095866, "LH"],
 [0.051367269430983485, "LL"],
 [-0.2172548045913472, "LH"],
 [0.05136726943098351, "LL"],
 [0.04191046803899837, "LL"],
 [0.7482357030403517, "HH"],
 [-0.014585767863118111, "LH"],
 [0.5410013139159929, "HH"],
 [1.0223932668429925, "LL"],
 [1.4179402898927476, "LL"]]
--- a/release/python/0.4.1/crankshaft/test/fixtures/neighbors.json
+++ b/release/python/0.4.1/crankshaft/test/fixtures/neighbors.json
@ -0,0 +1,54 @@
 [
    {"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
    {"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
    {"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
    {"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
    {"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
    {"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
    {"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
    {"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
    {"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
    {"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
    {"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
    {"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
    {"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
    {"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
    {"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
    {"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
    {"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
    {"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
    {"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
    {"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
    {"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
    {"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
    {"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
    {"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
    {"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
    {"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
    {"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
    {"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
    {"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
    {"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
    {"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
    {"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
    {"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
    {"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
    {"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
    {"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
    {"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
    {"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
    {"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
    {"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
    {"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
    {"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
    {"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
    {"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
    {"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
    {"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
    {"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
    {"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
    {"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
    {"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
    {"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
    {"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
  ]
--- a/release/python/0.4.1/crankshaft/test/fixtures/neighbors_markov.json
+++ b/release/python/0.4.1/crankshaft/test/fixtures/neighbors_markov.json
--- a/release/python/0.4.1/crankshaft/test/helper.py
+++ b/release/python/0.4.1/crankshaft/test/helper.py
@ -0,0 +1,13 @@
 import unittest
 from mock_plpy import MockPlPy
 plpy = MockPlPy()
 import sys
 sys.modules['plpy'] = plpy
 import os
 def fixture_file(name):
    dir = os.path.dirname(os.path.realpath(__file__))
    return os.path.join(dir, 'fixtures', name)
--- a/release/python/0.4.1/crankshaft/test/mock_plpy.py
+++ b/release/python/0.4.1/crankshaft/test/mock_plpy.py
@ -0,0 +1,52 @@
 import re
 class MockCursor:
    def __init__(self, data):
        self.cursor_pos = 0
        self.data = data
    def fetch(self, batch_size):
        batch = self.data[self.cursor_pos : self.cursor_pos + batch_size]
        self.cursor_pos += batch_size
        return batch
 class MockPlPy:
    def __init__(self):
        self._reset()
    def _reset(self):
        self.infos = []
        self.notices = []
        self.debugs = []
        self.logs = []
        self.warnings = []
        self.errors = []
        self.fatals = []
        self.executes = []
        self.results = []
        self.prepares = []
        self.results = []
    def _define_result(self, query, result):
        pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
        self.results.append([pattern, result])
    def notice(self, msg):
        self.notices.append(msg)
    def debug(self, msg):
        self.notices.append(msg)
    def info(self, msg):
        self.infos.append(msg)
    def cursor(self, query):
        data = self.execute(query)
        return MockCursor(data)
    def execute(self, query): # TODO: additional arguments
       for result in self.results:
          if result[0].match(query):
            return result[1]
       return []
--- a/release/python/0.4.1/crankshaft/test/test_cluster_kmeans.py
+++ b/release/python/0.4.1/crankshaft/test/test_cluster_kmeans.py
@ -0,0 +1,38 @@
 import unittest
 import numpy as np
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import numpy as np
 import crankshaft.clustering as cc
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 import json
 class KMeansTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
        self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
        self.params = {"subquery": "select * from table",
                       "no_clusters": "10"
                       }
    def test_kmeans(self):
        data = self.cluster_data
        plpy._define_result('select' ,data)
        clusters = cc.kmeans('subquery', 2)
        labels  = [a[1] for a in clusters]
        c1 = [a for a in clusters if a[1]==0]
        c2 = [a for a in clusters if a[1]==1]
        self.assertEqual(len(np.unique(labels)),2)
        self.assertEqual(len(c1),20)
        self.assertEqual(len(c2),20)
--- a/release/python/0.4.1/crankshaft/test/test_clustering_moran.py
+++ b/release/python/0.4.1/crankshaft/test/test_clustering_moran.py
@ -0,0 +1,88 @@
 import unittest
 import numpy as np
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import crankshaft.clustering as cc
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 import json
 class MoranTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
        self.params = {"id_col": "cartodb_id",
                       "attr1": "andy",
                       "attr2": "jay_z",
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.params_markov = {"id_col": "cartodb_id",
                              "time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
                              "subquery": "SELECT * FROM a_list",
                              "geom_col": "the_geom",
                              "num_ngbrs": 321}
        self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
        self.moran_data = json.loads(open(fixture_file('moran.json')).read())
    def test_map_quads(self):
        """Test map_quads"""
        self.assertEqual(cc.map_quads(1), 'HH')
        self.assertEqual(cc.map_quads(2), 'LH')
        self.assertEqual(cc.map_quads(3), 'LL')
        self.assertEqual(cc.map_quads(4), 'HL')
        self.assertEqual(cc.map_quads(33), None)
        self.assertEqual(cc.map_quads('andy'), None)
    def test_quad_position(self):
        """Test lisa_sig_vals"""
        quads = np.array([1, 2, 3, 4], np.int)
        ans = np.array(['HH', 'LH', 'LL', 'HL'])
        test_ans = cc.quad_position(quads)
        self.assertTrue((test_ans == ans).all())
    def test_moran_local(self):
        """Test Moran's I local"""
        data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
        result = [(row[0], row[1]) for row in result]
        expected = self.moran_data
        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
            self.assertAlmostEqual(res_val, exp_val)
            self.assertEqual(res_quad, exp_quad)
    def test_moran_local_rate(self):
        """Test Moran's I rate"""
        data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
        print 'result == None? ', result == None
        result = [(row[0], row[1]) for row in result]
        expected = self.moran_data
        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
            self.assertAlmostEqual(res_val, exp_val)
    def test_moran(self):
        """Test Moran's I global"""
        data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1235)
        result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
        print 'result == None?', result == None
        result_moran = result[0][0]
        expected_moran = np.array([row[0] for row in self.moran_data]).mean()
        self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)
--- a/release/python/0.4.1/crankshaft/test/test_pysal_utils.py
+++ b/release/python/0.4.1/crankshaft/test/test_pysal_utils.py
@ -0,0 +1,142 @@
 import unittest
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 class PysalUtilsTest(unittest.TestCase):
    """Testing class for utility functions related to PySAL integrations"""
    def setUp(self):
        self.params = {"id_col": "cartodb_id",
                       "attr1": "andy",
                       "attr2": "jay_z",
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.params_array = {"id_col": "cartodb_id",
                             "time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
                             "subquery": "SELECT * FROM a_list",
                             "geom_col": "the_geom",
                             "num_ngbrs": 321}
    def test_query_attr_select(self):
        """Test query_attr_select"""
        ans = "i.\"andy\"::numeric As attr1, " \
              "i.\"jay_z\"::numeric As attr2, "
        ans_array = "i.\"_2013_dec\"::numeric As attr1, " \
                    "i.\"_2014_jan\"::numeric As attr2, " \
                    "i.\"_2014_feb\"::numeric As attr3, "
        self.assertEqual(pu.query_attr_select(self.params), ans)
        self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
    def test_query_attr_where(self):
        """Test pu.query_attr_where"""
        ans = "idx_replace.\"andy\" IS NOT NULL AND " \
              "idx_replace.\"jay_z\" IS NOT NULL AND " \
              "idx_replace.\"jay_z\" <> 0"
        ans_array = "idx_replace.\"_2013_dec\" IS NOT NULL AND " \
                    "idx_replace.\"_2014_jan\" IS NOT NULL AND " \
                    "idx_replace.\"_2014_feb\" IS NOT NULL"
        self.assertEqual(pu.query_attr_where(self.params), ans)
        self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
    def test_knn(self):
        """Test knn neighbors constructor"""
        ans = "SELECT i.\"cartodb_id\" As id, " \
                     "i.\"andy\"::numeric As attr1, " \
                     "i.\"jay_z\"::numeric As attr2, " \
                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                                   "FROM (SELECT * FROM a_list) As j " \
                                   "WHERE " \
                                    "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                    "j.\"andy\" IS NOT NULL AND " \
                                    "j.\"jay_z\" IS NOT NULL AND " \
                                    "j.\"jay_z\" <> 0 " \
                                   "ORDER BY " \
                                    "j.\"the_geom\" <-> i.\"the_geom\" ASC " \
                      "LIMIT 321)) As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"andy\" IS NOT NULL AND " \
                    "i.\"jay_z\" IS NOT NULL AND " \
                    "i.\"jay_z\" <> 0 " \
              "ORDER BY i.\"cartodb_id\" ASC;"
        ans_array = "SELECT i.\"cartodb_id\" As id, " \
              "i.\"_2013_dec\"::numeric As attr1, " \
              "i.\"_2014_jan\"::numeric As attr2, " \
              "i.\"_2014_feb\"::numeric As attr3, " \
              "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                            "FROM (SELECT * FROM a_list) As j " \
                            "WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                  "j.\"_2013_dec\" IS NOT NULL AND " \
                                  "j.\"_2014_jan\" IS NOT NULL AND " \
                                  "j.\"_2014_feb\" IS NOT NULL " \
                            "ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
                            "LIMIT 321)) As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"_2013_dec\" IS NOT NULL AND " \
                    "i.\"_2014_jan\" IS NOT NULL AND " \
                    "i.\"_2014_feb\" IS NOT NULL "\
              "ORDER BY i.\"cartodb_id\" ASC;"
        self.assertEqual(pu.knn(self.params), ans)
        self.assertEqual(pu.knn(self.params_array), ans_array)
    def test_queen(self):
        """Test queen neighbors constructor"""
        ans = "SELECT i.\"cartodb_id\" As id, " \
                     "i.\"andy\"::numeric As attr1, " \
                     "i.\"jay_z\"::numeric As attr2, " \
                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                                   "FROM (SELECT * FROM a_list) As j " \
                                   "WHERE " \
                                   "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                   "ST_Touches(i.\"the_geom\", " \
                                              "j.\"the_geom\") AND " \
                                   "j.\"andy\" IS NOT NULL AND " \
                                   "j.\"jay_z\" IS NOT NULL AND " \
                                   "j.\"jay_z\" <> 0)" \
                                  ") As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"andy\" IS NOT NULL AND " \
                    "i.\"jay_z\" IS NOT NULL AND " \
                    "i.\"jay_z\" <> 0 " \
              "ORDER BY i.\"cartodb_id\" ASC;"
        self.assertEqual(pu.queen(self.params), ans)
    def test_construct_neighbor_query(self):
        """Test construct_neighbor_query"""
        # Compare to raw knn query
        self.assertEqual(pu.construct_neighbor_query('knn', self.params),
                         pu.knn(self.params))
    def test_get_attributes(self):
        """Test get_attributes"""
        ## need to add tests
        self.assertEqual(True, True)
    def test_get_weight(self):
        """Test get_weight"""
        self.assertEqual(True, True)
    def test_empty_zipped_array(self):
        """Test empty_zipped_array"""
        ans2 = [(None, None)]
        ans4 = [(None, None, None, None)]
        self.assertEqual(pu.empty_zipped_array(2), ans2)
        self.assertEqual(pu.empty_zipped_array(4), ans4)
--- a/release/python/0.4.1/crankshaft/test/test_segmentation.py
+++ b/release/python/0.4.1/crankshaft/test/test_segmentation.py
@ -0,0 +1,64 @@
 import unittest
 import numpy as np
 from helper import plpy, fixture_file
 import crankshaft.segmentation as segmentation
 import json
 class SegmentationTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
    def generate_random_data(self,n_samples,random_state,  row_type=False):
        x1 = random_state.uniform(size=n_samples)
        x2 = random_state.uniform(size=n_samples)
        x3 = random_state.randint(0, 4, size=n_samples)
        y = x1+x2*x2+x3
        cartodb_id  = range(len(x1))
        if row_type:
            return [ {'features': vals} for vals in zip(x1,x2,x3)], y
        else:
            return  [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
    def test_replace_nan_with_mean(self):
        test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
    def test_create_and_predict_segment(self):
        n_samples = 1000
        random_state_train = np.random.RandomState(13)
        random_state_test = np.random.RandomState(134)
        training_data = self.generate_random_data(n_samples, random_state_train)
        test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
        ids =  [{'cartodb_ids': range(len(test_data))}]
        rows =  [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
        plpy._define_result('select \* from  \(select \* from training\) a  limit 1',rows)
        plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
        plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
        plpy._define_result('.*select \* from test.*' ,test_data)
        model_parameters =  {'n_estimators': 1200,
                             'max_depth': 3,
                             'subsample' : 0.5,
                             'learning_rate': 0.01,
                             'min_samples_leaf': 1}
        result = segmentation.create_and_predict_segment(
                'select * from training',
                'target',
                'select * from test',
                model_parameters)
        prediction = [r[1] for r in result]
        accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
        self.assertEqual(len(result),len(test_data))
        self.assertTrue( result[0][2] < 0.01)
        self.assertTrue( accuracy < 0.5*np.mean(test_y)  )
--- a/release/python/0.4.1/crankshaft/test/test_space_time_dynamics.py
+++ b/release/python/0.4.1/crankshaft/test/test_space_time_dynamics.py
@ -0,0 +1,324 @@
 import unittest
 import numpy as np
 import unittest
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import crankshaft.space_time_dynamics as std
 from crankshaft import random_seeds
 import json
 class SpaceTimeTests(unittest.TestCase):
    """Testing class for Markov Functions."""
    def setUp(self):
        plpy._reset()
        self.params = {"id_col": "cartodb_id",
                       "time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.neighbors_data = json.loads(open(fixture_file('neighbors_markov.json')).read())
        self.markov_data = json.loads(open(fixture_file('markov.json')).read())
        self.time_data = np.array([i * np.ones(10, dtype=float) for i in range(10)]).T
        self.transition_matrix = np.array([
                [[ 0.96341463, 0.0304878 , 0.00609756, 0.        , 0.        ],
                 [ 0.06040268, 0.83221477, 0.10738255, 0.        , 0.        ],
                 [ 0.        , 0.14      , 0.74      , 0.12      , 0.        ],
                 [ 0.        , 0.03571429, 0.32142857, 0.57142857, 0.07142857],
                 [ 0.        , 0.        , 0.        , 0.16666667, 0.83333333]],
                [[ 0.79831933, 0.16806723, 0.03361345, 0.        , 0.        ],
                 [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
                 [ 0.00537634, 0.06989247, 0.8655914 , 0.05913978, 0.        ],
                 [ 0.        , 0.        , 0.06372549, 0.90196078, 0.03431373],
                 [ 0.        , 0.        , 0.        , 0.19444444, 0.80555556]],
                [[ 0.84693878, 0.15306122, 0.        , 0.        , 0.        ],
                 [ 0.08133971, 0.78947368, 0.1291866 , 0.        , 0.        ],
                 [ 0.00518135, 0.0984456 , 0.79274611, 0.0984456 , 0.00518135],
                 [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
                 [ 0.        , 0.        , 0.        , 0.10204082, 0.89795918]],
                [[ 0.8852459 , 0.09836066, 0.        , 0.01639344, 0.        ],
                 [ 0.03875969, 0.81395349, 0.13953488, 0.        , 0.00775194],
                 [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
                 [ 0.        , 0.02339181, 0.12865497, 0.75438596, 0.09356725],
                 [ 0.        , 0.        , 0.        , 0.09661836, 0.90338164]],
                [[ 0.33333333, 0.66666667, 0.        , 0.        , 0.        ],
                 [ 0.0483871 , 0.77419355, 0.16129032, 0.01612903, 0.        ],
                 [ 0.01149425, 0.16091954, 0.74712644, 0.08045977, 0.        ],
                 [ 0.        , 0.01036269, 0.06217617, 0.89637306, 0.03108808],
                 [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]]]
                 )
    def test_spatial_markov(self):
        """Test Spatial Markov."""
        data = [ { 'id': d['id'],
                   'attr1': d['y1995'],
                   'attr2': d['y1996'],
                   'attr3': d['y1997'],
                   'attr4': d['y1998'],
                   'attr5': d['y1999'],
                   'attr6': d['y2000'],
                   'attr7': d['y2001'],
                   'attr8': d['y2002'],
                   'attr9': d['y2003'],
                   'attr10': d['y2004'],
                   'attr11': d['y2005'],
                   'attr12': d['y2006'],
                   'attr13': d['y2007'],
                   'attr14': d['y2008'],
                   'attr15': d['y2009'],
                   'neighbors': d['neighbors'] } for d in self.neighbors_data]
        print(str(data[0]))
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = std.spatial_markov_trend('subquery', ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'], 5, 'knn', 5, 0, 'the_geom', 'cartodb_id')
        self.assertTrue(result != None)
        result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
        print result[0]
        expected = self.markov_data
        for ([res_trend, res_up, res_down, res_vol, res_id],
             [exp_trend, exp_up, exp_down, exp_vol, exp_id]
             ) in zip(result, expected):
            self.assertAlmostEqual(res_trend, exp_trend)
    def test_get_time_data(self):
        """Test get_time_data"""
        data = [ { 'attr1': d['y1995'],
                   'attr2': d['y1996'],
                   'attr3': d['y1997'],
                   'attr4': d['y1998'],
                   'attr5': d['y1999'],
                   'attr6': d['y2000'],
                   'attr7': d['y2001'],
                   'attr8': d['y2002'],
                   'attr9': d['y2003'],
                   'attr10': d['y2004'],
                   'attr11': d['y2005'],
                   'attr12': d['y2006'],
                   'attr13': d['y2007'],
                   'attr14': d['y2008'],
                   'attr15': d['y2009'] } for d in self.neighbors_data]
        result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'])
        ## expected was prepared from PySAL example:
        ### f = ps.open(ps.examples.get_path("usjoin.csv"))
        ### pci = np.array([f.by_col[str(y)] for y in range(1995, 2010)]).transpose()
        ### rpci = pci / (pci.mean(axis = 0))
        expected = np.array([[ 0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154,  0.83271652
        ,  0.83786314, 0.85012593, 0.85509656, 0.86416612, 0.87119375, 0.86302631
        ,  0.86148267, 0.86252252, 0.86746356],
         [ 0.9188951,  0.91757931, 0.92333258, 0.92517289, 0.92552388, 0.90746978
        ,  0.89830489, 0.89431991, 0.88924794, 0.89815176, 0.91832091, 0.91706054
        ,  0.90139505, 0.87897455, 0.86216858],
         [ 0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522, 0.78964559
        ,  0.80584442, 0.8084998,  0.82258551, 0.82668196, 0.82373724, 0.81814804
        ,  0.83675961, 0.83574199, 0.84647177],
         [ 1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841, 1.14506948
        ,  1.12151133, 1.11160697, 1.10888621, 1.11399806, 1.12168029, 1.13164797
        ,  1.12958508, 1.11371818, 1.09936775],
         [ 1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025, 1.16898201
        ,  1.17212488, 1.14752303, 1.11843284, 1.11024964, 1.11943471, 1.11736468
        ,  1.10863242, 1.09642516, 1.07762337],
         [ 1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684, 1.44184737
        ,  1.44782832, 1.41978227, 1.39092208, 1.4059372,  1.40788646, 1.44052766
        ,  1.45241216, 1.43306098, 1.4174431 ],
         [ 1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149, 1.10888138
        ,  1.11856629, 1.13062931, 1.11944984, 1.12446239, 1.11671008, 1.10880034
        ,  1.08401709, 1.06959206, 1.07875225],
         [ 1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545, 0.99854316
        ,  0.9880258,  0.99669587, 0.99327676, 1.01400905, 1.03176742, 1.040511
        ,  1.01749645, 0.9936394,  0.98279746],
         [ 0.98996986, 1.00143564, 0.99491,  1.00188408, 1.00455845, 0.99127006
        ,  0.97925917, 0.9683482,  0.95335147, 0.93694787, 0.94308213, 0.92232874
        ,  0.91284091, 0.89689833, 0.88928858],
         [ 0.87418391, 0.86416601, 0.84425695, 0.8404494,  0.83903044, 0.8578708
        ,  0.86036185, 0.86107306, 0.8500772,  0.86981998, 0.86837929, 0.87204141
        ,  0.86633032, 0.84946077, 0.83287146],
         [ 1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624, 1.14450183
        ,  1.12349752, 1.12596664, 1.12213996, 1.1119989,  1.10257792, 1.10491258
        ,  1.11059842, 1.10509795, 1.10020097],
         [ 0.97282463, 0.96700147, 0.96252588, 0.9653878,  0.96057687, 0.95831051
        ,  0.94480909, 0.94804195, 0.95430286, 0.94103989, 0.92122519, 0.91010201
        ,  0.89280392, 0.89298243, 0.89165385],
         [ 0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647, 0.9480927
        ,  0.93539182, 0.95388718, 0.94597005, 0.96918424, 0.94781281, 0.93466815
        ,  0.94281559, 0.96520315, 0.96715441],
         [ 0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897, 0.98687073
        ,  0.99237486, 0.98209969, 0.9877653,  0.97399471, 0.96910087, 0.98416665
        ,  0.98423613, 0.99823861, 0.99545704],
         [ 0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012,  0.86191535
        ,  0.84981451, 0.85472102, 0.84564835, 0.83998883, 0.83478547, 0.82803648
        ,  0.8198736,  0.82265395, 0.8399404 ],
         [ 0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136, 0.82785597
        ,  0.86008789, 0.86776298, 0.86720209, 0.8676334,  0.89179317, 0.94202108
        ,  0.9422231,  0.93902708, 0.94479184],
         [ 0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238, 0.90906632
        ,  0.92693339, 0.93695966, 0.94242697, 0.94338265, 0.91981796, 0.91108804
        ,  0.90543476, 0.91737138, 0.94793657],
         [ 1.1977611,  1.18222564, 1.18439158, 1.18267865, 1.19286723, 1.20172869
        ,  1.21328691, 1.22624778, 1.22397075, 1.23857042, 1.24419893, 1.23929384
        ,  1.23418676, 1.23626739, 1.26754398],
         [ 1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667, 1.34790023
        ,  1.34399863, 1.32575181, 1.30795492, 1.30544841, 1.30303302, 1.32107766
        ,  1.32936244, 1.33001241, 1.33288462],
         [ 1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093, 1.05059016
        ,  1.03405057, 1.02747623, 1.03162734, 0.9961416,  0.97356208, 0.94241549
        ,  0.92754547, 0.92549227, 0.92138102],
         [ 1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264, 1.13889622
        ,  1.12442212, 1.13367018, 1.13982256, 1.14029944, 1.11979401, 1.10905389
        ,  1.10577769, 1.11166825, 1.09985155],
         [ 0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284, 0.74480073
        ,  0.76098396, 0.76156903, 0.76651952, 0.76533288, 0.78205934, 0.76842416
        ,  0.77487118, 0.77768683, 0.78801192],
         [ 0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803, 0.97370819
        ,  0.96419154, 0.97209861, 0.97441313, 0.96356162, 0.94745352, 0.93965462
        ,  0.93069645, 0.94020973, 0.94358232],
         [ 0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801, 0.80071489
        ,  0.83358256, 0.83451613, 0.85175032, 0.85954307, 0.86790024, 0.87170334
        ,  0.87863799, 0.87497981, 0.87888675],
         [ 0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619, 0.98733195
        ,  0.99644997, 0.99669587, 1.02559097, 1.01116651, 0.99988024, 0.97906749
        ,  0.99323123, 1.00204939, 0.99602148],
         [ 1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683, 1.08312397
        ,  1.05192626, 1.04230892, 1.05577278, 1.08569751, 1.12443486, 1.08891079
        ,  1.08603695, 1.05997314, 1.02160943],
         [ 1.11368269, 1.1057147,  1.11893431, 1.13778669, 1.1432272,  1.18257029
        ,  1.16226243, 1.16009196, 1.14467789, 1.14820235, 1.12386598, 1.12680236
        ,  1.12357937, 1.1159258,  1.12570828],
         [ 1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667, 1.31210239
        ,  1.29989156, 1.29203193, 1.27183516, 1.26830786, 1.2617743,  1.28656675
        ,  1.29734097, 1.29390205, 1.29345446],
         [ 0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864, 0.78772975
        ,  0.82848011, 0.8259679,  0.82435705, 0.83108634, 0.84373784, 0.83891093
        ,  0.84349247, 0.85637272, 0.86539395],
         [ 1.23450087, 1.2426022,  1.23537935, 1.23581293, 1.24522626, 1.2256767
        ,  1.21126648, 1.19377804, 1.18355337, 1.19674434, 1.21536573, 1.23653297
        ,  1.27962009, 1.27968392, 1.25907738],
         [ 0.9769662,  0.97400719, 0.98035944, 0.97581531, 0.95543282, 0.96480308
        ,  0.94686376, 0.93679073, 0.92540049, 0.92988835, 0.93442917, 0.92100464
        ,  0.91475304, 0.90249622, 0.9021363 ],
         [ 0.84986886, 0.8986851,  0.84295997, 0.87280534, 0.85659368, 0.88937573
        ,  0.894401, 0.90448993, 0.95495898, 0.92698333, 0.94745352, 0.92562488
        ,  0.96635366, 1.02520312, 1.0394296 ],
         [ 1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073, 1.00759019
        ,  0.99192968, 0.99747298, 0.99550759, 0.97583768, 0.9610168,  0.94779638
        ,  0.93759089, 0.93353431, 0.94121705],
         [ 0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613, 0.83434854
        ,  0.85813595, 0.84667961, 0.84374558, 0.85951183, 0.87194227, 0.89455097
        ,  0.88283929, 0.90349491, 0.90600675],
         [ 1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086, 1.00581626
        ,  0.98850522, 0.99291168, 0.98983209, 0.97511924, 0.96134615, 0.96382634
        ,  0.95011401, 0.9434686,  0.94637765],
         [ 1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857, 1.04800023
        ,  1.03024941, 1.04200483, 1.0402554,  1.03296979, 1.02191682, 1.02476275
        ,  1.02347523, 1.02517684, 1.04359571],
         [ 1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043,  1.0531801
        ,  1.07452771, 1.09383478, 1.1052447,  1.10322136, 1.09167939, 1.08772756
        ,  1.08859544, 1.09177338, 1.1096083 ],
         [ 0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809, 0.86287327
        ,  0.85169796, 0.85411285, 0.84886336, 0.84517414, 0.84843858, 0.84488343
        ,  0.83374329, 0.82812044, 0.82878599],
         [ 0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286, 0.92652175
        ,  0.94278865, 0.93682452, 0.98655146, 0.992237, 0.9798497,  0.93869677
        ,  0.96947771, 1.00362626, 0.98102351],
         [ 0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967,  0.93092109
        ,  0.92662519, 0.93412152, 0.93501274, 0.92879506, 0.92110542, 0.91035556
        ,  0.90430364, 0.89994694, 0.90073864],
         [ 0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824, 0.98882205
        ,  0.97662234, 0.95601578, 0.94905385, 0.94934888, 0.97152609, 0.97163004
        ,  0.9700702,  0.97158948, 0.95884908],
         [ 0.83980439, 0.84726737, 0.85747,  0.85467221, 0.8556751,  0.84818516
        ,  0.85265681, 0.84502402, 0.82645665, 0.81743586, 0.83550406, 0.83338919
        ,  0.83511679, 0.82136617, 0.80921874],
         [ 0.95118156, 0.9466212,  0.94688098, 0.9508583,  0.9512441,  0.95440787
        ,  0.96364363, 0.96804412, 0.97136214, 0.97583768, 0.95571724, 0.96895368
        ,  0.97001634, 0.97082733, 0.98782366],
         [ 1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249, 1.10558188
        ,  1.1214086,  1.12292577, 1.13021031, 1.13342735, 1.14686068, 1.14502975
        ,  1.14474747, 1.14084037, 1.16142926],
         [ 1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863, 1.11856702
        ,  1.09764283, 1.08815849, 1.08044313, 1.09278827, 1.07003204, 1.08398066
        ,  1.09831768, 1.09298232, 1.09176125],
         [ 0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744, 0.77751194
        ,  0.79902974, 0.81437881, 0.80788828, 0.79603865, 0.78966436, 0.79949807
        ,  0.80172182, 0.82168155, 0.85587911],
         [ 1.0052447,  1.00007696, 1.00475899, 1.00613942, 1.00639561, 1.00162979
        ,  0.99860739, 1.00814981, 1.00574316, 0.99030032, 0.97682565, 0.97292596
        ,  0.96519561, 0.96173403, 0.95890284],
         [ 0.95808419, 0.9382568,  0.9654441,  0.95561201, 0.96987289, 0.96608031
        ,  0.99727185, 1.00781194, 1.03484236, 1.05333619, 1.0983263,  1.1704974
        ,  1.17025154, 1.18730553, 1.14242645]])
        self.assertTrue(np.allclose(result, expected))
        self.assertTrue(type(result) == type(expected))
        self.assertTrue(result.shape == expected.shape)
    def test_rebin_data(self):
        """Test rebin_data"""
        ## sample in double the time (even case since 10 % 2 = 0):
        ##   (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
        ## = 0.5,     2.5,     4.5,     6.5,     8.5
        ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
                             for i in range(0, 10, 2)]).T
        self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
        ## sample in triple the time (uneven since 10 % 3 = 1):
        ##   (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
        ## = 1,         4,         7,         9
        ans_odd  = np.array([i * np.ones(10, dtype=float)
                             for i in (1, 4, 7, 9)]).T
        self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
    def test_get_prob_dist(self):
        """Test get_prob_dist"""
        lag_indices = np.array([1, 2, 3, 4])
        unit_indices = np.array([1, 3, 2, 4])
        answer = np.array([
            [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
            [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
            [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
            [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]
        ])
        result = std.get_prob_dist(self.transition_matrix, lag_indices, unit_indices)
        self.assertTrue(np.array_equal(result, answer))
    def test_get_prob_stats(self):
        """Test get_prob_stats"""
        probs = np.array([
            [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
            [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
            [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
            [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]
        ])
        unit_indices = np.array([1, 3, 2, 4])
        answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
        answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
        answer_trend = np.array([-0.03301887 / 0.88207547, -0.05882353 / 0.87058824,  0.02475248 / 0.77722772, -0.02352941 / 0.97647059])
        answer_volatility = np.array([ 0.34221495,  0.33705421,  0.29226542,  0.38834223])
        result = std.get_prob_stats(probs, unit_indices)
        result_up = result[0]
        result_down = result[1]
        result_trend = result[2]
        result_volatility = result[3]
        self.assertTrue(np.allclose(result_up, answer_up))
        self.assertTrue(np.allclose(result_down, answer_down))
        self.assertTrue(np.allclose(result_trend, answer_trend))
        self.assertTrue(np.allclose(result_volatility, answer_volatility))
--- a/release/python/0.4.2/crankshaft/crankshaft/init.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/init.py
@ -0,0 +1,5 @@
 """Import all modules"""
 import crankshaft.random_seeds
 import crankshaft.clustering
 import crankshaft.space_time_dynamics
 import crankshaft.segmentation
--- a/release/python/0.4.2/crankshaft/crankshaft/clustering/init.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/clustering/init.py
@ -0,0 +1,3 @@
 """Import all functions from for clustering"""
 from moran import *
 from kmeans import *
--- a/release/python/0.4.2/crankshaft/crankshaft/clustering/kmeans.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/clustering/kmeans.py
@ -0,0 +1,18 @@
 from sklearn.cluster import KMeans
 import plpy
 def kmeans(query, no_clusters, no_init=20):
    data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
        array_agg(ST_X(the_geom) order by cartodb_id) xs,
        array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
        where the_geom is not null
    '''.format(query=query))
    xs  = data[0]['xs']
    ys  = data[0]['ys']
    ids = data[0]['ids']
    km = KMeans(n_clusters= no_clusters, n_init=no_init)
    labels = km.fit_predict(zip(xs,ys))
    return zip(ids,labels)
--- a/release/python/0.4.2/crankshaft/crankshaft/clustering/moran.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/clustering/moran.py
@ -0,0 +1,250 @@
 """
 Moran's I geostatistics (global clustering & outliers presence)
 """
 # TODO: Fill in local neighbors which have null/NoneType values with the
 #       average of the their neighborhood
 import pysal as ps
 import plpy
 from collections import OrderedDict
 # crankshaft module
 import crankshaft.pysal_utils as pu
 # High level interface ---------------------------------------
 def moran(subquery, attr_name,
          w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I (global)
    Implementation building neighbors with a PostGIS database and Moran's I
     core clusters with PySAL.
    Andy Eschbacher
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr_name),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(2)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(2)
    # collect attributes
    attr_vals = pu.get_attributes(result)
    # calculate weights
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate moran global
    moran_global = ps.esda.moran.Moran(attr_vals, weight,
                                       permutations=permutations)
    return zip([moran_global.I], [moran_global.EI])
 def moran_local(subquery, attr,
                w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I implementation for PL/Python
    Andy Eschbacher
    """
    # geometries with attributes that are null are ignored
    # resulting in a collection of not as near neighbors
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(5)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(5)
    attr_vals = pu.get_attributes(result)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
                                     permutations=permutations)
    # find quadrants for each geometry
    quads = quad_position(lisa.q)
    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
 def moran_rate(subquery, numerator, denominator,
               w_type, num_ngbrs, permutations, geom_col, id_col):
    """
    Moran's I Rate (global)
    Andy Eschbacher
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", numerator),
                         ("attr2", denominator)
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(2)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(2)
    # collect attributes
    numer = pu.get_attributes(result, 1)
    denom = pu.get_attributes(result, 2)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate moran global rate
    lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
                                         permutations=permutations)
    return zip([lisa_rate.I], [lisa_rate.EI])
 def moran_local_rate(subquery, numerator, denominator,
                     w_type, num_ngbrs, permutations, geom_col, id_col):
    """
        Moran's I Local Rate
        Andy Eschbacher
    """
    # geometries with values that are null are ignored
    # resulting in a collection of not as near neighbors
    qvals = OrderedDict([("id_col", id_col),
                         ("numerator", numerator),
                         ("denominator", denominator),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(5)
    except plpy.SPIError, e:
        plpy.error('Analysis failed: %s' % e)
        return pu.empty_zipped_array(5)
    # collect attributes
    numer = pu.get_attributes(result, 1)
    denom = pu.get_attributes(result, 2)
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
                                          permutations=permutations)
    # find quadrants for each geometry
    quads = quad_position(lisa.q)
    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
 def moran_local_bv(subquery, attr1, attr2,
                   permutations, geom_col, id_col, w_type, num_ngbrs):
    """
        Moran's I (local) Bivariate (untested)
    """
    qvals = OrderedDict([("id_col", id_col),
                         ("attr1", attr1),
                         ("attr2", attr2),
                         ("geom_col", geom_col),
                         ("subquery", subquery),
                         ("num_ngbrs", num_ngbrs)])
    query = pu.construct_neighbor_query(w_type, qvals)
    try:
        result = plpy.execute(query)
        # if there are no neighbors, exit
        if len(result) == 0:
            return pu.empty_zipped_array(4)
    except plpy.SPIError:
        plpy.error("Error: areas of interest query failed, "
                   "check input parameters")
        return pu.empty_zipped_array(4)
    # collect attributes
    attr1_vals = pu.get_attributes(result, 1)
    attr2_vals = pu.get_attributes(result, 2)
    # create weights
    weight = pu.get_weight(result, w_type, num_ngbrs)
    # calculate LISA values
    lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
                                        permutations=permutations)
    # find clustering of significance
    lisa_sig = quad_position(lisa.q)
    return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
 # Low level functions ----------------------------------------
 def map_quads(coord):
    """
        Map a quadrant number to Moran's I designation
        HH=1, LH=2, LL=3, HL=4
        Input:
        @param coord (int): quadrant of a specific measurement
        Output:
            classification (one of 'HH', 'LH', 'LL', or 'HL')
    """
    if coord == 1:
        return 'HH'
    elif coord == 2:
        return 'LH'
    elif coord == 3:
        return 'LL'
    elif coord == 4:
        return 'HL'
    else:
        return None
 def quad_position(quads):
    """
        Produce Moran's I classification based of n
        Input:
        @param quads ndarray: an array of quads classified by
          1-4 (PySAL default)
        Output:
        @param list: an array of quads classied by 'HH', 'LL', etc.
    """
    return [map_quads(q) for q in quads]
--- a/release/python/0.4.2/crankshaft/crankshaft/pysal_utils/init.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/pysal_utils/init.py
@ -0,0 +1,2 @@
 """Import all functions for pysal_utils"""
 from crankshaft.pysal_utils.pysal_utils import *
--- a/release/python/0.4.2/crankshaft/crankshaft/pysal_utils/pysal_utils.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/pysal_utils/pysal_utils.py
@ -0,0 +1,201 @@
 """
    Utilities module for generic PySAL functionality, mainly centered on
      translating queries into numpy arrays or PySAL weights objects
 """
 import numpy as np
 import pysal as ps
 def construct_neighbor_query(w_type, query_vals):
    """Return query (a string) used for finding neighbors
        @param w_type text: type of neighbors to calculate ('knn' or 'queen')
        @param query_vals dict: values used to construct the query
    """
    if w_type.lower() == 'knn':
        return knn(query_vals)
    else:
        return queen(query_vals)
 # Build weight object
 def get_weight(query_res, w_type='knn', num_ngbrs=5):
    """
        Construct PySAL weight from return value of query
        @param query_res dict-like: query results with attributes and neighbors
    """
    # if w_type.lower() == 'knn':
    #     row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
    #     weights = {x['id']: row_normed_weights for x in query_res}
    # else:
    #     weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
    #                         if len(x['neighbors']) > 0
    #                         else [] for x in query_res}
    neighbors = {x['id']: x['neighbors'] for x in query_res}
    print 'len of neighbors: %d' % len(neighbors)
    built_weight = ps.W(neighbors)
    built_weight.transform = 'r'
    return built_weight
 def query_attr_select(params):
    """
        Create portion of SELECT statement for attributes inolved in query.
        @param params: dict of information used in query (column names,
                       table name, etc.)
    """
    attr_string = ""
    template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
    if 'time_cols' in params:
        # if markov analysis
        attrs = params['time_cols']
        for idx, val in enumerate(attrs):
            attr_string += template % {"col": val, "alias_num": idx + 1}
    else:
        # if moran's analysis
        attrs = [k for k in params
                 if k not in ('id_col', 'geom_col', 'subquery',
                              'num_ngbrs', 'subquery')]
        for idx, val in enumerate(sorted(attrs)):
            attr_string += template % {"col": params[val],
                                       "alias_num": idx + 1}
    return attr_string
 def query_attr_where(params):
    """
      Construct where conditions when building neighbors query
        Create portion of WHERE clauses for weeding out NULL-valued geometries
        Input: dict of params:
            {'subquery': ...,
             'numerator': 'data1',
             'denominator': 'data2',
             '': ...}
        Output: 'idx_replace."data1" IS NOT NULL AND idx_replace."data2"
                IS NOT NULL'
        Input:
        {'subquery': ...,
         'time_cols': ['time1', 'time2', 'time3'],
         'etc': ...}
        Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
          NULL AND idx_replace."time3" IS NOT NULL'
    """
    attr_string = []
    template = "idx_replace.\"%s\" IS NOT NULL"
    if 'time_cols' in params:
        # markov where clauses
        attrs = params['time_cols']
        # add values to template
        for attr in attrs:
            attr_string.append(template % attr)
    else:
        # moran where clauses
        # get keys
        attrs = sorted([k for k in params
                        if k not in ('id_col', 'geom_col', 'subquery',
                                     'num_ngbrs', 'subquery')])
        # add values to template
        for attr in attrs:
            attr_string.append(template % params[attr])
        if len(attrs) == 2:
            attr_string.append("idx_replace.\"%s\" <> 0" % params[attrs[1]])
    out = " AND ".join(attr_string)
    return out
 def knn(params):
    """SQL query for k-nearest neighbors.
        @param vars: dict of values to fill template
    """
    attr_select = query_attr_select(params)
    attr_where = query_attr_where(params)
    replacements = {"attr_select": attr_select,
                    "attr_where_i": attr_where.replace("idx_replace", "i"),
                    "attr_where_j": attr_where.replace("idx_replace", "j")}
    query = "SELECT " \
                "i.\"{id_col}\" As id, " \
                "%(attr_select)s" \
                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
                              "FROM ({subquery}) As j " \
                              "WHERE " \
                                "i.\"{id_col}\" <> j.\"{id_col}\" AND " \
                                "%(attr_where_j)s " \
                              "ORDER BY " \
                                "j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
                              "LIMIT {num_ngbrs})" \
                ") As neighbors " \
            "FROM ({subquery}) As i " \
            "WHERE " \
                "%(attr_where_i)s " \
            "ORDER BY i.\"{id_col}\" ASC;" % replacements
    return query.format(**params)
 # SQL query for finding queens neighbors (all contiguous polygons)
 def queen(params):
    """SQL query for queen neighbors.
        @param params dict: information to fill query
    """
    attr_select = query_attr_select(params)
    attr_where = query_attr_where(params)
    replacements = {"attr_select": attr_select,
                    "attr_where_i": attr_where.replace("idx_replace", "i"),
                    "attr_where_j": attr_where.replace("idx_replace", "j")}
    query = "SELECT " \
                "i.\"{id_col}\" As id, " \
                "%(attr_select)s" \
                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
                 "FROM ({subquery}) As j " \
                 "WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
                       "ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
                       "%(attr_where_j)s)" \
                ") As neighbors " \
            "FROM ({subquery}) As i " \
            "WHERE " \
                "%(attr_where_i)s " \
            "ORDER BY i.\"{id_col}\" ASC;" % replacements
    return query.format(**params)
 # to add more weight methods open a ticket or pull request
 def get_attributes(query_res, attr_num=1):
    """
        @param query_res: query results with attributes and neighbors
        @param attr_num: attribute number (1, 2, ...)
    """
    return np.array([x['attr' + str(attr_num)] for x in query_res],
                    dtype=np.float)
 def empty_zipped_array(num_nones):
    """
        prepare return values for cases of empty weights objects (no neighbors)
        Input:
        @param num_nones int: number of columns (e.g., 4)
        Output:
        [(None, None, None, None)]
    """
    return [tuple([None] * num_nones)]
--- a/release/python/0.4.2/crankshaft/crankshaft/random_seeds.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/random_seeds.py
@ -0,0 +1,11 @@
 """Random seed generator used for non-deterministic functions in crankshaft"""
 import random
 import numpy
 def set_random_seeds(value):
    """
    Set the seeds of the RNGs (Random Number Generators)
    used internally.
    """
    random.seed(value)
    numpy.random.seed(value)
--- a/release/python/0.4.2/crankshaft/crankshaft/segmentation/init.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/segmentation/init.py
@ -0,0 +1 @@
 from segmentation import * 
--- a/release/python/0.4.2/crankshaft/crankshaft/segmentation/segmentation.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/segmentation/segmentation.py
@ -0,0 +1,176 @@
 """
 Segmentation creation and prediction
 """
 import sklearn
 import numpy as np
 import plpy
 from sklearn.ensemble import GradientBoostingRegressor
 from sklearn import metrics
 from sklearn.cross_validation import train_test_split
 # Lower level functions
 #----------------------
 def replace_nan_with_mean(array):
    """
        Input:
            @param array: an array of floats which may have null-valued entries
        Output:
            array with nans filled in with the mean of the dataset
    """
    # returns an array of rows and column indices
    indices = np.where(np.isnan(array))
    # iterate through entries which have nan values
    for row, col in zip(*indices):
            array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
    return array
 def get_data(variable, feature_columns, query):
    """
        Fetch data from the database, clean, and package into
          numpy arrays
        Input:
            @param variable: name of the target variable
            @param feature_columns: list of column names
            @param query: subquery that data is pulled from for the packaging
        Output:
            prepared data, packaged into NumPy arrays
    """
    columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
    try:
        data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
            variable=variable,
            columns=columns,
            query=query))
    except Exception, e:
        plpy.error('Failed to access data to build segmentation model: %s' % e)
    # extract target data from plpy object
    target = np.array(data[0]['target'])
    # put n feature data arrays into an n x m array of arrays
    features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
    return replace_nan_with_mean(target), replace_nan_with_mean(features)
 # High level interface
 # --------------------
 def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
    """
    Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
    the function.
        Input:
            @param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
            @param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
            @param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
            @param model_parameters: A dictionary containing parameters for the model.
    """
    clean_target = replace_nan_with_mean(target)
    clean_features = replace_nan_with_mean(features)
    target_features = replace_nan_with_mean(target_features)
    model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
    prediction = model.predict(target_features)
    accuracy_array = [accuracy]*prediction.shape[0]
    return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
 def create_and_predict_segment(query, variable, target_query, model_params):
    """
    generate a segment with machine learning
    Stuart Lynn
    """
    ## fetch column names
    try:
        columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1  '.format(query=query))[0].keys()
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    ## extract column names to be used in building the segmentation model
    feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
    ## get data from database
    target, features = get_data(variable, feature_columns, query)
    model, accuracy = train_model(target, features, model_params, 0.2)
    cartodb_ids, result = predict_segment(model, feature_columns, target_query)
    accuracy_array = [accuracy]*result.shape[0]
    return zip(cartodb_ids, result, accuracy_array)
 def train_model(target, features, model_params, test_split):
    """
        Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
        Input:
            @param target: 1D Array of the variable that the model is to be trianed to predict
            @param features: 2D Array NSamples * NFeatures to use in trining the model
            @param model_params: A dictionary of model parameters, the full specification can be found on the
                scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
            @parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
    """
    features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
    model = GradientBoostingRegressor(**model_params)
    model.fit(features_train, target_train)
    accuracy = calculate_model_accuracy(model, features, target)
    return model, accuracy
 def calculate_model_accuracy(model, features, target):
    """
        Calculate the mean squared error of the model prediction
        Input:
            @param model: model trained from input features
            @param features: features to make a prediction from
            @param target: target to compare prediction to
        Output:
            mean squared error of the model prection compared to the target
    """
    prediction = model.predict(features)
    return metrics.mean_squared_error(prediction, target)
 def predict_segment(model, features, target_query):
    """
    Use the provided model to predict the values for the new feature set
        Input:
            @param model: The pretrained model
            @features: A list of features to use in the model prediction (list of column names)
            @target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
    """
    batch_size = 1000
    joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
    try:
        cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
            joined_features=joined_features,
            target_query=target_query))
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    results = []
    while True:
        rows = cursor.fetch(batch_size)
        if not rows:
            break
        batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
        #Need to fix this. Should be global mean. This will cause weird effects
        batch = replace_nan_with_mean(batch)
        prediction = model.predict(batch)
        results.append(prediction)
    try:
        cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
    except Exception, e:
        plpy.error('Failed to build segmentation model: %s' % e)
    return cartodb_ids, np.concatenate(results)
--- a/release/python/0.4.2/crankshaft/crankshaft/space_time_dynamics/init.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/space_time_dynamics/init.py
@ -0,0 +1,2 @@
 """Import all functions from clustering libraries."""
 from markov import *
--- a/release/python/0.4.2/crankshaft/crankshaft/space_time_dynamics/markov.py
+++ b/release/python/0.4.2/crankshaft/crankshaft/space_time_dynamics/markov.py
@ -0,0 +1,189 @@
 """
 Spatial dynamics measurements using Spatial Markov
 """
 import numpy as np
 import pysal as ps
 import plpy
 import crankshaft.pysal_utils as pu
 def spatial_markov_trend(subquery, time_cols, num_classes=7,
                         w_type='knn', num_ngbrs=5, permutations=0,
                         geom_col='the_geom', id_col='cartodb_id'):
    """
        Predict the trends of a unit based on:
        1. history of its transitions to different classes (e.g., 1st quantile -> 2nd quantile)
        2. average class of its neighbors
        Inputs:
        @param subquery string: e.g., SELECT the_geom, cartodb_id,
          interesting_time_column FROM table_name
        @param time_cols list of strings: list of strings of column names
        @param num_classes (optional): number of classes to break distribution
          of values into. Currently uses quantile bins.
        @param w_type string (optional): weight type ('knn' or 'queen')
        @param num_ngbrs int (optional): number of neighbors (if knn type)
        @param permutations int (optional): number of permutations for test
          stats
        @param geom_col string (optional): name of column which contains the
          geometries
        @param id_col string (optional): name of column which has the ids of
          the table
        Outputs:
        @param trend_up float: probablity that a geom will move to a higher
          class
        @param trend_down float: probablity that a geom will move to a lower
          class
        @param trend float: (trend_up - trend_down) / trend_static
        @param volatility float: a measure of the volatility based on
          probability stddev(prob array)
    """
    if len(time_cols) < 2:
        plpy.error('More than one time column needs to be passed')
    qvals = {"id_col": id_col,
             "time_cols": time_cols,
             "geom_col": geom_col,
             "subquery": subquery,
             "num_ngbrs": num_ngbrs}
    try:
        query_result = plpy.execute(
            pu.construct_neighbor_query(w_type, qvals)
        )
        if len(query_result) == 0:
            return zip([None], [None], [None], [None], [None])
    except plpy.SPIError, e:
        plpy.debug('Query failed with exception %s: %s' % (err, pu.construct_neighbor_query(w_type, qvals)))
        plpy.error('Analysis failed: %s' % e)
        return zip([None], [None], [None], [None], [None])
    ## build weight
    weights = pu.get_weight(query_result, w_type)
    weights.transform = 'r'
    ## prep time data
    t_data = get_time_data(query_result, time_cols)
    plpy.debug('shape of t_data %d, %d' % t_data.shape)
    plpy.debug('number of weight objects: %d, %d' % (weights.sparse).shape)
    plpy.debug('first num elements: %f' % t_data[0, 0])
    sp_markov_result = ps.Spatial_Markov(t_data,
                                         weights,
                                         k=num_classes,
                                         fixed=False,
                                         permutations=permutations)
    ## get lag classes
    lag_classes = ps.Quantiles(
        ps.lag_spatial(weights, t_data[:, -1]),
        k=num_classes).yb
    ## look up probablity distribution for each unit according to class and lag class
    prob_dist = get_prob_dist(sp_markov_result.P,
                              lag_classes,
                              sp_markov_result.classes[:, -1])
    ## find the ups and down and overall distribution of each cell
    trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist,
                                                             sp_markov_result.classes[:, -1])
    ## output the results
    return zip(trend, trend_up, trend_down, volatility, weights.id_order)
 def get_time_data(markov_data, time_cols):
    """
        Extract the time columns and bin appropriately
    """
    num_attrs = len(time_cols)
    return np.array([[x['attr' + str(i)] for x in markov_data]
                     for i in range(1, num_attrs+1)], dtype=float).transpose()
 ## not currently used
 def rebin_data(time_data, num_time_per_bin):
    """
        Convert an n x l matrix into an (n/m) x l matrix where the values are
         reduced (averaged) for the intervening states:
          1 2 3 4    1.5 3.5
          5 6 7 8 -> 5.5 7.5
          9 8 7 6    8.5 6.5
          5 4 3 2    4.5 2.5
          if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
        This process effectively resamples the data at a longer time span n
         units longer than the input data.
        For cases when there is a remainder (remainder(5/3) = 2), the remaining
         two columns are binned together as the last time period, while the
         first three are binned together for the first period.
        Input:
          @param time_data n x l  ndarray: measurements of an attribute at
           different time intervals
          @param num_time_per_bin int: number of columns to average into a new
           column
        Output:
          ceil(n / m) x l ndarray of resampled time series
    """
    if time_data.shape[1] % num_time_per_bin == 0:
        ## if fit is perfect, then use it
        n_max = time_data.shape[1] / num_time_per_bin
    else:
        ## fit remainders into an additional column
        n_max = time_data.shape[1] / num_time_per_bin + 1
    return np.array([time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
                     for i in range(n_max)]).T
 def get_prob_dist(transition_matrix, lag_indices, unit_indices):
    """
        Given an array of transition matrices, look up the probability
        associated with the arrangements passed
        Input:
        @param transition_matrix ndarray[k,k,k]:
        @param lag_indices ndarray:
        @param unit_indices ndarray:
        Output:
        Array of probability distributions
    """
    return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
                     for i in range(len(lag_indices))])
 def get_prob_stats(prob_dist, unit_indices):
    """
        get the statistics of the probability distributions
        Outputs:
            @param trend_up ndarray(float): sum of probabilities for upward
               movement (relative to the unit index of that prob)
            @param trend_down ndarray(float): sum of probabilities for downward
               movement (relative to the unit index of that prob)
            @param trend ndarray(float): difference of upward and downward
               movements
    """
    num_elements = len(unit_indices)
    trend_up = np.empty(num_elements, dtype=float)
    trend_down = np.empty(num_elements, dtype=float)
    trend = np.empty(num_elements, dtype=float)
    for i in range(num_elements):
        trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
        trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
        if prob_dist[i, unit_indices[i]] > 0.0:
            trend[i] = (trend_up[i] - trend_down[i]) / prob_dist[i, unit_indices[i]]
        else:
            trend[i] = None
    ## calculate volatility of distribution
    volatility = prob_dist.std(axis=1)
    return trend_up, trend_down, trend, volatility
--- a/release/python/0.4.2/crankshaft/setup.py
+++ b/release/python/0.4.2/crankshaft/setup.py
@ -0,0 +1,49 @@
 """
 CartoDB Spatial Analysis Python Library
 See:
 https://github.com/CartoDB/crankshaft
 """
 from setuptools import setup, find_packages
 setup(
    name='crankshaft',
    version='0.0.0',
    description='CartoDB Spatial Analysis Python Library',
    url='https://github.com/CartoDB/crankshaft',
    author='Data Services Team - CartoDB',
    author_email='dataservices@cartodb.com',
    license='MIT',
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Mapping comunity',
        'Topic :: Maps :: Mapping Tools',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 2.7',
    ],
    keywords='maps mapping tools spatial analysis geostatistics',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    extras_require={
        'dev': ['unittest'],
        'test': ['unittest', 'nose', 'mock'],
    },
    # The choice of component versions is dictated by what's
    # provisioned in the production servers.
    # IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
    install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
    requires=['pysal', 'numpy', 'sklearn'],
    test_suite='test'
 )
--- a/release/python/0.4.2/crankshaft/setup.py-r
+++ b/release/python/0.4.2/crankshaft/setup.py-r
@ -0,0 +1,49 @@
 """
 CartoDB Spatial Analysis Python Library
 See:
 https://github.com/CartoDB/crankshaft
 """
 from setuptools import setup, find_packages
 setup(
    name='crankshaft',
    version='0.0.0',
    description='CartoDB Spatial Analysis Python Library',
    url='https://github.com/CartoDB/crankshaft',
    author='Data Services Team - CartoDB',
    author_email='dataservices@cartodb.com',
    license='MIT',
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Mapping comunity',
        'Topic :: Maps :: Mapping Tools',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 2.7',
    ],
    keywords='maps mapping tools spatial analysis geostatistics',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    extras_require={
        'dev': ['unittest'],
        'test': ['unittest', 'nose', 'mock'],
    },
    # The choice of component versions is dictated by what's
    # provisioned in the production servers.
    # IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
    install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
    requires=['pysal', 'numpy', 'sklearn'],
    test_suite='test'
 )
--- a/release/python/0.4.2/crankshaft/test/fixtures/kmeans.json
+++ b/release/python/0.4.2/crankshaft/test/fixtures/kmeans.json
@ -0,0 +1 @@
 [{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]
--- a/release/python/0.4.2/crankshaft/test/fixtures/markov.json
+++ b/release/python/0.4.2/crankshaft/test/fixtures/markov.json
@ -0,0 +1 @@
 [[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]
--- a/release/python/0.4.2/crankshaft/test/fixtures/moran.json
+++ b/release/python/0.4.2/crankshaft/test/fixtures/moran.json
@ -0,0 +1,52 @@
 [[0.9319096128346788, "HH"],
 [-1.135787401862846, "HL"],
 [0.11732030672508517, "LL"],
 [0.6152779669180425, "LL"],
 [-0.14657336660125297, "LH"],
 [0.6967858120189607, "LL"],
 [0.07949310115714454, "HH"],
 [0.4703198759258987, "HH"],
 [0.4421125200498064, "HH"],
 [0.5724288737143592, "LL"],
 [0.8970743435692062, "LL"],
 [0.18327334401918674, "LL"],
 [-0.01466729201304962, "HL"],
 [0.3481559372544409, "LL"],
 [0.06547094736902978, "LL"],
 [0.15482141569329988, "HH"],
 [0.4373841193538136, "HH"],
 [0.15971286468915544, "LL"],
 [1.0543588860308968, "HH"],
 [1.7372866900020818, "HH"],
 [1.091998586053999, "LL"],
 [0.1171572584252222, "HH"],
 [0.08438455015300014, "LL"],
 [0.06547094736902978, "LL"],
 [0.15482141569329985, "HH"],
 [1.1627044812890683, "HH"],
 [0.06547094736902978, "LL"],
 [0.795275137550483, "HH"],
 [0.18562939195219, "LL"],
 [0.3010757406693439, "LL"],
 [2.8205795942839376, "HH"],
 [0.11259190602909264, "LL"],
 [-0.07116352791516614, "HL"],
 [-0.09945240794119009, "LH"],
 [0.18562939195219, "LL"],
 [0.1832733440191868, "LL"],
 [-0.39054253768447705, "HL"],
 [-0.1672071289487642, "HL"],
 [0.3337669247916343, "HH"],
 [0.2584386102554792, "HH"],
 [-0.19733845476322634, "HL"],
 [-0.9379282899805409, "LH"],
 [-0.028770969951095866, "LH"],
 [0.051367269430983485, "LL"],
 [-0.2172548045913472, "LH"],
 [0.05136726943098351, "LL"],
 [0.04191046803899837, "LL"],
 [0.7482357030403517, "HH"],
 [-0.014585767863118111, "LH"],
 [0.5410013139159929, "HH"],
 [1.0223932668429925, "LL"],
 [1.4179402898927476, "LL"]]
--- a/release/python/0.4.2/crankshaft/test/fixtures/neighbors.json
+++ b/release/python/0.4.2/crankshaft/test/fixtures/neighbors.json
@ -0,0 +1,54 @@
 [
    {"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
    {"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
    {"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
    {"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
    {"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
    {"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
    {"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
    {"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
    {"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
    {"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
    {"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
    {"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
    {"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
    {"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
    {"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
    {"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
    {"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
    {"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
    {"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
    {"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
    {"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
    {"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
    {"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
    {"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
    {"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
    {"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
    {"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
    {"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
    {"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
    {"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
    {"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
    {"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
    {"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
    {"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
    {"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
    {"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
    {"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
    {"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
    {"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
    {"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
    {"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
    {"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
    {"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
    {"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
    {"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
    {"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
    {"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
    {"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
    {"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
    {"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
    {"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
    {"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
  ]
--- a/release/python/0.4.2/crankshaft/test/fixtures/neighbors_markov.json
+++ b/release/python/0.4.2/crankshaft/test/fixtures/neighbors_markov.json
--- a/release/python/0.4.2/crankshaft/test/helper.py
+++ b/release/python/0.4.2/crankshaft/test/helper.py
@ -0,0 +1,13 @@
 import unittest
 from mock_plpy import MockPlPy
 plpy = MockPlPy()
 import sys
 sys.modules['plpy'] = plpy
 import os
 def fixture_file(name):
    dir = os.path.dirname(os.path.realpath(__file__))
    return os.path.join(dir, 'fixtures', name)
--- a/release/python/0.4.2/crankshaft/test/mock_plpy.py
+++ b/release/python/0.4.2/crankshaft/test/mock_plpy.py
@ -0,0 +1,52 @@
 import re
 class MockCursor:
    def __init__(self, data):
        self.cursor_pos = 0
        self.data = data
    def fetch(self, batch_size):
        batch = self.data[self.cursor_pos : self.cursor_pos + batch_size]
        self.cursor_pos += batch_size
        return batch
 class MockPlPy:
    def __init__(self):
        self._reset()
    def _reset(self):
        self.infos = []
        self.notices = []
        self.debugs = []
        self.logs = []
        self.warnings = []
        self.errors = []
        self.fatals = []
        self.executes = []
        self.results = []
        self.prepares = []
        self.results = []
    def _define_result(self, query, result):
        pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
        self.results.append([pattern, result])
    def notice(self, msg):
        self.notices.append(msg)
    def debug(self, msg):
        self.notices.append(msg)
    def info(self, msg):
        self.infos.append(msg)
    def cursor(self, query):
        data = self.execute(query)
        return MockCursor(data)
    def execute(self, query): # TODO: additional arguments
       for result in self.results:
          if result[0].match(query):
            return result[1]
       return []
--- a/release/python/0.4.2/crankshaft/test/test_cluster_kmeans.py
+++ b/release/python/0.4.2/crankshaft/test/test_cluster_kmeans.py
@ -0,0 +1,38 @@
 import unittest
 import numpy as np
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import numpy as np
 import crankshaft.clustering as cc
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 import json
 class KMeansTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
        self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
        self.params = {"subquery": "select * from table",
                       "no_clusters": "10"
                       }
    def test_kmeans(self):
        data = self.cluster_data
        plpy._define_result('select' ,data)
        clusters = cc.kmeans('subquery', 2)
        labels  = [a[1] for a in clusters]
        c1 = [a for a in clusters if a[1]==0]
        c2 = [a for a in clusters if a[1]==1]
        self.assertEqual(len(np.unique(labels)),2)
        self.assertEqual(len(c1),20)
        self.assertEqual(len(c2),20)
--- a/release/python/0.4.2/crankshaft/test/test_clustering_moran.py
+++ b/release/python/0.4.2/crankshaft/test/test_clustering_moran.py
@ -0,0 +1,106 @@
 import unittest
 import numpy as np
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import crankshaft.clustering as cc
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 import json
 class MoranTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
        self.params = {"id_col": "cartodb_id",
                       "attr1": "andy",
                       "attr2": "jay_z",
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.params_markov = {"id_col": "cartodb_id",
                              "time_cols": ["_2013_dec", "_2014_jan",
                                            "_2014_feb"],
                              "subquery": "SELECT * FROM a_list",
                              "geom_col": "the_geom",
                              "num_ngbrs": 321}
        self.neighbors_data = json.loads(
          open(fixture_file('neighbors.json')).read())
        self.moran_data = json.loads(
          open(fixture_file('moran.json')).read())
    def test_map_quads(self):
        """Test map_quads"""
        self.assertEqual(cc.map_quads(1), 'HH')
        self.assertEqual(cc.map_quads(2), 'LH')
        self.assertEqual(cc.map_quads(3), 'LL')
        self.assertEqual(cc.map_quads(4), 'HL')
        self.assertEqual(cc.map_quads(33), None)
        self.assertEqual(cc.map_quads('andy'), None)
    def test_quad_position(self):
        """Test lisa_sig_vals"""
        quads = np.array([1, 2, 3, 4], np.int)
        ans = np.array(['HH', 'LH', 'LL', 'HL'])
        test_ans = cc.quad_position(quads)
        self.assertTrue((test_ans == ans).all())
    def test_moran_local(self):
        """Test Moran's I local"""
        data = [{'id': d['id'],
                 'attr1': d['value'],
                 'neighbors': d['neighbors']} for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = cc.moran_local('subquery', 'value',
                                'knn', 5, 99, 'the_geom', 'cartodb_id')
        result = [(row[0], row[1]) for row in result]
        zipped_values = zip(result, self.moran_data)
        for ([res_val, res_quad], [exp_val, exp_quad]) in zipped_values:
            self.assertAlmostEqual(res_val, exp_val)
            self.assertEqual(res_quad, exp_quad)
    def test_moran_local_rate(self):
        """Test Moran's I rate"""
        data = [{'id': d['id'],
                 'attr1': d['value'],
                 'attr2': 1,
                 'neighbors': d['neighbors']} for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = cc.moran_local_rate('subquery', 'numerator', 'denominator',
                                     'knn', 5, 99, 'the_geom', 'cartodb_id')
        result = [(row[0], row[1]) for row in result]
        zipped_values = zip(result, self.moran_data)
        for ([res_val, res_quad], [exp_val, exp_quad]) in zipped_values:
            self.assertAlmostEqual(res_val, exp_val)
    def test_moran(self):
        """Test Moran's I global"""
        data = [{'id': d['id'],
                 'attr1': d['value'],
                 'neighbors': d['neighbors']} for d in self.neighbors_data]
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1235)
        result = cc.moran('table', 'value',
                          'knn', 5, 99, 'the_geom', 'cartodb_id')
        result_moran = result[0][0]
        expected_moran = np.array([row[0] for row in self.moran_data]).mean()
        self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)
--- a/release/python/0.4.2/crankshaft/test/test_pysal_utils.py
+++ b/release/python/0.4.2/crankshaft/test/test_pysal_utils.py
@ -0,0 +1,142 @@
 import unittest
 import crankshaft.pysal_utils as pu
 from crankshaft import random_seeds
 class PysalUtilsTest(unittest.TestCase):
    """Testing class for utility functions related to PySAL integrations"""
    def setUp(self):
        self.params = {"id_col": "cartodb_id",
                       "attr1": "andy",
                       "attr2": "jay_z",
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.params_array = {"id_col": "cartodb_id",
                             "time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
                             "subquery": "SELECT * FROM a_list",
                             "geom_col": "the_geom",
                             "num_ngbrs": 321}
    def test_query_attr_select(self):
        """Test query_attr_select"""
        ans = "i.\"andy\"::numeric As attr1, " \
              "i.\"jay_z\"::numeric As attr2, "
        ans_array = "i.\"_2013_dec\"::numeric As attr1, " \
                    "i.\"_2014_jan\"::numeric As attr2, " \
                    "i.\"_2014_feb\"::numeric As attr3, "
        self.assertEqual(pu.query_attr_select(self.params), ans)
        self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
    def test_query_attr_where(self):
        """Test pu.query_attr_where"""
        ans = "idx_replace.\"andy\" IS NOT NULL AND " \
              "idx_replace.\"jay_z\" IS NOT NULL AND " \
              "idx_replace.\"jay_z\" <> 0"
        ans_array = "idx_replace.\"_2013_dec\" IS NOT NULL AND " \
                    "idx_replace.\"_2014_jan\" IS NOT NULL AND " \
                    "idx_replace.\"_2014_feb\" IS NOT NULL"
        self.assertEqual(pu.query_attr_where(self.params), ans)
        self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
    def test_knn(self):
        """Test knn neighbors constructor"""
        ans = "SELECT i.\"cartodb_id\" As id, " \
                     "i.\"andy\"::numeric As attr1, " \
                     "i.\"jay_z\"::numeric As attr2, " \
                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                                   "FROM (SELECT * FROM a_list) As j " \
                                   "WHERE " \
                                    "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                    "j.\"andy\" IS NOT NULL AND " \
                                    "j.\"jay_z\" IS NOT NULL AND " \
                                    "j.\"jay_z\" <> 0 " \
                                   "ORDER BY " \
                                    "j.\"the_geom\" <-> i.\"the_geom\" ASC " \
                      "LIMIT 321)) As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"andy\" IS NOT NULL AND " \
                    "i.\"jay_z\" IS NOT NULL AND " \
                    "i.\"jay_z\" <> 0 " \
              "ORDER BY i.\"cartodb_id\" ASC;"
        ans_array = "SELECT i.\"cartodb_id\" As id, " \
              "i.\"_2013_dec\"::numeric As attr1, " \
              "i.\"_2014_jan\"::numeric As attr2, " \
              "i.\"_2014_feb\"::numeric As attr3, " \
              "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                            "FROM (SELECT * FROM a_list) As j " \
                            "WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                  "j.\"_2013_dec\" IS NOT NULL AND " \
                                  "j.\"_2014_jan\" IS NOT NULL AND " \
                                  "j.\"_2014_feb\" IS NOT NULL " \
                            "ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
                            "LIMIT 321)) As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"_2013_dec\" IS NOT NULL AND " \
                    "i.\"_2014_jan\" IS NOT NULL AND " \
                    "i.\"_2014_feb\" IS NOT NULL "\
              "ORDER BY i.\"cartodb_id\" ASC;"
        self.assertEqual(pu.knn(self.params), ans)
        self.assertEqual(pu.knn(self.params_array), ans_array)
    def test_queen(self):
        """Test queen neighbors constructor"""
        ans = "SELECT i.\"cartodb_id\" As id, " \
                     "i.\"andy\"::numeric As attr1, " \
                     "i.\"jay_z\"::numeric As attr2, " \
                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
                                   "FROM (SELECT * FROM a_list) As j " \
                                   "WHERE " \
                                   "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
                                   "ST_Touches(i.\"the_geom\", " \
                                              "j.\"the_geom\") AND " \
                                   "j.\"andy\" IS NOT NULL AND " \
                                   "j.\"jay_z\" IS NOT NULL AND " \
                                   "j.\"jay_z\" <> 0)" \
                                  ") As neighbors " \
              "FROM (SELECT * FROM a_list) As i " \
              "WHERE i.\"andy\" IS NOT NULL AND " \
                    "i.\"jay_z\" IS NOT NULL AND " \
                    "i.\"jay_z\" <> 0 " \
              "ORDER BY i.\"cartodb_id\" ASC;"
        self.assertEqual(pu.queen(self.params), ans)
    def test_construct_neighbor_query(self):
        """Test construct_neighbor_query"""
        # Compare to raw knn query
        self.assertEqual(pu.construct_neighbor_query('knn', self.params),
                         pu.knn(self.params))
    def test_get_attributes(self):
        """Test get_attributes"""
        ## need to add tests
        self.assertEqual(True, True)
    def test_get_weight(self):
        """Test get_weight"""
        self.assertEqual(True, True)
    def test_empty_zipped_array(self):
        """Test empty_zipped_array"""
        ans2 = [(None, None)]
        ans4 = [(None, None, None, None)]
        self.assertEqual(pu.empty_zipped_array(2), ans2)
        self.assertEqual(pu.empty_zipped_array(4), ans4)
--- a/release/python/0.4.2/crankshaft/test/test_segmentation.py
+++ b/release/python/0.4.2/crankshaft/test/test_segmentation.py
@ -0,0 +1,64 @@
 import unittest
 import numpy as np
 from helper import plpy, fixture_file
 import crankshaft.segmentation as segmentation
 import json
 class SegmentationTest(unittest.TestCase):
    """Testing class for Moran's I functions"""
    def setUp(self):
        plpy._reset()
    def generate_random_data(self,n_samples,random_state,  row_type=False):
        x1 = random_state.uniform(size=n_samples)
        x2 = random_state.uniform(size=n_samples)
        x3 = random_state.randint(0, 4, size=n_samples)
        y = x1+x2*x2+x3
        cartodb_id  = range(len(x1))
        if row_type:
            return [ {'features': vals} for vals in zip(x1,x2,x3)], y
        else:
            return  [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
    def test_replace_nan_with_mean(self):
        test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
    def test_create_and_predict_segment(self):
        n_samples = 1000
        random_state_train = np.random.RandomState(13)
        random_state_test = np.random.RandomState(134)
        training_data = self.generate_random_data(n_samples, random_state_train)
        test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
        ids =  [{'cartodb_ids': range(len(test_data))}]
        rows =  [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
        plpy._define_result('select \* from  \(select \* from training\) a  limit 1',rows)
        plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
        plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
        plpy._define_result('.*select \* from test.*' ,test_data)
        model_parameters =  {'n_estimators': 1200,
                             'max_depth': 3,
                             'subsample' : 0.5,
                             'learning_rate': 0.01,
                             'min_samples_leaf': 1}
        result = segmentation.create_and_predict_segment(
                'select * from training',
                'target',
                'select * from test',
                model_parameters)
        prediction = [r[1] for r in result]
        accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
        self.assertEqual(len(result),len(test_data))
        self.assertTrue( result[0][2] < 0.01)
        self.assertTrue( accuracy < 0.5*np.mean(test_y)  )
--- a/release/python/0.4.2/crankshaft/test/test_space_time_dynamics.py
+++ b/release/python/0.4.2/crankshaft/test/test_space_time_dynamics.py
@ -0,0 +1,324 @@
 import unittest
 import numpy as np
 import unittest
 # from mock_plpy import MockPlPy
 # plpy = MockPlPy()
 #
 # import sys
 # sys.modules['plpy'] = plpy
 from helper import plpy, fixture_file
 import crankshaft.space_time_dynamics as std
 from crankshaft import random_seeds
 import json
 class SpaceTimeTests(unittest.TestCase):
    """Testing class for Markov Functions."""
    def setUp(self):
        plpy._reset()
        self.params = {"id_col": "cartodb_id",
                       "time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
                       "subquery": "SELECT * FROM a_list",
                       "geom_col": "the_geom",
                       "num_ngbrs": 321}
        self.neighbors_data = json.loads(open(fixture_file('neighbors_markov.json')).read())
        self.markov_data = json.loads(open(fixture_file('markov.json')).read())
        self.time_data = np.array([i * np.ones(10, dtype=float) for i in range(10)]).T
        self.transition_matrix = np.array([
                [[ 0.96341463, 0.0304878 , 0.00609756, 0.        , 0.        ],
                 [ 0.06040268, 0.83221477, 0.10738255, 0.        , 0.        ],
                 [ 0.        , 0.14      , 0.74      , 0.12      , 0.        ],
                 [ 0.        , 0.03571429, 0.32142857, 0.57142857, 0.07142857],
                 [ 0.        , 0.        , 0.        , 0.16666667, 0.83333333]],
                [[ 0.79831933, 0.16806723, 0.03361345, 0.        , 0.        ],
                 [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
                 [ 0.00537634, 0.06989247, 0.8655914 , 0.05913978, 0.        ],
                 [ 0.        , 0.        , 0.06372549, 0.90196078, 0.03431373],
                 [ 0.        , 0.        , 0.        , 0.19444444, 0.80555556]],
                [[ 0.84693878, 0.15306122, 0.        , 0.        , 0.        ],
                 [ 0.08133971, 0.78947368, 0.1291866 , 0.        , 0.        ],
                 [ 0.00518135, 0.0984456 , 0.79274611, 0.0984456 , 0.00518135],
                 [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
                 [ 0.        , 0.        , 0.        , 0.10204082, 0.89795918]],
                [[ 0.8852459 , 0.09836066, 0.        , 0.01639344, 0.        ],
                 [ 0.03875969, 0.81395349, 0.13953488, 0.        , 0.00775194],
                 [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
                 [ 0.        , 0.02339181, 0.12865497, 0.75438596, 0.09356725],
                 [ 0.        , 0.        , 0.        , 0.09661836, 0.90338164]],
                [[ 0.33333333, 0.66666667, 0.        , 0.        , 0.        ],
                 [ 0.0483871 , 0.77419355, 0.16129032, 0.01612903, 0.        ],
                 [ 0.01149425, 0.16091954, 0.74712644, 0.08045977, 0.        ],
                 [ 0.        , 0.01036269, 0.06217617, 0.89637306, 0.03108808],
                 [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]]]
                 )
    def test_spatial_markov(self):
        """Test Spatial Markov."""
        data = [ { 'id': d['id'],
                   'attr1': d['y1995'],
                   'attr2': d['y1996'],
                   'attr3': d['y1997'],
                   'attr4': d['y1998'],
                   'attr5': d['y1999'],
                   'attr6': d['y2000'],
                   'attr7': d['y2001'],
                   'attr8': d['y2002'],
                   'attr9': d['y2003'],
                   'attr10': d['y2004'],
                   'attr11': d['y2005'],
                   'attr12': d['y2006'],
                   'attr13': d['y2007'],
                   'attr14': d['y2008'],
                   'attr15': d['y2009'],
                   'neighbors': d['neighbors'] } for d in self.neighbors_data]
        print(str(data[0]))
        plpy._define_result('select', data)
        random_seeds.set_random_seeds(1234)
        result = std.spatial_markov_trend('subquery', ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'], 5, 'knn', 5, 0, 'the_geom', 'cartodb_id')
        self.assertTrue(result != None)
        result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
        print result[0]
        expected = self.markov_data
        for ([res_trend, res_up, res_down, res_vol, res_id],
             [exp_trend, exp_up, exp_down, exp_vol, exp_id]
             ) in zip(result, expected):
            self.assertAlmostEqual(res_trend, exp_trend)
    def test_get_time_data(self):
        """Test get_time_data"""
        data = [ { 'attr1': d['y1995'],
                   'attr2': d['y1996'],
                   'attr3': d['y1997'],
                   'attr4': d['y1998'],
                   'attr5': d['y1999'],
                   'attr6': d['y2000'],
                   'attr7': d['y2001'],
                   'attr8': d['y2002'],
                   'attr9': d['y2003'],
                   'attr10': d['y2004'],
                   'attr11': d['y2005'],
                   'attr12': d['y2006'],
                   'attr13': d['y2007'],
                   'attr14': d['y2008'],
                   'attr15': d['y2009'] } for d in self.neighbors_data]
        result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'])
        ## expected was prepared from PySAL example:
        ### f = ps.open(ps.examples.get_path("usjoin.csv"))
        ### pci = np.array([f.by_col[str(y)] for y in range(1995, 2010)]).transpose()
        ### rpci = pci / (pci.mean(axis = 0))
        expected = np.array([[ 0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154,  0.83271652
        ,  0.83786314, 0.85012593, 0.85509656, 0.86416612, 0.87119375, 0.86302631
        ,  0.86148267, 0.86252252, 0.86746356],
         [ 0.9188951,  0.91757931, 0.92333258, 0.92517289, 0.92552388, 0.90746978
        ,  0.89830489, 0.89431991, 0.88924794, 0.89815176, 0.91832091, 0.91706054
        ,  0.90139505, 0.87897455, 0.86216858],
         [ 0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522, 0.78964559
        ,  0.80584442, 0.8084998,  0.82258551, 0.82668196, 0.82373724, 0.81814804
        ,  0.83675961, 0.83574199, 0.84647177],
         [ 1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841, 1.14506948
        ,  1.12151133, 1.11160697, 1.10888621, 1.11399806, 1.12168029, 1.13164797
        ,  1.12958508, 1.11371818, 1.09936775],
         [ 1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025, 1.16898201
        ,  1.17212488, 1.14752303, 1.11843284, 1.11024964, 1.11943471, 1.11736468
        ,  1.10863242, 1.09642516, 1.07762337],
         [ 1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684, 1.44184737
        ,  1.44782832, 1.41978227, 1.39092208, 1.4059372,  1.40788646, 1.44052766
        ,  1.45241216, 1.43306098, 1.4174431 ],
         [ 1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149, 1.10888138
        ,  1.11856629, 1.13062931, 1.11944984, 1.12446239, 1.11671008, 1.10880034
        ,  1.08401709, 1.06959206, 1.07875225],
         [ 1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545, 0.99854316
        ,  0.9880258,  0.99669587, 0.99327676, 1.01400905, 1.03176742, 1.040511
        ,  1.01749645, 0.9936394,  0.98279746],
         [ 0.98996986, 1.00143564, 0.99491,  1.00188408, 1.00455845, 0.99127006
        ,  0.97925917, 0.9683482,  0.95335147, 0.93694787, 0.94308213, 0.92232874
        ,  0.91284091, 0.89689833, 0.88928858],
         [ 0.87418391, 0.86416601, 0.84425695, 0.8404494,  0.83903044, 0.8578708
        ,  0.86036185, 0.86107306, 0.8500772,  0.86981998, 0.86837929, 0.87204141
        ,  0.86633032, 0.84946077, 0.83287146],
         [ 1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624, 1.14450183
        ,  1.12349752, 1.12596664, 1.12213996, 1.1119989,  1.10257792, 1.10491258
        ,  1.11059842, 1.10509795, 1.10020097],
         [ 0.97282463, 0.96700147, 0.96252588, 0.9653878,  0.96057687, 0.95831051
        ,  0.94480909, 0.94804195, 0.95430286, 0.94103989, 0.92122519, 0.91010201
        ,  0.89280392, 0.89298243, 0.89165385],
         [ 0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647, 0.9480927
        ,  0.93539182, 0.95388718, 0.94597005, 0.96918424, 0.94781281, 0.93466815
        ,  0.94281559, 0.96520315, 0.96715441],
         [ 0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897, 0.98687073
        ,  0.99237486, 0.98209969, 0.9877653,  0.97399471, 0.96910087, 0.98416665
        ,  0.98423613, 0.99823861, 0.99545704],
         [ 0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012,  0.86191535
        ,  0.84981451, 0.85472102, 0.84564835, 0.83998883, 0.83478547, 0.82803648
        ,  0.8198736,  0.82265395, 0.8399404 ],
         [ 0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136, 0.82785597
        ,  0.86008789, 0.86776298, 0.86720209, 0.8676334,  0.89179317, 0.94202108
        ,  0.9422231,  0.93902708, 0.94479184],
         [ 0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238, 0.90906632
        ,  0.92693339, 0.93695966, 0.94242697, 0.94338265, 0.91981796, 0.91108804
        ,  0.90543476, 0.91737138, 0.94793657],
         [ 1.1977611,  1.18222564, 1.18439158, 1.18267865, 1.19286723, 1.20172869
        ,  1.21328691, 1.22624778, 1.22397075, 1.23857042, 1.24419893, 1.23929384
        ,  1.23418676, 1.23626739, 1.26754398],
         [ 1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667, 1.34790023
        ,  1.34399863, 1.32575181, 1.30795492, 1.30544841, 1.30303302, 1.32107766
        ,  1.32936244, 1.33001241, 1.33288462],
         [ 1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093, 1.05059016
        ,  1.03405057, 1.02747623, 1.03162734, 0.9961416,  0.97356208, 0.94241549
        ,  0.92754547, 0.92549227, 0.92138102],
         [ 1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264, 1.13889622
        ,  1.12442212, 1.13367018, 1.13982256, 1.14029944, 1.11979401, 1.10905389
        ,  1.10577769, 1.11166825, 1.09985155],
         [ 0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284, 0.74480073
        ,  0.76098396, 0.76156903, 0.76651952, 0.76533288, 0.78205934, 0.76842416
        ,  0.77487118, 0.77768683, 0.78801192],
         [ 0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803, 0.97370819
        ,  0.96419154, 0.97209861, 0.97441313, 0.96356162, 0.94745352, 0.93965462
        ,  0.93069645, 0.94020973, 0.94358232],
         [ 0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801, 0.80071489
        ,  0.83358256, 0.83451613, 0.85175032, 0.85954307, 0.86790024, 0.87170334
        ,  0.87863799, 0.87497981, 0.87888675],
         [ 0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619, 0.98733195
        ,  0.99644997, 0.99669587, 1.02559097, 1.01116651, 0.99988024, 0.97906749
        ,  0.99323123, 1.00204939, 0.99602148],
         [ 1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683, 1.08312397
        ,  1.05192626, 1.04230892, 1.05577278, 1.08569751, 1.12443486, 1.08891079
        ,  1.08603695, 1.05997314, 1.02160943],
         [ 1.11368269, 1.1057147,  1.11893431, 1.13778669, 1.1432272,  1.18257029
        ,  1.16226243, 1.16009196, 1.14467789, 1.14820235, 1.12386598, 1.12680236
        ,  1.12357937, 1.1159258,  1.12570828],
         [ 1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667, 1.31210239
        ,  1.29989156, 1.29203193, 1.27183516, 1.26830786, 1.2617743,  1.28656675
        ,  1.29734097, 1.29390205, 1.29345446],
         [ 0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864, 0.78772975
        ,  0.82848011, 0.8259679,  0.82435705, 0.83108634, 0.84373784, 0.83891093
        ,  0.84349247, 0.85637272, 0.86539395],
         [ 1.23450087, 1.2426022,  1.23537935, 1.23581293, 1.24522626, 1.2256767
        ,  1.21126648, 1.19377804, 1.18355337, 1.19674434, 1.21536573, 1.23653297
        ,  1.27962009, 1.27968392, 1.25907738],
         [ 0.9769662,  0.97400719, 0.98035944, 0.97581531, 0.95543282, 0.96480308
        ,  0.94686376, 0.93679073, 0.92540049, 0.92988835, 0.93442917, 0.92100464
        ,  0.91475304, 0.90249622, 0.9021363 ],
         [ 0.84986886, 0.8986851,  0.84295997, 0.87280534, 0.85659368, 0.88937573
        ,  0.894401, 0.90448993, 0.95495898, 0.92698333, 0.94745352, 0.92562488
        ,  0.96635366, 1.02520312, 1.0394296 ],
         [ 1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073, 1.00759019
        ,  0.99192968, 0.99747298, 0.99550759, 0.97583768, 0.9610168,  0.94779638
        ,  0.93759089, 0.93353431, 0.94121705],
         [ 0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613, 0.83434854
        ,  0.85813595, 0.84667961, 0.84374558, 0.85951183, 0.87194227, 0.89455097
        ,  0.88283929, 0.90349491, 0.90600675],
         [ 1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086, 1.00581626
        ,  0.98850522, 0.99291168, 0.98983209, 0.97511924, 0.96134615, 0.96382634
        ,  0.95011401, 0.9434686,  0.94637765],
         [ 1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857, 1.04800023
        ,  1.03024941, 1.04200483, 1.0402554,  1.03296979, 1.02191682, 1.02476275
        ,  1.02347523, 1.02517684, 1.04359571],
         [ 1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043,  1.0531801
        ,  1.07452771, 1.09383478, 1.1052447,  1.10322136, 1.09167939, 1.08772756
        ,  1.08859544, 1.09177338, 1.1096083 ],
         [ 0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809, 0.86287327
        ,  0.85169796, 0.85411285, 0.84886336, 0.84517414, 0.84843858, 0.84488343
        ,  0.83374329, 0.82812044, 0.82878599],
         [ 0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286, 0.92652175
        ,  0.94278865, 0.93682452, 0.98655146, 0.992237, 0.9798497,  0.93869677
        ,  0.96947771, 1.00362626, 0.98102351],
         [ 0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967,  0.93092109
        ,  0.92662519, 0.93412152, 0.93501274, 0.92879506, 0.92110542, 0.91035556
        ,  0.90430364, 0.89994694, 0.90073864],
         [ 0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824, 0.98882205
        ,  0.97662234, 0.95601578, 0.94905385, 0.94934888, 0.97152609, 0.97163004
        ,  0.9700702,  0.97158948, 0.95884908],
         [ 0.83980439, 0.84726737, 0.85747,  0.85467221, 0.8556751,  0.84818516
        ,  0.85265681, 0.84502402, 0.82645665, 0.81743586, 0.83550406, 0.83338919
        ,  0.83511679, 0.82136617, 0.80921874],
         [ 0.95118156, 0.9466212,  0.94688098, 0.9508583,  0.9512441,  0.95440787
        ,  0.96364363, 0.96804412, 0.97136214, 0.97583768, 0.95571724, 0.96895368
        ,  0.97001634, 0.97082733, 0.98782366],
         [ 1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249, 1.10558188
        ,  1.1214086,  1.12292577, 1.13021031, 1.13342735, 1.14686068, 1.14502975
        ,  1.14474747, 1.14084037, 1.16142926],
         [ 1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863, 1.11856702
        ,  1.09764283, 1.08815849, 1.08044313, 1.09278827, 1.07003204, 1.08398066
        ,  1.09831768, 1.09298232, 1.09176125],
         [ 0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744, 0.77751194
        ,  0.79902974, 0.81437881, 0.80788828, 0.79603865, 0.78966436, 0.79949807
        ,  0.80172182, 0.82168155, 0.85587911],
         [ 1.0052447,  1.00007696, 1.00475899, 1.00613942, 1.00639561, 1.00162979
        ,  0.99860739, 1.00814981, 1.00574316, 0.99030032, 0.97682565, 0.97292596
        ,  0.96519561, 0.96173403, 0.95890284],
         [ 0.95808419, 0.9382568,  0.9654441,  0.95561201, 0.96987289, 0.96608031
        ,  0.99727185, 1.00781194, 1.03484236, 1.05333619, 1.0983263,  1.1704974
        ,  1.17025154, 1.18730553, 1.14242645]])
        self.assertTrue(np.allclose(result, expected))
        self.assertTrue(type(result) == type(expected))
        self.assertTrue(result.shape == expected.shape)
    def test_rebin_data(self):
        """Test rebin_data"""
        ## sample in double the time (even case since 10 % 2 = 0):
        ##   (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
        ## = 0.5,     2.5,     4.5,     6.5,     8.5
        ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
                             for i in range(0, 10, 2)]).T
        self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
        ## sample in triple the time (uneven since 10 % 3 = 1):
        ##   (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
        ## = 1,         4,         7,         9
        ans_odd  = np.array([i * np.ones(10, dtype=float)
                             for i in (1, 4, 7, 9)]).T
        self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
    def test_get_prob_dist(self):
        """Test get_prob_dist"""
        lag_indices = np.array([1, 2, 3, 4])
        unit_indices = np.array([1, 3, 2, 4])
        answer = np.array([
            [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
            [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
            [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
            [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]
        ])
        result = std.get_prob_dist(self.transition_matrix, lag_indices, unit_indices)
        self.assertTrue(np.array_equal(result, answer))
    def test_get_prob_stats(self):
        """Test get_prob_stats"""
        probs = np.array([
            [ 0.0754717 , 0.88207547, 0.04245283, 0.        , 0.        ],
            [ 0.        , 0.        , 0.09411765, 0.87058824, 0.03529412],
            [ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
            [ 0.        , 0.        , 0.        , 0.02352941, 0.97647059]
        ])
        unit_indices = np.array([1, 3, 2, 4])
        answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
        answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
        answer_trend = np.array([-0.03301887 / 0.88207547, -0.05882353 / 0.87058824,  0.02475248 / 0.77722772, -0.02352941 / 0.97647059])
        answer_volatility = np.array([ 0.34221495,  0.33705421,  0.29226542,  0.38834223])
        result = std.get_prob_stats(probs, unit_indices)
        result_up = result[0]
        result_down = result[1]
        result_trend = result[2]
        result_volatility = result[3]
        self.assertTrue(np.allclose(result_up, answer_up))
        self.assertTrue(np.allclose(result_down, answer_down))
        self.assertTrue(np.allclose(result_trend, answer_trend))
        self.assertTrue(np.allclose(result_volatility, answer_volatility))
--- a/src/pg/crankshaft.control
+++ b/src/pg/crankshaft.control
@ -1,5 +1,5 @@
 comment = 'CartoDB Spatial Analysis extension'
-default_version = '0.3.1'
+default_version = '0.4.2'
 requires = 'plpythonu, postgis'
 superuser = true
 schema = cdb_crankshaft
--- a/src/pg/sql/08_interpolation.sql
+++ b/src/pg/sql/08_interpolation.sql
@ -1,6 +1,8 @@
-- 0: nearest neighbor
+-- 0: nearest neighbor(s)
 -- 1: barymetric
 -- 2: IDW
 -- 3: krigin ---> TO DO
 CREATE OR REPLACE FUNCTION CDB_SpatialInterpolation(
    IN query text,
@ -50,12 +52,19 @@ DECLARE
    vc numeric;
    output numeric;
 BEGIN
-    output :=  -999.999;
+    -- output :=  -999.999;
-    -- nearest
+
    -- nearest neighbors
    -- p1: limit the number of neighbors, 0-> closest one
    IF method = 0 THEN
-        WITH    a as (SELECT unnest(geomin) as g, unnest(colin) as v)
+        IF p1 = 0 THEN
-        SELECT a.v INTO output FROM a ORDER BY point<->a.g LIMIT 1;
+            p1 := 1;
        END IF;
        WITH    a as (SELECT unnest(geomin) as g, unnest(colin) as v),
                b as (SELECT a.v as v FROM a ORDER BY point<->a.g LIMIT p1::integer)
        SELECT avg(b.v) INTO output FROM b;
        RETURN output;
    -- barymetric
@ -121,6 +130,11 @@ BEGIN
        SELECT sum(b.f)/sum(b.k) INTO output FROM b;
        RETURN output;
    -- krigin
    ELSIF method = 3 THEN
    --  TO DO
    END IF;
    RETURN -777.777;
--- a/src/pg/sql/10_moran.sql
+++ b/src/pg/sql/10_moran.sql
@ -10,7 +10,7 @@ CREATE OR REPLACE FUNCTION
      id_col TEXT DEFAULT 'cartodb_id')
 RETURNS TABLE (moran NUMERIC, significance NUMERIC)
 AS $$
-  from crankshaft.clustering import moran_local
+  from crankshaft.clustering import moran
  # TODO: use named parameters or a dictionary
  return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
 $$ LANGUAGE plpythonu;
--- a/src/pg/sql/13_PIA.sql
+++ b/src/pg/sql/13_PIA.sql
@ -0,0 +1,123 @@
 -- Based on:
 -- https://github.com/mapbox/polylabel/blob/master/index.js
 -- https://sites.google.com/site/polesofinaccessibility/
 -- Requires: https://github.com/CartoDB/cartodb-postgresql
 -- Based on:
 -- https://github.com/mapbox/polylabel/blob/master/index.js
 -- https://sites.google.com/site/polesofinaccessibility/
 -- Requires: https://github.com/CartoDB/cartodb-postgresql
 CREATE OR REPLACE FUNCTION CDB_PIA(
    IN polygon geometry,
    IN tolerance numeric DEFAULT 1.0
    )
 RETURNS geometry  AS $$
 DECLARE
    env geometry[];
    cells geometry[];
    cell geometry;
    best_c geometry;
    best_d numeric;
    test_d numeric;
    test_mx numeric;
    test_h numeric;
    test_cells geometry[];
    width numeric;
    height numeric;
    h numeric;
    i integer;
    n integer;
    sqr numeric;
    p geometry;
 BEGIN
    sqr := |/2;
    polygon := ST_Transform(polygon, 3857);
    -- grid #0 cell size
    height := ST_YMax(polygon) - ST_YMin(polygon);
    width := ST_XMax(polygon) - ST_XMin(polygon);
    h := 0.5*LEAST(height, width);
    -- grid #0
    with c1 as(
        SELECT cdb_crankshaft.CDB_RectangleGrid(polygon, h, h) as c
    )
    SELECT array_agg(c) INTO cells FROM c1;
    -- 1st guess: centroid
    best_d := cdb_crankshaft._Signed_Dist(polygon, ST_Centroid(Polygon));
    -- looping the loop
    n := array_length(cells,1);
    i := 1;
    LOOP
        EXIT WHEN i > n;
        cell := cells[i];
        i := i+1;
        -- cell side size, it's square
        test_h := ST_XMax(cell) - ST_XMin(cell) ;
        -- check distance
        test_d := cdb_crankshaft._Signed_Dist(polygon, ST_Centroid(cell));
        IF test_d > best_d THEN
            best_d := test_d;
            best_c := cells[i];
        END IF;
        -- longest distance within the cell
        test_mx := test_d + (test_h/2 * sqr);
        -- if the cell has no chance to contains the desired point, continue
        CONTINUE WHEN test_mx - best_d <= tolerance;
        -- resample the cell
        with c1 as(
            SELECT cdb_crankshaft.CDB_RectangleGrid(cell, test_h/2, test_h/2) as c
        )
        SELECT array_agg(c) INTO test_cells FROM c1;
        -- concat the new cells to the former array
        cells := cells || test_cells;
        -- prepare next iteration
        n := array_length(cells,1);
    END LOOP;
    RETURN ST_transform(ST_Centroid(best_c), 4326);
 END;
 $$ language plpgsql IMMUTABLE;
 -- signed distance point to polygon with holes
 -- negative is the point is out the polygon
 CREATE OR REPLACE FUNCTION _Signed_Dist(
    IN polygon geometry,
    IN point geometry
    )
 RETURNS numeric  AS $$
 DECLARE
    i integer;
    within integer;
    holes integer;
    dist numeric;
 BEGIN
    dist := 1e999;
    SELECT LEAST(dist, ST_distance(point, ST_ExteriorRing(polygon))::numeric) INTO dist;
    SELECT CASE WHEN ST_Within(point,polygon) THEN 1 ELSE -1 END INTO within;
    SELECT ST_NumInteriorRings(polygon) INTO holes;
    IF holes > 0 THEN
        FOR i IN 1..holes
        LOOP
            SELECT LEAST(dist, ST_distance(point, ST_InteriorRingN(polygon, i))::numeric) INTO dist;
        END LOOP;
    END IF;
    dist := dist * within::numeric;
    RETURN dist;
 END;
 $$ language plpgsql IMMUTABLE;
--- a/src/pg/sql/14_densify.sql
+++ b/src/pg/sql/14_densify.sql
@ -0,0 +1,67 @@
 --
 -- Iterative densification of a set of points using Delaunay triangulation
 -- the new points have as assigned value the average value of the 3 vertex (centroid)
 --
 -- @param geomin - array of geometries (points)
 --
 -- @param colin - array of numeric values in that points
 --
 -- @param iterations - integer, number of iterations
 --
 --
 -- Returns: TABLE(geomout geometry, colout numeric)
 --
 --
 CREATE OR REPLACE FUNCTION CDB_Densify(
    IN geomin geometry[],
    IN colin numeric[],
    IN iterations integer
    )
 RETURNS TABLE(geomout geometry, colout numeric)  AS $$
 DECLARE
    geotemp geometry[];
    coltemp numeric[];
    i integer;
    gs geometry[];
    g geometry;
    vertex geometry[];
    va numeric;
    vb numeric;
    vc numeric;
    center geometry;
    centerval numeric;
    tmp integer;
 BEGIN
    geotemp := geomin;
    coltemp := colin;
    FOR i IN 1..iterations
    LOOP
        -- generate TIN
        WITH    a as (SELECT unnest(geotemp) AS e),
                b as (SELECT ST_DelaunayTriangles(ST_Collect(a.e),0.001, 0) AS t FROM a),
                c as (SELECT (ST_Dump(t)).geom AS v FROM b)
        SELECT array_agg(v) INTO gs FROM c;
        -- loop cells
        FOREACH g IN ARRAY gs
        LOOP
            -- append centroid
            SELECT ST_Centroid(g) INTO center;
            geotemp := array_append(geotemp, center);
            -- retrieve the value of each vertex
            WITH a AS (SELECT (ST_DumpPoints(g)).geom AS v)
            SELECT array_agg(v) INTO vertex FROM a;
            WITH a AS(SELECT unnest(geotemp) as geo, unnest(coltemp) as c)
            SELECT c INTO va FROM a WHERE ST_Equals(geo, vertex[1]);
            WITH a AS(SELECT unnest(geotemp) as geo, unnest(coltemp) as c)
            SELECT c INTO vb FROM a WHERE ST_Equals(geo, vertex[2]);
            WITH a AS(SELECT unnest(geotemp) as geo, unnest(coltemp) as c)
            SELECT c INTO vc FROM a WHERE ST_Equals(geo, vertex[3]);
            -- calc the value at the center
            centerval := (va + vb + vc) / 3;
            -- append the value
            coltemp := array_append(coltemp, centerval);
        END LOOP;
    END LOOP;
    RETURN QUERY SELECT unnest(geotemp ) as geomout, unnest(coltemp ) as colout;
 END;
 $$ language plpgsql IMMUTABLE;
--- a/src/pg/sql/15_tinmap.sql
+++ b/src/pg/sql/15_tinmap.sql
@ -0,0 +1,43 @@
 CREATE OR REPLACE FUNCTION CDB_TINmap(
    IN geomin geometry[],
    IN colin numeric[],
    IN iterations integer
    )
 RETURNS TABLE(geomout geometry, colout numeric)  AS $$
 DECLARE
    p geometry[];
    vals numeric[];
    gs geometry[];
    g geometry;
    vertex geometry[];
    centerval numeric;
    va numeric;
    vb numeric;
    vc numeric;
    coltemp numeric[];
 BEGIN
    SELECT array_agg(dens.geomout), array_agg(dens.colout) INTO p, vals FROM cdb_crankshaft.CDB_Densify(geomin, colin, iterations) dens;
    WITH    a as (SELECT unnest(p) AS e),
            b as (SELECT ST_DelaunayTriangles(ST_Collect(a.e),0.001, 0) AS t FROM a),
            c as (SELECT (ST_Dump(t)).geom AS v FROM b)
        SELECT array_agg(v) INTO gs FROM c;
    FOREACH g IN ARRAY gs
    LOOP
        -- retrieve the vertex of each triangle
        WITH a AS (SELECT (ST_DumpPoints(g)).geom AS v)
            SELECT array_agg(v) INTO vertex FROM a;
        -- retrieve the value of each vertex
        WITH a AS(SELECT unnest(p) as geo, unnest(vals) as c)
            SELECT c INTO va FROM a WHERE ST_Equals(geo, vertex[1]);
        WITH a AS(SELECT unnest(p) as geo, unnest(vals) as c)
            SELECT c INTO vb FROM a WHERE ST_Equals(geo, vertex[2]);
        WITH a AS(SELECT unnest(p) as geo, unnest(vals) as c)
            SELECT c INTO vc FROM a WHERE ST_Equals(geo, vertex[3]);
        -- calc the value at the center
        centerval := (va + vb + vc) / 3;
        -- append the value
        coltemp := array_append(coltemp, centerval);
    END LOOP;
    RETURN QUERY SELECT unnest(gs) as geomout, unnest(coltemp ) as colout;
 END;
 $$ language plpgsql IMMUTABLE;
--- a/src/pg/sql/19_contour.sql
+++ b/src/pg/sql/19_contour.sql
@ -0,0 +1,208 @@
 CREATE OR REPLACE FUNCTION CDB_Contour(
    IN geomin geometry[],
    IN colin numeric[],
    IN buffer numeric,
    IN intmethod integer,
    IN classmethod integer,
    IN steps integer,
    IN max_time integer DEFAULT 60000
    )
 RETURNS TABLE(
    the_geom geometry,
    bin integer,
    min_value numeric,
    max_value numeric,
    avg_value numeric
 )  AS $$
 DECLARE
    cell_count integer;
    tin geometry[];
    resolution integer;
 BEGIN
    -- nasty trick to override issue #121
    IF max_time = 0 THEN
        max_time = -90;
    END IF;
    resolution := max_time;
    max_time := -1 * resolution;
    -- calc the optimal number of cells for the current dataset
    SELECT
    CASE intmethod
        WHEN 0 THEN round(3.7745903782 * max_time - 9.4399210051 * array_length(geomin,1) - 1350.8778213073)
        WHEN 1 THEN round(2.2855592156 * max_time - 87.285217133 * array_length(geomin,1) + 17255.7085601797)
        WHEN 2 THEN  round(0.9799471999 * max_time - 127.0334085369 * array_length(geomin,1) + 22707.9579721218)
        ELSE 10000
    END INTO cell_count;
    -- we don't have iterative barycentric interpolation in CDB_interpolation,
    --    and it's a costy function, so let's make a custom one here till
    --    we update the code
    -- tin := ARRAY[]::geometry[];
    IF intmethod=1 THEN
        WITH
            a as (SELECT unnest(geomin) AS e),
            b as (SELECT ST_DelaunayTriangles(ST_Collect(a.e),0.001, 0) AS t FROM a),
            c as (SELECT (ST_Dump(t)).geom as v FROM b)
        SELECT array_agg(v) INTO tin FROM c;
    END IF;
    -- Delaunay stuff performed just ONCE!!
    -- magic
    RETURN QUERY
    WITH
    convexhull as (
        SELECT
            ST_ConvexHull(ST_Collect(geomin)) as g,
            buffer * |/ st_area(ST_ConvexHull(ST_Collect(geomin)))/PI() as r
    ),
    envelope as (
        SELECT
            st_expand(a.g, a.r) as e
        FROM  convexhull a
    ),
    envelope3857 as(
        SELECT
            ST_Transform(e, 3857) as geom
        FROM envelope
    ),
    resolution as(
        SELECT
            CASE WHEN resolution <= 0  THEN
                round(|/ (
                 ST_area(geom) / abs(cell_count)
             ))
            ELSE
                resolution
            END AS cell
        FROM envelope3857
    ),
    grid as(
        SELECT
            ST_Transform(cdb_crankshaft.CDB_RectangleGrid(e.geom, r.cell, r.cell), 4326) as geom
        FROM envelope3857 e, resolution r
    ),
    interp as(
        SELECT
            geom,
            CASE
                WHEN intmethod=1 THEN cdb_crankshaft._interp_in_tin(geomin, colin, tin, ST_Centroid(geom))
                ELSE cdb_crankshaft.CDB_SpatialInterpolation(geomin, colin, ST_Centroid(geom), intmethod)
            END as val
        FROM grid
    ),
    classes as(
        SELECT CASE
            WHEN classmethod = 0 THEN
                cdb_crankshaft.CDB_EqualIntervalBins(array_agg(val), steps)
            WHEN classmethod = 1 THEN
                cdb_crankshaft.CDB_HeadsTailsBins(array_agg(val), steps)
            WHEN classmethod = 2 THEN
                cdb_crankshaft.CDB_JenksBins(array_agg(val), steps)
            ELSE
                cdb_crankshaft.CDB_QuantileBins(array_agg(val), steps)
            END as b
        FROM interp
        where val is not null
    ),
    classified as(
        SELECT
        i.*,
        width_bucket(i.val, c.b) as bucket
        FROM interp i left join classes c
        ON 1=1
    ),
    classified2 as(
        SELECT
            geom,
            val,
            CASE
                WHEN bucket = steps THEN bucket - 1
                ELSE bucket
            END as b
        FROM classified
    ),
    final as(
        SELECT
            st_union(geom) as the_geom,
            b as bin,
            min(val) as min_value,
            max(val) as max_value,
            avg(val) as avg_value
        FROM classified2
        GROUP BY bin
    )
    SELECT
        *
    FROM final
    where final.bin is not null
    ;
 END;
 $$ language plpgsql;
 -- =====================================================================
 -- Interp in grid, so we can use barycentric with a precalculated tin (NNI)
 -- =====================================================================
 CREATE OR REPLACE FUNCTION _interp_in_tin(
    IN geomin geometry[],
    IN colin numeric[],
    IN tin geometry[],
    IN point geometry
    )
 RETURNS numeric AS
 $$
 DECLARE
    g geometry;
    vertex geometry[];
    sg numeric;
    sa numeric;
    sb numeric;
    sc numeric;
    va numeric;
    vb numeric;
    vc numeric;
    output numeric;
 BEGIN
    -- get the cell the point is within
    WITH
        a as (SELECT unnest(tin) as v),
        b as (SELECT v FROM a WHERE ST_Within(point, v))
    SELECT v INTO g FROM b;
    -- if we're out of the data realm,
    -- return null
    IF g is null THEN
        RETURN null;
    END IF;
    -- vertex of the selected cell
    WITH a AS (
        SELECT (ST_DumpPoints(g)).geom AS v
    )
    SELECT array_agg(v) INTO vertex FROM a;
    -- retrieve the value of each vertex
    WITH a AS(SELECT unnest(geomin) as geo, unnest(colin) as c)
        SELECT c INTO va FROM a WHERE ST_Equals(geo, vertex[1]);
    WITH a AS(SELECT unnest(geomin) as geo, unnest(colin) as c)
        SELECT c INTO vb FROM a WHERE ST_Equals(geo, vertex[2]);
    WITH a AS(SELECT unnest(geomin) as geo, unnest(colin) as c)
            SELECT c INTO vc FROM a WHERE ST_Equals(geo, vertex[3]);
    -- calc the areas
    SELECT
        ST_area(g),
        ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[2], vertex[3], point]))),
        ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[1], vertex[3], point]))),
        ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point,vertex[1],vertex[2], point]))) INTO sg, sa, sb, sc;
    output := (coalesce(sa,0) * coalesce(va,0) + coalesce(sb,0) * coalesce(vb,0) + coalesce(sc,0) * coalesce(vc,0)) / coalesce(sg,1);
    RETURN output;
 END;
 $$
 language plpgsql;
--- a/src/pg/sql/cdb_utils.sql
+++ b/src/pg/sql/cdb_utils.sql
@ -0,0 +1,447 @@
 --
 -- Fill given extent with a rectangular coverage
 --
 -- @param ext Extent to fill. Only rectangles with center point falling
 --            inside the extent (or at the lower or leftmost edge) will
 --            be emitted. The returned hexagons will have the same SRID
 --            as this extent.
 --
 -- @param width With of each rectangle
 --
 -- @param height Height of each rectangle
 --
 -- @param origin Optional origin to allow for exact tiling.
 --               If omitted the origin will be 0,0.
 --               The parameter is checked for having the same SRID
 --               as the extent.
 --
 --
 CREATE OR REPLACE FUNCTION CDB_RectangleGrid(ext GEOMETRY, width FLOAT8, height FLOAT8, origin GEOMETRY DEFAULT NULL)
 RETURNS SETOF GEOMETRY
 AS $$
 DECLARE
  h GEOMETRY; -- rectangle cell
  hstep FLOAT8; -- horizontal step
  vstep FLOAT8; -- vertical step
  hw FLOAT8; -- half width
  hh FLOAT8; -- half height
  vstart FLOAT8;
  hstart FLOAT8;
  hend FLOAT8;
  vend FLOAT8;
  xoff FLOAT8;
  yoff FLOAT8;
  xgrd FLOAT8;
  ygrd FLOAT8;
  x FLOAT8;
  y FLOAT8;
  srid INTEGER;
 BEGIN
  srid := ST_SRID(ext);
  xoff := 0;
  yoff := 0;
  IF origin IS NOT NULL THEN
    IF ST_SRID(origin) != srid THEN
      RAISE EXCEPTION 'SRID mismatch between extent (%) and origin (%)', srid, ST_SRID(origin);
    END IF;
    xoff := ST_X(origin);
    yoff := ST_Y(origin);
  END IF;
  --RAISE DEBUG 'X offset: %', xoff;
  --RAISE DEBUG 'Y offset: %', yoff;
  hw := width/2.0;
  hh := height/2.0;
  xgrd := hw;
  ygrd := hh;
  --RAISE DEBUG 'X grid size: %', xgrd;
  --RAISE DEBUG 'Y grid size: %', ygrd;
  hstep := width;
  vstep := height;
  -- Tweak horizontal start on hstep grid from origin
  hstart := xoff + ceil((ST_XMin(ext)-xoff)/hstep)*hstep;
  --RAISE DEBUG 'hstart: %', hstart;
  -- Tweak vertical start on vstep grid from origin
  vstart := yoff + ceil((ST_Ymin(ext)-yoff)/vstep)*vstep;
  --RAISE DEBUG 'vstart: %', vstart;
  hend := ST_XMax(ext);
  vend := ST_YMax(ext);
  --RAISE DEBUG 'hend: %', hend;
  --RAISE DEBUG 'vend: %', vend;
  x := hstart;
  WHILE x < hend LOOP -- over X
    y := vstart;
    h := ST_MakeEnvelope(x-hw, y-hh, x+hw, y+hh, srid);
    WHILE y < vend LOOP -- over Y
      RETURN NEXT h;
      h := ST_Translate(h, 0, vstep);
      y := yoff + round(((y + vstep)-yoff)/ygrd)*ygrd; -- round to grid
    END LOOP;
    x := xoff + round(((x + hstep)-xoff)/xgrd)*xgrd; -- round to grid
  END LOOP;
  RETURN;
 END
 $$ LANGUAGE 'plpgsql' IMMUTABLE;
 --
 -- Calculate the equal interval bins for a given column
 --
 -- @param in_array A numeric array of numbers to determine the best
 --                   to determine the bin boundary
 --
 -- @param breaks The number of bins you want to find.
 --
 --
 -- Returns: upper edges of bins
 --
 --
 CREATE OR REPLACE FUNCTION CDB_EqualIntervalBins ( in_array NUMERIC[], breaks INT ) RETURNS NUMERIC[] as $$
 DECLARE
    diff numeric;
    min_val numeric;
    max_val numeric;
    tmp_val numeric;
    i INT := 1;
    reply numeric[];
 BEGIN
    SELECT min(e), max(e) INTO min_val, max_val FROM ( SELECT unnest(in_array) e ) x WHERE e IS NOT NULL;
    diff = (max_val - min_val) / breaks::numeric;
    LOOP
        IF i < breaks THEN
            tmp_val = min_val + i::numeric * diff;
            reply = array_append(reply, tmp_val);
            i := i+1;
        ELSE
            reply = array_append(reply, max_val);
            EXIT;
        END IF;
    END LOOP;
    RETURN reply;
 END;
 $$ language plpgsql IMMUTABLE;
 --
 -- Determine the Heads/Tails classifications from a numeric array
 --
 -- @param in_array A numeric array of numbers to determine the best
 --            bins based on the Heads/Tails method.
 --
 -- @param breaks The number of bins you want to find.
 --
 --
 CREATE OR REPLACE FUNCTION CDB_HeadsTailsBins ( in_array NUMERIC[], breaks INT) RETURNS NUMERIC[] as $$
 DECLARE
    element_count INT4;
    arr_mean numeric;
    i INT := 2;
    reply numeric[];
 BEGIN
    -- get the total size of our row
    element_count := array_upper(in_array, 1) - array_lower(in_array, 1);
    -- ensure the ordering of in_array
    SELECT array_agg(e) INTO in_array FROM (SELECT unnest(in_array) e ORDER BY e) x;
    -- stop if no rows
    IF element_count IS NULL THEN
        RETURN NULL;
    END IF;
    -- stop if our breaks are more than our input array size
    IF element_count < breaks THEN
        RETURN in_array;
    END IF;
    -- get our mean value
    SELECT avg(v) INTO arr_mean FROM (  SELECT unnest(in_array) as v ) x;
    reply = Array[arr_mean];
    -- slice our bread
    LOOP
        IF i > breaks THEN  EXIT;  END IF;
        SELECT avg(e) INTO arr_mean FROM ( SELECT unnest(in_array) e) x WHERE e > reply[i-1];
        IF arr_mean IS NOT NULL THEN
            reply = array_append(reply, arr_mean);
        END IF;
        i := i+1;
    END LOOP;
    RETURN reply;
 END;
 $$ language plpgsql IMMUTABLE;
 --
 -- Determine the Jenks classifications from a numeric array
 --
 -- @param in_array A numeric array of numbers to determine the best
 --            bins based on the Jenks method.
 --
 -- @param breaks The number of bins you want to find.
 --
 -- @param iterations The number of different starting positions to test.
 --
 -- @param invert Optional wheter to return the top of each bin (default)
 --               or the bottom. BOOLEAN, default=FALSE.
 --
 --
 CREATE OR REPLACE FUNCTION CDB_JenksBins ( in_array NUMERIC[], breaks INT, iterations INT DEFAULT 5, invert BOOLEAN DEFAULT FALSE) RETURNS NUMERIC[] as $$
 DECLARE
    element_count INT4;
    arr_mean NUMERIC;
    bot INT;
    top INT;
    tops INT[];
    classes INT[][];
    i INT := 1; j INT := 1;
    curr_result NUMERIC[];
    best_result NUMERIC[];
    seedtarget TEXT;
    quant NUMERIC[];
    shuffles INT;
 BEGIN
    -- get the total size of our row
    element_count := array_length(in_array, 1); --array_upper(in_array, 1) - array_lower(in_array, 1);
    -- ensure the ordering of in_array
    SELECT array_agg(e) INTO in_array FROM (SELECT unnest(in_array) e ORDER BY e) x;
    -- stop if no rows
    IF element_count IS NULL THEN
        RETURN NULL;
    END IF;
    -- stop if our breaks are more than our input array size
    IF element_count < breaks THEN
        RETURN in_array;
    END IF;
    shuffles := LEAST(GREATEST(floor(2500000.0/(element_count::float*iterations::float)), 1), 750)::int;
    -- get our mean value
    SELECT avg(v) INTO arr_mean FROM (  SELECT unnest(in_array) as v ) x;
    -- assume best is actually Quantile
    SELECT cdb_crankshaft.CDB_QuantileBins(in_array, breaks) INTO quant;
    -- if data is very very large, just return quant and be done
    IF element_count > 5000000 THEN
        RETURN quant;
    END IF;
    -- change quant into bottom, top markers
    LOOP
        IF i = 1 THEN
            bot = 1;
        ELSE
            -- use last top to find this bot
            bot = top+1;
        END IF;
        IF i = breaks THEN
            top = element_count;
        ELSE
            SELECT count(*) INTO top FROM ( SELECT unnest(in_array) as v) x WHERE v <= quant[i];
        END IF;
        IF i = 1 THEN
            classes = ARRAY[ARRAY[bot,top]];
        ELSE
            classes = ARRAY_CAT(classes,ARRAY[bot,top]);
        END IF;
        IF i > breaks THEN EXIT; END IF;
        i = i+1;
    END LOOP;
    best_result = cdb_crankshaft.CDB_JenksBinsIteration( in_array, breaks, classes, invert, element_count, arr_mean, shuffles);
    --set the seed so we can ensure the same results
    SELECT setseed(0.4567) INTO seedtarget;
    --loop through random starting positions
    LOOP
        IF j > iterations-1 THEN  EXIT;  END IF;
        i = 1;
        tops = ARRAY[element_count];
        LOOP
            IF i = breaks THEN  EXIT;  END IF;
            SELECT array_agg(distinct e) INTO tops FROM (SELECT unnest(array_cat(tops, ARRAY[floor(random()*element_count::float)::int])) as e ORDER BY e) x WHERE e != 1;
            i = array_length(tops, 1);
        END LOOP;
        i = 1;
        LOOP
            IF i > breaks THEN  EXIT;  END IF;
            IF i = 1 THEN
                bot = 1;
            ELSE
                bot = top+1;
            END IF;
            top = tops[i];
            IF i = 1 THEN
                classes = ARRAY[ARRAY[bot,top]];
            ELSE
                classes = ARRAY_CAT(classes,ARRAY[bot,top]);
            END IF;
            i := i+1;
        END LOOP;
        curr_result = cdb_crankshaft.CDB_JenksBinsIteration( in_array, breaks, classes, invert, element_count, arr_mean, shuffles);
        IF curr_result[1] > best_result[1] THEN
            best_result = curr_result;
            j = j-1; -- if we found a better result, add one more search
        END IF;
        j = j+1;
    END LOOP;
    RETURN (best_result)[2:array_upper(best_result, 1)];
 END;
 $$ language plpgsql IMMUTABLE;
 --
 -- Perform a single iteration of the Jenks classification
 --
 CREATE OR REPLACE FUNCTION CDB_JenksBinsIteration ( in_array NUMERIC[], breaks INT, classes INT[][], invert BOOLEAN, element_count INT4, arr_mean NUMERIC, max_search INT DEFAULT 50) RETURNS NUMERIC[] as $$
 DECLARE
    tmp_val numeric;
    new_classes int[][];
    tmp_class int[];
    i INT := 1;
    j INT := 1;
    side INT := 2;
    sdam numeric;
    gvf numeric := 0.0;
    new_gvf numeric;
    arr_gvf numeric[];
    class_avg numeric;
    class_max_i INT;
    class_min_i INT;
    class_max numeric;
    class_min numeric;
    reply numeric[];
 BEGIN
    -- Calculate the sum of squared deviations from the array mean (SDAM).
    SELECT sum((arr_mean - e)^2) INTO sdam FROM (  SELECT unnest(in_array) as e ) x;
    --Identify the breaks for the lowest GVF
    LOOP
        i = 1;
        LOOP
            -- get our mean
            SELECT avg(e) INTO class_avg FROM ( SELECT unnest(in_array[classes[i][1]:classes[i][2]]) as e) x;
            -- find the deviation
            SELECT sum((class_avg-e)^2) INTO tmp_val FROM (   SELECT unnest(in_array[classes[i][1]:classes[i][2]]) as e  ) x;
            IF i = 1 THEN
                arr_gvf = ARRAY[tmp_val];
                -- init our min/max map for later
                class_max = arr_gvf[i];
                class_min = arr_gvf[i];
                class_min_i = 1;
                class_max_i = 1;
            ELSE
                arr_gvf = array_append(arr_gvf, tmp_val);
            END IF;
            i := i+1;
            IF i > breaks THEN EXIT; END IF;
        END LOOP;
        -- calculate our new GVF
        SELECT sdam-sum(e) INTO new_gvf FROM (  SELECT unnest(arr_gvf) as e  ) x;
        -- if no improvement was made, exit
        IF new_gvf < gvf THEN EXIT; END IF;
        gvf = new_gvf;
        IF j > max_search THEN EXIT; END IF;
        j = j+1;
        i = 1;
        LOOP
            --establish directionality (uppward through classes or downward)
            IF arr_gvf[i] < class_min THEN
                class_min = arr_gvf[i];
                class_min_i = i;
            END IF;
            IF arr_gvf[i] > class_max THEN
                class_max = arr_gvf[i];
                class_max_i = i;
            END IF;
            i := i+1;
            IF i > breaks THEN EXIT; END IF;
        END LOOP;
        IF class_max_i > class_min_i THEN
            class_min_i = class_max_i - 1;
        ELSE
            class_min_i = class_max_i + 1;
        END IF;
            --Move from higher class to a lower gid order
            IF class_max_i > class_min_i THEN
                classes[class_max_i][1] = classes[class_max_i][1] + 1;
                classes[class_min_i][2] = classes[class_min_i][2] + 1;
            ELSE -- Move from lower class UP into a higher class by gid
                classes[class_max_i][2] = classes[class_max_i][2] - 1;
                classes[class_min_i][1] = classes[class_min_i][1] - 1;
            END IF;
    END LOOP;
    i = 1;
    LOOP
        IF invert = TRUE THEN
            side = 1; --default returns bottom side of breaks, invert returns top side
        END IF;
        reply = array_append(reply, in_array[classes[i][side]]);
        i = i+1;
        IF i > breaks THEN  EXIT; END IF;
    END LOOP;
    RETURN array_prepend(gvf, reply);
 END;
 $$ language plpgsql IMMUTABLE;
 --
 -- Determine the Quantile classifications from a numeric array
 --
 -- @param in_array A numeric array of numbers to determine the best
 --            bins based on the Quantile method.
 --
 -- @param breaks The number of bins you want to find.
 --
 --
 CREATE OR REPLACE FUNCTION CDB_QuantileBins ( in_array NUMERIC[], breaks INT) RETURNS NUMERIC[] as $$
 DECLARE
    element_count INT4;
    break_size numeric;
    tmp_val numeric;
    i INT := 1;
    reply numeric[];
 BEGIN
    -- sort our values
    SELECT array_agg(e) INTO in_array FROM (SELECT unnest(in_array) e ORDER BY e ASC) x;
    -- get the total size of our data
    element_count := array_length(in_array, 1);
    break_size :=  element_count::numeric / breaks;
    -- slice our bread
    LOOP
        IF i < breaks THEN
            IF break_size * i % 1 > 0 THEN
                SELECT e INTO tmp_val FROM ( SELECT unnest(in_array) e LIMIT 1 OFFSET ceil(break_size * i) - 1) x;
            ELSE
                SELECT avg(e) INTO tmp_val FROM ( SELECT unnest(in_array) e LIMIT 2 OFFSET ceil(break_size * i) - 1 ) x;
            END IF;
        ELSIF i = breaks THEN
            -- select the last value
            SELECT max(e) INTO tmp_val FROM ( SELECT unnest(in_array) e ) x;
        ELSE
            EXIT;
        END IF;
        reply = array_append(reply, tmp_val);
        i := i+1;
    END LOOP;
    RETURN reply;
 END;
 $$ language plpgsql IMMUTABLE;
--- a/src/pg/test/expected/02_moran_test.out
+++ b/src/pg/test/expected/02_moran_test.out
@ -5,6 +5,12 @@ SET client_min_messages TO WARNING;
 \set ECHO none
 _cdb_random_seeds
 (1 row)
 moran|significance
 0.3399|-0.0196
 (1 row)
 _cdb_random_seeds
 (1 row)
 code|quads
 01|HH
--- a/Show More
+++ b/Show More
		`@ -0,0 +1,2 @@`
							`"""Import all functions for pysal_utils"""`
							`from crankshaft.pysal_utils.pysal_utils import *`
		`@ -0,0 +1,2 @@`
							`"""Import all functions from clustering libraries."""`
							`from markov import *`
		`@ -0,0 +1 @@`
							[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]
		`@ -0,0 +1 @@`
							[[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]