Giter Site home page Giter Site logo

jhetherly / sparse_linear_binning Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 1.58 MB

Python function for performing a linear binning (a la Wand and Jones) and optimized for sparsity

License: MIT License

Python 0.09% C++ 98.20% C 1.71%
linear-binning data-reduction sparse-data

sparse_linear_binning's Introduction

Build Status

Performs a linear binning technique described in Wand and Jones on a regularly-spaced grid in an arbitrary number of dimensions. The asymptotic behavior of this binning technique performs better than so-called simple binning (i.e. as in histograms). Each data point in d-dimensional space must have an associated weight (for equally weighted points just use a weight of 1.0 for each point).

For example, within a (segment of a) 2D grid with corners A, B, C, and D and a 2D data point P with weight wP:

A-----------------------------------B
|        |                          |
|                                   |
|        |                          |
|- - - - P- - - - - - - - - - - - - |
|        |                          |
D-----------------------------------C
  • Assign a weight to corner A of the proportion of area between P and C (times wP)
  • Assign a weight to corner B of the proportion of area between P and D (times wP)
  • Assign a weight to corner C of the proportion of area between P and A (times wP)
  • Assign a weight to corner D of the proportion of area between P and B (times wP)

Note that the under- and overflow bins need to be accounted for when specifying the numbers of grid points in each dimension (grid points act as bin centers). For instance, if you want grid points in steps of 0.1 in a range of [0,1] (i.e. (0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1)), specify the number of grid points to be 11. Internally, the grid points are stored in a high performance, C++-based hash table (sparsepp). This allows for finer binning in some circumstances because the hash table doesn't allocates memory for grid points with near-zero weight. To accommodate arbitrary numbers of bins along each dimension, an arbitrary precision numeric library (boost multiprecision) may be used internally and will negatively impact performance. If this degradation in performance is unacceptable, consider reducing the number of grid points in such a way that the product of grid points in all dimensions is less than the numeric maximum of "unsigned long long" on your system. For instance, in 20 dimenisons with each dimension having 51 grid points gives a total of 14171098670753043575626125424226001 potential grid points at which point the arbitrary precision library must take care of all arithmetic related to determining grid points.

Quickstart

  • pip install sparse_linear_binning

or

Example

This constructs one million random 2D points in the unit square with random weights and constructs a grid of 51 by 51 (can be different along different dimensions) linearly binned "bin centers." The boundaries of the grid of bin centers are specified by extents and can be thought of as the under- and overflow bins (i.e. these are the coordinates of the first and last bin centers).

from sparse_linear_binning import sparse_linear_binning
import numpy as np

# generate one million random 2D points and weights
# (should take less than a second to bin)
n_samples=1000000
D=2

# coordinates, weights, and extents must be of type "double"
sample_coords = np.random.random(size=(n_samples, D))
sample_weights = np.random.random(size=n_samples)
extents = np.tile([0., 1.], D).reshape((D, 2))
n_bins = np.full(D, 51)

coords, weights = sparse_linear_binning(sample_coords, sample_weights,
                                        extents, n_bins)

# check that weights on grid match original weights
print(np.allclose(weights.sum(), sample_weights.sum()))

Dependencies

  • numpy

sparse_linear_binning's People

Contributors

jhetherly avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.