Giter Site home page Giter Site logo

wilcrofter / buzzhash Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 54 KB

A Julia package based on S. Dasgupta, C. F. Stevens, and S. Navlakha (2017). A neural algorithm for a fundamental computing problem. Science, 358, 6364:793-796.

License: Other

Julia 1.01% Jupyter Notebook 98.99%

buzzhash's Introduction

BuzzHash

A locality sensitive hashing algorithm based on Dasgupta, et. al. A neural algorithm for a fundamental computing problem (preprint).

(See also Groschner et. al. Dendritic Integration of Sensory Evidence in Perceptual Decision-Making)

An implementation of the algorithm described by Desgupta, et. al., in the above reference and recently published in Science magazine [1]. The algorithm is based on the olfactory system of Drosophila melanogaster, though it is not a simulation of that system. It is, instead, a biomimetic abstraction applicable to the computer science problem of similarity search, but might well serve as an abstract model of neural organization applicable to other organisms and neural subsystems. In the last regard, the authors mention three possibilities: mouse olfaction, rat cerebellum, and rat hippocampus including entorhinal cortex, dentate gyrus, and hilar cells.

The algorithm is simple. Its input is a real vector, x, which in the fly's case has 50 components representing aggregate output of 50 types of olfactory receptor. The first step is to form x-mean(x) i,e, to subtract the mean of x from each of its elements. This mimics the biological step known as divisive normalization. The "mean centered" input is then multiplied by a sparse 1/0 matrix, A, which expands the number of components in x-mean(x). In the fly's case, the 50 components are expanded to 2000, each of the 2000 values being an aggregate of about 6 original. The non-zero components of A are randomly chosen and, of course, once chosen are fixed. The final step is zeroize all but the largest few components of A(x-mean(x)) and (by default) setting the rest to 1, leaving a sparse vector to act as a tag. (The last step, setting the largest components to 1, may be skipped by calling with clip=false.) In the fly's case the top 5% are retained.

With high probability, the columns of matrix A are linearly independent, which means the input, x-mean(x), can be recovered from its the product, A(x-mean(x)). Note, however, that the final step in which all but the largest values of A(x-mean(x)) are zeroized would make recovery from the final hash approximate at best.

This package provides three algorithms, sprand_fd to form the matrix, A, buzzhash(A, x, topN) to apply the algorithm to x, retaining the topN maximum values, and inverse(A) to form the matrix which can recover x-mean(x) from the product A(x-mean(x)). The notebook, usage.ipynb illustrates basic usage.

Interested parties should also see @dataplayer12's Python implementation and his very nice explanatory post at Medium.

TODO: Initial implementations of sprand_fd(KC, PN, PN_per_KC) and inverse(A) require optimization. They perform very poorly for large KC.

References:

[1] S. Dasgupta, C. F. Stevens, and S. Navlakha (2017). A neural algorithm for a fundamental computing problem. Science, 358, 6364:793-796.

[2] Charles F. Stevens A statistical property of fly odor responses is conserved across odors

[3] Charles F. Stevens What the fly's nose tells the fly's brain

[4] John Myles White MNIST.jl

[future] Cengiz Pehlevan, Alexander Genkin, Dmitri B. Chklovskii, A clustering neural network model of insect olfaction doi: https://doi.org/10.1101/226746

[future] A. A. Fenton et. al. Unmasking the CA1 ensemble place code by exposures to small and large environments: more place cells and multiple, irregularly-arranged, and expanded place fields in the larger space

[future]Kiehn and Forssberg Scientific Background: The Brain's Navigational Place and Grid Cell System(PDF)

buzzhash's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

buzzhash's Issues

Could this be(come) a rolling hash?

I'm so glad this was done in Julia. I have significant experience writing Python, C++, etc. ,etc. but think it makes sense to transition to Julia for almost all of those use cases.

I'm just learning about rolling hashes and, as I understand, the benefit there is that you can re-use the already-calculated values between the ends, for the next window slide step. But maybe that savings is moot in cases like this, where a matrix multiplication on a precalculated binary lookup table is so cheap. IAW, it might be at least as fast as subtracting the last and adding the next step ends. IDK.

I just think the fly neural processing is cool and elegant, in and of itself, anyway. But I've been wondering if this has/can been/be adapted to genome mapping (aka alignment) to a reference. I wonder if it would be any faster (or more amenable to long nucleotide chains with 4 character alphabets) than minimap2
(GitHub repo)

"buzzmap", anyone? (taken)
"buzzgenmap"??

example code

Brilliant job, I have just seen this interesting paper, and find the open source code. I have seen your code, but it seems there isn't an example code. Can you give an example code? Thank you very much.

usage

be nice to have some examples

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.