Giter Site home page Giter Site logo

ebc's Introduction

Ensemble Biclustering for Classification

####Note

This is a forked version of EBC python implementation. For the final release please refer to this repo.

A python implementation of the Ensemble Biclustering for Classification (EBC) algorithm. Although having "biclustering" in its name, EBC is a co-clustering algorithm that allows you to perform co-clustering on very large sparse N-dimensional matrix. For details and examples of using EBC please reference this paper.

##Files

  • ebc.py: an implementation of EBC algorithm using sparse matrix.
  • ebc2d.py: a vectorized implementation of EBC algorithm using (2D) numpy dense array.
  • matrix.py: a N-dimensional sparse matrix implementation using python dict that allows basic get/set/sum operations.
  • resources/: this directory contains many example data that could be used to run EBC on.

##Usage

####General Usage

Using ebc is easy. First you need a sparse matrix constructed as a SparseMatrix instance defined in matrix.py. You can use matrix.py's built in read_data() method to construct the sparse matrix.

Once you have the sparse matrix and import the ebc module, you can simply do the following:

ebc = EBC(sparse_matrix, n_clusters=[30, 125], max_iterations=10)
cXY, objective, iter = ebc.run()

The returned cXY contains the clustering assignments along each axis in a list of lists, objective contains the final objective function value, iter contains the iterations that it ran for to converge.

####Efficiency Considerations

In short, EBC is built for highly sparse multi-dimensional matrix, while EBC2D is a vectorized version of EBC, and is built for less sparse 2D matrix. The running time of EBC increases linearly with the number of non-zero elements, while the running time of EBC2D will only increase when matrix size increases.

  • If your input matrix is highly sparse or you need support for N-dimensional matrix with N > 2, you should use EBC class in ebc.py.
  • If your input matrix is not very sparse (e.g. 5% density), and can fit into memory with numpy, you can choose to use EBC2D class in ebc2d.py. Note that EBC2D only supports 2D matrix, since large multi-dimensional dense matrix can hardly fit into memory.

To give you more sense of the efficiency, the EBC implementation runs for ~50 seconds on our 14052 x 7272 example matrix (which is >99% sparse) on a MacBook, and in this case it is faster than EBC2D. However, in our testing, for a 5000 x 5000 matrix of 95% sparsity, EBC2D can get a x5 speedup over EBC. And this speedup grows as the sparsity decreases.

####Dependency

To run this implementation of EBC algorithm, you need to have numpy installed.

##References

  • Percha, Bethany, and Russ B. Altman. "Learning the Structure of Biomedical Relationships from Unstructured Text." PLoS computational biology 11.7 (2015).
  • Dhillon, Inderjit S., Subramanyam Mallela, and Dharmendra S. Modha. "Information-theoretic co-clustering." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.

ebc's People

Contributors

blpercha avatar yuhaozhang avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.