Giter Site home page Giter Site logo

rlichtenwalter / mrmr Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 355 KB

A re-implementation of the minimum redundancy maximum relevance (mRMR) feature selection algorithm with emphasis on greatly increased perfomance (1000x or greater on large data sets) and an improved user interface.

License: GNU General Public License v3.0

Makefile 3.08% C++ 96.92%

mrmr's Introduction

Improved mRMR is a re-implementation of the minimum redundancy maximum relevance (mRMR) feature selection algorithm with emphasis on greatly increased perfomance (1000x or greater on large data sets) and an improved user interface. There are no disadvantages to using this utility as opposed to the original release by Hanchuan Peng, but benefits include:

- results identical to original mRMR implementation by Hanchuan Peng, excluding statistically inconsequential preservation of rank corresponce in the case of metric ties
- incorporation of all improvements from the Fast-MRMR implementation by Sergio Ramírez
- additional performance improvements, such as avoiding computing mutual information for zero-entropy attributes, and careful selection of and usage of data structures
- output for each attribute includes its selection rank, entropy, mutual information with the class attribute, and mRMR score in an easily parsed format friendly to downstream manipulation
- operates directly on original textual data, requiring no transformation into a one-time binary representation
- robust data set parser fails gracefully with bad input and reports the location of the first error
- modular support in the code for arbitrary discretization routines, with several examples already provided and implemented
- support to output the result of parsing and discretization so that it can be verified and analyzed with external tools
- supports stream-based processing, and can operate equally well reading a data set from standard input, in a pipeline, from a named pipe, or with process substitution
- standard GNU getopt_long POSIX-compliant option processing, including full-featured -h/--help capability, informative error messages, graceful failure, and sensible defaults
- high-quality C++14 compliant code base


BUILING
===============
1. To build, enter project directory and type 'make'.
2. To run from project directory, type './mrmr -h' to get usage information and additional help.

EXAMPLE USAGE
===============
The following two commands are equivalent in effect when run from the project top-level directory.

< example.tsv bin/mrmr
bin/mrmr -t '\t' -c 1 -d 'truncate' example.tsv

Notes: Implemented discretization functionality is minimal. Feature values are expected to be or to discretize to be contiguous integers starting from 0, but this is not currently checked.

VERSION HISTORY
===============
0.93 (beta)
	- Added header for easier usage as library principally in support of Python bindings.

0.92 (beta)
	- Added example data and updated README with example usage.
	- Added note about expected feature value ranges.

0.91 (beta)
	- Refactored discretization code.
	- Added 'truncate' discretization procedure and make it the default.
	- Added support for new warning log level and made it the default.
	- Now warn in case a discretization method is not explicitly chosen.
	- Implemented attribute domain translation and integer overflow detection.

0.9 (beta)
	- Support for delimiter specification.
	- Minor improvement to log handling.
	- Small incidental code changes.

0.1 (beta)
	- Initial release.

mrmr's People

Contributors

rlichtenwalter avatar

Stargazers

Arnaldo Gualberto avatar Bruno Facca avatar  avatar

Watchers

 avatar

Forkers

mdiponio

mrmr's Issues

Example file?

Is it possible to provide a small toy example? For example I am wondering how to format my input file, currently it looks like that:

Class,f1,f2,f3,...
2,236432.0283,10734.80089,403899.139
2,268311.6713,16047.0898,480746.0285
2,273036.9338,14640.56709,451720.3672
2,259769.1156,25957.11897,414723.1245
2,293445.6201,20009.28208,457957.967
2,233897.5317,18004.15151,338560.9224
2,304474.4254,22450.29259,443437.309
1,110916.2286,42055.95477,3866734.609
1,127235.2357,33883.33067,3482291.13
1,156927.8094,37390.5515,2636100.025
1,141650.6519,48998.13556,2864093.868
1,127707.6032,58833.52719,3297159.141
1,129534.684,73803.55167,2277034.988

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.