Giter Site home page Giter Site logo

christiansch / skml Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 3.0 3.31 MB

scikit-learn compatibel multi-label classification

Home Page: http://skml.readthedocs.io/en/latest/

License: MIT License

Makefile 0.70% Python 99.30%
machinelearning multi-label-classification scikit-learn machine-learning machine-learning-library multi-label-learning multi-label-problem python python3 artificial-intelligence data-science data-mining multi-label

skml's Introduction

skml

https://travis-ci.org/ChristianSch/skml.svg?branch=master

scikit-learn compatible multi-label classification implementations.

A multi-class classification (MLC) problem is given, if a subset of labels (picture of equation) shall be predicted for an example.

Currently Supported

  • Problem Transformations:
    • Binary Relevance
    • Label Powerset
    • Classifier Chains
    • Probabilistic Classifier Chains
  • Ensembles:
    • Ensemble Classifier Chain

Installation

For production install via pip: ` pip install skml `

For development, clone this repo, change to the directory of skml and inside of the skml directory run the following: ` pip install -e .[dev] python setup. `

Python Supported

Due to dependencies we do not check for a working distribution of skml for the following Python versions:

  • 3.2

skml's People

Contributors

christiansch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

asdhob will241 bw996

skml's Issues

travis doesn't work ๐Ÿ˜ข

Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.12/bin/green", line 11, in <module>
    sys.exit(main())
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/cmdline.py", line 75, in main
    result = run(test_suite, stream, args, testing)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/runner.py", line 92, in run
    targets = [(target, manager.Queue()) for target in toParallelTargets(suite, args.targets)]
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 60, in toParallelTargets
    proto_test_list = toProtoTestList(suite)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 45, in toProtoTestList
    toProtoTestList(i, test_list, doing_completions)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 45, in toProtoTestList
    toProtoTestList(i, test_list, doing_completions)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 45, in toProtoTestList
    toProtoTestList(i, test_list, doing_completions)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 45, in toProtoTestList
    toProtoTestList(i, test_list, doing_completions)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 37, in toProtoTestList
    getattr(suite, exception_method)()
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 235, in testFailure
    raise ImportError(message)
ImportError: Failed to import test.test_br computed from filename /home/travis/build/ChristianSch/skml/test/test_br.py
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/green/loader.py", line 212, in loadFromModuleFilename
    __import__(dotted_module)
  File "/home/travis/build/ChristianSch/skml/test/test_br.py", line 18, in <module>
    X, y = load_dataset('yeast')
  File "/home/travis/build/ChristianSch/skml/skml/datasets/load_datasets.py", line 17, in load_dataset
    data = fetch_mldata('yeast')
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/sklearn/datasets/mldata.py", line 142, in fetch_mldata
    mldata_url = urlopen(urlname)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 467, in error
    result = self._call_chain(*args)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 654, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: INTERNAL SERVER ERROR
make: *** [test] Error 1

about scikit-multilearn

Hi there!

I am wondering if you are one of the original owners/maintainers of scikit-multilearn, and if this repository will carry over that project. It seems that the former is already inactive: some pull requests are not being answered, multiple issues exist, and dependencies are not that managed properly for all Python versions (especially the thing about graph-tools and MEKA's Java dependency).

I am just wondering what the vision for that library would be. I believe that providing a multi-label classification library in Python, to augment scikit-learn, is very useful.

I'd also like to contribute in whatever way. I am currently a graduate student doing research in bioinformatics (hence multi-label classification). I'd be happy to help re-structuring the project, writing docs, refactor some code, and clean-up some unit tests.

Perhaps we can start by managing all dependencies to create a successful travis build. It seems that graph-tools is tricky, and the docker image provided is only for Python 3. In addition to that, there's also the MEKA extension to take care of.

Maybe we can omit for a while the support for these features? And focus for a while with the "easier" ones?

Thank you so much, I'd really love to help out in this project and find people who are still interested to continue maintaining scikit-multilearn.

ModuleNotFoundError: No module named 'skml'

How do I install skml? I get the error 'ModuleNotFoundError: No module named 'skml'' when I try to run the code. When I use 'pip install skml' I get the error ' ERROR: Could not find a version that satisfies the requirement skml (from versions: none)
ERROR: No matching distribution found for skml'

python3: module "future" not found

File "$HOME.virtualenvs/skml/lib/python3.6/site-packages/skmultilearn/dataset.py", line 2, in <module>
    from future import standard_library
ModuleNotFoundError: No module named 'future'

implement label space down sampling

Just like in the original PCC paper we'd like to introduce a way to remove labels from a given label vector easily. These are a few methods that come to mind:

  • by-threshold: only retain labels that occur in, say 95% of the instances
  • most-frequent: take only the top k labels that occur the most frequent (see PCC paper)

datasets

Currently load_dataset isn't really helpful. I guess mirroring them and having a custom load method that does not depend on skmultilearn would be good, as it's currently broken.

ECC: warnings on tests

/home/user/Documents/dev/skml/skml/ensemble/ensemble_classifier_chains.py:93: RuntimeWarning: invalid value encountered in greater_equal
  return (out >= self.threshold).astype(int)
/home/user/Documents/dev/skml/skml/ensemble/ensemble_classifier_chains.py:93: RuntimeWarning: invalid value encountered in true_divide
  out = preds / W_norm

parallelization of predict methods

Well, when running any classifier (including PCC), the fitting works in no time, the predictions however take quite some time and in good Python manier it runs only on a single CPU core. We have 2018 though, and most processors have round about 4-6 cores with a bunch of threads, we should utilize this.
This issue tries to clarify if and how we can utilize multi-core CPUs properly.

The first idea is to parallelize the predict methods just like sklearn does, via Parallel:

        all_importances = Parallel(n_jobs=self.n_jobs,
                                   backend="threading")(
            delayed(getattr)(tree, 'feature_importances_')
for tree in self.estimators_)

refactoring

  • rename estimator to base_estimator maybe?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.