Giter Site home page Giter Site logo

scikit-learn-contrib / sklearn-ann Goto Github PK

View Code? Open in Web Editor NEW
14.0 14.0 3.0 180 KB

Integration with (approximate) nearest neighbors libraries for scikit-learn + clustering based on with kNN-graphs.

Home Page: https://sklearn-ann.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
approximate-nearest-neighbor-search clustering knn knn-graphs scikit-learn

sklearn-ann's Introduction

scikit-learn-contrib

scikit-learn-contrib is a github organization for gathering high-quality scikit-learn compatible projects. It also provides a template for establishing new scikit-learn compatible projects.

Vision

With the explosion of the number of machine learning papers, it becomes increasingly difficult for users and researchers to implement and compare algorithms. Even when authors release their software, it takes time to learn how to use it and how to apply it to one's own purposes. The goal of scikit-learn-contrib is to provide easy-to-install and easy-to-use high-quality machine learning software. With scikit-learn-contrib, users can install a project by pip install sklearn-contrib-project-name and immediately try it on their data with the usual fit, predict and transform methods. In addition, projects are compatible with scikit-learn tools such as grid search, pipelines, etc.

Projects

If you would like to include your own project in scikit-learn-contrib, take a look at the workflow.

A simple-but-efficient density-based clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K, the number of nearest neighbors.

Read The Docs, Read the Paper

Maintained by: Mohamed Abbas

Large-scale linear classification, regression and ranking.

Maintained by Mathieu Blondel and Fabian Pedregosa.

Fast and modular Generalized Linear Models with support for models missing in scikit-learn.

Maintained by Mathurin Massias, Pierre-Antoine Bannier, Quentin Klopfenstein and Quentin Bertrand.

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines.

Maintained by Jason Rudy and Mehdi.

Python module to perform under sampling and over sampling with various techniques.

Maintained by Guillaume Lemaitre, Fernando Nogueira, Dayvid Oliveira and Christos Aridas.

Factorization machines and polynomial networks for classification and regression in Python.

Maintained by Vlad Niculae.

Confidence intervals for scikit-learn forest algorithms.

Maintained by Ariel Rokem, Kivan Polimis and Bryna Hazelton.

A high performance implementation of HDBSCAN clustering.

Maintained by Leland McInnes, jc-healy, c-north and Steve Astels.

A library of sklearn compatible categorical variable encoders.

Maintained by Will McGinnis and Paul Westenthanner

Python implementations of the Boruta all-relevant feature selection method.

Maintained by Daniel Homola

Pandas integration with sklearn.

Maintained by Israel Saeta Pérez

Machine learning with logical rules in Python.

Maintained by Florian Gardin, Ronan Gautier, Nicolas Goix and Jean-Matthieu Schertzer.

A Python implementation of the stability selection feature selection algorithm.

Maintained by Thomas Huijskens

Metric learning algorithms in Python.

Maintained by CJ Carey, Yuan Tang, William de Vazelhes, Aurélien Bellet and Nathalie Vauquier.

sklearn-ann's People

Contributors

flying-sheep avatar frankier avatar ivirshup avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sklearn-ann's Issues

Maintenance status

Hi, is this still maintained?

We’d be interested to include something like it into scanpy.

In case you’re no longer interested in maintenance, maybe we could transfer this project to scverse and take over, publish it to PyPI, …

Add ScaNN

This is a top scoring system on ann-benchmarks

Specialised parameter search

Can get k = max knn graph and then iteratively truncate it and rerun

Can produce histogram as well as validation indices

Installation instructions

It would be nice if there were documented installation instructions, including information about what "extras" were available.

Likely downstream of #21 and getting a name on pypi.

Document parameters of FAISS

It would be nice if the parameters of the transformers were documented more fully.

In particular, FAISS's index_key argument is not particularly intuitive, and I think can have fairly complex variations.

Fix pre-commit check on main

Seems like the no-commit-to-branch hook gets run on main. I assume pre-commit.ci skips it somehow. We should do that too for that nice green checkmark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.