scikit-learn-contrib / project-template Goto Github PK

A template for scikit-learn extensions

Home Page: http://contrib.scikit-learn.org/project-template/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

project-template's Introduction

scikit-learn-contrib

scikit-learn-contrib is a github organization for gathering high-quality scikit-learn compatible projects. It also provides a template for establishing new scikit-learn compatible projects.

Vision

With the explosion of the number of machine learning papers, it becomes increasingly difficult for users and researchers to implement and compare algorithms. Even when authors release their software, it takes time to learn how to use it and how to apply it to one's own purposes. The goal of scikit-learn-contrib is to provide easy-to-install and easy-to-use high-quality machine learning software. With scikit-learn-contrib, users can install a project by pip install sklearn-contrib-project-name and immediately try it on their data with the usual fit, predict and transform methods. In addition, projects are compatible with scikit-learn tools such as grid search, pipelines, etc.

Projects

If you would like to include your own project in scikit-learn-contrib, take a look at the workflow.

DenMune: Density-peak clustering using mutual nearest neighbors

A simple-but-efficient density-based clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K, the number of nearest neighbors.

Read The Docs, Read the Paper

Maintained by: Mohamed Abbas

lightning

Large-scale linear classification, regression and ranking.

Maintained by Mathieu Blondel and Fabian Pedregosa.

skglm

Fast and modular Generalized Linear Models with support for models missing in scikit-learn.

Maintained by Mathurin Massias, Pierre-Antoine Bannier, Quentin Klopfenstein and Quentin Bertrand.

py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines.

Maintained by Jason Rudy and Mehdi.

imbalanced-learn

Python module to perform under sampling and over sampling with various techniques.

Maintained by Guillaume Lemaitre, Fernando Nogueira, Dayvid Oliveira and Christos Aridas.

polylearn

Factorization machines and polynomial networks for classification and regression in Python.

Maintained by Vlad Niculae.

forest-confidence-interval

Confidence intervals for scikit-learn forest algorithms.

Maintained by Ariel Rokem, Kivan Polimis and Bryna Hazelton.

hdbscan

A high performance implementation of HDBSCAN clustering.

Maintained by Leland McInnes, jc-healy, c-north and Steve Astels.

categorical-encoding

A library of sklearn compatible categorical variable encoders.

Maintained by Will McGinnis and Paul Westenthanner

boruta_py

Python implementations of the Boruta all-relevant feature selection method.

Maintained by Daniel Homola

sklearn-pandas

Pandas integration with sklearn.

Maintained by Israel Saeta Pérez

skope-rules

Machine learning with logical rules in Python.

Maintained by Florian Gardin, Ronan Gautier, Nicolas Goix and Jean-Matthieu Schertzer.

stability-selection

A Python implementation of the stability selection feature selection algorithm.

Maintained by Thomas Huijskens

metric-learn

Metric learning algorithms in Python.

Maintained by CJ Carey, Yuan Tang, William de Vazelhes, Aurélien Bellet and Nathalie Vauquier.

project-template's People

Contributors

Stargazers

Watchers

Forkers

tsterbak skltrial rpgone jnothman vighneshbirodkar quick-start mrastgoo gummibearehausen arokem mechcoder shubhankar90 leezqcst alderasus prem2017 huyhoang17 romainbrault qirun-chen quipa joaorafaelm vakili73 dundee2002 warvito johannfaouzi vhsven gsel9 gshahriar glemaitre souravsingh hacktuarial ccaloian kautumn06 torasonic zachglassman john-james-ai polishtits sbunzel adrinjalali flopska omkar-kumbhar fchmiel paulsanjo apoorvagnihotri aglingael stjordanis gregorymorse leockl pandinosaurus tsu3010 chengning-zhang wreise rth hydron063 shmlzr scikit-mine db11051998 neteeramc machinedog henriquepassos pseudotetraden cmftall ferrod29 shuhuig ashokcse135 belalanik lencof wenjiez samiazn gmartinonqm graue70 gaetandu gregorystrubel wiwang avx99 data-corentinv adamcatto krooonal samgorr lacava kiakoro simeloni juan-bolivar aliciajoseph jjanborowka mschemm milaguiar

project-template's Issues

Should add the package directory to `sys.path` by default

The Quick Start guide claims that running the following instruction is sufficient to generate the documentation:

$ cd doc
$ make html

However, it will fail because the package to document is located at neither the directory of doc/conf.py nor the cwd. The file conf.py will fail to import the package. While this is not an issue in CI since it automatically adds the top-level directory to sys.path, it does become troublesome when following the quick start guide.

Solution: add sys.path.insert(0, os.path.abspath('..')) into conf.py.

Apply changes done on sklearn-extra here.

@rth you did some nice cleanups on sklearn-extra's CI and setup, would it be not too time consuming for you to apply those changes to the template as well?

Does Templates supports pandas output

There are a lot of changes in scikit-learn 1.2.0 where it supports pandas output for all of the transformers but how can I use it in a custom transformer template ?

Does this features is available for custom transformer template ??

There is no TemplateRegressor

Please add a TemplateRegressor example in the skltemplate/_template.py

[Documentation] Search Page is empty

Autogenerated search.html is empty.

It should look something like that.

License when using this template

This repo includes a LICENSE file which says at the top:

Copyright (c) 2016, Vighnesh Birodkar and scikit-learn-contrib contributors

I have several questions about the rules regarding a new project I'm creating with this template.

For my project, can I use my own license or do I have to keep the existing license?
project-template has a 3-Clause BSD license. If I use my own license, does it have to also be BSD 3-Clause or can it be different (e.g. Apache 2.0, MIT)?
Do I have to credit scikit-learn-contrib and/or Vighnesh Birodkar anywhere in my repo/source code?

errors with class arguments to check_estimator

I'm trying to validate an estimator which takes class arguments and seem to be getting multiple error messages:

Taking the TemplateEstimator example and adding an additional argument my_class:

def __init__(self, demo_param='demo_param', my_class=None):
    self.demo_param = demo_param
    self.my_class = my_class

It is then called passing class Gaussian inherited from Likelihood:

import abc
import tensorflow as tf

class Likelihood:
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def predict(self, latent_val):
        raise NotImplementedError("Subclass should implement this.")

class Gaussian(Likelihood):
    def __init__(self, log_var=-2.0):     
        self.log_var = tf.Variable(log_var, name="log_theta")

    def predict(self, latent_val):
        return latent_val

like = Gaussian()
estimator = TemplateEstimator(demo_param='demo_param', my_class=like)
check_estimator(estimator)

throwing an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-dd2c6598c00e> in <module>()
      1 like = Gaussian()
      2 estimator = TemplateEstimator(demo_param='demo_param', my_class=like)
----> 3 check_estimator(estimator)

/opt/conda/envs/py3/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
    263     for check in _yield_all_checks(name, estimator):
    264         try:
--> 265             check(name, estimator)
    266         except SkipTest as message:
    267             # the only SkipTest thrown currently results from not

/opt/conda/envs/py3/lib/python3.5/site-packages/sklearn/utils/testing.py in wrapper(*args, **kwargs)
    289             with warnings.catch_warnings():
    290                 warnings.simplefilter("ignore", self.category)
--> 291                 return fn(*args, **kwargs)
    292 
    293         return wrapper

/opt/conda/envs/py3/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py in check_estimators_dtypes(name, estimator_orig)
    837 
    838     for X_train in [X_train_32, X_train_64, X_train_int_64, X_train_int_32]:
--> 839         estimator = clone(estimator_orig)
    840         set_random_state(estimator, 1)
    841         estimator.fit(X_train, y)

/opt/conda/envs/py3/lib/python3.5/site-packages/sklearn/base.py in clone(estimator, safe)
     59     new_object_params = estimator.get_params(deep=False)
     60     for name, param in six.iteritems(new_object_params):
---> 61         new_object_params[name] = clone(param, safe=False)
     62     new_object = klass(**new_object_params)
     63     params_set = new_object.get_params(deep=False)

/opt/conda/envs/py3/lib/python3.5/site-packages/sklearn/base.py in clone(estimator, safe)
     50     elif not hasattr(estimator, 'get_params'):
     51         if not safe:
---> 52             return copy.deepcopy(estimator)
     53         else:
     54             raise TypeError("Cannot clone object '%s' (type %s): "

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    180                             raise Error(
    181                                 "un(deep)copyable object of type %s" % cls)
--> 182                 y = _reconstruct(x, rv, 1, memo)
    183 
    184     # If is its own copy, don't memoize.

/opt/conda/envs/py3/lib/python3.5/copy.py in _reconstruct(x, info, deep, memo)
    295     if state is not None:
    296         if deep:
--> 297             state = deepcopy(state, memo)
    298         if hasattr(y, '__setstate__'):
    299             y.__setstate__(state)

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    153     copier = _deepcopy_dispatch.get(cls)
    154     if copier:
--> 155         y = copier(x, memo)
    156     else:
    157         try:

/opt/conda/envs/py3/lib/python3.5/copy.py in _deepcopy_dict(x, memo)
    241     memo[id(x)] = y
    242     for key, value in x.items():
--> 243         y[deepcopy(key, memo)] = deepcopy(value, memo)
    244     return y
    245 d[dict] = _deepcopy_dict

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    180                             raise Error(
    181                                 "un(deep)copyable object of type %s" % cls)
--> 182                 y = _reconstruct(x, rv, 1, memo)
    183 
    184     # If is its own copy, don't memoize.

/opt/conda/envs/py3/lib/python3.5/copy.py in _reconstruct(x, info, deep, memo)
    295     if state is not None:
    296         if deep:
--> 297             state = deepcopy(state, memo)
    298         if hasattr(y, '__setstate__'):
    299             y.__setstate__(state)

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    153     copier = _deepcopy_dispatch.get(cls)
    154     if copier:
--> 155         y = copier(x, memo)
    156     else:
    157         try:

/opt/conda/envs/py3/lib/python3.5/copy.py in _deepcopy_dict(x, memo)
    241     memo[id(x)] = y
    242     for key, value in x.items():
--> 243         y[deepcopy(key, memo)] = deepcopy(value, memo)
    244     return y
    245 d[dict] = _deepcopy_dict

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    180                             raise Error(
    181                                 "un(deep)copyable object of type %s" % cls)
--> 182                 y = _reconstruct(x, rv, 1, memo)
    183 
    184     # If is its own copy, don't memoize.

/opt/conda/envs/py3/lib/python3.5/copy.py in _reconstruct(x, info, deep, memo)
    295     if state is not None:
    296         if deep:
--> 297             state = deepcopy(state, memo)
    298         if hasattr(y, '__setstate__'):
    299             y.__setstate__(state)

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    153     copier = _deepcopy_dispatch.get(cls)
    154     if copier:
--> 155         y = copier(x, memo)
    156     else:
    157         try:

/opt/conda/envs/py3/lib/python3.5/copy.py in _deepcopy_dict(x, memo)
    241     memo[id(x)] = y
    242     for key, value in x.items():
--> 243         y[deepcopy(key, memo)] = deepcopy(value, memo)
    244     return y
    245 d[dict] = _deepcopy_dict

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    180                             raise Error(
    181                                 "un(deep)copyable object of type %s" % cls)
--> 182                 y = _reconstruct(x, rv, 1, memo)
    183 
    184     # If is its own copy, don't memoize.

/opt/conda/envs/py3/lib/python3.5/copy.py in _reconstruct(x, info, deep, memo)
    295     if state is not None:
    296         if deep:
--> 297             state = deepcopy(state, memo)
    298         if hasattr(y, '__setstate__'):
    299             y.__setstate__(state)

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    153     copier = _deepcopy_dispatch.get(cls)
    154     if copier:
--> 155         y = copier(x, memo)
    156     else:
    157         try:

/opt/conda/envs/py3/lib/python3.5/copy.py in _deepcopy_dict(x, memo)
    241     memo[id(x)] = y
    242     for key, value in x.items():
--> 243         y[deepcopy(key, memo)] = deepcopy(value, memo)
    244     return y
    245 d[dict] = _deepcopy_dict

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    180                             raise Error(
    181                                 "un(deep)copyable object of type %s" % cls)
--> 182                 y = _reconstruct(x, rv, 1, memo)
    183 
    184     # If is its own copy, don't memoize.

/opt/conda/envs/py3/lib/python3.5/copy.py in _reconstruct(x, info, deep, memo)
    295     if state is not None:
    296         if deep:
--> 297             state = deepcopy(state, memo)
    298         if hasattr(y, '__setstate__'):
    299             y.__setstate__(state)

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    153     copier = _deepcopy_dispatch.get(cls)
    154     if copier:
--> 155         y = copier(x, memo)
    156     else:
    157         try:

/opt/conda/envs/py3/lib/python3.5/copy.py in _deepcopy_dict(x, memo)
    241     memo[id(x)] = y
    242     for key, value in x.items():
--> 243         y[deepcopy(key, memo)] = deepcopy(value, memo)
    244     return y
    245 d[dict] = _deepcopy_dict

/opt/conda/envs/py3/lib/python3.5/copy.py in deepcopy(x, memo, _nil)
    172                     reductor = getattr(x, "__reduce_ex__", None)
    173                     if reductor:
--> 174                         rv = reductor(4)
    175                     else:
    176                         reductor = getattr(x, "__reduce__", None)

TypeError: can't pickle _thread.lock objects

Add minimal examples of classifier and transformer that pass check_estimator

FYI scikit-learn/scikit-learn#6526

CSS broken somehow; rendering parameters doesn't work

Also see discussion here:
https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/doc/conf.py#L320

https://sklearn-template.readthedocs.io/en/latest/generated/skltemplate.TemplateEstimator.html#skltemplate.TemplateEstimator

The parameters are not rendered correctly, they should look like this:
https://imbalanced-learn.org/stable/generated/imblearn.under_sampling.CondensedNearestNeighbour.html#imblearn.under_sampling.CondensedNearestNeighbour

or this:
http://contrib.scikit-learn.org/forest-confidence-interval/reference/forestci.html#calc-inbag

The imbalanced-learn docs have app.add_stylesheet("basic.css") in conf.py which would fix this for the template project, but that breaks the alignment in the API overview.

There seems to be some difference in how all the contrib projects are set up and some seem to work and some have glitches :-/ anyway, the current template doesn't render parameters properly.

Migration to CircleCI 2.0

Hi,

I'm wondering whether there should be some changes to the code to support the migration to CircleCI 2.0. If so, it would be useful for people who have used the template for their projects and for future users.

Best regards,
Johann

check_estimator outside of unittests

Hi there,
Thanks for this awesome template 👍

For the moment calls to check_estimator are made insides unittest (see test_common.py)

For my project I find it can be useful to automatically run calls to check_estimators for all classes in some pre-defined modules

proposed solution

a python script at the root of the project, implementing the following process:

load all classes from a module
filter those classes according to a given criterion (I check if they have a .fit method)
call check_estimator with generate_only=True, and sequentially run all tests for all candidate estimators
pretty print the result on the standard output

Here is my code

import inspect

import sklearn
from sklearn.utils.estimator_checks import check_estimator

import skmine.itemsets
# TODO : add other modules here

MODULES = [
    skmine.itemsets,
]

OK = '\x1b[42m[ OK ]\x1b[0m'
FAIL = "\x1b[41m[FAIL]\x1b[0m"

def is_estimator(e):
    _, est = e
    meth = getattr(est, "fit", None)
    return callable(meth)

if __name__ == '__main__':
    for module in MODULES:
        clsmembers = inspect.getmembers(skmine.itemsets, inspect.isclass)
        estimators = filter(is_estimator, clsmembers)
        for est_name, est in estimators:
            # from sklearn 0.23 check_estimator takes an instance as input
            obj = est() if sklearn.__version__[:4] >= '0.23' else est
            checks = check_estimator(obj, generate_only=True)
            for arg, check in checks:
                check_name = check.func.__name__  # unwrap partial function
                desc = '{} === {}'.format(est_name, check_name)
                try:
                    check(arg)
                    print(OK, desc)
                except Exception as e:
                    print(FAIL, desc, e)

and here is the kind of output I get when calling

python check_estimators.py

from the project root

LIMITS

Note that with sklearn.__version__ >= '0.23' we run check_estimator with an instance of an estimator. The above script always instantiate it with default parameters, so it's not perfect, but the point here is to provide a quick check for compatibility

Migrate to Azure pipelines for CI

If we what to be more in sync with CI used in scikit-learn, scikit-learn-extra, imbalanced-learn etc.

check_estimator

Hello,

I'm currently developing a library thanks to your project, and I notice that in sklearn 0.17 (as seen in the requirements), check_estimator should take a class instance. Whereas in master it should take a class object:

In version 0.17.x

    Parameters
    ----------
    Estimator : class
        Class to check.
    """
    name = Estimator.__class__.__name__

In master

    Parameters
    ----------
    Estimator : class
        Class to check. Estimator is a class object (not an instance).
    """
    name = Estimator.__name__

Any idea on how to handle these different cases?

Incompatible dependencies in .travis.yml environment

The following Travis-CI test environment currently raises an "UnsatisfiableError"

project-template/.travis.yml

Lines 14 to 15 in e588f28

    
           - env: PYTHON_VERSION="3.6" NUMPY_VERSION="1.13.1" SCIPY_VERSION="0.19.1" 
        
                  SKLEARN_VERSION="0.20.3"

Shortened Console Log

$ export PYTHON_VERSION="3.6"
$ export NUMPY_VERSION="1.13.1"
$ export SCIPY_VERSION="0.19.1"
$ export SKLEARN_VERSION="0.20.3"
$ conda create -n testenv --yes python=$PYTHON_VERSION pip
$ conda install --yes numpy==$NUMPY_VERSION scipy==$SCIPY_VERSION scikit-learn==$SKLEARN_VERSION cython nose pytest pytest-cov
  
Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed
UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (python):
  - scipy==0.19.1 -> numpy[version='>=1.9.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0']
The following specifications were found to be incompatible with each other:
  - scikit-learn==0.20.3 -> blas==1.0=mkl
  - scikit-learn==0.20.3 -> libgcc-ng[version='>=7.3.0']
  - scikit-learn==0.20.3 -> libstdcxx-ng[version='>=7.3.0']
  - scikit-learn==0.20.3 -> mkl[version='>=2019.1,<2020.0a0'] -> intel-openmp
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> libgfortran-ng[version='>=7,<8.0a0']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> libopenblas[version='>=0.3.3,<1.0a0']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> libffi[version='>=3.2.1,<4.0a0']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> ncurses[version='>=6.1,<7.0a0']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a'] -> ca-certificates
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> pip -> setuptools -> certifi[version='>=2016.09']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> pip -> wheel
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> readline[version='>=7.0,<8.0a0']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> sqlite[version='>=3.26.0,<4.0a0'] -> libedit[version='>=3.1.20181209,<3.2.0a0']
  - scikit-learn==0.20.3 -> numpy[version='>=1.11.3,<2.0a0'] -> python[version='>=3.6,<3.7.0a0'] -> tk[version='>=8.6.8,<8.7.0a0'] -> zlib[version='>=1.2.11,<1.3.0a0']
  - scikit-learn==0.20.3 -> scipy
  - scipy==0.19.1 -> numpy[version='>=1.9.3,<2.0a0'] -> blas==1.0=mkl

Travis CI Raw Console Log

Minimal example to reproduce

conda create -n testenv python=3.6 --yes
conda activate testenv
conda install numpy==1.13.1 scipy==0.19.1 scikit-learn==0.20.3

Debugging notes

Each of the following also raise this error:

conda install numpy==1.13.1 scipy scikit-learn==0.20.3 cython nose pytest pytest-cov
conda install numpy scipy==0.19.1 scikit-learn==0.20.3 cython nose pytest pytest-cov

... but this does not:

conda install numpy==1.16.4 scipy==1.2.1 scikit-learn==0.20.3 cython nose pytest pytest-cov

Simplify CI template

To follow up on @glemaitre's comment in #31 (comment)

The current CI template is somewhat outdated and a bit complex. A few thoughts on what could be simplified,

drop circle CI altogether in favor of using readthedocs.org for the documentation: the latter requires much technical effort to setup and maintain. @glemaitre can share his experience with imbalance-learn
Appveyor: I find spotlight's setup fairly clean https://github.com/maciejkula/spotlight/blob/master/appveyor.yml some things might be adapted from there. Maybe drop the wheel uploading: for maintainers new to Appveyor this may be a bit too much, and it can always be added subsequently in the project.

pngmath vs. imgmath

When building the documentation with make html I receive the following warning:
sphinx.ext.pngmath has been deprecated. Please use sphinx.ext.imgmath instead.

What do you think, should we adjust doc/conf.py to use the newer extension?
Currently, I am not aware of any negative impact.

Running Sphinx v1.4.5.

Travis CI is run on python 2.7 only

Currently there are 4 versions of Python that should be tested according to the .travis.yml file but Python 2.7.14 is used for every build in Travis (see the latest build).

Can't import skltemplate in doc/conf.py

PR #41 introduced from skltemplate import __version__ to doc/conf.py. But now when I run make html this gives me ModuleNotFoundError: No module named 'skltemplate'. So I was wondering what the intended workflow is for building the docs.

EDIT: I realise now that none of the docstring imports for the api documentation work either. So the intention must be for skltemplate to be in the python path when the documentation is generated, but how?

Add _OneToOneFeatureMixin to class templates

Scikit-learn recently updated all classes to have the get_feature_names_out method in order to allow retrieving feature names from pipelines (scikit-learn/scikit-learn#21308). The templates could be updated to reflect this new "template".

I believe it is just as simple as add _OneToOneFeatureMixin to each class. e.g.

from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin, _OneToOneFeatureMixin

and

class TemplateEstimator(BaseEstimator, _OneToOneFeatureMixin):

What do you think?

use readthedocs sphinx theme

The readthedocs sphinx theme has a significantly better menu. The default menu is horrible.
We should advocate using the readthedocs theme, I think.

Clarify distinction between TemplateEstimator and TemplateClassifier

To me it's not obvious what the difference between the two classes is. Maybe TemplateEstimator is thought to do regression, but then this should reflect in the docstring (calling it estimator doesn't suffice IMHO, because every classifer or transformer is a BaseEstimator, too). Also a good hint would be to include the RegressorMixin.

Move project to scikit-learn-contrib

Hi!

Very nice project. We recently created the scikit-learn-contrib organization on github and I think it would be the perfect home for this project. Would you be willing to move your project there? If it's ok with you, I will give you membership to the organization and then you can transfer this repo to the organization.

https://github.com/scikit-learn-contrib

A possibly more explicit name for the repo would be just "project-template" although I am fine if you prefer to keep the current name.

CC @amueller @jnothman

how to run doctests?

How are doctests in the docs run here? Are they run on CI? I think they should be.

Add appveyor CI

Here are the files used in scikit-learn and lightning:
https://github.com/scikit-learn/scikit-learn/blob/master/appveyor.yml
https://github.com/scikit-learn-contrib/lightning/blob/master/appveyor.yml

How to adapt an existing project to this template

Very nice work on this template!

The documentation currently states that to create your own library from this template, one should clone this repo and change the origin url. That is fine when starting a new project from scratch but it is unclear what's the best way to proceed when adapting an existing project to this template: i.e. it's not possible (or very unpractical) to keep the original project git history and the one from this template.

I'm aware that it is under BSD licence, but I was wondering what would be the best way of giving credit to the developers of this template when adapting it?

A short explanation of this in the readme could also be helpful.

Use setuptools_scm

It would be nice, if this template could use setuptools_scm for automated versioning:
https://github.com/pypa/setuptools_scm

Update of the documentation

We should:

Update the documentation
Update the README

The README should contain less information and we will use the documentation webpage to give information regarding editing the template

Cython version?

I just wanted to know if there is a reason why you specify a cython version? Are there any problems using newer versions than the ones proposed? If no, do you use them for backwards compability?

Testing

Would it be possible to update "test.sh" in the folder ci_scripts such that it uses pytest instead of nosetests?

Scikit-Learn switched to pytest since nosetests is no longer maintained to the best of my knowledge/

ValueError in Installation example code

I have just installed the template (and am using scikit-learn v 0.16.1; update I get the same error with scikit-learn v 0.17.1). When I run the example code listed under "Installation and Usage:"

>>> from skltemplate import TemplateEstimator
>>> estimator = TemplateEstimator()
>>> estimator.fit(np.arange(10), np.arange(10))

I get the following ValueError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "skltemplate/template.py", line 33, in fit
    X, y = check_X_y(X, y)
  File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 454, in check_X_y
    check_consistent_length(X, y)
  File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 174, in check_consistent_length
    "%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [ 1 10]

I am not very familiar with the checks that check_X_y is doing, but perhaps this is because X, in this case, is 1-dimensional? I checked this by running:

>>> estimator.fit(np.zeros([10, 10]), np.arange(10))

And no exception was raised.

Extra dependencies in section "Modifying the Documentation"

In the readme file, section "Modifying the Documentation", instead of:
pip install sphinx matplotlib sphinx-gallery
perhaps it should be:
pip install sphinx matplotlib sphinx-gallery numpydoc sphinx_rtd_theme

What do you think?

test_estimator not valid for vectorizers

We are currently developing a transformer compatible with sk-learn that behaves
as a vectorizer for graph type object. The test estimator method injects arrays of
n by m which are not valid to our current input.Tfidf vectorizer supports also a kind
o input that is not an array of n by m features, but rather a vector of strings.
Can the check_estimator constraints be softened for a vectorizer type transformer?

project-template/skltemplate/tests/test_common.py

Line 7 in ac1f099

return check_estimator(TemplateEstimator)


Configuration error:
There is a programable error in your configuration file:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sphinx/config.py", line 161, in __init__
    execfile_(filename, config)
  File "/usr/local/lib/python3.6/site-packages/sphinx/util/pycompat.py", line 150, in execfile_
    exec_(code, _globals)
  File "conf.py", line 18, in <module>
    import sphinx_rtd_theme
ModuleNotFoundError: No module named 'sphinx_rtd_theme'

make: *** [html] Error 2

GitHub bot applying the template to the projects it is used in

When some goodie is added to the template, it would be nice to add it to all the projects using the template.

When something worth to add to the forks is added to master branch of this repo the bot should

create a patch with git format-patch
walk the list of known projects in this repo in some text format. resolve their names in order to handle renamings
clone each one and try to apply the patch
if the patch is applied successfully, create a PR to that repo

python3.6.0 dateutil issue while make of pgadmin

make[1]: Entering directory `/usr/src/pgadmin4-3.0/docs/en_US'
Generating code-snippet.rst for some of the important classes...
sphinx-build -W -b html -d _build/doctrees   . _build/html
Running Sphinx v1.7.5

Configuration error:
There is a programable error in your configuration file:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sphinx/config.py", line 161, in __init__
    execfile_(filename, config)
  File "/usr/local/lib/python3.6/site-packages/sphinx/util/pycompat.py", line 150, in execfile_
    exec_(code, _globals)
  File "conf.py", line 27, in <module>
    import config
  File "/usr/src/pgadmin4-3.0/docs/en_US/../../web/config.py", line 29, in <module>
    from pgadmin.utils import env, IS_PY2, IS_WIN, fs_short_path
  File "/usr/src/pgadmin4-3.0/web/pgadmin/__init__.py", line 31, in <module>
    from pgadmin.utils import PgAdminModule, driver
  File "/usr/src/pgadmin4-3.0/web/pgadmin/utils/__init__.py", line 19, in <module>
    from .preferences import Preferences
  File "/usr/src/pgadmin4-3.0/web/pgadmin/utils/preferences.py", line 18, in <module>
    import dateutil.parser as dateutil_parser
ModuleNotFoundError: No module named 'dateutil'

make[1]: *** [html] Error 2
make[1]: Leaving directory `/usr/src/pgadmin4-3.0/docs/en_US'
make: *** [docs] Error 2

Please advise!

check_estimator(...) should be passed instances in scikit-learn>=0.24

The current approach gets us a:

TypeError: Passing a class was deprecated in version 0.23 and isn't supported anymore from 0.24.Please pass an instance instead.

	- env: PYTHON_VERSION="3.6" NUMPY_VERSION="1.13.1" SCIPY_VERSION="0.19.1"
	SKLEARN_VERSION="0.20.3"