Giter Site home page Giter Site logo

civisanalytics / python-glmnet Goto Github PK

View Code? Open in Web Editor NEW
261.0 75.0 61.0 195 KB

A python port of the glmnet package for fitting generalized linear models via penalized maximum likelihood.

License: Other

Python 100.00%
lasso elasticnet glmnet glm r python

python-glmnet's Introduction

Python GLMNET

Build status Latest version on conda forge Latest version on PyPI Supported python versions for python-glmnet

This is a Python wrapper for the fortran library used in the R package glmnet. While the library includes linear, logistic, Cox, Poisson, and multiple-response Gaussian, only linear and logistic are implemented in this package.

The API follows the conventions of Scikit-Learn, so it is expected to work with tools from that ecosystem.

Installation

requirements

python-glmnet requires Python version >= 3.6, scikit-learn, numpy, and scipy. Installation from source or via pip requires a Fortran compiler.

conda

conda install -c conda-forge glmnet

pip

pip install glmnet

source

glmnet depends on numpy, scikit-learn and scipy. A working Fortran compiler is also required to build the package. For Mac users, brew install gcc will take care of this requirement.

git clone [email protected]:civisanalytics/python-glmnet.git
cd python-glmnet
python setup.py install

Usage

General

By default, LogitNet and ElasticNet fit a series of models using the lasso penalty (α = 1) and up to 100 values for λ (determined by the algorithm). In addition, after computing the path of λ values, performance metrics for each value of λ are computed using 3-fold cross validation. The value of λ corresponding to the best performing model is saved as the lambda_max_ attribute and the largest value of λ such that the model performance is within cut_point * standard_error of the best scoring model is saved as the lambda_best_ attribute.

The predict and predict_proba methods accept an optional parameter lamb which is used to select which model(s) will be used to make predictions. If lamb is omitted, lambda_best_ is used.

Both models will accept dense or sparse arrays.

Regularized Logistic Regression

from glmnet import LogitNet

m = LogitNet()
m = m.fit(x, y)

Prediction is similar to Scikit-Learn:

# predict labels
p = m.predict(x)
# or probability estimates
p = m.predict_proba(x)

Regularized Linear Regression

from glmnet import ElasticNet

m = ElasticNet()
m = m.fit(x, y)

Predict:

p = m.predict(x)

python-glmnet's People

Contributors

eli-goodfriend avatar eric-valente avatar jacksonlee-civis avatar kcrum avatar lingling93 avatar mheilman avatar mwjin avatar victorrodriguez avatar wlattner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-glmnet's Issues

installing issue on Windows 10

On Windows 10 cmd, I typed the following command
conda install -c conda-forge glmnet

But the error message says

PackagesNotFoundError: The following packages are not available from current channels:

  - glmnet

Current channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

Deprecation warnings from sklearn

The latest master threw me some warning such as

/home/samuel/anaconda3/lib/python3.5/site-packages/sklearn/cross_validation.py:44: 
DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved.
 Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.

  "This module will be removed in 0.20.", DeprecationWarning)

and

home/samuel/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. 
Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

  DeprecationWarning)

They are already at 0.18, so it's just annoying as of now but would be worth having a look into for the future.

ElasticNet.fit raises ValueError when not converging instead of just issuing a warning

Thank you for your package, and for making it available on conda.

If I set a max_iter which is too low, instead of getting a convergence warning as in sklearn behavior, it simply fails with an error. Can this be fixed easily? I'm trying to get the solution for a single lambda (and from what I understood, if I use a default apth, I have no guarantee that glmnet will go to the end of it, it may early stop, which I don't want).
Reproduce with:

from celer.datasets import make_correlated_data
from sklearn.linear_model import ElasticNet
import glmnet
from numpy.linalg import norm
import numpy as np

np.random.seed(0)
X = np.random.randn(100, 200)
X = np.asfortranarray(X)
y = np.random.randn(100)
alpha_max = norm(X.T @ y, ord=np.inf) / len(y)


clf2 = glmnet.ElasticNet(alpha=1, lambda_path=[
                         alpha_max, alpha_max/100], standardize=False, fit_intercept=False, tol=1e-10, max_iter=1).fit(X, y)

output:

/home/mathurin/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/errors.py:66: RuntimeWarning: Model did not converge for smaller values of lambda, returning solution for the largest 3 values.
  warnings.warn("Model did not converge for smaller values of lambda, "
/home/mathurin/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/errors.py:66: RuntimeWarning: Model did not converge for smaller values of lambda, returning solution for the largest 3 values.
  warnings.warn("Model did not converge for smaller values of lambda, "
/home/mathurin/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/util.py:202: RuntimeWarning: lambda_path has a single value, this may be an intercept-only model.
  warnings.warn("lambda_path has a single value, this may be an "
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-97963ff96fcf> in <module>
     10 
     11 
---> 12 clf2 = glmnet.ElasticNet(alpha=1, lambda_path=[
     13                          alpha_max, alpha_max/100], standardize=False, fit_intercept=False, tol=1e-10, max_iter=1).fit(X, y)

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in fit(self, X, y, sample_weight, relative_penalties, groups)
    236                 self._cv = GroupKFold(n_splits=self.n_splits)
    237 
--> 238             cv_scores = _score_lambda_path(self, X, y, groups,
    239                                            sample_weight,
    240                                            relative_penalties,

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/util.py in _score_lambda_path(est, X, y, groups, sample_weight, relative_penalties, scoring, n_jobs, verbose)
     64         warnings.simplefilter(action, UndefinedMetricWarning)
     65 
---> 66         scores = Parallel(n_jobs=n_jobs, verbose=verbose, backend='threading')(
     67             delayed(_fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
     68                                     est.lambda_path_, train_idx, test_idx)

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
   1041             # remaining jobs.
   1042             self._iterating = False
-> 1043             if self.dispatch_one_batch(iterator):
   1044                 self._iterating = self._original_iterator is not None
   1045 

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    859                 return False
    860             else:
--> 861                 self._dispatch(tasks)
    862                 return True
    863 

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch)
    777         with self._lock:
    778             job_idx = len(self._jobs)
--> 779             job = self._backend.apply_async(batch, callback=cb)
    780             # A job can complete so quickly than its callback is
    781             # called before we get here, causing self._jobs to

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    206     def apply_async(self, func, callback=None):
    207         """Schedule a func to be run"""
--> 208         result = ImmediateResult(func)
    209         if callback:
    210             callback(result)

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    570         # Don't delay the application, to avoid keeping the input
    571         # arguments in memory
--> 572         self.results = batch()
    573 
    574     def get(self):

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in __call__(self)
    260         # change the default number of processes to -1
    261         with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262             return [func(*args, **kwargs)
    263                     for func, args, kwargs in self.items]
    264 

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0)
    260         # change the default number of processes to -1
    261         with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262             return [func(*args, **kwargs)
    263                     for func, args, kwargs in self.items]
    264 

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/util.py in _fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
    117 
    118     lamb = np.clip(score_lambda_path, m.lambda_path_[-1], m.lambda_path_[0])
--> 119     return scorer(m, X[test_inx, :], y[test_inx], lamb=lamb)
    120 
    121 

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/scorer.py in _passthrough_scorer(estimator, *args, **kwargs)
    187 def _passthrough_scorer(estimator, *args, **kwargs):
    188     """Function that wraps estimator.score"""
--> 189     return estimator.score(*args, **kwargs)
    190 
    191 

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in score(self, X, y, lamb)
    437 
    438         # pred will have shape (n_samples, n_lambda)
--> 439         pred = self.predict(X, lamb=lamb)
    440 
    441         # Reverse the args of the r2_score function from scikit-learn. The

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in predict(self, X, lamb)
    414             Predicted response value for each sample given each value of lambda
    415         """
--> 416         return self.decision_function(X, lamb)
    417 
    418     def score(self, X, y, lamb=None):

~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in decision_function(self, X, lamb)
    392         # single value of lambda
    393         if lamb.shape[0] == 1:
--> 394             z = z.squeeze(axis=-1)
    395         return z
    396 

ValueError: cannot select an axis to squeeze out which has size not equal to one

ping @agramfort

ElasticNet and LogitNet fail "check_estimator"

If I run

from glmnet import ElasticNet, LogitNet
from sklearn.utils.estimator_checks import check_estimator
check_estimator(ElasticNet)
check_estimator(LogitNet)

then each estimator check fails.

For the ElasticNet, the error is

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-40-d2891e7905ab> in <module>()
----> 1 check_estimator(ElasticNet)

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
    263     for check in _yield_all_checks(name, estimator):
    264         try:
--> 265             check(name, estimator)
    266         except SkipTest as message:
    267             # the only SkipTest thrown currently results from not

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/testing.py in wrapper(*args, **kwargs)
    289             with warnings.catch_warnings():
    290                 warnings.simplefilter("ignore", self.category)
--> 291                 return fn(*args, **kwargs)
    292
    293         return wrapper

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_sample_weights_list(name, estimator_orig)
    429         sample_weight = [3] * 10
    430         # Test that estimators don't raise any exception
--> 431         estimator.fit(X, y, sample_weight=sample_weight)
    432
    433

~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/linear.py in fit(self, X, y, sample_weight, relative_penalties)
    186             sample_weight = np.ones(X.shape[0])
    187
--> 188         self._fit(X, y, sample_weight, relative_penalties)
    189
    190         if self.n_splits >= 3:

~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/linear.py in _fit(self, X, y, sample_weight, relative_penalties)
    225
    226         _y = y.astype(dtype=np.float64, order='F', copy=True)
--> 227         _sample_weight = sample_weight.astype(dtype=np.float64, order='F',
    228                                               copy=True)
    229

AttributeError: 'list' object has no attribute 'astype'

and for the LogitNet, it's

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-b458d16bd33c> in <module>()
----> 1 check_estimator(LogitNet)

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
    263     for check in _yield_all_checks(name, estimator):
    264         try:
--> 265             check(name, estimator)
    266         except SkipTest as message:
    267             # the only SkipTest thrown currently results from not

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/testing.py in wrapper(*args, **kwargs)
    289             with warnings.catch_warnings():
    290                 warnings.simplefilter("ignore", self.category)
--> 291                 return fn(*args, **kwargs)
    292
    293         return wrapper

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_sample_weights_list(name, estimator_orig)
    429         sample_weight = [3] * 10
    430         # Test that estimators don't raise any exception
--> 431         estimator.fit(X, y, sample_weight=sample_weight)
    432
    433

~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties)
    196                                            self.scoring, classifier=True,
    197                                            n_jobs=self.n_jobs,
--> 198                                            verbose=self.verbose)
    199
    200             self.cv_mean_score_ = np.atleast_1d(np.mean(cv_scores, axis=0))

~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/util.py in _score_lambda_path(est, X, y, sample_weight, relative_penalties, cv, scoring, classifier, n_jobs, verbose)
     69             delayed(_fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
     70                                     est.lambda_path_, train_idx, test_idx)
---> 71             for (train_idx, test_idx) in cv)
     72
     73     return scores

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333
    334     def get(self):

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132
    133     def __len__(self):

~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132
    133     def __len__(self):

~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/util.py in _fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
    112     """
    113     m = clone(est)
--> 114     m = m._fit(X[train_inx, :], y[train_inx], sample_weight[train_inx], relative_penalties)
    115
    116     lamb = np.clip(score_lambda_path, m.lambda_path_[-1], m.lambda_path_[0])

TypeError: only integer scalar arrays can be converted to a scalar index

I would expect that these objects should pass the check_estimator checks.

BUG: ModuleNotFoundError: No module named '_glmnet'

Hello,
I have forked the repo to work on a new feature. Based on the contributing documents, I started to see if I can run pytest on tests files, which I could not. Apparently, _glmnet module is missing. I tried to load the modules, and I failed as well. I am wondering if you can help me. I get the following error.

      1 import pkg_resources
      2 
----> 3 from .logistic import LogitNet
      4 from .linear import ElasticNet
      5 

~/Documents/GitHub/GLM-Net/glmnet/logistic.py in <module>
     12 
     13 from .errors import _check_error_flag
---> 14 from _glmnet import lognet, splognet, lsolns
     15 from glmnet.util import (_fix_lambda_path,
     16                          _check_user_lambda,

ModuleNotFoundError: No module named '_glmnet'

Unable to install for python in Windows using pip or conda

Hi,
So I have this project where I am forced to use 'glmnet', however for the last 10 hours I have tried almost everything possible to install it but it is giving some sort of lame 'numpy' dependency error even though I am having latest version of numpy installed. From the original documentation at https://pypi.org/project/glmnet/ I have found out that it need fortran compiler, I have installed that too and ensured its working. But to no use. I have pasteed the error code below for better understanding. Any help regarding this will be greatly appreciated as I can not find any latest post on this that works for the current version of python and pip. Also if I need to downgrade something and than I can install it, please let me know. I have tried this too vaguely but has not worked.
Thanks in advance.

ERROR:
C:\Users\PMLS>pip install glmnet
Collecting glmnet
Using cached glmnet-2.2.1.tar.gz (90 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [2 lines of output]
install requires: 'numpy'. use pip or easy_install.
$ pip install numpy
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Merge with glmnet1?

Hi,

An almost fully functional glmnet_python version from Stanford has been around for a few months now. It has been tested and validated with base R versions with several use cases. There is also an extensive vignette with many examples and feature documentation.

However, it needs a little bit more work in getting it pip installable, by someone knowledgeable about how it is done. Essentially, it has been successfully installed on Centos 6.7/64-bit linux machines but not tested in others.

It would be great to get the help of this community to get the two versions integrated. Here is the code repository:

https://github.com/bbalasub1/glmnet_python

The best starting point for looking at this project is the jupyter notebook here:

https://github.com/bbalasub1/glmnet_python/blob/master/test/glmnet_examples.ipynb

Thank you & Regards,
Bala

Package does not build with `--use-pep517` (with Poetry)

The package does not build with pip using --use-pep517 with poetry. This is mainly an issue as the flag cannot be disabled with poetry (see python-poetry/poetry#3433).

A reproducible example is given below:

mkdir tmp
cd tmp
poetry init 
# click through steps...
poetry add glmnet
> Using version ^2.2.1 for glmnet
> 
> Updating dependencies
> Resolving dependencies... (0.1s)
> 
> Writing lock file
> 
> Package operations: 6 installs, 0 updates, 0 removals
> 
>   • Installing numpy (1.23.3)
>   • Installing joblib (1.2.0)
>   • Installing scipy (1.9.2)
>   • Installing threadpoolctl (3.1.0)
>   • Installing scikit-learn (1.1.2)
>   • Installing glmnet (2.2.1): Failed
> 
>   CalledProcessError
> 
>  ...
> 
>    Installing build dependencies: started
>    Installing build dependencies: finished with status 'done'
>    Getting requirements to build wheel: started
>    Getting requirements to build wheel: finished with status 'error'
>    error: subprocess-exited-with-error
>    
>    × Getting requirements to build wheel did not run successfully.
>exit code: 1
>    ╰─> [2 lines of output]
>        install requires: 'numpy'. use pip or easy_install.
>          $ pip install numpy
>        [end of output]

The error comes from the beginning of the setup.py file:

python-glmnet/setup.py

Lines 12 to 18 in 813c06f

import setuptools # noqa: F401
try:
from numpy.distutils.core import Extension, setup
except ImportError:
sys.exit("install requires: 'numpy'."
" use pip or easy_install."
" \n $ pip install numpy")

which seems to use this SO answer to compile the FORTRAN code from the original R package: https://stackoverflow.com/a/55358607/5861244

A solution in the short term is to call

poetry run python -m pip install glmnet --no-use-pep517
poetry add glmnet 

as mentioned here: python-poetry/poetry#3433 (comment)

Versions for completness:

poetry run python --version
> Python 3.10.7
poetry --version
> Poetry (version 1.2.0)

offset in glmnet

Hi there,

In the R implementation, glmnet allows supplying the model with offset, a vector to be included in the linear predictor. It seems that the current python implementation does not yet have it. But you guys intend to support this soon? Because I found a placeholder in your code. Thanks.

offset = np.zeros((X.shape[0], n_classes), dtype=np.float64, order='F')

Create new release for PIP past 2.0

Please consider creating a new release for PIP (after 2.0).
There are some great new features added after release 2.0, including issue #40, #41 lower_limit and upper_limit, and #31 max_features.

Failed to find libgfortran.3.dylib

I run into an error when trying to pip install glmnet, which I wanted to share with you. I’m using macOS High Siera (10.13.6). The output I receive you find underneath. I also ran a locate to see if I could find the dynamic library, which I did, and also attached the output displaying this result. In case you can tell me how to resolve this, I would appreciate that.

Sincerely,
Michiel

(venv) Zarathustra:allosteric-inference-master Zarathustra$ pip3 install glmnet
Collecting glmnet
  Using cached https://files.pythonhosted.org/packages/c7/97/6f92f20fc193478c5d5927396c8d691abbdaa7774fd67e8a08fdeb1a2470/glmnet-2.0.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/tq/yhb6dsx952l1wcmx5qnxp0940000gn/T/pip-install-u44dd773/glmnet/setup.py", line 38, in <module>
        GFORTRAN_LIB = get_lib_dir('libgfortran.3.dylib')
      File "/private/var/folders/tq/yhb6dsx952l1wcmx5qnxp0940000gn/T/pip-install-u44dd773/glmnet/setup.py", line 30, in get_lib_dir
        raise Exception("Failed to find {}".format(dylib))
    Exception: Failed to find libgfortran.3.dylib
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/tq/yhb6dsx952l1wcmx5qnxp0940000gn/T/pip-install-u44dd773/glmnet/
(venv) Zarathustra:allosteric-inference-master Zarathustra$ locate libgfortran.3.dylib
/Applications/MATLAB_R2017b.app/sys/os/maci64/libgfortran.3.dylib
/Applications/Tellurium.app/Contents/Resources/telocal/python-3.6.3/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libgfortran.3.dylib
/Users/Zarathustra/ETH/git_repo/allosteric-inference-master/venv/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/python_server/tutorial/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/venv/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/venv/python_server/tutorial/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/miniconda2/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda2/pkgs/libgfortran-3.0.1-h93005f0_2/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda3/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda3/pkgs/libgfortran-3.0.1-h93005f0_2/lib/libgfortran.3.dylib
/usr/local/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib

Update CircleCI config to v2

CircleCI v1 configuration files were discontinued at the end of August 2018. We should update to use a v2 configuration file.

IndexError when predicting one sample

Hey there,

when doing leave one out cross-validation LogitNet fails with an Index Error when I try to predict the label of the single test sample.
Here is a simple working example:

X = np.random.randint(0,10, size=(22,10))
y = np.random.randint(0,2,size=(22,))


X_train = X[:-1,:]
y_train = y[:-1]

X_test = X[[-1]]
y_test = y[[-1]]

m = LogitNet(alpha=0.8, tol=0.3, max_iter=2000)

m.fit(X_train, y_train)

m.predict(X_test)

In my case it fails with

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-178-2b6b26202b08> in <module>()
     15 m.fit(X_train, y_train)
     16 
---> 17 m.predict(X_test)

/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in predict(self, X, lamb)
    474         """
    475 
--> 476         scores = self.predict_proba(X, lamb)
    477         indices = scores.argmax(axis=1)
    478 

/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in predict_proba(self, X, lamb)
    443         # reshape z to (n_samples, n_classes, n_lambda)
    444         n_lambda = len(np.atleast_1d(lamb))
--> 445         z = z.reshape(z.shape[0], -1, n_lambda)
    446 
    447         if z.shape[1] == 1:

IndexError: tuple index out of range

As a side note, I could imagine it refers to #25 but I am not sure.

When switching to leave-2-out cv the problem does not occur anymore.
Best and Merry Christmas!

class variable CV not available from instance LogitNet?Python version issue??!!

from glmnet import LogitNet
m3 = LogitNet()
xd = trn

%time m3 = m3.fit(X=xd, y=yd_trn)

the fit command gives me an error

Traceback (most recent call last):

  File "<timed exec>", line 1, in <module>

  File "XXX/anaconda3/lib/python3.6/site-packages/glmnet/logistic.py", line 206, in fit
    self.cv = self.CV(n_splits=self.n_splits, shuffle=True,

AttributeError: 'LogitNet' object has no attribute 'CV'

I have 'fixed' it by changing code to LogitNet.CV .... I am not clear on why this should be required.

I am using Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49)

What can be the reason for different results compared to R glmnet?

Hi,

I have the following pipeline.
First I apply ridge regression using 10-cv to find the best lambda.
I get same lambda max and lambda best as in R cv.glmnet.

Next, I refit the model using the best lambda from the first step, without intercept and compare it to the results of R glmnet.
The coefficients and predictions are different. Why is that?

Comparison of coefficients:
R
(Intercept) 0
f1 -0.004059542
f2 0.377331808
f3 1.006589044
f4 0.876858914
f5 0.140710854
f6 730268.470575249
f7 244447.850561236
f8 537663.923355049
f9 176279.892636801
f10 662.748853227
f11 739399.127039033

python:
Intercept 0
f1 -0.16957
f2 0.33352
f3 0.80749
f4 0.71330
f5 0.11385
f6 801091.27661
f7 293769.02256
f8 557147.70998
f9 251954.31707
f10 797640.12411
f11 1086129.27954

Thanks

Getting glmnet Error no. 777

Hey there,

when trying to fit a LogitNet using

X = np.ones((22,1))
y_ = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

m = LogitNet(alpha=0.8, maxiter=2000, n_splits=3, tol=0.3)

m.fit(X,y_)

I get

RuntimeError                              Traceback (most recent call last)
<ipython-input-30-7c128fa244fb> in <module>()
----> 1 m.fit(X,y_)

/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties)
    187 
    188         # fit the model
--> 189         self._fit(X, y, sample_weight, relative_penalties)
    190 
    191         # score each model on the path of lambda values found by glmnet and

/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in _fit(self, X, y, sample_weight, relative_penalties)
    363         # raises RuntimeError if self.jerr_ is nonzero
    364         self.jerr_ = jerr
--> 365         _check_glmnet_error_flag(self.jerr_)
    366 
    367         # glmnet may not return the requested number of lambda values, so we

/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _check_glmnet_error_flag(jerr)
    138         else:
    139             msg = "glmnet error no. {}"
--> 140             raise RuntimeError(msg.format(jerr))
    141 
    142 

RuntimeError: glmnet error no. 7777

Any idea why?

glmnet in Python is always deterministic, regardless of seed

I was hoping to get some clarification on why glmnet for Python is always deterministic regardless of seed, despite the fact that the documentation states the solver is not deterministic (e.g. https://github.com/civisanalytics/python-glmnet/blob/master/glmnet/linear.py#L77). For example, each of the following different runs return the result, regardless of whether a seed is or is not set:

from glmnet import ElasticNet
import io
import numpy as np
import pandas as pd
import requests
from sklearn.preprocessing import StandardScaler


# Load data
url = 'https://raw.githubusercontent.com/CCS-Lab/easyml/master/Python/datasets/prostate.csv'
s = requests.get(url).content
prostate = pd.read_csv(io.StringIO(s.decode('utf-8')))

# Generate coefficients from data by hand
X, y = prostate.drop('lpsa', axis=1).values, prostate['lpsa'].values
sclr = StandardScaler()
X_preprocessed = sclr.fit_transform(X)

# no random state
coefficients = []
for i in range(10):
    model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200)
    print(id(model))
    model.fit(X_preprocessed, y)
    coefficients.append(np.asarray(model.coef_))
print(coefficients)

# seed set at outer level
np.random.seed(43210)
coefficients = []
for i in range(10):
    model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200)
    print(id(model))
    model.fit(X_preprocessed, y)
    coefficients.append(np.asarray(model.coef_))
print(coefficients)

# seed set at inner level
coefficients = []
for i in range(10):
    np.random.seed(43210)
    model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200)
    print(id(model))
    model.fit(X_preprocessed, y)
    coefficients.append(np.asarray(model.coef_))
print(coefficients)

# seed set at function level
coefficients = []
for i in range(10):
    random_state = np.random.RandomState(i)
    model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200, random_state=random_state)
    print(id(model))
    model.fit(X_preprocessed, y)
    coefficients.append(np.asarray(model.coef_))
print(coefficients)

coefficients = []
random_state = np.random.RandomState(43210)
for i in range(10):
    model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200, random_state=random_state)
    print(id(model))
    model.fit(X_preprocessed, y)
    coefficients.append(np.asarray(model.coef_))
print(coefficients)

This behavior is in direct contrast with the behavior observed in the R version of the glmnet package:

library(easyml) # devtools::install_github("CCS-Lab/easyml", subdir = "R")
library(glmnet)

data("prostate", package = "easyml")

# Set X, y, and scale X
X <- as.matrix(prostate[, -9])
y <- prostate[, 9]
X_scaled <- scale(X)

# no seed
m <- 10
n <- ncol(X)
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
  model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
  model <- glmnet(X_scaled, y)
  coefs <- coef(model, s = model_cv$lambda.min)
  Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)

# Seed set at outer level
set.seed(43210)
m <- 10
n <- ncol(X)
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
  model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
  model <- glmnet(X_scaled, y)
  coefs <- coef(model, s = model_cv$lambda.min)
  Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)

# Seed set at inner level
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
  set.seed(43210)
  model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
  model <- glmnet(X_scaled, y)
  coefs <- coef(model, s = model_cv$lambda.min)
  Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)

# Different seed set each loop at inner level
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
  set.seed(i)
  model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
  model <- glmnet(X_scaled, y)
  coefs <- coef(model, s = model_cv$lambda.min)
  Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)

If the https://github.com/civisanalytics/python-glmnet/ version of glmnet is a wrapper around the Fortran code, why are the behavior in R and Python different?

Mac wheel library linking issue

I've been using glmnet==2.2.1 installed from Mac wheels with gcc==9.3.0 with no issues. But when my colleagues who didn't have gcc installed yet tried to go through the setup, they ran into the following:

python -c "import glmnet"

ImportError: dlopen(/Users/rockwellweiner/model/.venv/lib/python3.7/site-packages/_glmnet.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/gcc/lib/gcc/9/libgfortran.5.dylib Referenced from: /Users/rockwellweiner/model/.venv/lib/python3.7/site-packages/_glmnet.cpython-37m-darwin.so Reason: image not found

Symlinking /usr/local/opt/gcc/lib/gcc/10 to /usr/local/opt/gcc/lib/gcc/9 seems to do the trick but is obviously not ideal; is there maybe a change to setup.py or the wheel build script that would support both?

model.fit fails due to reshaping error in predict_proba()

Hey there,
I've been trying to fit an Elastic Net with your toolbox and ran into an error:

In the logistic.py class in the predict_proba() function you have the following code:

       z = self.decision_function(X, lamb)
        expit(z, z)
       # z = np.atleast_2d(z)

        # reshape z to (n_samples, n_classes, n_lambda)
        n_lambda = len(np.atleast_1d(lamb))
        z = z.reshape(z.shape[0], -1, n_lambda)

However, when the passed X is only one-dimensional and let's say n_lambda = 86, then z.shape() will return the number of lambdas ( as in (86,) , not (1,86)). Which leads the reshape to fail since it tries to shape a 1x86 array into an 86xKx86 array.

As you can see, I added the
z = np.atleast_2d(z)
line which takes care of the reshaping problem. However, then I this kind of error:

/usr/local/lib/python3.4/dist-packages/glmnet/logistic.py in predict(self, X, lamb)
    478         indices = scores.argmax(axis=1)
    479 
--> 480         return self.classes_[indices]
    481 
    482     def score(self, X, y, lamb=None):

IndexError: index 85 is out of bounds for axis 1 with size 2

since then the output is apparently not in the expected shape anymore.
I believe, this error could be fixed with a simple axis=0 in line 478, but I do not have the overview so I thought it's better to report back to you.

Best,
Sophie

ReStructuredText README for PyPI

It'd be nice if the README were .rst instead of .md, for PyPI.

I think one could just

pandoc --from=markdown --to=rst --output=README.rst README.md

Error with sklearn's make_scorer

I have been trying to use
F1 = make_scorer(fbeta_score, beta=1, labels = ['1', '2'], average='micro')
and F2
as the scoring parameter for logistic regression but glmnet throws the below error. It runs smoothly with scoring = 'accuracy', though. I tried to study the code but couldn't find a way to work with customized scores. Any help would be appreciated.

TypeError Traceback (most recent call last)
/home/Dados/Redes_Neurais_II/Dissertacao/Teste pdf 201224.py in
4110 n_splits=n_splits, min_lambda_ratio=min_lambda_ratio, tol=tol,
4111 scoring=scoring, n_jobs=-1, random_state=1, verbose=True)
----> 4112 clf_cv = lgnetcv.fit(x_train, y_train)
4113 print(f'\nMelhor lambda = {clf_cv.lambda_best_} para alpha = {alpha}')
4114 # Usa todo o conj de treinamento (inclusive valid) para achar coeficientes finais

~/.local/lib/python3.6/site-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties, groups)
248 self.scoring,
249 n_jobs=self.n_jobs,
--> 250 verbose=self.verbose)
251
252 self.cv_mean_score_ = np.atleast_1d(np.mean(cv_scores, axis=0))

~/.local/lib/python3.6/site-packages/glmnet/util.py in _score_lambda_path(est, X, y, groups, sample_weight, relative_penalties, scoring, n_jobs, verbose)
67 delayed(fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
68 est.lambda_path
, train_idx, test_idx)
---> 69 for (train_idx, test_idx) in cv_split)
70
71 return scores

~/.local/lib/python3.6/site-packages/joblib/parallel.py in call(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time

~/.local/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())

/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):

/usr/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
117 job, i, func, args, kwds = task
118 try:
--> 119 result = (True, func(*args, **kwds))
120 except Exception as e:
121 if wrap_exception and func is not _helper_reraises_exception:

~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py in call(self, *args, **kwargs)
606 def call(self, *args, **kwargs):
607 try:
--> 608 return self.func(*args, **kwargs)
609 except KeyboardInterrupt:
610 # We capture the KeyboardInterrupt and reraise it as

~/.local/lib/python3.6/site-packages/joblib/parallel.py in call(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):

~/.local/lib/python3.6/site-packages/joblib/parallel.py in (.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):

~/.local/lib/python3.6/site-packages/glmnet/util.py in fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
117
118 lamb = np.clip(score_lambda_path, m.lambda_path
[-1], m.lambda_path_[0])
--> 119 return scorer(m, X[test_inx, :], y[test_inx], lamb=lamb)
120
121

TypeError: call() got an unexpected keyword argument 'lamb'

Use external joblib and six packages, instead of sklearn.externals versions

Starting in version 0.21 of sklearn, certain packages previously provided in the sklearn.externals module are now depreciated. These include joblib and six, which glmnet uses in its util.py and scorer.py files, respectively. To reduce FutureWarnings upon module load, and also to avoid issues when sklearn-0.23 is released and suddenly joblib and six aren't found under sklearn.externals, it would be a good idea to depend on these packages directly, adding them to the requirements.txt file.

BUG: unable to build on Windows

I'm trying to build this package on Windows and not having much luck. Using the mingw-w64 Fortran compiler (installed from Anaconda) and Visual Studio 2015, I get these errors:

glmnet5.o : error LNK2001: unresolved external symbol _gfortran_runtime_error_at
glmnet5.o : error LNK2001: unresolved external symbol _gfortran_internal_pack
glmnet5.o : error LNK2001: unresolved external symbol _gfortran_internal_unpack

A morning of Googling hasn't been much help. The closest I found was this StackOverflow question, but setting the compiler=mingw32 flag in setup.cfg leads to a different error, ValueError: Unknown MS Compiler version 1900.

Have you successfully built this package on Windows?

Compilation issue on MacOS

I am running into a number of Fortran compilation issues on MacOS. I followed the installation instructions but when running python setup.py install I get the following warning: (repeated for many lines)

glmnet/src/glmnet/glmnet5.f90:683:72:

       subroutine get_int_parms(sml,eps,big,mnlam,rsqmax,pmin,exmx)          772
                                                                        1
Warning: Line truncated at (1) [-Wline-truncation]

After these, there are warnings about unused labels.
At the end it returns an error.

ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
error: Command "/usr/local/bin/gfortran -Wall -g -pie -headerpad_max_install_names build/temp.macosx-10.9-x86_64-3.6/build/src.macosx-10.9-x86_64-3.6/glmnet/_glmnetmodule.o build/temp.macosx-10.9-x86_64-3.6/build/src.macosx-10.9-x86_64-3.6/build/src.macosx-10.9-x86_64-3.6/glmnet/fortranobject.o build/temp.macosx-10.9-x86_64-3.6/glmnet/src/glmnet/glmnet5.o -L/usr/local/Cellar/gcc/6.2.0/lib/gcc/6 -L/usr/local/Cellar/gcc/6.2.0/lib/gcc/6 -L/usr/local/Cellar/gcc/6.2.0/lib/gcc/6/gcc/x86_64-apple-darwin15.6.0/6.2.0 -lgfortran -o build/lib.macosx-10.9-x86_64-3.6/_glmnet.cpython-36m-darwin.so" failed with exit status 1

About about one and a half year ago I succesfully installed python-glmnet. Cannot remember any problems at that time. I am installing in a conda environment. Required packages scipy, numpy, scikit-learn have been installed.
Any ideas?

Inaccurate porting of covariance vs naive method

if X.shape[1] > X.shape[0]:
# the glmnet docs suggest using a different algorithm for the case
# of p >> n
algo_flag = 2
else:
algo_flag = 1

glmnet actually does a slightly different check than just a "n" vs "p" comparison like this. It invokes method 1 (covariance method) if p <= 500. The covariance method keeps track of a matrix of covariances C(i,j) for every feature i and every active feature j. And under the hood, C is allocated as a pxp matrix (even though we use much less memory than that usually); this was done out of simplicity because it's very hard to write clever data structures in Fortran. So even when n >> p, if p is also large, this is not a viable default option on most machines.

Anyways, I'd suggest changing to

if X.shape[1] <= 500:
    algo_flag = 1
else:
    algo_flag = 2

Regularized Cox Regression

A wrapper of the regularized cox regression would be awesome. I'm thinking about trying, would it be a similar thing to LogitNet?

Math domain error when fitting a model

Hey there,
I sometimes get the following error when fitting the model for cross validation:

ValueError                                Traceback (most recent call last)
<ipython-input-39-ed9d5b075d90> in <module>()
     38         print(X_train.shape)
     39 
---> 40     m.fit(X_train,y_train)
     41     scores.append(m.score(X_test,y_test))
     42 

/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties)
    196                                            self.scoring, classifier=True,
    197                                            n_jobs=self.n_jobs,
--> 198                                            verbose=self.verbose)
    199 
    200             self.cv_mean_score_ = np.atleast_1d(np.mean(cv_scores, axis=0))

/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _score_lambda_path(est, X, y, sample_weight, relative_penalties, cv, scoring, classifier, n_jobs, verbose)
     69             delayed(_fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
     70                                     est.lambda_path_, train_idx, test_idx)
---> 71             for (train_idx, test_idx) in cv)
     72 
     73     return scores

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
    112     """
    113     m = clone(est)
--> 114     m = m._fit(X[train_inx, :], y[train_inx], sample_weight[train_inx], relative_penalties)
    115 
    116     lamb = np.clip(score_lambda_path, m.lambda_path_[-1], m.lambda_path_[0])

/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in _fit(self, X, y, sample_weight, relative_penalties)
    370         self.lambda_path_ = self.lambda_path_[:self.n_lambda_]
    371         # also fix the first value of lambda
--> 372         self.lambda_path_ = _fix_lambda_path(self.lambda_path_)
    373         self.intercept_path_ = self.intercept_path_[:, :self.n_lambda_]
    374         # also trim the compressed coefficient matrix

/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _fix_lambda_path(lambda_path)
    122     reasonable. The method below matches what is done in the R/glmnent wrapper."""
    123     if lambda_path.shape[0] > 2:
--> 124         lambda_0 = math.exp(2 * math.log(lambda_path[1]) - math.log(lambda_path[2]))
    125         lambda_path[0] = lambda_0
    126     return lambda_path

ValueError: math domain error

I guess there is an uncaught zero or -inf or something in the lambda_path, that is causing the math library to fail. Could you add a condtition?

Docs are not consistent

It looks like the doc strings have slightly different names for various attributes that get set during CV?

ElasticNet.predict returns 0d array on 1-row inputs

The glment.ElasticNet.predict method outputs an array with dimension 0 when given one row to predict. It should return an array with dimension 1 and shape (1,). Presumably the LogitNet.predict_proba has the same problem.

Code to reproduce using glmnet v2.0.0:

from sklearn import datasets
X, y = datasets.make_regression(n_samples=9, n_features=4, random_state=0)

import glmnet
print(glmnet.__version__)
gl = glmnet.ElasticNet(random_state=0)
gl.fit(X, y)

print(gl.predict(X[:2], lamb=[20, 10]).shape)
print(gl.predict(X[:1], lamb=[20, 10]).shape)
print(gl.predict(X[:2]).shape)
print(gl.predict(X[:1]).shape)

Actual output:

2.0.0+18.ga25bcef
(2, 2)
(2,)
(2,)
()

Expected output:

2.0.0+18.ga25bcef
(2, 2)
(2,)
(2,)
(1,)

I'd originally reported this under #30 , but I believe it's a different issue.

Where to put 'lambda.1se'

I want to use 'lambda.1se' to choose the model, I didn't find which parameter is for it.
Can you help me with that ?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.