Giter Site home page Giter Site logo

Comments (12)

kcrum avatar kcrum commented on June 6, 2024

Thanks for reporting this @Visdoom. Could you please add a minimal working example that would produce such an error?

from python-glmnet.

Visdoom avatar Visdoom commented on June 6, 2024

I will try to work on it

from python-glmnet.

kcrum avatar kcrum commented on June 6, 2024

Thank you!

from python-glmnet.

Visdoom avatar Visdoom commented on June 6, 2024

Hey there, I found some examples that reliably reproduce that error in my code:

m = LogitNet(alpha=0.8,max_iter=2000,tol=0.3,n_splits=3)
X_train = array([[ 8], [ 9], [ 8], [ 4], [ 8],[ 9],[10], [ 4],  [ 5], [ 7],[ 6], [ 7],[ 9],[ 9],[ 6],[ 6],[ 4], [10], [ 5], [ 8], [ 8],[ 9],[ 8],[ 6],[ 7],[ 7]]

y_train = array(['DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC'], dtype=object)

m.fit(X_train, y_train)

from python-glmnet.

Visdoom avatar Visdoom commented on June 6, 2024

Other data would be:

X_train = array([[  7.,   7.,  15.,  10.,  13.,  14.,   9.,  13.,  11.,  10.,  10.,
          10.,  13.,  14.,  10.,   8.,   8.,  10.,  11.,  12.,   8.,  10.,
          18.,   5.,  15.,  12.,  12.,  10.,  10.,  10.,  12.,   8.,  11.,
          11.,   8.,  15.,  11.,  13.]]),
 y_train = array(['CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1',
        'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1',
        'CBC1', 'CBC1', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC1'], dtype=object)

or

X_train = array([[  9.        ,  17.        ,  20.        ,  11.        ,
          13.        ,  14.        ,  15.        ,  17.        ,
          15.        ,  16.        ,  13.        ,  16.        ,
          14.        ,  16.        ,  11.        ,  17.        ,
          12.        ,  18.        ,  11.        ,   9.        ,
          16.        ,  16.        ,  15.        ,  18.        ,
          16.        ,  13.        ,  11.        ,  14.        ,
          14.        ,  15.        ,  15.        ,  18.        ,
          15.        ,  13.        ,  15.        ,  18.        ,
          15.        ,   9.43743297]]),
 y_train = array(['CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T',
        'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T',
        'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T'], dtype=object)
  • Here is transposed X_train for visualization reasons.

I hope that helps

from python-glmnet.

kcrum avatar kcrum commented on June 6, 2024

glmnet.LogitNet is expecting numbers for the dependent variable, not strings (or np.objects). You will want to cast your dependent variables to integers. For example:

y = (y_train == 'DC').astype(int)

will set 'DC' to 1 and everything else to 0.

from python-glmnet.

Visdoom avatar Visdoom commented on June 6, 2024

I do classification on a large scale and it works for most cases even though I use the dependent variable as it is. I don't think, that this is the problem.
If you want I can get an example with the same dependent variable that does the trick, so you can compare.
Best,
S.

from python-glmnet.

kcrum avatar kcrum commented on June 6, 2024

Huh, that's surprising. When I run the example you posted, it does raise the same "Math domain error," however when I replaced y_train with integers like I showed in my comment, the error goes away. Do you see the same thing?

from python-glmnet.

kcrum avatar kcrum commented on June 6, 2024

Hmmmm, now I'm starting to think the issue is something else. I'm guessing a number <= 0 is being passed to one of the np.log calls on line 124 in that last block of the traceback you posted. I'll reopen and investigate...

from python-glmnet.

Visdoom avatar Visdoom commented on June 6, 2024

yes, when I replace the strings in y_train with booleans or int it works for me as well.

from python-glmnet.

kcrum avatar kcrum commented on June 6, 2024

It seems the third CV fold returns a lambda path mostly filled with zeros, and this is causing the error you're seeing. In this fold the covariates for the 'DC' class are effectively identical to those of the 'SC' class, so the best fit coefficient would be zero. Therefore it makes sense that the Fortran code would return a lambda path full of zeros, since no penalty is necessary to shrink the best fit coefficient of zero.

I don't know your use case, but as @wlattner mentioned to me offline, it typically doesn't make much sense to use glment in a univariate problem. It may be worthwhile to add a warning against the univariate case, but personally I don't think this issue merits any changes to python-glmnet, since it is the result of fairly pathological data that doesn't make for a well formed problem.

I could be persuaded otherwise, however, so I'm curious to hear what you think. Thank you for filing issues here!

from python-glmnet.

Visdoom avatar Visdoom commented on June 6, 2024

Hey @kcrum

Thanks for investigating! I've encountered that error when searching a feature space automatically so it is indeed a rather seldom case. I agree that it does not make sense to use glmnet on uni variate cases but I personally are in favor of adding a warning, since those are better caught in an automated approach of i.e. feature selection with the goodness of fit being the selection criterion.

from python-glmnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.