Hey there, I sometimes get the following error when fitting the model for cross va

Thanks for reporting this <a class="user-mention notranslate" data-hovercard-type="use

Other data would be: <div class="snippet-clipboard-content notranslate position-re

yes, when I replace the strings in y_train with <code

Math domain error when fitting a model about python-glmnet HOT 12 CLOSED

civisanalytics commented on June 6, 2024

Math domain error when fitting a model

from python-glmnet.

Comments (12)

kcrum commented on June 6, 2024

Thanks for reporting this @Visdoom. Could you please add a minimal working example that would produce such an error?

from python-glmnet.

Visdoom commented on June 6, 2024

I will try to work on it

from python-glmnet.

kcrum commented on June 6, 2024

Thank you!

from python-glmnet.

Visdoom commented on June 6, 2024

Hey there, I found some examples that reliably reproduce that error in my code:

m = LogitNet(alpha=0.8,max_iter=2000,tol=0.3,n_splits=3)
X_train = array([[ 8], [ 9], [ 8], [ 4], [ 8],[ 9],[10], [ 4],  [ 5], [ 7],[ 6], [ 7],[ 9],[ 9],[ 6],[ 6],[ 4], [10], [ 5], [ 8], [ 8],[ 9],[ 8],[ 6],[ 7],[ 7]]

y_train = array(['DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC', 'SC'], dtype=object)

m.fit(X_train, y_train)

from python-glmnet.

Visdoom commented on June 6, 2024

Other data would be:

X_train = array([[  7.,   7.,  15.,  10.,  13.,  14.,   9.,  13.,  11.,  10.,  10.,
          10.,  13.,  14.,  10.,   8.,   8.,  10.,  11.,  12.,   8.,  10.,
          18.,   5.,  15.,  12.,  12.,  10.,  10.,  10.,  12.,   8.,  11.,
          11.,   8.,  15.,  11.,  13.]]),
 y_train = array(['CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1',
        'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1', 'CBC1',
        'CBC1', 'CBC1', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC1'], dtype=object)

X_train = array([[  9.        ,  17.        ,  20.        ,  11.        ,
          13.        ,  14.        ,  15.        ,  17.        ,
          15.        ,  16.        ,  13.        ,  16.        ,
          14.        ,  16.        ,  11.        ,  17.        ,
          12.        ,  18.        ,  11.        ,   9.        ,
          16.        ,  16.        ,  15.        ,  18.        ,
          16.        ,  13.        ,  11.        ,  14.        ,
          14.        ,  15.        ,  15.        ,  18.        ,
          15.        ,  13.        ,  15.        ,  18.        ,
          15.        ,   9.43743297]]),
 y_train = array(['CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2', 'CBC2',
        'CBC2', 'CBC2', 'CBC2', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T',
        'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T',
        'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T', 'CBC5T'], dtype=object)

Here is transposed X_train for visualization reasons.

I hope that helps

from python-glmnet.

kcrum commented on June 6, 2024

glmnet.LogitNet is expecting numbers for the dependent variable, not strings (or np.objects). You will want to cast your dependent variables to integers. For example:

y = (y_train == 'DC').astype(int)

will set 'DC' to 1 and everything else to 0.

from python-glmnet.

Visdoom commented on June 6, 2024

I do classification on a large scale and it works for most cases even though I use the dependent variable as it is. I don't think, that this is the problem.
If you want I can get an example with the same dependent variable that does the trick, so you can compare.
Best,
S.

from python-glmnet.

kcrum commented on June 6, 2024

Huh, that's surprising. When I run the example you posted, it does raise the same "Math domain error," however when I replaced y_train with integers like I showed in my comment, the error goes away. Do you see the same thing?

from python-glmnet.

kcrum commented on June 6, 2024

Hmmmm, now I'm starting to think the issue is something else. I'm guessing a number <= 0 is being passed to one of the np.log calls on line 124 in that last block of the traceback you posted. I'll reopen and investigate...

from python-glmnet.

Visdoom commented on June 6, 2024

yes, when I replace the strings in y_train with booleans or int it works for me as well.

from python-glmnet.

kcrum commented on June 6, 2024

It seems the third CV fold returns a lambda path mostly filled with zeros, and this is causing the error you're seeing. In this fold the covariates for the 'DC' class are effectively identical to those of the 'SC' class, so the best fit coefficient would be zero. Therefore it makes sense that the Fortran code would return a lambda path full of zeros, since no penalty is necessary to shrink the best fit coefficient of zero.

I don't know your use case, but as @wlattner mentioned to me offline, it typically doesn't make much sense to use glment in a univariate problem. It may be worthwhile to add a warning against the univariate case, but personally I don't think this issue merits any changes to python-glmnet, since it is the result of fairly pathological data that doesn't make for a well formed problem.

I could be persuaded otherwise, however, so I'm curious to hear what you think. Thank you for filing issues here!

from python-glmnet.

Visdoom commented on June 6, 2024

Hey @kcrum

Thanks for investigating! I've encountered that error when searching a feature space automatically so it is indeed a rather seldom case. I agree that it does not make sense to use glmnet on uni variate cases but I personally are in favor of adding a warning, since those are better caught in an automated approach of i.e. feature selection with the goodness of fit being the selection criterion.

from python-glmnet.

Math domain error when fitting a model about python-glmnet HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent