Thank you for this extremely helpful package, which I found over a medium post and rec

Bug(?): lognorm distribution with negative loc parameter about distfit HOT 2 OPEN

erdogant commented on June 6, 2024

Bug(?): lognorm distribution with negative loc parameter

from distfit.

Comments (2)

erdogant commented on June 6, 2024 1

Thank you for the feedback! I agree. I will lower all capitals.

Furthermore, I have been looking into your issue. For many of the distributions, it uses scipy, such as the lognorm. The log/scale parameters are likely better described there.

For the lognormal distribution, the "mean" and "std dev" correspond to log(scale) and shape.
For demonstration:

loc = 5
scale=10
sample_dist = st.lognorm.rvs(3, loc=loc, scale=np.exp(scale), size=10000)
dfit = distfit('parametric', todf=True, distr=["lognorm"])
dfit.fit_transform(sample_dist)

print('Estimated loc: %g, input loc: %g' %(dfit.model['loc'], loc))
print('Estimated mu or scale: %g, input scale: %g' %(np.log(dfit.model['scale']), scale))

[distfit] >INFO> fit
[distfit] >INFO> transform
[distfit] >INFO> [lognorm] [0.36 sec] [RSS: 1.76437e-10] [loc=5.069 scale=22043.122]
[distfit] >INFO> Compute confidence intervals [parametric]
Estimated loc: 5.06934, input loc: 5
Estimated mu or scale: 10.0008, input scale: 10

The loc/scale is nicely estimated.
If I now do the same in your case but first without the filters. The mu seems pretty close.

mu=13.8
loc=47.55
x_sim = np.random.normal(loc=loc,scale=np.exp(mu), size = 10000)
# x_sim = np.append([*filter(lambda x: x<=80, x_sim)],np.random.normal(loc=90,scale=10, size = 50))
# x_sim = np.array([*filter(lambda x: x >=0,x_sim)])

dfit = distfit('parametric', todf=True, distr=["lognorm"])
dfit.fit_transform(x_sim)
dfit.bootstrap(x_sim, n_boots=1)

print('Estimated mu or scale: %g, input scale: %g' %(np.log(dfit.model['scale']), mu))
Estimated mu or scale: 17.3597, input scale: 13.8

Checkout this thread on stackoverflow.

from distfit.

Buedenbender commented on June 6, 2024 1

I will read more into the resources you provided, regarding the "manually" simulated 2nd plot (where the pdf basically looks like a corner, and one can not see bars from the contained histogram), I now understand why it does look this way. The upper-limit confidence interval does explode. E.g. the empirical values in my distribution let's say range from $[0, 1000]$. After using distfit with the popular distribution and consequently executing the bootstrap test the 95-upper Confidence Interval boundary (for the e.g., paretro distribution) is estimated to be at $CI_{Upper} = 500,000$ thus making it impossible to interpret the plot or the upper CI limit.

So I made sure to reread the information you provided. Thank you very much for clarifying the relation between mean, SD and log(loc) and log(scale). Still as far as I understand it negative values should not be possible under the distribution, log(negative) = results in complex number with an imaginary component

from distfit.

Bug(?): lognorm distribution with negative loc parameter about distfit HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent