rc / dist_mixtures Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
dist_mixtures ============= A package for fitting histograms of spatial orientations by a mixture of von Mises distributions. License: New BSD License, see the LICENSE file. Data ---- This package has been developed to analyze muscle cells directions data coming from aortic segments. The related data files can be found at [1]. [1] https://github.com/rc/dist_mixtures_data Installation ------------ Download the source code from [1]. [1] https://github.com/rc/dist_mixtures The following dependencies need to be installed: - NumPy (http://numpy.org) - SciPy (http://scipy.org) - Git version - Matplotlib (http://matplotlib.sourceforge.net) - Statsmodels (http://statsmodels.sourceforge.net) - Git version Then ``cd`` into the ``dist_mixtures/`` directory. Usage ----- Run ``fit_von_mises.py`` without arguments to see the help message:: $ ./fit_von_mises.py Usage: fit_von_mises.py [options] pattern data_dir Fit data files with names matching a given pattern by a mixture of von Mises distributions. Options: --version show program's version number and exit -h, --help show this help message and exit -o dirname, --output-dir=dirname output directory [default: output] -c filename, --conf=filename use configuration file with parameter sets. Ignored, if n_components option is given. -n positive_int, --n-components=positive_int number of components of the mixture [default: 2] -p kappa0,mu0,kappa1,mu1,..., --parameters=kappa0,mu0,kappa1,mu1,... initial guess of von Mises parameters for each component as a comma separated list, e.g., for two components: "1,0,1,0" corresponding to kappa0, mu0, kappa1, mu1 respectively. The location parameter mu should be given in degrees in [-90, 90[. -d pattern, --dir-pattern=pattern pattern that subdirectories should match [default: *] -m positive_int, --merge-bins=positive_int number of consecutive bins in data to merge [default: None] --plot-bins-step=positive_int step to choose bins from all bins for histogram plots [default: 4] --spread-data spread raw data using their counts instead of just repeating them -a, --area-angles compute and draw angles of two systems of fibres determined by equal histogram areas --no-neg-shift do not add 180 degrees to negative angles -s, --show show the figures Example runs ------------ By default, the results are stored into a directory called ``output``. Use ``-o`` option to change that. - Analyze histograms in ``*.txt`` files of data sets in ``data/<dataset name>`` directories, assume 2 von Mises mixture (VMM) components:: $ ./fit_von_mises.py "*.txt" data/ -n 2 - Run analysis with parameters given in a file. This example shows how to run the analysis with varying number of VMM components:: $ ./fit_von_mises.py "*.txt" data/ -c examples/psets/n_components.py See other example parameter set files in ``examples/psets/``.
VonMisesMixture.rvs_mix for a large sample size=50000 up to 1 Million
I got a small (2 to 3) percent observations out of range.
-> add modulo/remainder
Merge 'updates' branch to 'sensitivity' branch, add Struct class to parents of other classes. This would allow:
ipdb> print res.model
VonMisesMixture
_data_attr:
list: ['exog', 'endog', 'data.exog', 'data.endog', 'data.orig_endog', 'data.orig_exog']
confint_dist:
<scipy.stats.distributions.norm_gen object at 0x29343d0>
data:
<statsmodels.base.data.ModelData object at 0x38a4d10>
endog:
(40097,) array of float64
exog:
None
k_constant:
0
that is one would see immediately which attributes objects have.
In the sensitivity branch:
$ file dist_mixtures/mixture_von_mises.py
dist_mixtures/mixture_von_mises.py: Python script, ASCII text executable, with CRLF, LF line terminators
Originally, it was:
$ file dist_mixtures/mixture_von_mises.py
dist_mixtures/mixture_von_mises.py: Python script, ASCII text executable, with CRLF line terminators
Maybe autocrlf [1] setting needs to be fixed?
[1] http://stackoverflow.com/questions/4181870/git-on-windows-what-do-the-crlf-settings-mean
I just ran into a bug with fit_ls for a test case when endog, which is currently the count, is int (int32 returned by np.histogram).
It might be better to convert the data in the models to float if they are not already, since there might be other, less obvious problems with other dtypes.
(not a problem with the current data which is float64)
I am running the code with the test data, looking for two components as follows:
$ ./fit_von_mises.py "*.txt" test-data oo -s
When nm
is used, the two components obtained are:
dist0: shape=3.7856, loc=-3.1304, prob=0.5117
dist1: shape=1.7051, loc=3.0787, prob=0.4883
And with bfgs
:
dist0: shape=2.3769, loc=3.1221, prob=0.5000
dist1: shape=2.3769, loc=3.1221, prob=0.5000
The bfgs
result seems strange - can you reproduce it?
It looks like the vonmises.cdf doesn't work correctly if loc is different from zero.
I think it doesn't shift the support correctly.
I tried to use stats.vonmises.cdf for the chisquare test, but get negative cdf values.
I may have asked this already, but (my colleagues ask...): Is there a way of saying (a statistical test with some p value?) that our data come
a) from a single
b) from a mixture of von Mises distributions?
I am sceptical about b), as given a sufficient count of mixture components, anything can be fitted. I tried 4, and got perfect fits for even rather irregular data.
The question is related to the hypothesis "there are two preferential direction symmetric w.r.t. axis (axis = e.g. the location of a single von Mises)" - I think this can be tested only under assumption of the von Mises mixture. Then even if rejected, in reality there can be two preferential direction symmetric w.r.t. axis, but with a different distribution, which cannot be verified. What do you think?
the original data is binned, so we have a discretized distribution
I don't think the approximation error is large for the estimation
>>> plt.plot(stats.vonmises.pdf(xxpi-d/2., 4)[1:])
[<matplotlib.lines.Line2D object at 0x0DAE1E10>]
>>> plt.show()
>>> np.max(np.abs(np.diff(stats.vonmises.cdf(xxpi, 4)) / d - stats.vonmises.pdf(xxpi, 4)[1:]))
0.015765796647894514
>>> np.max(np.abs(np.diff(stats.vonmises.cdf(xxpi, 4)) / d - stats.vonmises.pdf(xxpi-d/2., 4)[1:]))
0.00015579844725244207
>>> np.max(np.abs(np.diff(stats.vonmises.cdf(xxpi, 4)) / d / stats.vonmises.pdf(xxpi-d/2., 4)[1:] -1))
0.0008251767346727501
Note: cdf in cython looks pretty fast
some possibilities:
Because of the periodicity of cosine, the parameters are not globally unique, but they are locally identified.
for example kappa only needs to be positive
loc only needs to be in interval (-pi, pi) (?)
We don't impose this during estimation to avoid boundary conditions
However, to get parameter estimates that are more easily interpretable, we could normalize the params to the observationally equivalent "standard" form.
e.g. getting rid of negative kappa by adjusting loc
>>> xx = np.linspace(-2*np.pi, 2*np.pi, 21)
>>> -np.cos(xx)
array([-1. , -0.80901699, -0.30901699, 0.30901699, 0.80901699,
1. , 0.80901699, 0.30901699, -0.30901699, -0.80901699,
-1. , -0.80901699, -0.30901699, 0.30901699, 0.80901699,
1. , 0.80901699, 0.30901699, -0.30901699, -0.80901699, -1. ])
>>> np.cos(xx+np.pi)
array([-1. , -0.80901699, -0.30901699, 0.30901699, 0.80901699,
1. , 0.80901699, 0.30901699, -0.30901699, -0.80901699,
-1. , -0.80901699, -0.30901699, 0.30901699, 0.80901699,
1. , 0.80901699, 0.30901699, -0.30901699, -0.80901699, -1. ])
>>> np.cos(xx-np.pi)
array([-1. , -0.80901699, -0.30901699, 0.30901699, 0.80901699,
1. , 0.80901699, 0.30901699, -0.30901699, -0.80901699,
-1. , -0.80901699, -0.30901699, 0.30901699, 0.80901699,
1. , 0.80901699, 0.30901699, -0.30901699, -0.80901699, -1. ])
the histogram misses one slice
I didn't look at the code yet
Some notes: when comparing the results of the two classes:
What causes the last two bullets?
(Commenting on results of $ ./fit_von_mises.py "*.txt" test-data/ -c examples/psets/models.py --spread-data
)
Traceback (most recent call last):
File "./fit_von_mises.py", line 128, in <module>
main()
File "./fit_von_mises.py", line 118, in main
logs, alog = analyze(source, psets, options)
File "./analyses/fit_mixture.py", line 110, in analyze
pl.plot_histogram_comparison(pset.output_dir, res, source, ii)
File "./analyses/plots.py", line 116, in plot_histogram_comparison
ret_sizes=True)
File "./aorta/dist_mixtures/mixture_von_mises.py", line 380, in rvs_mix
size=sizes[ii]))
File "~/software/usr/local/lib/python/dist-packages/scipy/stats/distributions.py", line 627, in rvs
vals = self._rvs(*args)
File "~/software/usr/local/lib/python/dist-packages/scipy/stats/distributions.py", line 5748, in _rvs
return mtrand.vonmises(0.0, b, size=self._size)
File "mtrand.pyx", line 2240, in mtrand.RandomState.vonmises (numpy/random/mtrand/mtrand.c:10433)
File "mtrand.pyx", line 201, in mtrand.cont2_array_sc (numpy/random/mtrand/mtrand.c:1867)
ValueError: negative dimensions are not allowed
(To test failure in PR #10 comments, I'm not able to add a comment there now (?))
I'm getting the same failure, I think I have seen this before.
One addition I wanted to make is to rearrange components in normalize_params, so we get a unique parameterization (instead of permutations).
I thought of sorting by probability, but that can change which is the left-out component in probabilities, the last part of params.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.