Giter Site home page Giter Site logo

dist_mixtures's Introduction

dist_mixtures
=============

A package for fitting histograms of spatial orientations by a mixture of von
Mises distributions.

License: New BSD License, see the LICENSE file.

Data
----

This package has been developed to analyze muscle cells directions data coming
from aortic segments. The related data files can be found at [1].

[1] https://github.com/rc/dist_mixtures_data

Installation
------------

Download the source code from [1].

[1] https://github.com/rc/dist_mixtures

The following dependencies need to be installed:

- NumPy (http://numpy.org)
- SciPy (http://scipy.org) - Git version
- Matplotlib (http://matplotlib.sourceforge.net)
- Statsmodels (http://statsmodels.sourceforge.net) - Git version

Then ``cd`` into the ``dist_mixtures/`` directory.

Usage
-----

Run ``fit_von_mises.py`` without arguments to see the help message::

  $ ./fit_von_mises.py
  Usage: fit_von_mises.py [options] pattern data_dir

  Fit data files with names matching a given pattern by a mixture of von Mises
  distributions.

  Options:
    --version             show program's version number and exit
    -h, --help            show this help message and exit
    -o dirname, --output-dir=dirname
                          output directory [default: output]
    -c filename, --conf=filename
                          use configuration file with parameter sets. Ignored,
                          if n_components option is given.
    -n positive_int, --n-components=positive_int
                          number of components of the mixture [default: 2]
    -p kappa0,mu0,kappa1,mu1,..., --parameters=kappa0,mu0,kappa1,mu1,...
                          initial guess of von Mises parameters for each
                          component as a comma separated list, e.g., for two
                          components: "1,0,1,0" corresponding to kappa0, mu0,
                          kappa1, mu1 respectively. The location parameter mu
                          should be given in degrees in [-90, 90[.
    -d pattern, --dir-pattern=pattern
                          pattern that subdirectories should match [default: *]
    -m positive_int, --merge-bins=positive_int
                          number of consecutive bins in data to merge [default:
                          None]
    --plot-bins-step=positive_int
                          step to choose bins from all bins for histogram plots
                          [default: 4]
    --spread-data         spread raw data using their counts instead of just
                          repeating them
    -a, --area-angles     compute and draw angles of two systems of fibres
                          determined by equal histogram areas
    --no-neg-shift        do not add 180 degrees to negative angles
    -s, --show            show the figures

Example runs
------------

By default, the results are stored into a directory called ``output``. Use
``-o`` option to change that.

- Analyze histograms in ``*.txt`` files of data sets in ``data/<dataset name>``
  directories, assume 2 von Mises mixture (VMM) components::

    $ ./fit_von_mises.py "*.txt" data/ -n 2

- Run analysis with parameters given in a file. This example shows how to run
  the analysis with varying number of VMM components::

    $ ./fit_von_mises.py "*.txt" data/ -c examples/psets/n_components.py

  See other example parameter set files in ``examples/psets/``.

dist_mixtures's People

Contributors

josef-pkt avatar rc avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

josef-pkt

dist_mixtures's Issues

Merge 'updates' branch

Merge 'updates' branch to 'sensitivity' branch, add Struct class to parents of other classes. This would allow:

ipdb> print res.model
VonMisesMixture
  _data_attr:
    list: ['exog', 'endog', 'data.exog', 'data.endog', 'data.orig_endog', 'data.orig_exog']
  confint_dist:
    <scipy.stats.distributions.norm_gen object at 0x29343d0>
  data:
    <statsmodels.base.data.ModelData object at 0x38a4d10>
  endog:
    (40097,) array of float64
  exog:
    None
  k_constant:
    0

that is one would see immediately which attributes objects have.

fix coding (excessive LFs in dist_mixtures/mixture_von_mises.py)

In the sensitivity branch:

$ file dist_mixtures/mixture_von_mises.py
dist_mixtures/mixture_von_mises.py: Python script, ASCII text executable, with CRLF, LF line terminators

Originally, it was:

$ file dist_mixtures/mixture_von_mises.py
dist_mixtures/mixture_von_mises.py: Python script, ASCII text executable, with CRLF line terminators

Maybe autocrlf [1] setting needs to be fixed?

[1] http://stackoverflow.com/questions/4181870/git-on-windows-what-do-the-crlf-settings-mean

design: convert endog, exog to float ?

I just ran into a bug with fit_ls for a test case when endog, which is currently the count, is int (int32 returned by np.histogram).

It might be better to convert the data in the models to float if they are not already, since there might be other, less obvious problems with other dtypes.

(not a problem with the current data which is float64)

differences between nm and bfgs solutions

I am running the code with the test data, looking for two components as follows:

$ ./fit_von_mises.py "*.txt" test-data oo  -s

When nm is used, the two components obtained are:

dist0: shape=3.7856, loc=-3.1304, prob=0.5117
dist1: shape=1.7051, loc=3.0787, prob=0.4883

And with bfgs:

dist0: shape=2.3769, loc=3.1221, prob=0.5000
dist1: shape=2.3769, loc=3.1221, prob=0.5000

The bfgs result seems strange - can you reproduce it?

problem in vonmises cdf of scipy.stats

It looks like the vonmises.cdf doesn't work correctly if loc is different from zero.
I think it doesn't shift the support correctly.

I tried to use stats.vonmises.cdf for the chisquare test, but get negative cdf values.

Verify the kind of distribution

I may have asked this already, but (my colleagues ask...): Is there a way of saying (a statistical test with some p value?) that our data come

a) from a single
b) from a mixture of von Mises distributions?

I am sceptical about b), as given a sufficient count of mixture components, anything can be fitted. I tried 4, and got perfect fits for even rather irregular data.

The question is related to the hypothesis "there are two preferential direction symmetric w.r.t. axis (axis = e.g. the location of a single von Mises)" - I think this can be tested only under assumption of the von Mises mixture. Then even if rejected, in reality there can be two preferential direction symmetric w.r.t. axis, but with a different distribution, which cannot be verified. What do you think?

binning

the original data is binned, so we have a discretized distribution

I don't think the approximation error is large for the estimation

>>> plt.plot(stats.vonmises.pdf(xxpi-d/2., 4)[1:])
[<matplotlib.lines.Line2D object at 0x0DAE1E10>]
>>> plt.show()
>>> np.max(np.abs(np.diff(stats.vonmises.cdf(xxpi, 4)) / d -  stats.vonmises.pdf(xxpi, 4)[1:]))
0.015765796647894514
>>> np.max(np.abs(np.diff(stats.vonmises.cdf(xxpi, 4)) / d -  stats.vonmises.pdf(xxpi-d/2., 4)[1:]))
0.00015579844725244207
>>> np.max(np.abs(np.diff(stats.vonmises.cdf(xxpi, 4)) / d /  stats.vonmises.pdf(xxpi-d/2., 4)[1:] -1))
0.0008251767346727501

Note: cdf in cython looks pretty fast

some possibilities:

  • computational: we could take advantage that we only have 180 unique values
  • goodness of fit:
    • chi-square test could be almost directly applied
    • Watson (Anderson-Darling/Von Mises statistic): I need to check, it should work especially since
      we need bootstrap p-values anyway.
    • parametric bootstrap: we might have to apply the binning to mimic the data sampling/binning
  • alternative estimators that take binning into account: maybe just as sensitivity check
    My guess is that it wouldn't change the results but would be computationally more direct.
    Disadvantage: We don't get standard maximum likelihood statistics and inference.

normalize params

Because of the periodicity of cosine, the parameters are not globally unique, but they are locally identified.

for example kappa only needs to be positive
loc only needs to be in interval (-pi, pi) (?)

We don't impose this during estimation to avoid boundary conditions
However, to get parameter estimates that are more easily interpretable, we could normalize the params to the observationally equivalent "standard" form.

e.g. getting rid of negative kappa by adjusting loc

>>> xx = np.linspace(-2*np.pi, 2*np.pi, 21)
>>> -np.cos(xx)
array([-1.        , -0.80901699, -0.30901699,  0.30901699,  0.80901699,
        1.        ,  0.80901699,  0.30901699, -0.30901699, -0.80901699,
       -1.        , -0.80901699, -0.30901699,  0.30901699,  0.80901699,
        1.        ,  0.80901699,  0.30901699, -0.30901699, -0.80901699, -1.        ])
>>> np.cos(xx+np.pi)
array([-1.        , -0.80901699, -0.30901699,  0.30901699,  0.80901699,
        1.        ,  0.80901699,  0.30901699, -0.30901699, -0.80901699,
       -1.        , -0.80901699, -0.30901699,  0.30901699,  0.80901699,
        1.        ,  0.80901699,  0.30901699, -0.30901699, -0.80901699, -1.        ])
>>> np.cos(xx-np.pi)
array([-1.        , -0.80901699, -0.30901699,  0.30901699,  0.80901699,
        1.        ,  0.80901699,  0.30901699, -0.30901699, -0.80901699,
       -1.        , -0.80901699, -0.30901699,  0.30901699,  0.80901699,
        1.        ,  0.80901699,  0.30901699, -0.30901699, -0.80901699, -1.        ])

VonMisesMixtureBinned vs. VonMisesMixture

Some notes: when comparing the results of the two classes:

  • the fits are very similar, but the pdf plots plots slightly shifted by a "half of histogram bin width"
  • this shift can be removed by using the --spread-data option,
  • then the fits are visually almost the same, with VonMisesMixture looking slightly better.
  • aic and bic are much higher for VonMisesMixtureBinned
  • p-value is 0 for VonMisesMixtureBinned, 1 for VonMisesMixture

What causes the last two bullets?

(Commenting on results of $ ./fit_von_mises.py "*.txt" test-data/ -c examples/psets/models.py --spread-data)

VonMisesMixtureBinned does not work with statsmodels from git

Traceback (most recent call last):
  File "./fit_von_mises.py", line 128, in <module>
    main()
  File "./fit_von_mises.py", line 118, in main
    logs, alog = analyze(source, psets, options)
  File "./analyses/fit_mixture.py", line 110, in analyze
    pl.plot_histogram_comparison(pset.output_dir, res, source, ii)
  File "./analyses/plots.py", line 116, in plot_histogram_comparison
    ret_sizes=True)
  File "./aorta/dist_mixtures/mixture_von_mises.py", line 380, in rvs_mix
    size=sizes[ii]))
  File "~/software/usr/local/lib/python/dist-packages/scipy/stats/distributions.py", line 627, in rvs
    vals = self._rvs(*args)
  File "~/software/usr/local/lib/python/dist-packages/scipy/stats/distributions.py", line 5748, in _rvs
    return mtrand.vonmises(0.0, b, size=self._size)
  File "mtrand.pyx", line 2240, in mtrand.RandomState.vonmises (numpy/random/mtrand/mtrand.c:10433)
  File "mtrand.pyx", line 201, in mtrand.cont2_array_sc (numpy/random/mtrand/mtrand.c:1867)
ValueError: negative dimensions are not allowed

normalize_params reorder components in params

(To test failure in PR #10 comments, I'm not able to add a comment there now (?))

I'm getting the same failure, I think I have seen this before.
One addition I wanted to make is to rearrange components in normalize_params, so we get a unique parameterization (instead of permutations).

I thought of sorting by probability, but that can change which is the left-out component in probabilities, the last part of params.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.