eth-cscs / abcpy Goto Github PK

ABCpy package

License: BSD 3-Clause Clear License

Makefile 0.38% Python 99.62%

abcpy's Introduction

ABCpy

ABCpy is a scientific library written in Python for Bayesian uncertainty quantification in absence of likelihood function, which parallelizes existing approximate Bayesian computation (ABC) algorithms and other likelihood-free inference schemes.

Content

ABCpy presently includes the following ABC algorithms:

The above can be used with the following distances:

Euclidean Distance
Logistic Regression and Penalised Logistic Regression (classification accuracy)
Divergences between datasets:

Moreover, we provide the following methods for directly approximating the likelihood functions:

The above likelihood approximation methods can be used with the following samplers:

PMC (Population Monte Carlo)
Metropolis-Hastings MCMC (Markov Chain Monte Carlo)

Additional features are:

plotting utilities for the obtained posterior
several methods for summary selection:
Random Forest Model Selection Scheme

ABCpy addresses the needs of domain scientists and data scientists by providing

a fully modularized framework that is easy to use and easy to extend,
a quick way to integrate your generative model into the framework (from C++, R etc.) and
a non-intrusive, user-friendly way to parallelize inference computations (for your laptop to clusters, supercomputers and AWS)
an intuitive way to perform inference on hierarchical models or more generally on Bayesian networks

Documentation

For more information, check out the

Youtube video presenting the library
Documentation
Examples directory and
Companion paper

Further, we provide a collection of models for which ABCpy has been applied successfully. This is a good place to look at more complicated inference setups.

Quick installation and requirements

ABCpy can be installed from pip:

pip install abcpy

Check here for more details.

Basic requirements are listed in requirements.txt. That also includes packages required for MPI parallelization there, which is very often used. However, we also provide support for parallelization with Apache Spark (see below).

Additional packages are required for additional features:

torch is needed in order to use neural networks to learn summary statistics. It can be installed by running pip install -r requirements/neural_networks_requirements.txt
In order to use Apache Spark for parallelization, findspark and pyspark are required; install them by pip install -r requirements/backend-spark.txt

Troubleshooting `mpi4py` installation

mpi4py requires a working MPI implementation to be installed; check the official docs for more info. On Ubuntu, that can be installed with:

sudo apt-get install libopenmpi-dev

Even when that is present, running pip install mpi4py can sometimes lead to errors. In fact, as specified in the official docs, the mpicc compiler needs to be in the search path. If that is not the case, a workaround is:

env MPICC=/path/to/mpicc pip install mpi4py

In some cases, even the above may not be enough. A possibility is using conda (conda install mpi4py) which usually handles package dependencies better than pip. Alternatively, you can try by installing directly mpi4py from the package manager; in Ubuntu, you can do:

sudo apt install python3-mpi4py

which however does not work with virtual environments.

Author

ABCpy was written by Ritabrata Dutta, Warwick University and Marcel Schoengens, CSCS, ETH Zurich, and presently actively maintained by Lorenzo Pacchiardi, Oxford University and Ritabrata Dutta, Warwick University. Please feel free to submit any bugs or feature requests. We'd also love to hear about your experiences with ABCpy in general. Drop us an email!

We want to thank Prof. Antonietta Mira, Università della svizzera italiana, and Prof. Jukka-Pekka Onnela, Harvard University for helpful contributions and advice; Avinash Ummadisinghu and Nicole Widmern respectively for developing dynamic-MPI backend and making ABCpy suitable for hierarchical models; and finally CSCS (Swiss National Super Computing Center) for their generous support.

Citation

There is a paper in the Journal of Statistical Software. In case you use ABCpy for your publication, we would appreciate a citation. You can use this BibTex reference.

Other References

Publications in which ABCpy was applied:

L. Pacchiardi, R. Dutta. "Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators", 2021, arXiv:2104.03889.
L. Pacchiardi, R. Dutta. "Score Matched Conditional Exponential Families for Likelihood-Free Inference", 2022, Journal of Machine Learning Research 23(38):1−71.
R. Dutta, K. Zouaoui-Boudjeltia, C. Kotsalos, A. Rousseau, D. Ribeiro de Sousa, J. M. Desmet, A. Van Meerhaeghe, A. Mira, and B. Chopard. "Interpretable pathological test for Cardio-vascular disease: Approximate Bayesian computation with distance learning.", 2020, arXiv:2010.06465.
R. Dutta, S. Gomes, D. Kalise, L. Pacchiardi. "Using mobility data in the design of optimal lockdown strategies for the COVID-19 pandemic in England.", 2021, PLOS Computational Biology, 17(8), e1009236.
L. Pacchiardi, P. Künzli, M. Schöngens, B. Chopard, R. Dutta, "Distance-Learning for Approximate Bayesian Computation to Model a Volcanic Eruption", 2021, Sankhya B, 83(1), 288-317.
R. Dutta, J. P. Onnela, A. Mira, "Bayesian Inference of Spreading Processes on Networks", 2018, Proceedings of Royal Society A, 474(2215), 20180129.
R. Dutta, Z. Faidon Brotzakis and A. Mira, "Bayesian Calibration of Force-fields from Experimental Data: TIP4P Water", 2018, Journal of Chemical Physics 149, 154110.
R. Dutta, B. Chopard, J. Lätt, F. Dubois, K. Zouaoui Boudjeltia and A. Mira, "Parameter Estimation of Platelets Deposition: Approximate Bayesian Computation with High Performance Computing", 2018, Frontiers in physiology, 9.
A. Ebert, R. Dutta, K. Mengersen, A. Mira, F. Ruggeri and P. Wu, "Likelihood-free parameter estimation for dynamic queueing networks: case study of passenger flow in an international airport terminal", 2021, Journal of Royal Statistical Society: Series C (Applied Statistics) 70.3: 770-792.

License

ABCpy is published under the BSD 3-clause license, see here.

Contribute

You are very welcome to contribute to ABCpy.

If you want to contribute code, there are a few things to consider:

a good start is to fork the repository
know our branching strategy
use GitHub pull requests to merge your contribution
consider documenting your code according to the NumPy documentation style guide
consider writing reasonable unit tests

abcpy's People

Contributors

Stargazers

Watchers

abcpy's Issues

Deprecation warnings glmnet

from abcpy.distances import LogReg
/users/mvalle/spark/py/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)

Getting lots of ConverganceWarnings and no results when trying to run tutorial

There is a sort of tutorial in the getting started section of the docs;

def tutorial_abcpy():
    mu = abcpy.continuousmodels.Uniform([[150], [200]], name="mu")
    sigma = abcpy.continuousmodels.Uniform([[5], [40]], name="sigma")
    # define the model
    height = abcpy.continuousmodels.Normal([mu, sigma], name='height')
    statistics_calculator = abcpy.statistics.Identity(degree=2, cross=False)
    distance_calculator = abcpy.distances.LogReg(statistics_calculator, seed=42)
    kernal = abcpy.perturbationkernel.DefaultKernel([mu, sigma])
    backend = abcpy.backends.BackendDummy()
    sampler = abcpy.inferences.PMCABC([height], [distance_calculator], backend, kernal, seed=1)
    eps_arr = np.array([.75])
    steps = 100
    height_obs = [160.82499176, 167.24266737, 185.71695756, 153.7045709, 163.40568812, 140.70658699, 169.59102084, 172.81041696, 187.38782738, 179.66358934, 176.63417241, 189.16082803, 181.98288443, 170.18565017, 183.78493886, 166.58387299, 161.9521899, 155.69213073, 156.17867343, 144.51580379, 170.29847515, 197.96767899, 153.36646527, 162.22710198, 158.70012047, 178.53470703, 170.77697743, 164.31392633, 165.88595994, 177.38083686, 146.67058471763457, 179.41946565658628, 238.02751620619537, 206.22458790620766, 220.89530574344568, 221.04082532837026, 142.25301427453394, 261.37656571434275, 171.63761180867033, 210.28121820385866, 237.29130237612236, 175.75558340169619, 224.54340549862235, 197.42448680731226, 165.88273684581381, 166.55094082844519, 229.54308602661584, 222.99844054358519, 185.30223966014586, 152.69149367593846, 206.94372818527413, 256.35498655339154, 165.43140916577741, 250.19273595481803, 148.87781549665536, 223.05547559193792, 230.03418198709608, 146.13611923127021, 138.24716809523139, 179.26755740864527, 141.21704876815426, 170.89587081800852, 222.96391329259626, 188.27229523693822, 202.67075179617672, 211.75963110985992, 217.45423324370509]

    epsilon_percentile = 10
    journal = sampler.sample([height_obs], steps, eps_arr, epsilon_percentile=epsilon_percentile)
    return journal

Running this does not error, but it doesn't seem to complete either. I get;

In [4]: tutorial_abcpy()
INFO:abcpy.inferences:Starting PMC iterations
INFO:abcpy.inferences:Broadcasting parameters
INFO:abcpy.inferences:Resampling parameters
/usr/local/lib/python3.6/dist-packages/sklearn/svm/_base.py:986: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/svm/_base.py:986: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

the convergence warning just keeps repeating. Is there an example I could try to check things are working properly locally?

As an aside, the binder is also not working right now, it gives; ModuleNotFoundError: No module named 'abcpy'.

Issue in `_sample_parameter_statistics` method in `Semiautomatic` class, when using a misspecified statistical model

Consider the case in which a prior with positive probability on the negative numbers is placed on the parameter of a model for which that parameter needs to be positive. It is therefore impossible to generate from the model in such a way.

However, when using the Semiautomatic method on it, a cryptic error is raised, so that for the user is hard to spot the bug. The following small example consider a Gaussian model on which a negative uniform prior is placed on the variance parameter:

from abcpy.backends import BackendDummy as backendD
from abcpy.continuousmodels import Normal, Uniform
from abcpy.statistics import Identity

backend = backendD()

mean = 0
sigma = Uniform([[-1], [0]], name="sigma")
x = Normal([mean, sigma], name="x")

statistics_calculator = Identity(degree=1, cross=False)

from abcpy.summaryselections import Semiautomatic

summary_selection = Semiautomatic([x], statistics_calculator, backend, n_samples=5,
                          n_samples_per_param=1)

The error this returns is:

  File "/homes/pacchiar/Documents/OxWaSP/ABC-project/Code/ABCpy-integration/issue_misspecified_model.py", line 15, in <module>
    summary_selection = Semiautomatic([x], statistics_calculator, backend, n_samples=5, n_samples_per_param=1)
  File "/homes/pacchiar/.local/lib/python3.7/site-packages/abcpy/summaryselections.py", line 73, in __init__
    sample_parameters_statistics_pds = self.backend.map(self._sample_parameter_statistics, rng_pds)
  File "/homes/pacchiar/.local/lib/python3.7/site-packages/abcpy/backends/base.py", line 185, in map
    result_pds = PDSDummy(list(result_map))
  File "/homes/pacchiar/.local/lib/python3.7/site-packages/abcpy/summaryselections.py", line 112, in _sample_parameter_statistics
    return (parameter, statistics)
UnboundLocalError: local variable 'statistics' referenced before assignment

It seems like the Semiautomatic does not check properly the validity of the model, and tries to sample from it anyway. I guess it would be useful to return a more explanatory error message; I believe the issue may be in the graphtools part.

Incompatibility with latest version of `numpy` and `matplotlib`

After installing abcpy manually (not via pip), I encountered issues when running the examples. Specifically, I had the latest versions of numpy (1.25.2) and matplotlib (3.7.2) installed.

Running the examples (mpirun -np 10 --bind-to core python backends/mpi/pmcabc_gaussian.py), I obtain the following error messages:

[...]
INFO:abcpy.inferences:Calculating acceptances threshold
INFO:abcpy.inferences:Calculating weights
INFO:abcpy.inferences:Calculate weights
INFO:abcpy.inferences:Calculating covariance matrix
Traceback (most recent call last):
  File "/home/testuser/abcpy/examples/backends/mpi/pmcabc_gaussian.py", line 118, in <module>
    journal = infer_parameters(backend, logging_level=logging.INFO)
  File "/home/testuser/abcpy/examples/backends/mpi/pmcabc_gaussian.py", line 73, in infer_parameters
    journal = sampler.sample([height_obs], steps, eps_arr, n_sample, n_samples_per_param, epsilon_percentile)
  File "/home/testuser/.pyenv/versions/3.9.1/lib/python3.9/site-packages/abcpy/inferences.py", line 933, in sample
    new_cov_mats = self.kernel.calculate_cov(self.accepted_parameters_manager)
  File "/home/testuser/.pyenv/versions/3.9.1/lib/python3.9/site-packages/abcpy/perturbationkernel.py", line 144, in calculate_cov
    all_covs.append(kernel.calculate_cov(accepted_parameters_manager, kernel_index))
  File "/home/testuser/.pyenv/versions/3.9.1/lib/python3.9/site-packages/abcpy/perturbationkernel.py", line 266, in calculate_cov
    (np.float, np.float32, np.float64, np.int, np.int32, np.int64)):
  File "/home/testuser/.pyenv/versions/3.9.1/lib/python3.9/site-packages/numpy/__init__.py", line 319, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------

and

[...]
[0.00317671]
 [0.00472121]
 [0.00478266]]
{'mu': 177.04224197656742, 'sigma': 14.735985720015503}
(array([[185.31015072,  -1.10226865],
       [ -1.10226865,  33.72395735]]), dict_keys(['mu', 'sigma']))
{'type_model': ['Normal'], 'type_dist_func': ['LogReg'], 'steps': 3, 'epsilon_init': array([0.75]), 'n_samples': 250, 'n_samples_per_param': 10, 'epsilon_percentile': 10, 'covFactor': 2, 'full_output': 0, 'epsilon_arr': [0.75, 0.7402597402597402, 0.7402597402597402]}
Traceback (most recent call last):
  File "/home/testuser/abcpy/examples/backends/mpi/pmcabc_gaussian.py", line 119, in <module>
    analyse_journal(journal)
  File "/home/testuser/abcpy/examples/backends/mpi/pmcabc_gaussian.py", line 91, in analyse_journal
    journal.plot_posterior_distr(path_to_save="posterior.png")
  File "/home/testuser/.pyenv/versions/3.9.1/lib/python3.9/site-packages/abcpy/output.py", line 847, in plot_posterior_distr
    fig, axes = scatterplot_matrix(datat, meanpost, parameters_to_show,
  File "/home/testuser/.pyenv/versions/3.9.1/lib/python3.9/site-packages/abcpy/output.py", line 663, in scatterplot_matrix
    if ax.is_first_col():
AttributeError: 'Axes' object has no attribute 'is_first_col'
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------

Downgrading the packages as

numpy from 1.25.2 to 1.23.5
matplotlib from 3.7.2 to 3.5.3
fixes the problems.

Are there any plans updating the repository to account for the API changes of these major dependencies?

Class duplication

If I understand correctly, there are at least two RSMCABC and SMCABC in abcpy.inferences.
https://abcpy.readthedocs.io/en/latest/_modules/abcpy/inferences.html#SMCABC
I believe this is not on purpose?

MPI required?

Is MPI a requirement for abcpy? I didn't see that in the Installation page. When I attempt to pip install it, I get:

    _configtest.c:2:10: fatal error: 'mpi.h' file not found
    #include <mpi.h>
             ^~~~~~~
    1 error generated.

SABC does not save the right distance in the journal after resampling

NOTE: I will refer to the current master version of the file abcpy/inferences.py

SABC algorithm has 2 distances inside: distances that is the value returned by the Distance class (relevant number when analyzing the results of a simulation) and smooth_distances which is used basically everywhere in the algorithm.

The problem is that the resampling (if at line 1230) is (correctly) done using the smooth_distances but only this value is updated afterwards (line 1236).

This creates the problem that distances does not change during the resampling and, from that moment on, this wrong matrix is saved in the journal at every iteration.

Possible bug fix
Add after line 1236 the following line

distances = distances[index_resampled]

SABC 1d run : dimension mismatch

While running SABC for 1d problems,

1d normal distribution not available
There is a dimension mismatch inside SABC code.

New release including `sklearn`->`scikit-learn` update

Hi!
In order for the update to requirements.txt file made in caf0fd8 to take effect for pip users, a new version of abcpy has to be released to PyPI. Would it be possible to release such a version? Thanks in advance!

lol

SABC returns the same set of parameters in the journal case of full_output=1

I'm running SABC with full_output=1 and the parameters list of arrays in the Journal has, in each re-sample, the same arrays of parameters.
Called a_i the array of parameters generated in the i-th step of the algorithm (i from 1 to #steps), in the case of 5 steps with re-sample after the 2nd the results is

[a_2, a_2, a_5, a_5, a_5]

I run the code in debug and, looking inside the journal.add_parameters function, each step it's called the journal.parameters list becomes:
[a_1]
[a_2, a_2]
[a_2, a_2, a_3]
[a_2, a_2, a_4, a_4]
[a_2, a_2, a_5, a_5, a_5]

Issue with multivariate parameters

I have a problem using a multivariate distribution for the parameters of a model.

The following example is the smallest example I can come up to reproduce my issue:

import numpy as np
from abcpy.backends import BackendDummy as Backend
from abcpy.continuousmodels import (InputConnector, MultivariateNormal,
                                    ProbabilisticModel)
from abcpy.distances import Euclidean
from abcpy.inferences import PMCABC
from abcpy.perturbationkernel import DefaultKernel
from abcpy.statistics import Identity
from numpy.random import multivariate_normal

mu = [0, 0]
cov = [[2, 0.5], [0.5, 2]]

data = multivariate_normal(mu, cov, 100).tolist()

class IdentityModel(ProbabilisticModel):
    def __init__(self, parameters, name):
        parameters = InputConnector.from_list(parameters)
        super().__init__(parameters, name)

    def forward_simulate(self, input_values, k, rng=np.random.RandomState()):
        # Extract the input parameters
        return [input_values]

    def _check_input(self, parameters):
        return True

    def _check_output(self, parameters):
        return True

    def pdf(self, input_values, x):
        raise NotImplementedError

    def get_output_dimension(self):
        return len(self.x)


statistics = Identity(2, False)
distance = Euclidean(statistics)

mv = MultivariateNormal([mu, cov])

model = IdentityModel([mv], "model")
kernel = DefaultKernel([mv])
backend = Backend()


sampler = PMCABC([model], [distance], backend, kernel, seed=1)

T, n_sample, n_samples_per_param = 2, 100, 1

eps_arr = np.array([500])
epsilon_percentile = 10

journal = sampler.sample(
    [data], T, eps_arr, n_sample, n_samples_per_param, epsilon_percentile, full_output=1
)

The error message I'm concerned about is:

$ python multivar.py
Traceback (most recent call last):
  File "multivar.py", line 58, in <module>
    [data], T, eps_arr, n_sample, n_samples_per_param, epsilon_percentile, full_output=1
  File "/Users/uweschmitt/Projects/scripts_uwe/abcpy_examples/venv37/lib/python3.7/site-packages/abcpy/inferences.py", line 488, in sample
    self.accepted_parameters_manager.get_accepted_parameters_bds_values(kernel.models))
  File "/Users/uweschmitt/Projects/scripts_uwe/abcpy_examples/venv37/lib/python3.7/site-packages/abcpy/acceptedparametersmanager.py", line 147, in get_accepted_parameters_bds_values
    accepted_bds_values[i].append(self.accepted_parameters_bds.value()[i][index])
IndexError: index 1 is out of bounds for axis 0 with size 1

Looks like there are some indexing issues for the non-scalar distribution.

Linear or Non-linear Regression

First of all, thanks for releasing such a useful library. I am new to the Bayesian analysis. I am wondering if there is an example of linear or non-linear regression using abcpy? The problem I am trying to solve is a non-linear problem. I have the experimental measurements and a forward model that takes in a lot of parameters as input. The goal is to find the best fitting parameters such that the numerical prediction is similar to the experimental measurements. Therefore, it would be very helpful if there is an example of a regression problem.

`Semiautomatic` summary selection not working for multidimensional parameters

The Semiautomatic class gives an error when it is run on a model with a multidimensional parameter. It does however work when the model has two one-dimensional parameters. Here it is a small piece of code reproducing the error:

from abcpy.backends import BackendDummy as backendD
from abcpy.continuousmodels import MultivariateNormal
from abcpy.statistics import Identity

backend = backendD()

mean = MultivariateNormal([[0, 100], [[1, 0], [0, 1]]], name="mean")
x = MultivariateNormal([mean, [[1, 0], [0, 1]]], name="x")

statistics_calculator = Identity(degree=1, cross=False)

from abcpy.summaryselections import Semiautomatic

summary_selection = Semiautomatic([x], statistics_calculator, backend, n_samples=5, n_samples_per_param=1)

With corresponding errors:

 File "/homes/pacchiar/Documents/OxWaSP/ABC-project/Code/ABCpy-integration/issue _multidim_param.py", line 14, in <module>
    summary_selection = Semiautomatic([x], statistics_calculator, backend, n_samples=5, n_samples_per_param=1)
  File "/homes/pacchiar/.local/lib/python3.7/site-packages/abcpy/summaryselections.py", line 84, in __init__
    self.coefficients_learnt[ind, :] = regr.coef_
ValueError: could not broadcast input array from shape (2,2) into shape (2)

I guess the error is given by an issue with the shape of the arrays that are used to fit the linear regression model. In fact, adding the following line after line 78 in summaryselections.py fixes it:

      sample_parameters = sample_parameters.reshape((n_samples, -1))

Example binders don't work

None of the two example binder-notebooks don't work for me.

Spack package

Hi,

I'm trying to make this package spack installable, see spack/spack#25713. While doing so, I wonder why:

sklearn is in requirements.txt, isn't that deprecated long ago?
sphinx, sphinx_rtd_theme and coverage are listed in requirements.txt, these are development dependencies? Can they be moved to a separate dev-requirements.txt or something?
there aren't really lower and upper bounds for dependencies in requirements.txt, what if dependencies make breaking changes?

Thanks!

Exception: Failed to find libgfortran.3.dylib

Hi,

I was interested in trying out the abcpy, but run into an error when trying to pip install, which I wanted to share with you. I’m using macOS High Siera (10.13.6). The output I receive you find underneath. I also ran a locate to see if I could find the dynamic library, which I did, and also attached the output displaying this result.

best regards,
Michiel

(venv) Zarathustra:allosteric-inference-master Zarathustra$ sudo -H pip3 install abcpy
Password:
Collecting abcpy
  Downloading https://files.pythonhosted.org/packages/10/db/b399869792c81bb6046a43282c9cfa8be33db52d1b1876a0eb93a82cb38d/abcpy-0.5.2-py3-none-any.whl (83kB)
    100% |████████████████████████████████| 92kB 2.7MB/s 
Collecting coverage (from abcpy)
  Downloading https://files.pythonhosted.org/packages/a3/7e/c94c21d643bfe7017615994df7b52292a33c8dcf36a6f694af110594edba/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl (178kB)
    100% |████████████████████████████████| 184kB 10.4MB/s 
Requirement already satisfied: scipy in ./venv/lib/python3.6/site-packages (from abcpy) (1.1.0)
Collecting sphinx-rtd-theme (from abcpy)
  Downloading https://files.pythonhosted.org/packages/87/30/7460f7b77b6e8a080dd3688f750fe5d5666c49358f8941449c5b128fa97d/sphinx_rtd_theme-0.4.1-py2.py3-none-any.whl (5.4MB)
    100% |████████████████████████████████| 5.4MB 3.6MB/s 
Collecting sklearn (from abcpy)
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting sphinx==1.4.8 (from abcpy)
  Downloading https://files.pythonhosted.org/packages/b5/db/a93672c16532ee4066cebe47f9a72e774c9be44b976110a7a16cecc016fd/Sphinx-1.4.8-py2.py3-none-any.whl (1.6MB)
    100% |████████████████████████████████| 1.6MB 6.7MB/s 
Collecting glmnet==2.0.0 (from abcpy)
  Downloading https://files.pythonhosted.org/packages/c7/97/6f92f20fc193478c5d5927396c8d691abbdaa7774fd67e8a08fdeb1a2470/glmnet-2.0.0.tar.gz (102kB)
    100% |████████████████████████████████| 112kB 16.7MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/tmp/pip-install-1bcfpco3/glmnet/setup.py", line 38, in <module>
        GFORTRAN_LIB = get_lib_dir('libgfortran.3.dylib')
      File "/private/tmp/pip-install-1bcfpco3/glmnet/setup.py", line 30, in get_lib_dir
        raise Exception("Failed to find {}".format(dylib))
    Exception: Failed to find libgfortran.3.dylib
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-install-1bcfpco3/glmnet/

(venv) Zarathustra:allosteric-inference-master Zarathustra$ locate libgfortran.3.dylib
/Applications/MATLAB_R2017b.app/sys/os/maci64/libgfortran.3.dylib
/Applications/Tellurium.app/Contents/Resources/telocal/python-3.6.3/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libgfortran.3.dylib
/Users/Zarathustra/ETH/git_repo/allosteric-inference-master/venv/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/python_server/tutorial/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/venv/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/venv/python_server/tutorial/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/miniconda2/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda2/pkgs/libgfortran-3.0.1-h93005f0_2/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda3/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda3/pkgs/libgfortran-3.0.1-h93005f0_2/lib/libgfortran.3.dylib
/usr/local/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib

abcpy installation issue

I failed to install abcpy package and absolutely have no ideas about the causes.
During installation the 200-lines log file was generated, but I cannot reveal the bugs.
E.g., there is the "error: Microsoft Visual C++ 14.0 or greater is required", but in fact Visual Studio 2019 Developer v16.11.3 is installed.
I attached the log below. Maybe, it will help to hunt down a question.
Thanks in advance.
ABCpy.txt

error: extension '_glmnet' has Fortran sources but no Fortran compiler found

Hey so I was installing glmnet by using pip install glmnet, also tried by using the wheel file. I'm getting the following error:
Building wheel for glmnet (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: '..\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'..\pip-install-z85qz6m4\glmnet_7f7a7f35ff654772862ef9f8358e2a4e\setup.py'"'"'; file='"'"'..\pip-install-z85qz6m4\glmnet_7f7a7f35ff654772862ef9f8358e2a4e\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\LENOVO~1\AppData\Local\Temp\pip-wheel-_0ejym25'
cwd: ..\pip-install-z85qz6m4\glmnet_7f7a7f35ff654772862ef9f8358e2a4e
Complete output (56 lines):
running bdist_wheel
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "glmnet" sources
f2py options: []
adding 'build\src.win-amd64-3.6\build\src.win-amd64-3.6\glmnet\fortranobject.c' to sources.
adding 'build\src.win-amd64-3.6\build\src.win-amd64-3.6\glmnet' to include_dirs.
build_src: building npy-pkg config files
running build_py
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\glmnet
copying glmnet\doc.py -> build\lib.win-amd64-3.6\glmnet
copying glmnet\errors.py -> build\lib.win-amd64-3.6\glmnet
copying glmnet\linear.py -> build\lib.win-amd64-3.6\glmnet
copying glmnet\logistic.py -> build\lib.win-amd64-3.6\glmnet
copying glmnet\scorer.py -> build\lib.win-amd64-3.6\glmnet
copying glmnet\util.py -> build\lib.win-amd64-3.6\glmnet
copying glmnet_init.py -> build\lib.win-amd64-3.6\glmnet
running build_ext
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
customize MSVCCompiler using build_ext
get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
customize GnuFCompiler
Could not locate executable g77
Could not locate executable f77
customize IntelVisualFCompiler
Could not locate executable ifort
Could not locate executable ifl
customize AbsoftFCompiler
Could not locate executable f90
customize CompaqVisualFCompiler
Could not locate executable DF
customize IntelItaniumVisualFCompiler
Could not locate executable efl
customize Gnu95FCompiler
Could not locate executable gfortran
Could not locate executable f95
customize G95FCompiler
Could not locate executable g95
customize IntelEM64VisualFCompiler
customize IntelEM64TFCompiler
Could not locate executable efort
Could not locate executable efc
customize PGroupFlangCompiler
Could not locate executable flang
don't know how to compile Fortran code on platform 'nt'
warning: build_ext: f77_compiler=None is not available.

building '_glmnet' extension
error: extension '_glmnet' has Fortran sources but no Fortran compiler found

ERROR: Failed building wheel for glmnet
Running setup.py clean for glmnet
Failed to build glmnet
Installing collected packages: glmnet
Running setup.py install for glmnet ... error
ERROR: Command errored out with exit status 1:

 building '_glmnet' extension
error: extension '_glmnet' has Fortran sources but no Fortran compiler found

Can you help me with this?

sklearn depreciation

Many thanks for this excellent package!

sklearn is being depreciated according to this depreciation schedule. It is currently in the rep's requirements.txt file and can cause build errors (depending on what time the package is installed, re: depreciation schedule). It would be great if this could be removed from the requirements and sklearn imports ported to scikit-learn.

Cheers!
Joe

TypeError (actually more like a NameError?) in inferences.py

I am trying to start the process from a journal file read in from disk, and getting a TypeError from inferences.py.

So earlier in the code accepted_cov_mats=None, then in these lines (435-438);

                new_cov_mats = self.kernel.calculate_cov(self.accepted_parameters_manager)
                # Since each entry of new_cov_mats is a numpy array, we can multiply like this
                # accepted_cov_mats = [covFactor * new_cov_mat for new_cov_mat in new_cov_mats]
                accepted_cov_mats = self._compute_accepted_cov_mats(covFactor, accepted_cov_mats)

This throws a TypeError, as None is not valid for the second parameter of self._compute_accepted_cov_mats.
Once you go into self._compute_accepted_cov_mats the second parameter is named new_cov_mats.
Should the third line be uncommented, or the second parameter accepted_cov_mats actually be new_cov_mats?

Best,
Henry

get_parameters() vs. parameters[-1]

Hi there, I've got an issue with the function get_parameters(). I am inferring three parameters (a1, a2, eps2) that are initially uniformly distributed in given intervals.
I run the following:

import numpy as np
import statistics
import time
from IterMapModel import IterMap

# Load data
data = [np.loadtxt('synth_dat_test.dat',dtype=float)]

# Define Graphical Model
from abcpy.continuousmodels import Uniform
lp = len(data[0])
a1 = Uniform([[1.3], [1.7]], name='a1')
a2 = Uniform([[1.6], [2.1]], name='a2')
eps2 = Uniform([[0.2], [0.5]], name='eps2')
timeseries = IterMap([lp, a1,a2,eps2], name = 'timeseries')

# Define backend
from abcpy.backends import BackendDummy as Backend
backend = Backend()

# Define Statistics
from abcpy.statistics import Identity
statistics_calculator = Identity(degree=2, cross=False)

from abcpy.summaryselections import Semiautomatic
summary_selection = Semiautomatic([timeseries], statistics_calculator, backend, n_samples = 10, n_samples_per_param = 1, seed = 1)
statistics_calculator.statistics = lambda x, f2 = summary_selection.transformation, f1 = statistics_calculator.statistics: f2(f1(x))

# Define distance
from abcpy.distances import Euclidean
distance_calculator = Euclidean(statistics_calculator)

# Define kernel
from abcpy.perturbationkernel import DefaultKernel
kernel = DefaultKernel([a1,a2,eps2])

# SABC ##
tsabc=time.time()
from abcpy.inferences import SABC
sampler = SABC([timeseries], [distance_calculator], backend, kernel, seed = 1)
steps, epsilon, n_samples, n_samples_per_param = 5, 100, 20, 1
tstart = time.time()
journal_sabc = sampler.sample([timeseries_obs], steps, epsilon, n_samples, n_samples_per_param,full_output=0)

samples = journal_sabc.parameters[-1]
#### Posterior sample for parameter a1 obtained using parameters[-1] #### 
print("Parameters[-1] gives for parameter a1: ")
print(samples[:,0])
print("Posterior mean from parameters[-1] is ")
print(statistics.mean(samples[:,0]))
#### Posterior sample for parameter a1 obtained using get_parameters() #### 
a1_post_sample, a2_post_sample, eps2_post_sample, post_weights = \
    np.array(journal_sabc.get_parameters()['a1']), np.array(journal_sabc.get_parameters()['a2']), \
    np.array(journal_sabc.get_parameters()['eps2']), np.array(journal_sabc.get_weights())
print("Get_parameters() gives for parameter a1: ")
print(a1_post_sample.flatten())
print("Posterior mean from get_parameters() is ")
print(statistics.mean(a1_post_sample.flatten()))

print("Posterior_mean() gives for a1: ")
print(journal_sabc.posterior_mean()[0])

The output I get is as follows,

Parameters[-1] gives for parameter a1: 
[1.68659718 1.39658876 1.50724835 1.53917481 1.30862889 1.50724835
 1.39658876 1.68659718 1.62423885 1.48039411 1.6568104  1.39658876
 1.62423885 1.50724835 1.41777341 1.68130142 1.6568104  1.39658876
 1.39658876 1.63262029]
Posterior mean from parameters[-1] is 
1.5249937330687782
Get_parameters() gives for parameter a1: 
[1.41777341 1.68130142 1.49302175 1.50724835 1.48039411 1.53917481
 1.63262029 1.50282747 1.44300723 1.6568104  1.47250143 1.39658876
 1.49733069 1.57906456 1.30862889 1.66946639 1.62423885 1.65467304
 1.38761389 1.68659718]
Posterior mean from get_parameters() is 
1.5315441460199923
Posterior_mean() gives for a1: 
1.5249937330687782

The posterior parameter samples obtained with parameters[-1] DIFFER from the parameter samples obtained with the function get_parameters().
IMPORTANT: this difference exists only if 'full_output' = 0 AND 'steps' is odd (3,5,7,...). However, if I set 'full_output' = 1 OR 'steps' is even (2,4,6,...) then the two posterior parameter samples coincide.

Is there any explanation for this "odd" behaviour?

The mean value of the parameter samples obtained with parameter[-1] is the same as the mean provided by the function posterior_mean(). It seems that, while parameters[-1] behaves correctly, there might be some problems with get_parameters().

I would appreciate some help!
Thanks a lot!

Guide for windows users to install abcpy

To start with, appreciate the eth-cscs team's excellent work on abcpy. I have been struggling for installing abcpy on windows for a real long time. For windows users, the installation problem here is that abcpy requires glmnet, and glmnet needs a Fortran compiler. One method mentioned by eth-cscs team is installing abcpy without glmnet, see link #97 (comment). Here I share my successful experience on installing abcpy with glmnet. I use conda to manage my environment.

Step One: Install mingw-64. (It has fortran compiler)

conda install -c conda-forge prophet

I installed prophet package before, it helps me install mingw64.
You can install mingw-64 in conda by your own way.

Step Two: Install mpi4py. Microsoft MPI

conda install -c conda-forge mpi4py msmpi

Step Three. Install glmnet and abcpy

pip install glmnet abcpy

You will successfully compile and install glmnet, also succefully install abcpy. However, you still can not import glmnet succefully. you will meet.

Error DLL load failed while importing _glmnet

You need last step.

Step Four: Copy DLL to path

find the dll Anaconda\Lib\site-packages\_glmnet\.lib\*.dll and copy to Anaconda\Lib\site-packages.

restarting SABC run from a saved journal file

Dear abcpy-team,

I have some trouble restarting the SABC algorithm after it was saved to Journal file. The error message is short so I post it below:

INFO:abcpy.inferences:Broadcasting parameters
INFO:abcpy.inferences:Initial accepted parameters
Traceback (most recent call last):
  File "./ln_like_abcpy.py", line 498, in <module>
    main_spitzer()
  File "./ln_like_abcpy.py", line 496, in main_spitzer
    infer_parameters_spitzer(obs, backend=backend, type='multi', steps=250, n_sample=2000, n_samples_per_param=8, epsilon=30000., epsilon_percentile=None)
  File "./ln_like_abcpy.py", line 439, in infer_parameters_spitzer
    journal = sampler.sample([obs], steps, epsilon, n_sample, n_samples_per_param, beta=2, delta=0.2, v=0.3, full_output=True, ar_cutoff=0.00001, journal_file='/u/sfellenberg/workspace/Python/periodicty/abcpy_runs/spitzer_journal_model2.obj')
  File "/u/sfellenberg/.local/lib/python3.8/site-packages/abcpy/inferences.py", line 1705, in sample
    params_and_dists = self.backend.collect(params_and_dists_pds)
  File "/u/sfellenberg/.local/lib/python3.8/site-packages/abcpy/backends/mpi.py", line 336, in collect
    raise item
AttributeError: Exception occured while calling the map function _accept_parameter: 'NoneType' object has no attribute 'value'
Exception ignored in: <function BackendMPIScheduler.__del__ at 0x151f70766670>
Traceback (most recent call last):
  File "/u/sfellenberg/.local/lib/python3.8/site-packages/abcpy/backends/mpi.py", line 404, in __del__
AttributeError: 'NoneType' object has no attribute 'Finalize'

The error is only raised once the journal_file keyword is specified, without it works fine and it generated the file I specified. The journal file it self shows no obvious corruptions.