Giter Site home page Giter Site logo

nuclear-verification-and-disarmament / bicyclus Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 866 KB

Bicyclus, a Bayesian Inference module for Cyclus

License: BSD 3-Clause "New" or "Revised" License

Python 99.73% Shell 0.27%
bayesian-inference pymc cyclus nuclear-fuel-cycle

bicyclus's Introduction

Bicyclus

Code style: black

Bicyclus is a Bayesian Inference module for Cyclus. Specifically, it is an interface written in Python3 that connects Cyclus, a nuclear fuel simulator, with PyMC (formerly PyMC3), a Python package for Bayesian statistics using Markov chain Monte Carlo algorithms.

Bicyclus, Bicyclus, Bicyclus
I want to use my Bicyclus
I want to use Cyclus
I want to infer with Cyclus
I want to infer what I like.

Freddie Mercury (somewhen, somewhere, probably)

Features

  • Run inference processes on nuclear fuel cycles simulated with Cyclus.
  • Store quantities of interest and simulation output that are not part of the inference process, in separate files. These can be used for later analysis.
  • Use PyMC distributions as priors (WIP, currently restricted to a subset of distributions).
  • Features utility functions for post-processing and plotting posterior distributions.

How-to

Requirements

The following table lists the software requirements. All dependencies listed below will be installed automatically through pip3 except for Cyclus, which the user must install themself.

Name Tested with version Notes
Python 3.10.6
PyMC 4.2.0 must be used with PyMC v4.x or larger
Aesara 2.8.2 only for PyMC v4.x
PyTensor only for PyMC v5.x or larger
Arviz 0.12.1
NumPy 1.23.3
Scipy 1.9.1
Pandas 1.5.0 only for plotting
matplotlib 3.6.0 only for plotting
seaborn 0.12.0 only for plotting
Cyclus 1.5.5-59-gb1a858e3 Must be installed by the user

Installation

Install Bicyclus and all dependencies.

$ git clone https://github.com/Nuclear-Verification-and-Disarmament/bicyclus.git
$ cd bicyclus
$ pip3 install ".[plotting]"

If you do not want to install dependencies needed for plotting, run the following commands:

$ git clone https://github.com/Nuclear-Verification-and-Disarmament/bicyclus.git
$ cd bicyclus
$ pip3 install .

You can still run the inference process, but features from bicyclus/visualize may not be available.

Tutorial

Inference mode

A minimum working example (MWE) can be found in the examples directory. At the moment, Bicyclus can be used through a driver script that has to be written on a case-by-case basis. However, the MWE provided should be a good starting point for any new driver script.

Forward mode

Bicyclus can be used to perform large-scale forward simulations with Cyclus, which can be useful, e.g., to perform sensity analyses. This so-called forward mode uses Quasi Monte Carlo sampling (specifically, Sobol sequences) to efficiently sample the input parameter space. Furthermore, the driver script and file structure used in the inference mode can largely be reused here.

An MWE might be provided at a later stage.

Pitfalls

  • Depending on the runtime of one Cyclus run, subprocess's timeout value has to be adapted. It is defined in bicyclus/blackbox/blackbox.py (CyclusCliModel.simulate) and is currently set to 300 seconds (as of September 2022).

Legacy code

Originally, this work was developed as part of a Bachelor's thesis by Lewin Bormann. If you are interested in the git blame and log beyond this repository's initial commit, please visit the original repository. That repository also contains two applications of complex nuclear fuel cycles and reconstruction scenarios.

Contributing: Usage of pre-commit hooks

We follow the Black code style. Run the following command from the root directory to enable use of the pre-commit hook. This will automatically run black when comitting and thus will ensure proper formatting of the committed code.

$ git config --local core.hooksPath .githooks/

bicyclus's People

Contributors

maxschalz avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

maxschalz

bicyclus's Issues

Add deterministic random start value generator

This would be part of the util subpackage and could look something like this:

def generate_start_value(sample_parameters, n=1):
    """Generate random start values. Ensure that the random generator has been deterministically seeded!"""
    def genstart():
        d = {}
        for (k, v) in sample_parameters.items():
            if type(v) is list:
                d[k] = np.random.random() * (v[1] - v[0]) + v[0]
            elif type(v) is dict:
                if v['type'] == 'Normal':
                    d[k] = (np.random.randn()*v['sigma']) + v['mu']
                elif v['type'] == 'TruncatedNormal':
                    d[k] = pm.TruncatedNormal.dist(**{p: pv for (p, pv) in v.items() if p != 'type'}).random()
                else:
                    assert False, f'Unknown parameter type {k} => {v}'
            else:
                assert False, f'Unknown parameter type {k} => {v}'
        return d

    if n == 1:
        return genstart()
    return [genstart() for i in range(0, n)]

Taken from Lewin Bormann's Bayesian Cycle. Note that pm.XY.dist.random() is PyMC3 and must be replaced by pm.draw(pm.XY.dist(...)), see PyMC docs.

If this is added, it is crucial that reproducibility is tested! AFAIK there's the possibility to pass a seed keyword to pm.draw. I should check if this is needed, or if I can fix the seeds before.

Use updated likelihood calcuations, e.g., in examples

Do not pymc's eval function but rather calculate the log likelihood of, e.g., a Normal distribution either by hand or using scipy, both of which are much faster because PyMC does some lengthy stuff in the background.

Allow non-uniform distributions in forward mode

Should be relatively easy to implement. Before doing this, I should investigate if this interferes with Sobol's efficient coverage of the parameter space. If so, a warning should be emitted.

Relevant code snippet:

for param in parameters.values():
if param["type"] != "Uniform":
msg = (
"At the moment, parameter distributions must be Uniform"
" distributions."
)
raise ValueError(msg)

visualize/merge.py: Investigate Pandas VisibleDeprecationWarning

See warning below:

python3 ../bicyclus/visualize/merge.py  data/cyclus_trace_BicyclusExample_BicyclusExample_29979821_000

2022-09-23 11:37:03.507647 :: running with Namespace(infiles=['data/cyclus_trace_BicyclusExample_BicyclusExample_29980243_000_0000.cdf'], trace_dir='/max/data/', jobid=-1, json=False, trace_plot_file='plot_merge_trace.{format}', density_plot_file='plot_merge_density.{format}', histogram_plot_file='plot_merge_histogram.{format}', hist_kind='kde', format='png', combined=False, outcdf='merged.cdf', hist_vars=None, dim='chain')
2022-09-23 11:37:03.620978 :: <xarray.Dataset>
Dimensions:     (chain: 4, draw: 200)
Coordinates:
  * chain       (chain) int64 0 1 2 3
  * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 192 193 194 195 196 197 198 199
Data variables:
    feed_assay  (chain, draw) float64 ...
Attributes:
    created_at:                 2022-09-23T09:36:00.598989
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.2.0
    sampling_time:              2255.8700861930847
    tuning_steps:               100
2022-09-23 11:37:03.622291 :: Available posterior distributions: ['feed_assay']
<xarray.Dataset>
Dimensions:     (chain: 4, draw: 200)
Coordinates:
  * chain       (chain) int64 0 1 2 3
  * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 192 193 194 195 196 197 198 199
Data variables:
    feed_assay  (chain, draw) float64 ...
Attributes:
    created_at:                 2022-09-23T09:36:00.598989
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.2.0
    sampling_time:              2255.8700861930847
    tuning_steps:               100
             mean   sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
feed_assay  0.007  0.0   0.007    0.007        0.0      0.0     705.0     513.0   1.01
2022-09-23 11:37:03.674212 :: Plotting trace
2022-09-23 11:37:04.374128 :: Plotting density
2022-09-23 11:37:04.556923 :: Plotting all histograms in array for job -1
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)

Possible future features

  • Add maximum number of iterations to Slice sampler.
    I think this is a good idea, however what should happen if that limit gets exceeded?
    Simply raise an error and kill the process or restart with another initial value?

    This question is irrelevant inasmuch as the maximum number of iterations of Slice is not part of Bicyclus.
    This must be implemented in the user's driver script.
  • Improve class structure somehow?
  • Copy useful functions from Pakistan/Elbonia case study into bicyclus?
  • Add examples and minimum working example. Remember to take into account #3
  • Should we redesign the logging function? AFAIK, it currently opens a file stream but does not close it properly. See write_to_log_file(...) in log.py. One simple way to do this could be to use a with open() context manager in log_print?
  • Allow to use either PyMC4 or PyMC5? Related to #27

Update likelihood file

Notably:

  • Make blackbox.LikelihoodFunction an abstract base class
    class LikelihoodFunction:
    def __init__(self):
    pass
    def log_likelihood(self, X):
    assert False, "LikelihoodFunction.log_likelihood needs to be overridden"
  • Write docstrings
  • Are there unused/unnecessary classes that can be removed (e.g., simplegrad)?

Upgrade from PyMC3 to PyMC

  • Move all PyMC3 and Theano dependencies to PyMC and Aesara.
    • setup.cfg, README.
    • examples/
    • toymodel/ This is an untracked directory, not part of the repository!
    • blackbox/
    • util/ except for above-mentioned function!
    • cyclus_db/
    • visualize/
  • Upgrade bicyclus.util.save_trace, taking into account #9 .
  • Test upgraded version

Make installable package

  • Add setup.py Used setup.cfg which is preferred for whatever reason.

  • Add list of dependencies โžก๏ธ see README

  • ๐Ÿšจ How to deal with PyMC and Theano cherry-picked dependencies? ๐Ÿšจ See comment below.
    According to the setuptools documentation,

    When your project is installed (e.g. using pip), all of the dependencies not already installed will be located (via PyPI), downloaded, built (if necessary), and installed [...].

    Thus, I cannot simply put PyMC3 and arviz into the requirements list. Maybe I could point pip to the specific cherry-picked versions? I think for now, I will not put it in the requirements list and simply mention it in the README.

`iter_sample` does not work

Using pm.iter_sample in examples/run.py raises the following error:

Traceback (most recent call last):
  File "/home/bicyclus/examples/run.py", line 316, in <module>
    main()
  File "/home/bicyclus/examples/run.py", line 312, in main
    sample(args, pymc_model, initvals)
  File "/home/bicyclus/examples/run.py", line 290, in sample
    for trace in sampler:
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py", line 981, in iter_sample
    for i, (strace, _) in enumerate(sampling):
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py", line 1062, in _iter_sample
    point, stats = step.step(point)
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 155, in step
    apoint = DictToArrayBijection.map({v.name: point[v.name] for v in self.vars})
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 155, in <dictcomp>
    apoint = DictToArrayBijection.map({v.name: point[v.name] for v in self.vars})
KeyError: 'feed_assay_interval__'

I am currently investigating the error. Both pm.sample and pm.iter_sample eventually call pymc/sampling.py:1025 (function _iter_sample) to perform the sampling. I suspect that pm.sample does something to the initial values that transforms them into the correct format, compare the following pdb output:

  • Using pm.sample:
    (Pdb) w
      /home/bicyclus/toymodel/toymodel.py(452)<module>()
    -> main()
      /home/bicyclus/toymodel/toymodel.py(447)main()
    -> arviz_summary, trace, emu, solver_str = run_simulations(model, **kwargs)
      /home/bicyclus/toymodel/toymodel.py(291)run_simulations()
    -> trace = pm.sample(
      /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(655)sample()
    -> mtrace = _sample_many(**sample_args)
      /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(769)_sample_many()
    -> trace = _sample(
      /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(915)_sample()
    -> for it, (strace, diverging) in enumerate(sampling):
      /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/fastprogress/fastprogress.py(41)__iter__()
    -> for i,o in enumerate(self.gen):
    > /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(1025)_iter_sample()
    -> model = modelcontext(model)
    (Pdb) start
    {'large_norm_interval__': array(1.44301298), 'small_unif_interval__': array(-0.17056391)}
    
  • Using pm.iter_sample:
    (Pdb) w
      /home/bicyclus/toymodel/toymodel.py(452)<module>()
    -> main()
      /home/bicyclus/toymodel/toymodel.py(447)main()
    -> arviz_summary, trace, emu, solver_str = run_simulations(model, **kwargs)
      /home/bicyclus/toymodel/toymodel.py(283)run_simulations()
    -> for trace in sampler:
      /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(981)iter_sample()
    -> for i, (strace, _) in enumerate(sampling):
    > /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(1025)_iter_sample()
    -> model = modelcontext(model)
    (Pdb) start
    {'large_norm': 7559.40626711426, 'small_unif': 0.2070833560593321}
    

Unnecessary use of deepcopy in example?

Is the deepcopy line needed?

bicyclus/examples/run.py

Lines 61 to 65 in 8317d3e

self.mut_model = deepcopy(self.model)
self.mut_model["simulation"]["recipe"][0]["nuclide"][0].update(
comp=parameters["feed_assay"]) # U235
self.mut_model["simulation"]["recipe"][0]["nuclide"][1].update(
comp=1.-parameters["feed_assay"]) # U238

In an actual use case, I would probably outsource all of this to an external file writer. Is there a possibility to showcase this in the example without making it unnecessary complex? Would there be an advantage?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.