snoplusuk / echidna Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 12.0 1.77 MB

License: MIT License

Python 95.68% Jupyter Notebook 4.07% Smarty 0.25%

echidna's People

Contributors

Stargazers

Watchers

Forkers

ashleyrback pgjones arushanova jwaterfield edleming jeffhartnell mjmottram jackdunger slangrock mark--s drjeannewilson billyliggins

echidna's Issues

Grid search minimiser

The Fit class/module should also include a grid search minimiser, similar to the algorithm currently implemented by the LimitSetting._get_chi_squared method.

CSS/style links don't work for online version of sphinx docs

The style information for the auto-generated _static html files, displaying the sphinx documentation, is not properly included when these static pages are hosted on GitHub via gh-pages.

The style information is included e.g. here. But the specified relative path _static/default.css doesn't exist on GitHub, so the docs appear in just basic html (see https://snoplusuk.github.io/echidna/docs/index.html).

For the homepage (https://snoplusuk.github.io/echidna/) this issue was remedied by following this solution - see _config and head.html. But since the sphinx html pages are auto-generated, I'm not sure how to implement a similar solution.

Dynamic parameters in spectra class

Use config file to create members of data structure as an instance of Spectra is created.

Add method for calculating chi squared

Chi squared is calculated by summing the log likelihood and penalty term in quadrature.

Will form the get_chi_squared method in #4. The methods written in #2 and #5 will be called from within this method.

Core data structure

This needs to be:

3d, radius, energy and time
Plotting/output functionality

Add Baker-Cousins style log likelihood

Write a method to perform calculation of log likelihood, based on Baker-Cousins method of setting limits

Will form get_log_likelihood method in #4

Add method to Spectra class to rebin `Spectra._data`

Should be able to rebin along any axis in Spectra._data. Something along the lines of http://wiki.scipy.org/Cookbook/Rebinning (found by @jwaterfield)

Large Matrices

Echidna currently uses numpy arrays to store the spectra data. This is OK for spectra with a small number of dimensions and a small number of bins but for spectra with large N dimensions each with a large number of bins this will not work as there is not enough memory to store the matrix.

For example, currently if we want to store energy, xpos, ypos and zpos each with 500 bins then the line

self._data = numpy.zeros(shape=(500, 500, 500, 500), dtype=float)

is called which throws a MemoryError as the matrix is too large.

One possible solution is to use scipy's sparse matrices which condenses zero elements within the matrix. However if there is not enough zero elements in the matrix then we would have to look at using PyTables. Moving towards one of these solutions will also improve echidna's overall efficiency.

See this blog post for more details:
http://www.philippsinger.info/?p=464

Convert ChiSquared

Convert the current ChiSquared class into three new classes derived from TestStatistic

floating point comparison

Change all floating point comparisons to use numpy.allclose instead of ==. The latter is causing many bugs. int cases can still use == / != e.g. number of bins

Incorrect value for `Spectra._num_decays` for multi_ntuple_spectrum

When appending to existing spectrum, _num_decays is not updated. I added the following to the start of my limit setting code:

    print "sum:", Te130_0n2b.sum()
    print "raw_events:", Te130_0n2b._raw_events
    print "num_decays:", Te130_0n2b._num_decays

I know that from all ~20 ntuples I combined, there were 200034 events (then multiplied by 10 to give 10 years), so the output of the above should be:

sum: <2000340
raw_events: 2000340
num_decays: 2000340

Instead what I get is:

sum: 1688010.0
raw_events: 0
num_decays: 10002

The sum seems a little lower than I would have thought for T_1/2 = 6.2e24. The main problem is that raw_events is zero and num_decays is far too small, probably just the first ntuple that was read in. If we are appending to an existing spectrum, these values need to be updated.

Sphinx doc build error with ubuntu

I get the following error when making the html using the latest stable release on ubuntu:

sphinx-build -b html -d _build/doctrees . _build/html
Running Sphinx v1.1.3

Extension error:
Could not import extension sphinx.ext.napoleon (exception: No module named napoleon)
make: *** [html] Error 1

A couple of points relating to #36

In combine_hdf5.py:

Would prefer use of argparse over sys.argv to keep it consistent with the other scripts.
Would also prefer use of argparse.error over print followed by sys.exit(1), see https://github.com/snoplusuk/echidna/blob/master/echidna/scripts/dump_smeared.py#L51 for an example
Typo on line 27, loads file1 twice

Sampling MC to make a "fake data" spectrum

@drjeannewilson and I were discussing how we could build into echidna the ability to create a "fake dataset" by selecting a random sample of say 10% of MC events.

We thought the best way to do this may be to add a get_valid-type method to the extractor base class that would be called per event. A new class member (e.g. _sample_fraction would set the faction you wish to sample. By default it would return True for all events (_sample_fraction = 1.0).

I think the only tricky part here is how we make sure it is only called once per event.

We should also consider this in the Extractor discussions in #96. This is the sort of thing that we probably want to happen (where possible) in a uniform way, rather than in an ad-hock way via a pyroot script filling the spectrum.

@drjeannewilson has agreed to start having a look into this.

Add storage output module

HDF5
~~ROOT~~

Check scaling logic in Xe136_majoron_limits

Check the logic used for scaling spectra in Xe136_majoron_limits script, matches the assumptions made about the spectra loaded by the script.

Add requirements to README

Add information to README file about requirements and how to install requirements using pip and requirements.txt

Test spectra unittests failing

Unittests give the error

======================================================================
ERROR: test_slicing (test_spectra.TestSpectra)
Test the slicing shirnks the spectra in the correct way.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna/echidna/test/test_spectra.py", line 94, in test_slicing
    test_spectra._time_high))
  File "echidna/core/spectra.py", line 154, in shrink
    raise ValueError("Energy high is below exist bound")
ValueError: Energy high is below exist bound

The assertRaises method takes the function and the function's separate arguments as arguments, e.g.:

self.assertRaises(ValueError,
                  test_spectra.shrink,
                  test_spectra._energy_low,
                  2 * test_spectra._energy_high,
                  test_spectra._radial_low,
                  test_spectra._radial_high,
                  test_spectra._time_low,
                  test_spectra._time_high)

But I then get

======================================================================
FAIL: test_slicing (test_spectra.TestSpectra)
Test the slicing shirnks the spectra in the correct way.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna/echidna/test/test_spectra.py", line 120, in test_slicing
    test_spectra._time_bins / 2))
AssertionError: False is not true

----------------------------------------------------------------------

Which I don't know how to fix

Fit config

Create a FitConfig class to book-keep the parameters in the fit

Load from RAT

Load from RAT files

Maths not displaying correctly in docs

Maths is currently not being displayed correctly by MathJax in the built echidna documentation, see screenshot.

Seperate floating and fixed backgrounds in limit setting

Currently the code rescales all background spectra for each signal scaling. The code can be optimised by only scaling backgrounds which are being floated with the rest of the backgrounds scaled and fixed to their priors at the time of initialising the get_limit function.

I am currently thinking of separating backgrounds into 2 types which each get passed as a list. 1 where the backgrounds only have an accompanying prior and 1 where they have a prior and a numpy array of counts for floating the corresponding background. This will probably require quite a major rewrite of the limit_setting code. Do you agree before I precede @ashleyrback?

Missing argaprse import in `multi_ntuple_spectrum`

Currently the start of the example script looks as follows:

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-s", "--save_path",
                        type=str,
                        default="./",
                        help="Enter destination path for .hdf5 spectra files.")

It is missing an import argparse and running the script results in:

  File "echidna/scripts/multi_ntuple_spectrum.py", line 75, in <module>
    parser = argparse.ArgumentParser()
NameError: name 'argparse' is not defined

The import argparse could be inserted just after if __name__ == "__main__":.

Fit parameter

Create a FitParameter class to define the parameters in the fit

Limit setting with Fit class

Modify the LimitSetting class to work with the new framework

`Fit` class

Create a new central Fit class to manage/book-keep the fitting process.

TestStatistic

Create a TestStatistic base class

Add class for chi squared calculation

Base class for handling chi squared calculation

Inputs

1D data histogram
1D montecarlo histogram
Systematics that need to be accounted for

Methods

get_log_likelihood
get_penalty_term (accounts for systematics)
get_chi_squared

Output

Calculated chi squared for montecarlo compare to data

BiPo Extractor

Write an extractor for bipo

Images for README/Wiki

Github doesn't have any functionality for uploading images directly for inclusion in the Wiki/README. But you can upload images in comments and then copy the link to the image file that way.

The sole purpose of this issue is for uploading images for inclusion in the Wiki/README!

Minimiser function

Fit class created in #101 should include a method to be passed to any minimiser. This method should be of the form:

def func(self, *args):
    # Match values in args back with the corresponding parameter
    # For each value in args implement the relevant change
    # Combine Spectra to form a single expected spectrum
    # Project onto fit dimensions and flatten
    # Call compute_statistic method of test statistic
    return test_statistic

Logo

Create one.

Smearing functionality

Check scaling logic in zero_nu_limit

As mentioned in #76, the scaling logic in zero_nu_limit should be checked to ensure it matches the assumptions made about the Spectra loaded into the script.

Record Number of Raw Events

Spectra object should keep track the number of raw events that are used to generate each spectra.

Extractors

I'm going to use the limit setting part of echidna to look at how the pileup rejection helps improving the limit of 0nu. At the moment in order to create a spectra class I have to prepare the root file (i.e. apply the pileup cuts) first and then just to fill in an empty spectra with the energies. This leads to the question if we want to add more extractors? I've written the options I can see and comments below.

We will add more extractors.
This extractors will be basically representing the cuts. I personally will need to implement about five extractors, that are the classifiers I use for cuts.

The extractors then should be dependent on the input, so that a person will choose which extractor (i.e. cut) values he wants to apply.
The good thing about this is that the cut value can be one of the fitting parameter.
The negative thing is that do we actually need this?

We are not adding more extractors.
It is quite simple for a user to keep the code that prepares the pdf personal. In this case a person can apply any cuts before he turns his file into a spectra class.

First of all, this won't make an impression that creating a spectra is hardcoded. On contrary a simple example and recommendations will allow freedom at this step.
A lot of extractors shouldn't be included into echidna, because they have the only purpose - creating a spectra. And they can be easily used in outside of echidna.
ds structure might change again and this won't affect echidna.
if a person want to implement cut as one of the parameter he can add extractor if he wants to.

Plot titles

There should be added the possibility to write a title to the plot and a legend.

Add examples of how to run scripts to README

Currently README only has information relevant for developers.

Add class for limit setting

Base class to deal with limit setting

Inputs

1D energy histogram for signal
1D energy histogram for combined background (in signal region of interest)
Required precision for limit, e.g. 90% CL, 3 sigma etc.

Output

Limit

Extract data from production ntuples on the Grid

Extend the fill_spectrum code so it is compatable with production ntuples located on the grid not just root files generated by rat.

Set up iRODs server

For sharing HDF5s. Ask Francesca and Alex.

Set up irods server
Integration with python-irodsclient, so that hdf5s can be loaded straight from irods server.

Seperate half life weights into its own function

Remove the half life weights from the fill_spectrum functions and have it in its own function which can be applied seperately.

Edit 15/04/2015:
Jeanne's suggestion is to remove the timing axis from the Spectra class and have the timing applied as analytic function. This means that whenever you return a number of counts from the spectrum i.e. in Spectra.sum() you would need to multiply the number of events from the 2D spectrum by the result of the correct time model.

Problems with `plot` module

Some problems with the plot module:

plot_projection bars are too wide, so you can't see any detail in the spectrum when plotted.
plot_surface fails with the error below. This may be due to removing the Axes3D module that I thought was unused, in #76.

I'll investigate.

ValueError                                Traceback (most recent call last)
<ipython-input-13-3378db744f5a> in <module>()
----> 1 plot.plot_surface(spectrum_1, "energy_mc", "radial_mc")

/home/ashley/snoplus/software/echidna/echidna/echidna/output/plot.pyc in plot_surface(spectra, dimension1, dimension2)
    192     """
    193     fig = plt.figure()
--> 194     axis = fig.add_subplot(111, projection='3d')
    195     index1 = spectra.get_config().get_index(dimension1)
    196     index2 = spectra.get_config().get_index(dimension2)

/usr/lib/pymodules/python2.7/matplotlib/figure.pyc in add_subplot(self, *args, **kwargs)
    894         else:
    895             projection_class, kwargs, key = process_projection_requirements(
--> 896                 self, *args, **kwargs)
    897 
    898             # try to find the axes with this key in the stack

/usr/lib/pymodules/python2.7/matplotlib/projections/__init__.pyc in process_projection_requirements(figure, *args, **kwargs)
    117 
    118     if isinstance(projection, basestring) or projection is None:
--> 119         projection_class = get_projection_class(projection)
    120     elif hasattr(projection, '_as_mpl_axes'):
    121         projection_class, extra_kwargs = projection._as_mpl_axes()

/usr/lib/pymodules/python2.7/matplotlib/projections/__init__.pyc in get_projection_class(projection)
     61         return projection_registry.get_projection_class(projection)
     62     except KeyError:
---> 63         raise ValueError("Unknown projection '%s'" % projection)
     64 
     65 

ValueError: Unknown projection '3d'

Add top-level documentation to echidna

@drjeannewilson pointed out during some discussions that, for a new user there is a lack of documentation explaining the basic code structure:

What is in each of the top-level directories?
What are the main objectives of each part of the code?
What are the first steps for a new user to familiarise themselves with the code?
Perhaps have some entry-level example scripts:
- How do you make a basic Spectra object?
- How do you fill a Spectra object with e.g. some random numbers?
- How do you smear a spectrum?
- How do you plot a spectrum?
- How do you create a very simple limit setting/fitting routine?

Installation of requirements fails

Installing requirements using:

pip install -r requirements.txt

often seems to fail. Needs updating!

Add float checking/conversion to Spectra.scale

I noticed when running some verification tests on #76, that if you call the scale method on a spectrum with a number of events that is not a float, there is a chance all _data can be set to zero e.g.:

    Xe136_0n2b_n1 = store.load("data/pr_76/Xe136_0n2b_n1_smeared.hdf5")
    print Xe136_0n2b_n1.sum()
    Xe136_0n2b_n1.scale(1000)
    print Xe136_0n2b_n1.sum()

yields:

47577.0
0.0

Debugging the code, within the scale method:

(Pdb) print numpy.multiply(self._data, num_decays / self._num_decays)
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]

as you can see all become zero as python is doing integer division, where num_decays is much less than self._num_decays. But if I cast num_decays as a float, all goes as expected:

(Pdb) print numpy.multiply(self._data, float(num_decays) / self._num_decays)
[[  2.49716652e-06   2.49716662e-06   2.49716671e-06 ...,   2.49713646e-06
    2.49713631e-06   2.49713615e-06]
 [  6.01030304e-06   6.01030326e-06   6.01030348e-06 ...,   6.01022526e-06
    6.01022488e-06   6.01022450e-06]
 [  6.11828942e-06   6.11828962e-06   6.11828982e-06 ...,   6.11818621e-06
    6.11818580e-06   6.11818539e-06]
 ..., 
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    0.00000000e+00   0.00000000e+00]]

The solution should be quite simple, either we cast num_decays as a float (as above) or raise a TypeError if you supply a value for num_decays that is not of float type.

Ability to store different variables in N dimensions

E.g. we might want to store chi-squared, eff mass and penalty terms in a 3d histogram. Effectively we want our own TH* class for echidna.

Add method for calculating penalty term

Method to calculate the penalty term which is used in the chi squared calculation. This takes into account any systematics which are associated with the fit.

Will form the get_penalty_term method in #4

Fit results

Create a FitResultsclass to store/report the results of the fit and for offline analyses, e.g. correlations.

Some functionality similar to the SystAnalyser` class.

Should have a main _data member that is a numpy array with one dimension for each fit parameter. This will store the value of the test statistic calculated for each combination of fit parameter values tested.

Add unittests for decay.py

Add unittests for the code included in #44.

snoplusuk / echidna Goto Github PK

echidna's People

Contributors

Stargazers

Watchers

Forkers

echidna's Issues

Inputs

Methods

Output

Inputs

Output

Recommend Projects

Recommend Topics

Recommend Org