Giter Site home page Giter Site logo

echidna's People

Contributors

arushanova avatar ashleyrback avatar edleming avatar jwaterfield avatar mjmottram avatar pgjones avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

echidna's Issues

Grid search minimiser

The Fit class/module should also include a grid search minimiser, similar to the algorithm currently implemented by the LimitSetting._get_chi_squared method.

CSS/style links don't work for online version of sphinx docs

The style information for the auto-generated _static html files, displaying the sphinx documentation, is not properly included when these static pages are hosted on GitHub via gh-pages.

The style information is included e.g. here. But the specified relative path _static/default.css doesn't exist on GitHub, so the docs appear in just basic html (see https://snoplusuk.github.io/echidna/docs/index.html).

For the homepage (https://snoplusuk.github.io/echidna/) this issue was remedied by following this solution - see _config and head.html. But since the sphinx html pages are auto-generated, I'm not sure how to implement a similar solution.

Add method for calculating chi squared

Chi squared is calculated by summing the log likelihood and penalty term in quadrature.

Will form the get_chi_squared method in #4. The methods written in #2 and #5 will be called from within this method.

Core data structure

This needs to be:

  • 3d, radius, energy and time
  • Plotting/output functionality

Large Matrices

Echidna currently uses numpy arrays to store the spectra data. This is OK for spectra with a small number of dimensions and a small number of bins but for spectra with large N dimensions each with a large number of bins this will not work as there is not enough memory to store the matrix.

For example, currently if we want to store energy, xpos, ypos and zpos each with 500 bins then the line

self._data = numpy.zeros(shape=(500, 500, 500, 500), dtype=float)

is called which throws a MemoryError as the matrix is too large.

One possible solution is to use scipy's sparse matrices which condenses zero elements within the matrix. However if there is not enough zero elements in the matrix then we would have to look at using PyTables. Moving towards one of these solutions will also improve echidna's overall efficiency.

See this blog post for more details:
http://www.philippsinger.info/?p=464

Convert ChiSquared

Convert the current ChiSquared class into three new classes derived from TestStatistic

floating point comparison

Change all floating point comparisons to use numpy.allclose instead of ==. The latter is causing many bugs. int cases can still use == / != e.g. number of bins

Incorrect value for `Spectra._num_decays` for multi_ntuple_spectrum

When appending to existing spectrum, _num_decays is not updated. I added the following to the start of my limit setting code:

    print "sum:", Te130_0n2b.sum()
    print "raw_events:", Te130_0n2b._raw_events
    print "num_decays:", Te130_0n2b._num_decays

I know that from all ~20 ntuples I combined, there were 200034 events (then multiplied by 10 to give 10 years), so the output of the above should be:

sum: <2000340
raw_events: 2000340
num_decays: 2000340

Instead what I get is:

sum: 1688010.0
raw_events: 0
num_decays: 10002

The sum seems a little lower than I would have thought for T_1/2 = 6.2e24. The main problem is that raw_events is zero and num_decays is far too small, probably just the first ntuple that was read in. If we are appending to an existing spectrum, these values need to be updated.

Sphinx doc build error with ubuntu

I get the following error when making the html using the latest stable release on ubuntu:

sphinx-build -b html -d _build/doctrees . _build/html
Running Sphinx v1.1.3

Extension error:
Could not import extension sphinx.ext.napoleon (exception: No module named napoleon)
make: *** [html] Error 1

Sampling MC to make a "fake data" spectrum

@drjeannewilson and I were discussing how we could build into echidna the ability to create a "fake dataset" by selecting a random sample of say 10% of MC events.

We thought the best way to do this may be to add a get_valid-type method to the extractor base class that would be called per event. A new class member (e.g. _sample_fraction would set the faction you wish to sample. By default it would return True for all events (_sample_fraction = 1.0).

I think the only tricky part here is how we make sure it is only called once per event.

We should also consider this in the Extractor discussions in #96. This is the sort of thing that we probably want to happen (where possible) in a uniform way, rather than in an ad-hock way via a pyroot script filling the spectrum.

@drjeannewilson has agreed to start having a look into this.

Add requirements to README

Add information to README file about requirements and how to install requirements using pip and requirements.txt

Test spectra unittests failing

Unittests give the error

======================================================================
ERROR: test_slicing (test_spectra.TestSpectra)
Test the slicing shirnks the spectra in the correct way.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna/echidna/test/test_spectra.py", line 94, in test_slicing
    test_spectra._time_high))
  File "echidna/core/spectra.py", line 154, in shrink
    raise ValueError("Energy high is below exist bound")
ValueError: Energy high is below exist bound

The assertRaises method takes the function and the function's separate arguments as arguments, e.g.:

self.assertRaises(ValueError,
                  test_spectra.shrink,
                  test_spectra._energy_low,
                  2 * test_spectra._energy_high,
                  test_spectra._radial_low,
                  test_spectra._radial_high,
                  test_spectra._time_low,
                  test_spectra._time_high)

But I then get

======================================================================
FAIL: test_slicing (test_spectra.TestSpectra)
Test the slicing shirnks the spectra in the correct way.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna/echidna/test/test_spectra.py", line 120, in test_slicing
    test_spectra._time_bins / 2))
AssertionError: False is not true

----------------------------------------------------------------------

Which I don't know how to fix

Fit config

Create a FitConfig class to book-keep the parameters in the fit

Seperate floating and fixed backgrounds in limit setting

Currently the code rescales all background spectra for each signal scaling. The code can be optimised by only scaling backgrounds which are being floated with the rest of the backgrounds scaled and fixed to their priors at the time of initialising the get_limit function.

I am currently thinking of separating backgrounds into 2 types which each get passed as a list. 1 where the backgrounds only have an accompanying prior and 1 where they have a prior and a numpy array of counts for floating the corresponding background. This will probably require quite a major rewrite of the limit_setting code. Do you agree before I precede @ashleyrback?

Missing argaprse import in `multi_ntuple_spectrum`

Currently the start of the example script looks as follows:

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-s", "--save_path",
                        type=str,
                        default="./",
                        help="Enter destination path for .hdf5 spectra files.")

It is missing an import argparse and running the script results in:

  File "echidna/scripts/multi_ntuple_spectrum.py", line 75, in <module>
    parser = argparse.ArgumentParser()
NameError: name 'argparse' is not defined

The import argparse could be inserted just after if __name__ == "__main__":.

Fit parameter

Create a FitParameter class to define the parameters in the fit

`Fit` class

Create a new central Fit class to manage/book-keep the fitting process.

Add class for chi squared calculation

Base class for handling chi squared calculation

Inputs

  • 1D data histogram
  • 1D montecarlo histogram
  • Systematics that need to be accounted for

Methods

  • get_log_likelihood
  • get_penalty_term (accounts for systematics)
  • get_chi_squared

Output

  • Calculated chi squared for montecarlo compare to data

Images for README/Wiki

Github doesn't have any functionality for uploading images directly for inclusion in the Wiki/README. But you can upload images in comments and then copy the link to the image file that way.

The sole purpose of this issue is for uploading images for inclusion in the Wiki/README!

Minimiser function

Fit class created in #101 should include a method to be passed to any minimiser. This method should be of the form:

def func(self, *args):
    # Match values in args back with the corresponding parameter
    # For each value in args implement the relevant change
    # Combine Spectra to form a single expected spectrum
    # Project onto fit dimensions and flatten
    # Call compute_statistic method of test statistic
    return test_statistic

Check scaling logic in zero_nu_limit

As mentioned in #76, the scaling logic in zero_nu_limit should be checked to ensure it matches the assumptions made about the Spectra loaded into the script.

Extractors

I'm going to use the limit setting part of echidna to look at how the pileup rejection helps improving the limit of 0nu. At the moment in order to create a spectra class I have to prepare the root file (i.e. apply the pileup cuts) first and then just to fill in an empty spectra with the energies. This leads to the question if we want to add more extractors? I've written the options I can see and comments below.

  1. We will add more extractors.
    This extractors will be basically representing the cuts. I personally will need to implement about five extractors, that are the classifiers I use for cuts.
  • The extractors then should be dependent on the input, so that a person will choose which extractor (i.e. cut) values he wants to apply.
  • The good thing about this is that the cut value can be one of the fitting parameter.
  • The negative thing is that do we actually need this?
  1. We are not adding more extractors.
    It is quite simple for a user to keep the code that prepares the pdf personal. In this case a person can apply any cuts before he turns his file into a spectra class.
  • First of all, this won't make an impression that creating a spectra is hardcoded. On contrary a simple example and recommendations will allow freedom at this step.
  • A lot of extractors shouldn't be included into echidna, because they have the only purpose - creating a spectra. And they can be easily used in outside of echidna.
  • ds structure might change again and this won't affect echidna.
  • if a person want to implement cut as one of the parameter he can add extractor if he wants to.

Plot titles

There should be added the possibility to write a title to the plot and a legend.

Add class for limit setting

Base class to deal with limit setting

Inputs

  • 1D energy histogram for signal
  • 1D energy histogram for combined background (in signal region of interest)
  • Required precision for limit, e.g. 90% CL, 3 sigma etc.

Output

  • Limit

Set up iRODs server

For sharing HDF5s. Ask Francesca and Alex.

  • Set up irods server
  • Integration with python-irodsclient, so that hdf5s can be loaded straight from irods server.

Seperate half life weights into its own function

Remove the half life weights from the fill_spectrum functions and have it in its own function which can be applied seperately.

Edit 15/04/2015:
Jeanne's suggestion is to remove the timing axis from the Spectra class and have the timing applied as analytic function. This means that whenever you return a number of counts from the spectrum i.e. in Spectra.sum() you would need to multiply the number of events from the 2D spectrum by the result of the correct time model.

Problems with `plot` module

Some problems with the plot module:

  • plot_projection bars are too wide, so you can't see any detail in the spectrum when plotted.
  • plot_surface fails with the error below. This may be due to removing the Axes3D module that I thought was unused, in #76.

I'll investigate.

ValueError                                Traceback (most recent call last)
<ipython-input-13-3378db744f5a> in <module>()
----> 1 plot.plot_surface(spectrum_1, "energy_mc", "radial_mc")

/home/ashley/snoplus/software/echidna/echidna/echidna/output/plot.pyc in plot_surface(spectra, dimension1, dimension2)
    192     """
    193     fig = plt.figure()
--> 194     axis = fig.add_subplot(111, projection='3d')
    195     index1 = spectra.get_config().get_index(dimension1)
    196     index2 = spectra.get_config().get_index(dimension2)

/usr/lib/pymodules/python2.7/matplotlib/figure.pyc in add_subplot(self, *args, **kwargs)
    894         else:
    895             projection_class, kwargs, key = process_projection_requirements(
--> 896                 self, *args, **kwargs)
    897 
    898             # try to find the axes with this key in the stack

/usr/lib/pymodules/python2.7/matplotlib/projections/__init__.pyc in process_projection_requirements(figure, *args, **kwargs)
    117 
    118     if isinstance(projection, basestring) or projection is None:
--> 119         projection_class = get_projection_class(projection)
    120     elif hasattr(projection, '_as_mpl_axes'):
    121         projection_class, extra_kwargs = projection._as_mpl_axes()

/usr/lib/pymodules/python2.7/matplotlib/projections/__init__.pyc in get_projection_class(projection)
     61         return projection_registry.get_projection_class(projection)
     62     except KeyError:
---> 63         raise ValueError("Unknown projection '%s'" % projection)
     64 
     65 

ValueError: Unknown projection '3d'

Add top-level documentation to echidna

@drjeannewilson pointed out during some discussions that, for a new user there is a lack of documentation explaining the basic code structure:

  • What is in each of the top-level directories?
  • What are the main objectives of each part of the code?
  • What are the first steps for a new user to familiarise themselves with the code?
  • Perhaps have some entry-level example scripts:
    • How do you make a basic Spectra object?
    • How do you fill a Spectra object with e.g. some random numbers?
    • How do you smear a spectrum?
    • How do you plot a spectrum?
    • How do you create a very simple limit setting/fitting routine?

Add float checking/conversion to Spectra.scale

I noticed when running some verification tests on #76, that if you call the scale method on a spectrum with a number of events that is not a float, there is a chance all _data can be set to zero e.g.:

    Xe136_0n2b_n1 = store.load("data/pr_76/Xe136_0n2b_n1_smeared.hdf5")
    print Xe136_0n2b_n1.sum()
    Xe136_0n2b_n1.scale(1000)
    print Xe136_0n2b_n1.sum()

yields:

47577.0
0.0

Debugging the code, within the scale method:

(Pdb) print numpy.multiply(self._data, num_decays / self._num_decays)
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]

as you can see all become zero as python is doing integer division, where num_decays is much less than self._num_decays. But if I cast num_decays as a float, all goes as expected:

(Pdb) print numpy.multiply(self._data, float(num_decays) / self._num_decays)
[[  2.49716652e-06   2.49716662e-06   2.49716671e-06 ...,   2.49713646e-06
    2.49713631e-06   2.49713615e-06]
 [  6.01030304e-06   6.01030326e-06   6.01030348e-06 ...,   6.01022526e-06
    6.01022488e-06   6.01022450e-06]
 [  6.11828942e-06   6.11828962e-06   6.11828982e-06 ...,   6.11818621e-06
    6.11818580e-06   6.11818539e-06]
 ..., 
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    0.00000000e+00   0.00000000e+00]]

The solution should be quite simple, either we cast num_decays as a float (as above) or raise a TypeError if you supply a value for num_decays that is not of float type.

Add method for calculating penalty term

Method to calculate the penalty term which is used in the chi squared calculation. This takes into account any systematics which are associated with the fit.

Will form the get_penalty_term method in #4

Fit results

Create a FitResultsclass to store/report the results of the fit and for offline analyses, e.g. correlations.

Some functionality similar to the SystAnalyser` class.

Should have a main _data member that is a numpy array with one dimension for each fit parameter. This will store the value of the test statistic calculated for each combination of fit parameter values tested.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.