next-exp / ic Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 70.0 92.77 MB

Python 97.37% Shell 0.89% Cython 1.74%

ic's People

Contributors

Stargazers

Watchers

Forkers

jacg jmbenlloch gonzaponte paolafer bpalmeiro jerenner adampara francescalted aretno andlaing mmkekic carmenromo coreyjadams leslierogers austinmcdonald jmunozv rutywb11 cransom12 msorel rfelkai miryammv ausonandres neuslopez beatrix87 jjgomezcadenas jocuen proscrite amenkara drailin bogglehead physhead20 echurch franmon aitoralvarado mrjacopbass gondiaz miguelsimon akshattripathi jmalbos aliciamosquera siralde yairif tcontrer k-woodruff halmamol jmhaefner mcruces-fz tubbz-alt carhc azerrdp karennavarro96 kvjmistry martinperezmaneiro jahernando giriphm diegolopezgutierrez fabiankellerer soleti adamredwine mariandbt mariajmz joshgrocott benjpjones enakshidey fvalenciano jwaiton marcseemann antalofis

ic's Issues

PyQt for python 3

Vicente Herrero has developed a very nice lab tools GUI based in PyQt. I was thinking in a similar tool for fast analysis and online monitoring of data.

After some trying it appears that all conda channels that install PyQt both for mac osx and Linux are python 2.7.

Questions: any one has found a conda install for python 3.5/3.6?
Otherwise, shall we install with another tool (e.g, pip3)?

Test Irene._store_pmaps

There is (deliberate) bug in Irene._store_pmaps which in not picked up by any of our current tests.

Write a test that points out the bug.
Fix the bug.
Check whether the name can be used instead of the (magic) number.

Provide emacs init file for non-emacsers to automate installation of magit

This should automatically download and install magit and helm on first use, and set sensible default key bindings etc.

Should probably be based on JJ's el-get init file that we created yesterday.

I guess that it should live in the (still non-existent) documentation directory.

Structure of the reconstruction repository (anatomy of IC)

The IC repository will contain the reconstruction program of NEW/NEXT. Describe the structure and workflow on IC.

Modifying the format of S2Si table

Currently S2Si is being written to file as a "sparse table". This means the following.

This is the representation in memory of S2Si:

{0: [[1353, array([  7.88733907,  10.91102902,  15.70949351,   8.74186014,
            0.        ,   0.        ])],
  [1354, array([ 13.91424864,  22.49485269,  31.07545674,  22.1793893 ,
            6.21694207,   0.        ])],
  [1356, array([  9.93309068,  15.09844199,  14.55788197,   5.48848607,
            0.        ,   0.        ])],
  [1359, array([  0.        ,  10.83277617,  18.66492619,  15.41744935,
            0.        ,   0.        ])]]}

Notice that the array with energies contains a number of ceros.

This is the table representation:

array([(0, 1820002, 0, 1353, 0,   7.88733912),
       (0, 1820002, 0, 1353, 1,  10.91102886),
       (0, 1820002, 0, 1353, 2,  15.70949364),
       (0, 1820002, 0, 1353, 3,   8.74186039),
       (0, 1820002, 0, 1354, 0,  13.91424847),
       (0, 1820002, 0, 1354, 1,  22.49485207),
       (0, 1820002, 0, 1354, 2,  31.07545662),
       (0, 1820002, 0, 1354, 3,  22.17938995),
       (0, 1820002, 0, 1354, 4,   6.21694183),
       (0, 1820002, 0, 1356, 0,   9.93309021),
       (0, 1820002, 0, 1356, 1,  15.09844208),
       (0, 1820002, 0, 1356, 2,  14.55788231),
       (0, 1820002, 0, 1356, 3,   5.48848629),
       (0, 1820002, 0, 1359, 1,  10.83277607),
       (0, 1820002, 0, 1359, 2,  18.66492653),
       (0, 1820002, 0, 1359, 3,  15.417449  )], 
      dtype=[('event', '<i4'), ('evtDaq', '<i4'), ('peak', 'u1'), ('nsipm', '<i2'), ('nsample', '<i2'), ('ene', '<f4')])

Notice that the table representation eliminates the entries with zeros. The original array can still be reproduced, since the indexes are kept.

However:

The PMAPS tables are small
Reconstructing the array from the indexes rather than just reading the full array from file may be inefficient and in any case painful.

Thus, we intend to change the format to write in file the full array (ceros included).
@jacg, @gonzaponte, @neuslopez please comment

Investigate why test_define_window crashes on Travis Linux

It works on Travis OS X, it works on JG's linux machine, it works on JJ's Mac ... but core dumps on Travis Linux.

The test can be found in invisible_cities/core/core_functions_test.py.

It has been marked as skipif on Travis Linux.

Work flow in ICARO

I am trying to document (and also to understand) the work flow for a "typical" developer working in ICARO (the analysis repository). Two important points are:

ICARO is a client of IC. This means that all relevant, general stuff (e.g, fitting_functions.py) developed while working in an specific analysis must be exported to IC (e.g, fitting_functions.py belongs to /core).
Does that mean that all people working in ICARO has to go through the chores of forking IC, creating the corresponding branch, including fitting_functions.py, and making a PR? This scheme, is the simplest one, but it requires that every one working in analysis interacts with IC, which increases the numbers of developers working in IC, which may create complications (or not).
Is there an alternative? One could imagine creating a folder /export_to_IC folder in ICARO, so that the normal user of ICARO, on her pull request, includes /export_to_ic/core/fitting_functions.py. When the PR is approved (by the dictator(s) of ICARO), then the dictator(s) move fitting_functions.py to IC.
I tend to prefer the second method, but I don't know if we hit here the usual situation best described in Spanish: "desnudar a un santo para vestir a otro".
what do you think @jacg, @jmbenlloch ?

(let's agree on something, since I am writing documentation...)

We are effectively moving towards JCK-1 syntax, and the last PR tries hard to follow the JCK notions. However, if we are going to ask other developers to follow it, we need a description/discussion (with examples, that can be taken from the Irene.py code) to illustrate the most important ideas.

Event ID numbering in Monte Carlo

The need of having unique event ID numbers for a MC production bunch.
Now we have event IDs starting at '0' for every file in the same production, therefore we cannot merge files.

Implementation inconsistency in pmaps_functions

Functions S12df_select_event and S12df_select_peak look like they should have symmetric implementations, but one uses the parameter of the lambda while the other uses the parameter of the outer function. Investigate if it is a bug and if so we need a test.

How do we want to distribute or package IC?

At the moment we have a very simple setup.py which we invoke with the develop command in order to compile the cython modules. This relults in the products (not only the .so files which are the actual modules, but also the Cython-generated .c files, which are just intermediate garbage by-products) being placed in our source tree. This has served us just fine as an initial quick and dirty solution to allow us to get on with writing the code, but it's not a very good long term solution.

Who do we expect to be using IC, and how do we expect them to install it?

Once we know the answer to the last question, we should decide how to move forward with our setup.py.

New directories in IC

I would like to expand the IC directories to include analysis in the following way:

one directory for calibrations /calib
one directory per relevant analysis. Thus, rather than sticking everything into /analysis, I prefer that each analysis main topic has its own folder. Right now we can already define /kripton /na22 and /co56. We can choose between invisible_cities/analysis/kripton or invisible_cities/kripton. I prefer the latest, since the /analysis label is somewhat arbitrary and seems to be that simply hides the visibility of what is going on in the repo.

If you agree with this proposal I would like to submit a pull request creating the directories with a readme.rst where I will add some guidelines concerning the specifics of the analysis.

Please confirm.

Add Jupiter notebook to environment in manage.sh script?

I suggest that we could add jupyter notebook to the environments defined by manage.sh. Most users will eventually work with notebooks and adding the environment in manage.sh avoids that we need to add it later, when they want to run the notebook. In this way, when they work in the NB repository they will only need to define an extra environment variable (ICNBTDIR) (to make the references to their code and files independent of user details).

If you agree, this can be also added to the next pull request on manage.sh

Revisiting PMAPS IO

Recent development has made clear that we need to handle the IO of PMAPS in a consistent way.

In fact, PMAPS IO is an interesting problem because it is the first case in IC where we need to handle a persistent representation that is different from the transient representation of the data. Before PMAPS the transient representation is identical to the persistent representation (arrays of waveforms).

To summarise, the (almost) agreed transient representation of the PMAPS is:

S1/S2 = {event: {peak: namedtudple('t','E')}} where t and E are np arrays.

S2Si = {event:{peak:{nsipm:[E]}}} where E is a np.array of energies (not sparse, e.g, zeros included)

Do we all agree on this? The current representation of S2Si is different:
S2Si = {event:{peak:[nsipm, ([N],[E])]}}

where [N] is a vector of samples and E is a vector of energies sparse (e.g, zeros not included). HOWEVER, we can get rid of [N] by writing the zeros in [E] (thus there is a direct mapping in time with the S2), at very little cost, since after SiPM zero suppression, the vectors are very small. In addition, we can substitute the list by another dictionary, as suggested by @gonzaponte, a more consistent and elegant rep.

AGREED?

Coming to persistent representation, we use pytables. The relevant fields are:

class S12(tb.IsDescription):
    """Store for a S1/S2
    The table maps a S12:
    peak is the index of the S12 dictionary, running over the number of peaks found
    time and energy of the peak.
    """
    event  = tb.Int32Col(pos=0)
    evtDaq = tb.Int32Col(pos=1)
    peak   = tb.UInt8Col(pos=2)  # peak number
    time   = tb.Float32Col(pos=3) # time in ns
    ene    = tb.Float32Col(pos=4) # energy in pes


class S2Si(tb.IsDescription):
    """Store for a S2Si
    The table maps a S2Si
    peak is the same than the S2 peak
    nsipm gives the SiPM number
    only energies are stored (times are defined in S2)
    """
    event   = tb.Int32Col(pos=0)
    evtDaq  = tb.Int32Col(pos=1)
    peak    = tb.UInt8Col(pos=2)  # peak number
    nsipm   = tb.Int16Col(pos=3)  # sipm number
    nsample = tb.Int16Col(pos=4) # sample number
    ene     = tb.Float32Col(pos=5) # energy in pes

In the tables above, we distinguish between event and evetDaq. This is due to historical reasons, since in the real data the events always come with a number, while in the MC we did not have such unique number.

This is totally confusing and should be solved like this:

MC events should also come with a unique number.
The field evtDaq should be eliminated. In this way, we only have one event number to care for, the one that goes both to the persistent and transient rep of the data.

Agree?

I will continue the discussion later. Please, let's agree on the issues above. Also, @gonzaponte, @jacg, @neuslopez, @jmbenlloch take a look to my branch PmapWriter, for a working example of a writer. The writer includes a test, but care has to be taken when modifying Irene to add it (I will do it shortly).

More soon.

Error in df_to_pmaps_dict when reading an empty file

I have encountered a bug when trying to read back pmaps produced with the new pmap writer interface. For some reason, the file is empty, which means that the read_pmaps function in reco/pmaps_functions.py returns empty data frames. This causes the following error in df_to_pmaps_dict:

ValueError: Buffer dtype mismatch, expected 'int' but got 'double'

I think by default, pandas Data Frames tend to use doubles, unless the data explicitly says otherwise. Therefore, the "event" column, which should be of type int, does not contain the type information. This causes the cython function to crash because it expects an int array.

Question:

What caused the empty file? The new pmap-writer interface or an external error? I think it is the latter. @jmbenlloch, can you confirm (file number 8)?

If nobody complains, I will create a new PR with a simple protection in df_to_pmaps_dict (as well as in df_to_s2si_dict) and a couple of new tests to ensure we don't fall into this trap again.

Write tests for `diomira_run.py`

Bugs crept into this file in #48 because no tests seem to exist for it!

Irene error: 'IndexError: Out of bounds on buffer access (axis 0)'

@jahernando asked me to run Irene over Na data (run 2948 from last campaign). @gonzaponte send me a set of parameters to do so. This is the configuration file:

PATH_IN /analysis/2948/hdf5/data
PATH_OUT /analysis/2948/hdf5/irene
FILE_IN run_2948.gdc*next.001.next1el_2948.root.h5
FILE_OUT run_2948.gdcsnext.001.next1el_2948.root.h5
COMPRESSION ZLIB4

RUN_NUMBER 2948

NPRINT 100
PRINT_EMPTY_EVENTS 1

## JJ
NBASELINE 38000
THR_TRIGGER 5

## JJ
NMAU 100
THR_MAU 3

## JJ
THR_CSUM_S1 0.2
THR_CSUM_S2 1.0

## JJ
NMAU_SIPM 100
THR_SIPM 3.5

S1_TMIN 99
S1_TMAX 101
S1_STRIDE 4
S1_LMIN 4
S1_LMAX 16

S2_TMIN 101
S2_TMAX 1200
S2_STRIDE 40
S2_LMIN 60
S2_LMAX 100000

THR_SIPM_S2 20

NEVENTS 0
RUN_ALL True

Data files can be copied from Canfranc or from here:
https://www.cern.ch/jobenllo/run_2948.gdc1next.001.next1el_2948.root.h5
https://www.cern.ch/jobenllo/run_2948.gdc2next.001.next1el_2948.root.h5

This is the error I'm getting:

$ python $ICDIR/cities/irene.py -c /home/jmbenlloch/productions/configs_new//irene_r2948_1.conf
COMPRESSION            => ZLIB4
FILE_IN                => /analysis/2948/hdf5/data/run_2948.gdc*next.001.next1el_2948.root.h5
FILE_OUT               => /analysis/2948/hdf5/irene/run_2948.gdcsnext.001.next1el_2948.root.h5
INFO                   => False
NBASELINE              => 38000
NEVENTS                => 0
NMAU                   => 100
NMAU_SIPM              => 100
NPRINT                 => 100
PATH_IN                => /analysis/2948/hdf5/data
PATH_OUT               => /analysis/2948/hdf5/irene
PRINT_EMPTY_EVENTS     => 1
RUN_ALL                => True
RUN_NUMBER             => 2948
S1_LMAX                => 16
S1_LMIN                => 4
S1_STRIDE              => 4
S1_TMAX                => 101
S1_TMIN                => 99
S2_LMAX                => 100000
S2_LMIN                => 60
S2_STRIDE              => 40
S2_TMAX                => 1200
S2_TMIN                => 101
SKIP                   => 0
THR_CSUM_S1            => 0.2
THR_CSUM_S2            => 1.0
THR_MAU                => 3
THR_SIPM               => 3.5
THR_SIPM_S2            => 20
THR_TRIGGER            => 5
VERBOSITY              => 20

                 Irene will run a max of -1 events
                 Input Files = ['/analysis/2948/hdf5/data/run_2948.gdc1next.001.next1el_2948.root.h5', '/analysis/2948/hdf5/data/run_2948.gdc2next.001.next1el_2948.root.h5']
                 Output File = /analysis/2948/hdf5/irene/run_2948.gdcsnext.001.next1el_2948.root.h5
                          

                 S1 parameters S12Params(tmin=99000.0, tmax=101000.0, stride=4, lmin=4, lmax=16, rebin=False)

                 S2 parameters S12Params(tmin=101000.0, tmax=1200000.0, stride=40, lmin=60, lmax=100000, rebin=True)

                 S2Si parameters
                 threshold min charge per SiPM = 3.5 pes
                 threshold min charge in  S2   = 20 pes
                          
Opening /analysis/2948/hdf5/data/run_2948.gdc1next.001.next1el_2948.root.h5... Events in file = 132
# PMT                  => 12
# SiPM                 => 1792
PMT WL                 => 32000
SIPM WL                => 800
Traceback (most recent call last):
  File "/home/jmbenlloch/IC/invisible_cities/cities/irene.py", line 376, in <module>
    IRENE(sys.argv)
  File "/home/jmbenlloch/IC/invisible_cities/cities/irene.py", line 366, in IRENE
    nevt = irene.run(nmax=nevts, print_empty=CFP.PRINT_EMPTY_EVENTS)
  File "/home/jmbenlloch/IC/invisible_cities/cities/irene.py", line 249, in run
    CWF = self.deconv_pmt(pmtrwf[evt])
  File "/home/jmbenlloch/IC/invisible_cities/cities/base_cities.py", line 319, in deconv_pmt
    thr_trigger = self.thr_trigger)
  File "invisible_cities/sierpe/blr.pyx", line 91, in invisible_cities.sierpe.blr.deconv_pmt (invisible_cities/sierpe/blr.c:3912)
  File "invisible_cities/sierpe/blr.pyx", line 108, in invisible_cities.sierpe.blr.deconv_pmt (invisible_cities/sierpe/blr.c:3667)
  File "invisible_cities/sierpe/blr.pyx", line 38, in invisible_cities.sierpe.blr.deconvolve_signal (invisible_cities/sierpe/blr.c:2430)
IndexError: Out of bounds on buffer access (axis 0)

Any idea?

Diomira speed

We have now a rather stable version of DIOMIRA which runs at ~0.5 events/second in a fast processor (I-7 at 3.1 GHz) on MacOS X.

I propose to investigate:

dependence of speed with system (e.g, LinUX, lxplus machines at CERN)
dependence on compressor parameters. Currently RWF are 150 kb per event with ZLIB4 and MCRD files are 110 kb per event. Any major speed gain if we decrease a bit compression? Any major speed lost if we increase it?

bug in irene (store_pmaps)

A bug has been found in irene (store_pmaps), which caused that S2 were not written in the PMAPS file. The bug was introduced when re-factoring the run method, creating the store_pmaps functions. A fix (and a test) will be submitted shortly

Describe management strategy for notebook input and output

Notebooks produce html files (output) and consume data files (input). Neither data files nor html files belong to a version controlled repository. Instead they will be placed in an independent non-version controlled server baptised here forth as the gallery. Document the details and the philosophy behind it.

Need to pick a licence for the code

How to handle static html produced by NBs

I have investigated a bit GitPages and does not appear to be a specially intelligent solution.

The reason is this: GitPages is yet another GitHub repository, with some tweaks to present pretty pages if the use wants to put the work on it. But they are still keeping the .html docs under GitHub!

Thus, using gitpages is not different conceptually from just uploading the html files to our repository and take advantage of the git capability to read them directly. If we are going to put html pages in git, better to do so in our own repository that start some new repository just for that (with all the added overhead).

Perhaps they can be put somewhere else, but is not obvious where.

My proposal is that we keep the html in our repository unless we find a better solution soon. The whole purpose of the NBs is that they can be read, and currently, since we are stripping them and not pushing the html they are not useful.

Automate extraction of version number from git in setup.py

Analysis: argument in favor of independent repository

The last couple of days have made obvious an important difference between the workflow in IC and in Notebooks, which can be summarised (simplified) like this:

In IC the developer "follows the main repository". This makes sense, since IC is primarily concerned about reconstruction, from low level objects to high level objects. This is done by chaining cities, and adding auxiliary functions to core. Plus tests. The developers are somewhat generalists, and can work often in any aspect of the reconstruction.

The workflow calls for rebasing often into the main directory, so that there is never too much divergence between the fork of a given developer and the main repository.

Instead, as we have "discovered" in the last couple of days, the workflow for Notebooks is different. Developers are mainly concerned with their own analysis, and in general not interested in what is happening in other analysis. Thus, a workflow with a Develop and a Deliver branch seems OK and git gives us the tools to facilitate this.

It turns out that Notebooks mimic quite well what will happen in Analysis. In fact, the developers will most likely prepare a draft version of their analysis in the NBs and then translate to scripts. But still the WF is the same. They will want to work for a long time in a Develop branch, running in their own fork and eventually export the results of their work to Deliver.

This brings me to the proposal of creating a separate directory for analysis. It has no real cost, since the Analysis repo could import all IC modules by simply setting PYTHONPATH as we do for Notebooks. It would separate Analysis (which has less intrinsic tests, more contributors, and is in general messier) from Reconstruction (that must be crystal clear), and it would permit a simple workflow (with no rebases, etc.) for people just interested in analysis.

Your opinion, @jacg and @jmbenlloch

investigate documentation tools, choices and structure

We need to add a /docs directory to the repository. Proposal is to add it at the top level (next to invisible_cities), possibly with some extra sub-structure (such as /cities, /analysis, /core, etc.)

Open questions:

JJ argues that adding clean, well structured jupyter nb as docs is OK and good. They need to be reviewed anyway, before entering repository.
Are we using Sphinx for docs? Any other tool?
What would be the structure of the /docs recommended by Sphinx?
Any other issues, suggestions?

Splitting core in two directories (core and reco)

The directory core lumps together rather different animals. I propose to split of core in two separate directories:
a) core which would keep what I think are really core modules: configure, core_functions, exceptions, log_config, random_sampling and system_of_units, mpl_f (plus testing modules and cython modules)
b) reco which would, for now, get all the rest: nh5, params, peak_functions, pmaps_functions, tbl_f, wfm_f, mctrk_f (plus testing and cython)
Opinons?

Wrong type in last evt in s12df_to_s12l

The function is supposed to return a dictionary with dictionaries holding two np.ndarrays. When looping over the full input DataFrame the last event is not cast to np.ndarray and remains as a list.

RuntimeWarnings in test_rebin_wf2

In Python 2 test_rebin_wf2 typically gives 3 RuntimeWarnings; in Python 3 the same test typically gives about 100 such warnings!

This could well be a symptom of a failure to deal correctly and completely with the differences in division between Python 2 and 3.

In any case, someone who understands what the test tries to achieve should

figure out whether these warnings are a symptoms of a bug
silence them.

Create a bash command argument completion file for `manage.sh`

Our manage.sh script is acquiring (#77) a large number of long and inconvenient to type command arguments. It would benefit immensely from completion help from the shell. I get the impression that most of us are using bash or some derivative. Let's provide a file which teaches bash how to complete manage.sh's arguments for us.

How this can be done seems to be described here.

Once made, this file can be sourced to enable the completion. You could do this in your .bashrc or similar.

Command line completion on OS X does not work.

We need to investigate why Mac users experience problems with command line completion of manage.sh arguments.

PMAPs interface

The PMAPs interface we currently have is an accidental by-product of the implementation choices we happen to be going with right now: no effort has been made to design a decent interface for PMAPs.

The presence or absence of zeros (whether it be in the HDF5 file or in the datastructure used in the transient representation) in the S2Sis is a matter of implementation: the client doesn't care how and where the information is stored, as long as it is accessible conveniently and quickly.

Similarly, whether times or sample numbers are stored in the energy plane components of the pmaps and/or the tracking plane components, is purely a detail of implementation: the client just needs to access information as quickly and conveniently as possible.

Our choice to store times and energies in the S12s, but only energies in the S2Sis, is an implementation detail which we currently throw into the client's face.This choice of implementation leaks out through the interface: Compare and contrast how to access energies and their corresponding times in the different planes, when you are holding on to some peak p:

S12s:
- energy: p.E[n] OK.
- time p.t[n] Nice symmetry.
S2Sis:
- energy: p[n] Eh? Where did the .E go?
- time: erm ... a_related_but_different_p[event_no].t[n]? WTF!

This is an abominable, inconsistent interface with a horrible abstraction leak. It is like it is because the implementors (that's us!) haven't bothered to think about the client (that's also us!) interface. It needs to be killed with fire!

What needs to be done:

Collect use cases, showing the sorts of things we expect clients to do with PMAPs.
Design interface, one which allows the client to use PMAPs cleanly, efficiently and consistently.
Hide implementation behind interface.

If y'all provide me with use cases, I can tackle the rest in April. Of course, this is best done con peras, but the collection of use cases can start right now. The collection of use cases is also the part where I can probably contribute least.

There is lots more to be said on this topic, but this should be enough for a start.

Error in Irene

I have run Irene on MC Kr at CERN, but 41 of the jobs had errors during processing. All of them are like this:

Opening ./ic_dst_NEXT_v0_08_06_Kr_ACTIVE_7_0_7bar_MCRD_10000.root.h5... Events in file = 10000
# PMT                  => 12
# SiPM                 => 1792
PMT WL                 => 48000
SIPM WL                => 1200
event in file = 99, total = 100
error
Traceback (most recent call last):
  File "/afs/cern.ch/work/j/jobenllo/software/IC/invisible_cities/cities/irene.py", line 548, in <module>
    IRENE(sys.argv)
  File "/afs/cern.ch/work/j/jobenllo/software/IC/invisible_cities/cities/irene.py", line 540, in IRENE
    nevt = fpp.run(nmax=nevts, store_pmaps=True)
  File "/afs/cern.ch/work/j/jobenllo/software/IC/invisible_cities/cities/irene.py", line 429, in run
    rebin  = False)
  File "invisible_cities/core/peak_functions_c.pyx", line 105, in invisible_cities.core.peak_functions_c.find_S12 (invisible_cities/core/peak_functions_c.c:5121)
  File "invisible_cities/core/peak_functions_c.pyx", line 134, in invisible_cities.core.peak_functions_c.find_S12 (invisible_cities/core/peak_functions_c.c:4232)
IndexError: Out of bounds on buffer access (axis 0)

I have checked the logs to find a file in which the error appears soon, in particular with:

lxplus.cern.ch:/eos/experiment/next/dataic/DiomiraMS/Kr/ic_dst_NEXT_v0_08_06_Kr_ACTIVE_7_0_7bar_MCRD_10000.root.h5

The error occurs before event 200. The config file I am using is this one:

PATH_IN .
PATH_OUT .
FILE_IN ic_dst_NEXT_v0_08_06_Kr_ACTIVE_7_0_7bar_MCRD_10000.root.h5
FILE_OUT pmaps_dst_NEXT_v0_08_06_Kr_ACTIVE_7_0_7bar_MCRD_10000.root.h5
COMPRESSION ZLIB4

RUN_NUMBER 0
NPRINT 100

NBASELINE 38000
THR_TRIGGER 5

NMAU 100
THR_MAU 3
THR_CSUM 0.5
S1_TMIN 10
S1_TMAX 590
S1_STRIDE 4
S1_LMIN 10
S1_LMAX 20
S2_TMIN 10
S2_TMAX 1190
S2_STRIDE 40
S2_LMIN 100
S2_LMAX 100000
THR_ZS 5
THR_SIPM_S2 25

NEVENTS 20
RUN_ALL True

Any idea @jjgomezcadenas ?

PMAP Writer

After a quick discussion, we believe that there is a handy solution to separate the functionality of computing PMAPS from that of saving them to file. We propose to write a PMapWriter beast (aka class) that takes care of that core and can be called either by Irene or other clients. We propose to open a specific branch for this. @jmbenlloch has finger-volunteered to take this job.

Investigate how plot functions might be tested

Our code base contains functions whose purpose is to create plots. These are currently untested. Options include

not test them at all
ad-hoc test them, perhaps in notebooks
try to write proper automated tests

@jacg is to investigate the feasibility of option 3

@jjgomezcadenas please make a list of all such functions. (I would do it myself, but given that matplotlib is sometimes used in our code to perform calculations rather than to make plots, I'm likely to get it wrong :-)

script manage.sh behaves in an unexpected way in Mac OS X

source manage.sh c

Unrecognized command: c

Usage:

source -bash install_and_check X.Y
soruce -bash install X.Y
source -bash work_in_python_version X.Y
bash -bash make_environment X.Y
bash -bash run_tests
bash -bash compile_and_test
bash -bash download_test_db
bash -bash clean
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
Deleting expired sessions...none found.

[Process completed]

Record conda and pip package versions for reproducibility

In order to be able to reproduce old runs, it is not enough to check out the particular version of IC with which the run was performed, but one must also ensure that exactly the same versions of packages installed with conda and pip (listed in the make_environment function of manage.sh) are used.

This means that we need an easy way to record and store the versions of such packages.

Two small changes in manage.sh

Here is a proposal for two small changes in manage.sh (I can do them, but let's agree that this is OK first).

in function ic_env

echo setting PYTHONPATH
export PYTHONPATH=$ICTDIR:$PYTHONPATH

has the side effect that each time the script is executed (and it can be executed many time in the same region) the variable PYTHONPATH grows (although it points always to the same place, e.g, the top IC directory).

On the other hand, I believe that PYTHONPATH should point exclusively to top level IC and nothing else when working in IC (this is the situation currently). Then, one could simply do instead:

echo setting PYTHONPATH
export PYTHONPATH=$ICTDIR

In the same function:

echo setting ICDIR
export ICTDIR=pwd

echo setting ICTDIR
export ICDIR=$ICTDIR/invisible_cities/

echo setting PYTHONPATH
export PYTHONPATH=$ICTDIR:$PYTHONPATH

I think it would be better:

export ICTDIR=`pwd`

echo ICDIR has been set to $ICDIR

(etc for all the others)

Error installing magit in Mac 10.10.2

We have been trying to configure magit in Paola's emacs but it doesn't work. This is the error we are getting:

Warning (el-get): Your Emacs doesn't support HTTPS (TLS).
Warning (el-get): Failed to retrieve emacswiki package list: .
Warning (initialization): An error occurred while loading `/Users/paola/.emacs':

Symbol's value as variable is void: package-archive-contents

To ensure normal operation, you should investigate and remove the
cause of the error in your initialization file.  Start Emacs with
the `--debug-init' option to view a complete error backtrace.

Any idea how to solve that?

Structure of the analysis repository (anatomy of ICARO)

The analysis repository (ICARO) will contain a number of essentially independent analysis (often but not exclusively based on calibration radioactive sources). Describe the organisation of the repository and the proposed workflow.

Should `manage.sh` be replaced with a makefile?

If we can get make to work with the commands than need sourcing, then we should probably do it.

Continuous delivery

I would like us to state in writing what we have been saying off line:

IC will be continuously delivered via the master branch in nextic/IC.

Consequently, there will be no release branch, no release tags, etc.

Another consequence of this is that we must be extremely strict about keeping the master branch usable at all times.

Acknowledging this here should be enough for now, but eventually such a statement has to make it into the manual, so this issue should be closed automatically by the merge of the PR which adds the statement to the manual (unless we decide against continuous delivery).

Module names do not conform to PEP8

PEP8 suggests that

Modules should have short, all-lowercase names.

advice which IC does not follow at present. If we are going to follow this advice, we should probably make the change before opening up the repository to the rest of the team.

Investigate matplotlib dependence in diomira_test

JJ really should push his ssh key to github

Do it now!

Wrong number of slices in S2Si dict

We have a new bug (thanks to @neuslopez for pointing it out). S2Si dicts are filled with zeros when a slice is empty for any charged sensor. However, this is done just for those slices before the last one present in the input; trailing zeros (empty slices at the end) are not included.

Solving this is non-trivial since the information of the actual number of slices is missing in the input (it is stored in the S2-PMT pmaps). My proposal is that, as suggested in #165, we store all empty slices as zeros in disk. This way, the problem is moved to the writing side, which has full information.

Note: we are just fine for now since we are not requiring slice info in the SiPMs.

Cities init method arguments

We need to discuss whether it is more convenient for the cities constructors to list just their own arguments and take the ones belonging to the classes they inherit from as keywords or to list them all and make it more explicit.

Describe the "standard branch-fork-main-repository" development cycle

I would like to document the standard sequence that a developer will follow to implement his contributions to the main repository. Here is the flow (tentative).

Developer opens a branch in his local (dev-branch)
Developer commits a sets of changes to dev-branch, until the task she has set to do is completed (n commits to dev-branch).
Developer squashes commits to n1 < n (offen n1 = 1), to clean up and tidy up the history of that particular development.
Developer pushes dev-branch to her fork (origin/dev-branch) and waits for Travis to approve.
Developer makes a pull-request.
If pull request accepted dev-branch is rebased to master in main rep (icmaster)
Developer now fetches icmaster
Then rebases dev-branch on top of icmaster
Then deletes dev-branch
Finally deletes origin/dev-branch.

Could you experts please:

Confirm that sequence is correct, edit, add or subtract to sequence above (Jacek)
Describe how to perform tasks within magit (Jacek)
Describe how to perform tasks with raw git commands (JM? or simply cut-and-paste magit--git commands).

We need to nail down this procedure before opening to other collaborators or we will be in trouble...

Non-existing event introduced when converting pmaps DataFrames to dictionaries

The function cdf_to_dict in pmaps_functions_c.pyx always returns event #0 even when it is not present in the input DataFrame.

numpy compatibility

This will give an error in numpy 1.11 and not in numpy 1.12
np.random.normal(scale=[.2,0,4])

I believe those of us that ran
source manage.sh install_and_check X.Y
before Jan 15th have v1.11 and those who ran it after have v1.12

At the moment, this means Travis-Ci will show that Anastasia passes all her tests, but if someone who started working in IC before Jan 15th tries to fetch my branch and test it, they will get this error:

ValueError: scale <= 0
I can write some uglier code to accommodate the old version of numpy, or do we all want to use the new version of numpy? Is this a symptom of a larger issue? Or are we all just responsible for continually updating our packages used by our minicondas?

Manage.sh does not work in a Mac

We have tried to install IC using source manage.sh install_and_check 3.5 but wget is not installed, so miniconda is not downloaded and we cannot continue with the installation.

It seems that curl is available in Mac, we should check if it installed by default in both Mac & Linux, in that case maybe we should change manage.sh to use curl.