mashephe / amptools Goto Github PK

A utility library for performing amplitude analysis on particle physics data.

C++ 91.87% Makefile 1.14% Cuda 3.39% C 1.88% TeX 1.40% Python 0.32%

particle-physics data-analysis maximum-likelihood

amptools's Introduction


    ^
   / \
  /---\
 /     \
/       \ MP
 -------
    |
    |
    | OOLS


AmpTools is library to facilitate performing unbinned maximum likelihood
fits of experimental data to a coherent sum of amplitudes.  For
additional documentation refer to the AmpTools_User_Guide.pdf file
distributed with this code.

If you use AmpTools for data analysis that results in a publication,
please cite the source using the DOI specific to the version you used which
can be located by following the general DOI:

doi.org/10.5281/zenodo.5039377

The entire source tree, including the Tutorial, should build from this
top level directory by invoking make.  Prior to building, be sure
that root-config is in your path and check/adjust the Makefile.settings
as needed.

Three modules are included with the distribution and individual README files,
are contained within each module.  These README files should be
referenced for details about various releases.

AmpTools:  This is the main AmpTools library.  Once compiled it includes
no executable code, but provides functionality and an interface that the
user can utilize to perform analyses.

Tutorials:  This contains a couple of examples of how to utilize the AmpTools
library.  We try to keep these up to date as AmpTools develops.  It is
recommended that user explore the Dalitz tutorial.

AmpPlotter:  This is an optional package that provides a GUI interface
for viewing the projections of a fit.  It enables visualization of the
contributions of various amplitudes to the fit.

amptools's People

Contributors

Stargazers

Watchers

Forkers

hep-dog ch2ohch2oh gabyrod7 zabaldwin maltealbrecht nseptian bgrube sergeigribanov denehoffman redeboer duberii

amptools's Issues

Compilation on AlmaLinux9

Cuda is now installed as a module for AlmaLinux9 on cvmfs. This is how to set it up on ifarm9:

module use /cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/modulefiles
module load cuda
export CUDA_INSTALL_PATH=/cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/cuda/11.4.2/

However, compiling AmpTools 0.15.2 with GPU acceleration on this node fails:

make gpu
...
-> Compiling GPUAmpProductKernel.cu
/usr/include/stdio.h(183): error: attribute "__malloc__" does not take arguments

/usr/include/stdio.h(195): error: attribute "__malloc__" does not take arguments

is it possible to optimize use of memory for generated NI calculation?

At present the framework uses the same data structures, caches, etc. for the calculation of the generated normalization integral matrix as the accepted. In cases where the acceptance is small O(1%) this places an unnecessary burden on memory. The generated NI's never need to be recomputed throughout a fit, so there is no need for the caching mechanisms that are in place for the accepted NI's (which are recomputed if the amplitudes contains free parameters). Maybe there is a way to reduce memory usage for the generated NI's, e.g., compute in blocks of events and sum?

loadEvent in AmpVecs on GPU

It looks like the loadEvent method in the AmpVecs class is commented on out the GPU? Needs investigation.

ROOT friendly classes

There are a couple of AmpTools classes that would be useful to use in ROOT:

ConfigurationInfo
NormIntInterface
FitResults

This is helpful for configuring a fit or managing the results of a fit. The FitResults and NormIntInterface probably need certain member data and constructors excluded by preprocessor macros. One only needs to be able to construct them from text files and retrieve results.

It would be useful also to distribute a logon script that does the appropriate loading.

Plot generation on GPU is broken

When running the Dalitz tutorial on the GPU it appears that using the GPU to generate MC results in garbage. This may be due to the fact that when using the GPU accelerated amplitude manager the calculation of intensities is done on the CPU after the GPU calculated amplitudes are copied back to the CPU. (Fitting does the sum log( I ) calculation entirely on the GPU.) So, GPU MC generation exposes a slightly different path through the code.

optimize memory of accepted MC is reused

For some fits the same accepted MC can be used for multiple reactions. (For example, if each reaction is a different beam polarization state.) In these cases the framework will read in multiple copies of the accepted MC. It would be nice if it could discover the MC is reused and avoid replicating it.

Disable symmetrization

Add a feature that allows users to disable automatic symmetrization of amplitudes in the AmplitudeManager based on the identity of the particles in the reaction declaration.

residual plots in AmpPlotter

Useful for comparing 2D distributions.

improve precision in likelihood calculation

It seems possible to improve the precision of the "data term" in the likelihood calculation. One can add the same constant to ln(I) for every event so that when the sum is performed over all events the average value of the offset ln(I) is about zero. This will maintain maximal precision in the sum.

This offset could be subtracted from the total sum to leave the likelihood unchanged. Depending on how this is implemented one can hide this offset from the user. This may also be useful to shift the likehood to numbers that might be more acceptable to MINUIT.

FitResults ampParMap limitation

The algorithm in FitResults::ampParMap to distinguish the amplitude parameters from the scale parameters needs rewriting to use the ConfigurationInfo. It currently does not allow the case where a scale parameter is also an amplitude parameter.

Renormalize Amps

NormIntInteface assumes that amplitude integrals are not used in the fit, but these integrals are needed to renormalize the amplitudes. In the case of a floating parameter, they must be recalculated at each fit iteration. The NormIntInterface needs to be modified to recognize such a scenario and cache not only the accepted but also the generated MC in order to perform such a calculation.

background subtraction in AmpPlotter

Is there a way to have the AmpPlotter generated background subtracted plots (in the case background is supplied)? Right now the behavior is to overlay data with a stack of background + signal, but maybe it would be useful to use background to subtract from data and compare with signal?

Positive Likelihoods after Fit Minimimization.

I have been trying to perform fits using AmpTools using the group build of halld_sim. The details of the Signal Events are below

Signal MC Events

I am trying to generate 1p state in Positive reflectivity PosRefl.
It is generated in a narrow mass range 1.10 - 1.15 GeV. and t range 0.15-0.30
A total of 10k events are generated.

Phasespace MC Events

1M events are generated in the same range as signal events

Fitting the generated signal

I fit the sample by initializing the production parameters to what was generated. This is for faster convergence.

When I fit, the fit converges with a positive LogLikelihood. Not sure what is going on.

The details of the software version are shown below

##################### Meta Info on software used ##############################
AMPTOOLS_HOME  /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/amptools/AmpTools-0.14.4  ; AMPTOOLS_VERSION  0.14.4
HALLD_SIM_HOME  /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/halld_sim/halld_sim-4.42.0  ; HALLD_SIM_VERSION  4.42.0
which fit : /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/halld_sim/halld_sim-4.42.0/Linux_CentOS7.7-x86_64-gcc4.8.5/bin/fit
which omegapi_plotter : /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/halld_sim/halld_sim-4.42.0/Linux_CentOS7.7-x86_64-gcc4.8.5/bin/omegapi_plotter
###############################################################################

The output log file is at volatile/halld/home/ksuresh/ampToolsFits/neutralb1/ClassicTests/TestingAmpTools/EmulatedSignal/OutputEmulated.log

To reproduce the problem in JLab's ifarm.

source /group/halld/Software/build_scripts/gluex_env_jlab.csh

Then, the source code is at

/lustre19/expphy/volatile/halld/home/ksuresh/ampToolsFits/neutralb1/ClassicTests/TestingAmpTools/EmulatedSignal

Commands to run.

./runAll.csh - will generate Signal, PhaseSpace and then performs fit
./runSignal.csh and ./runPhaseSpace.csh generate Signal and PhaseSpace
runFit.csh to fit the generated Signal and PhaseSpace sample.

Master branch of `AmpTools`

I confirmed that this issue still occurs with the master branch of AmpTools.
To run with a Master version of AmpTools

source /work/halld/ksuresh/MY_HALLD_SIM_CPU2/setup_gluex.csh

and retry the runFit.csh.

kMaxParticles = 7

In the new tag of 0.10.2 kMaxParticles was changed from 7 to 6. Was there a reason to reduce this, or can it be switched back to 7 for the next tag?

nan when parameter initialize missing in MPI

Theere is a report that if the config file does not have initialize commands for amplitudes this will result in a nan production parameter for amplitudes in MPI. This needs some investigation to see if there are obvious mistakes. Maybe the default parameter initialization is not properly conveyed through the MPI framework?

Build without setting paths

It may be useful if the package built using relative paths appropriate for a checkout of the repository. These could then be overridden by user environment variables if desired. A top-level Makefile might also be nice that builds AmpTools, AmpPlotter, and the Tutorial library and executables.

Create PlotGenerator with ConfigurationInfo or other solution to get weight

Some people would like to use the results of a fit to get event-by-event weights for custom plotting of anything from the event four-vectors. Need to come up with a method to accommodate this. Look at what Ryan and Nils were doing and try to find a way to make it more efficient.

Output Filtering

Redefine output streams to avoid overlapping output in MPI environments.

Multiple Register in ATI

If one inadvertently registers an amplitude or data reader in the AmpToolsInterface only the most recent registered one is managed by the interface. This leads resource problems, e.g., too many ROOT files open in the case of data readers since all of the registered data readers are not cleaned up.

ERROR exit needed in ATI for missing DataReader

Around line 194 in the AmpToolsInterface the package will print a warning if the data reader is missing. This could be the case for example, if a user specifies a data reader that is not registered. A warning is not severe enough as the program continues to run. Here there should be an error and exit -- or elsewhere we should catch the error if a user specifies an unregistered data reader and exit.

      if (!dataRdr)
        report( WARNING, kModule ) << "not creating a DataReader for data associated with reaction " << reactionName << endl;
      if (!genMCRdr)
        report( WARNING, kModule ) << "not creating a DataReader for generated MC associated with reaction " << reactionName << endl;
      if (!accMCRdr)
        report( WARNING, kModule ) << "not creating a DataReader for accepted MC associated with reaction " << reactionName << endl;

Normalization for background with both positive and negative weights

Mark Dalton reported issues with the normalization of the amplitudes when he uses signal and background files. Since he does accidental correction and sideband subtraction, both files contain events with positive and negative weights. There are a few pieces of the code which did not foresee this case:

when the events of a background file are read in, all weights are forced to be negative
to get the total number of events in the background sample, the absolute values of the weights are summed up
the normalization term in the likelihood uses the total number of data events, disregarding the weights

I made a branch to fix this issue, but I am not allowed to push it to origin for review. Would it be OK to add my user name? Otherwise, I have to send the patch as a text file.

Add ability to constrain only the phases between amplitudes

The current method to give 2 amplitudes the same phase value requires setting amplitudes to be real, and multiplying them by a phase value. This does not allow for the phase difference to be easily accessed via the FitResults class, and requires one to calculate the phase difference and its error manually.

separate MPI and normal builds

adjust build systems so that libraries that contain MPI, GPU, and normal code are better handled; it is easy now for users to mix variants of these. For example, when linking one often wants to link fitters that support MPI but plotters that do not. This should be better managed.

GPU device in MPI

The GPU manager needs to be improved to select the a different GPU device for multiple MPI processes running on the same machine.

double delete (?) when running with multiple reactions

One gets a bad_alloc or other crash sometimes when running fits with multiple reactions. This happens at the end of the fit and it does not appear to affect the fit results. It seems to arise when the clear() function of the AmpToolsInterface is called.

Malte was able to avoid the bug by changing line 387 to

if (intensityManager(reactionName)) delete m_intensityManagers[irct];

and moving it down to the bottom of that block, after line 391.

The only class that is really cleaning memory is the AmplitudeManager and, while there are some memory gymnastics with the Amplitude classes, it appears everything is handled OK.

error handling in FitResults

It looks like if a config file has extra unused parameters declared, e.g., parameter reactionscale 1.25 , then this might cause undesirable behavior for in functions like intensity() in the FitResults objects. It looks like the generation of the ampParMap will try to fetch values for all parameters, and if these are not setup or used, then it will trigger an error. This is confusing behavior and this incorrect use case should be handled better.

Potential bug in UserDataReader that might never present itself

AmpToolsInterface::resetConfigurationInfo calls AmpToolsInterface::clear

clear deletes DataReader objects stored in m_uniqueDataSets

m_uniqueDataSets stores DataReaders created by m_userDataReaders-> newDataReader

Inside UserDataReader.h the m_dataReaderInstances map needs to be cleared when AmpToolsInterface is cleared otherwise the objects it points to after the first (or second pass?) has been deleted and will not be created again when newDataReader is called again. This will results in a corrupted memory error.

Why would someone call resetConfigurationInfo multiple times you ask? Not in any of the standard tools. I do it with these nifty fits to reset the state multiple times to find a good starting position

format of seed file is incorrect

parameter commands in seed file written by FitResults are missing a newline character -- maybe these should be omitted all together from the seed since they are declaration and initialization command rather than purely initialization commands

expanded assert needed

If the size of the permutation iterator doesn't match the number of particles then the user is feeding four-vectors to the framework that are inconsistent with the number of particles in the reaction vector. This currently causes a failed assertion but only on the GPU. This is an easy mistake to make and should probably be checked for elsewhere with an appropriate message.

disable splash screen

Would be nice to check an environment variable and let the user disable the banner screen to the console that has the AmpTools version, etc..

errors/step size in repeated fits

We need a mechanism to reset the errors and step sizes when one resets parameters. This is done for parameter scans or repeated fits with random starting points. If one fit fails the step sizes become inappropriate to ensure reliable performance in subsequent fits.

Looks like a call to maybe mnrset might be needed somewhere.

obscure crash when reading in files

We need to check that if a different file, e.g., a .cfg file, is passed in instead of a .fit file then this doesn't generate an obscure crash. A problem was reported where if one runs:

$DALITZ/DalitzExe/plotResults dalitz1.cfg dalitz1.fit dalitz1.root

as instructed in the tutorial, then a strange crash occurs.

$DALITZ/DalitzExe/plotResults dalitz1.fit dalitz1.root

produces the desired outcome.

Setup ATI based on FitResults

Add functionality in the ATI to set the parameters in AmplitudeManager and related classes based on the results of the fit instead of what is in the configuration file. This would allow easier MC generation based on the output of a fit.

Fixing stacks and 2D histograms in the plotter

The stacked 2D histograms are relatively useless in the plotter. This commonly comes up when wanting to compare, e.g., data to a sum of signal + background in some 2D plane. The problem is that the plotter draws the signal + background a s 2D stack which is not helpful. It would be better to create a single 2D histogram that is the sum and draw it instead. This requires a little reworking of the PlotFactory class.

hm12acc vs hm12gen

Hello developer,

On page 11 of the doc (https://github.com/mashephe/AmpTools/blob/master/Tutorials/Dalitz/doc/doc.pdf), it says hm12gen is the acceptance corrected result. However, from the name, hm12acc seems to be the acceptance corrected result. Is that a typo or I misunderstood something?

Thanks,
Dazhi

Problem with DataReaderBootstrap in current master

I am loading the same MC sample for 4 different orientations in my SDME analysis. I wanted to profit from the recent PR #37 to reuse the memory, but I see a segmentation violation connected to the bootstrap data reader with the current master:

===========================================================
#5 0x000000000049a072 in ROOTDataReaderBootstrap::~ROOTDataReaderBootstrap (this=0x7ffdb8176800, __in_chrg=) at libraries/AMPTOOLS_DATAIO/ROOTDataReaderBootstrap.cc:93
#6 0x000000000043a4d2 in main (argc=, argv=) at programs/AmplitudeAnalysis/fit/fit.cc:307

Interestingly enough, this happens even without requesting the bootstrap data reader. Here are two config files:
/work/halld2/home/aaustreg/tmp/4matt/bootstrap/BS_123.cfg (for bootstrapping)
/work/halld2/home/aaustreg/tmp/4matt/bootstrap/test.cfg (standard fit)

The crash happens with and without gpu acceleration. It works without problems for the latest release 0.13.1.

How does scale work in the config file?

It is not clear how the scale keyword in the configuration file works. I have a amplitude like this

T_tot = a_1 * T_1 + a_2 * T_2

where Ts are my amplitudes and as are the coefficients. It turns out the fitted results depend on the number of events I have, i.e. more events produce larger fitted coefficients. What I want to do is to keep the magnitude of as normalized but I am not sure if this is possible with the scale keyword.

likelihood in ATI is not complete

It looks like the likelihood return by the AmpToolsInterface is not the full likelihood in some cases. It sums the likelihood contributions for various reactions. But there may be contributions to the likelihood from GaussianBound parameters or user-defined likelihood contributions. These will not be included.

Acceptance-corrected Yield in the plotter

To evaluate the fit quality (for example, for SDME analyses), measured distributions and MC distributions weighted with fit results are compared as seen in FIG. 6 in
https://halldweb.jlab.org/DocDB/0055/005576/015/Paper_Draft__rho_SDMEs_2233003.pdf

But at least for this rho-meson analysis, the distributions are determined mainly by acceptance effects and it's very hard to tell whether the fit is really good or not by just seeing this comparison.
I think the comparison between the acceptance-corrected yield and the model is somehow better when it comes to the evaluation of the fit.
Here,

acceptance = (#accepted MC events weighted with fit results) / ($generated MC events weighted with fit results)
the model = generated MC event distribution weighted with fit results

For example, in the plot c) of FIG. 6 in the above rho-meson paper draft, this comparison gives an impression that the fit works well because the discrepancy looks small. But in fact, when you check the acceptance-corrected yield vs the model, clear systematic discrepancies are observed.

In short, I think it's better to check this acceptance-corrected yield ( = ( data - background ) / acceptance ) in the plotter, and what is the easiest way to achieve this?

I'd like to hear your opinions and raise this feature request if this is not implemented.
Thanks!

Keigo Mizutani

problems with checks on weights in MPI jobs

When running MPI jobs the checks on sums of weights in the dataTerm() method of the Likelihood calculator get applied to the subset of events on the particular node. In reality, they should only be applied to the sums over all nodes on the leader job.

avoid loading and unloading MC when doing successive fits

If one is doing many fits in succession, trying to "finalize" each fit will force a flush/reload of the MC. For some applications, there is a non-trivial amount of time needed to load data. If there is sufficient memory, there is no need to flush the MC once loaded. In addition there are reports of GPU malloc errors after many iterations. Perhaps GPU device memory becomes fragmented and there no large blocks available as needed for some calculations. The code should be modified to allow the generated MC to be persistent in memory.

write out fit results as fit proceeds

Add an option that will write out the FitResults text file periodically during a long fit.