mashephe / amptools Goto Github PK
View Code? Open in Web Editor NEWA utility library for performing amplitude analysis on particle physics data.
A utility library for performing amplitude analysis on particle physics data.
^ / \ /---\ / \ / \ MP ------- | | | OOLS AmpTools is library to facilitate performing unbinned maximum likelihood fits of experimental data to a coherent sum of amplitudes. For additional documentation refer to the AmpTools_User_Guide.pdf file distributed with this code. If you use AmpTools for data analysis that results in a publication, please cite the source using the DOI specific to the version you used which can be located by following the general DOI: doi.org/10.5281/zenodo.5039377 The entire source tree, including the Tutorial, should build from this top level directory by invoking make. Prior to building, be sure that root-config is in your path and check/adjust the Makefile.settings as needed. Three modules are included with the distribution and individual README files, are contained within each module. These README files should be referenced for details about various releases. AmpTools: This is the main AmpTools library. Once compiled it includes no executable code, but provides functionality and an interface that the user can utilize to perform analyses. Tutorials: This contains a couple of examples of how to utilize the AmpTools library. We try to keep these up to date as AmpTools develops. It is recommended that user explore the Dalitz tutorial. AmpPlotter: This is an optional package that provides a GUI interface for viewing the projections of a fit. It enables visualization of the contributions of various amplitudes to the fit.
Cuda is now installed as a module for AlmaLinux9 on cvmfs. This is how to set it up on ifarm9:
module use /cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/modulefiles
module load cuda
export CUDA_INSTALL_PATH=/cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/cuda/11.4.2/
However, compiling AmpTools 0.15.2 with GPU acceleration on this node fails:
make gpu
...
-> Compiling GPUAmpProductKernel.cu
/usr/include/stdio.h(183): error: attribute "__malloc__" does not take arguments
/usr/include/stdio.h(195): error: attribute "__malloc__" does not take arguments
At present the framework uses the same data structures, caches, etc. for the calculation of the generated normalization integral matrix as the accepted. In cases where the acceptance is small O(1%) this places an unnecessary burden on memory. The generated NI's never need to be recomputed throughout a fit, so there is no need for the caching mechanisms that are in place for the accepted NI's (which are recomputed if the amplitudes contains free parameters). Maybe there is a way to reduce memory usage for the generated NI's, e.g., compute in blocks of events and sum?
It looks like the loadEvent method in the AmpVecs class is commented on out the GPU? Needs investigation.
There are a couple of AmpTools classes that would be useful to use in ROOT:
ConfigurationInfo
NormIntInterface
FitResults
This is helpful for configuring a fit or managing the results of a fit. The FitResults and NormIntInterface probably need certain member data and constructors excluded by preprocessor macros. One only needs to be able to construct them from text files and retrieve results.
It would be useful also to distribute a logon script that does the appropriate loading.
When running the Dalitz tutorial on the GPU it appears that using the GPU to generate MC results in garbage. This may be due to the fact that when using the GPU accelerated amplitude manager the calculation of intensities is done on the CPU after the GPU calculated amplitudes are copied back to the CPU. (Fitting does the sum log( I ) calculation entirely on the GPU.) So, GPU MC generation exposes a slightly different path through the code.
For some fits the same accepted MC can be used for multiple reactions. (For example, if each reaction is a different beam polarization state.) In these cases the framework will read in multiple copies of the accepted MC. It would be nice if it could discover the MC is reused and avoid replicating it.
Add a feature that allows users to disable automatic symmetrization of amplitudes in the AmplitudeManager based on the identity of the particles in the reaction declaration.
Useful for comparing 2D distributions.
It seems possible to improve the precision of the "data term" in the likelihood calculation. One can add the same constant to ln(I) for every event so that when the sum is performed over all events the average value of the offset ln(I) is about zero. This will maintain maximal precision in the sum.
This offset could be subtracted from the total sum to leave the likelihood unchanged. Depending on how this is implemented one can hide this offset from the user. This may also be useful to shift the likehood to numbers that might be more acceptable to MINUIT.
The algorithm in FitResults::ampParMap to distinguish the amplitude parameters from the scale parameters needs rewriting to use the ConfigurationInfo. It currently does not allow the case where a scale parameter is also an amplitude parameter.
NormIntInteface assumes that amplitude integrals are not used in the fit, but these integrals are needed to renormalize the amplitudes. In the case of a floating parameter, they must be recalculated at each fit iteration. The NormIntInterface needs to be modified to recognize such a scenario and cache not only the accepted but also the generated MC in order to perform such a calculation.
Is there a way to have the AmpPlotter generated background subtracted plots (in the case background is supplied)? Right now the behavior is to overlay data with a stack of background + signal, but maybe it would be useful to use background to subtract from data and compare with signal?
I have been trying to perform fits using AmpTools using the group
build of halld_sim
. The details of the Signal Events are below
1p
state in Positive reflectivity PosRefl
.1.10 - 1.15 GeV
. and t range 0.15-0.30
When I fit, the fit converges with a positive LogLikelihood
. Not sure what is going on.
The details of the software version are shown below
##################### Meta Info on software used ##############################
AMPTOOLS_HOME /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/amptools/AmpTools-0.14.4 ; AMPTOOLS_VERSION 0.14.4
HALLD_SIM_HOME /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/halld_sim/halld_sim-4.42.0 ; HALLD_SIM_VERSION 4.42.0
which fit : /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/halld_sim/halld_sim-4.42.0/Linux_CentOS7.7-x86_64-gcc4.8.5/bin/fit
which omegapi_plotter : /group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/halld_sim/halld_sim-4.42.0/Linux_CentOS7.7-x86_64-gcc4.8.5/bin/omegapi_plotter
###############################################################################
The output log file is at volatile/halld/home/ksuresh/ampToolsFits/neutralb1/ClassicTests/TestingAmpTools/EmulatedSignal/OutputEmulated.log
To reproduce the problem in JLab's ifarm.
source /group/halld/Software/build_scripts/gluex_env_jlab.csh
Then, the source code is at
/lustre19/expphy/volatile/halld/home/ksuresh/ampToolsFits/neutralb1/ClassicTests/TestingAmpTools/EmulatedSignal
Commands to run.
./runAll.csh
- will generate Signal
, PhaseSpace
and then performs fit
./runSignal.csh
and ./runPhaseSpace.csh
generate Signal
and PhaseSpace
runFit.csh
to fit the generated Signal
and PhaseSpace
sample.AmpTools
I confirmed that this issue still occurs with the master branch of AmpTools
.
To run with a Master version of AmpTools
source /work/halld/ksuresh/MY_HALLD_SIM_CPU2/setup_gluex.csh
and retry the runFit.csh
.
In the new tag of 0.10.2 kMaxParticles was changed from 7 to 6. Was there a reason to reduce this, or can it be switched back to 7 for the next tag?
Theere is a report that if the config file does not have initialize commands for amplitudes this will result in a nan production parameter for amplitudes in MPI. This needs some investigation to see if there are obvious mistakes. Maybe the default parameter initialization is not properly conveyed through the MPI framework?
It may be useful if the package built using relative paths appropriate for a checkout of the repository. These could then be overridden by user environment variables if desired. A top-level Makefile might also be nice that builds AmpTools, AmpPlotter, and the Tutorial library and executables.
Some people would like to use the results of a fit to get event-by-event weights for custom plotting of anything from the event four-vectors. Need to come up with a method to accommodate this. Look at what Ryan and Nils were doing and try to find a way to make it more efficient.
Redefine output streams to avoid overlapping output in MPI environments.
If one inadvertently registers an amplitude or data reader in the AmpToolsInterface only the most recent registered one is managed by the interface. This leads resource problems, e.g., too many ROOT files open in the case of data readers since all of the registered data readers are not cleaned up.
Around line 194 in the AmpToolsInterface the package will print a warning if the data reader is missing. This could be the case for example, if a user specifies a data reader that is not registered. A warning is not severe enough as the program continues to run. Here there should be an error and exit -- or elsewhere we should catch the error if a user specifies an unregistered data reader and exit.
if (!dataRdr)
report( WARNING, kModule ) << "not creating a DataReader for data associated with reaction " << reactionName << endl;
if (!genMCRdr)
report( WARNING, kModule ) << "not creating a DataReader for generated MC associated with reaction " << reactionName << endl;
if (!accMCRdr)
report( WARNING, kModule ) << "not creating a DataReader for accepted MC associated with reaction " << reactionName << endl;
Mark Dalton reported issues with the normalization of the amplitudes when he uses signal and background files. Since he does accidental correction and sideband subtraction, both files contain events with positive and negative weights. There are a few pieces of the code which did not foresee this case:
I made a branch to fix this issue, but I am not allowed to push it to origin for review. Would it be OK to add my user name? Otherwise, I have to send the patch as a text file.
The current method to give 2 amplitudes the same phase value requires setting amplitudes to be real, and multiplying them by a phase value. This does not allow for the phase difference to be easily accessed via the FitResults
class, and requires one to calculate the phase difference and its error manually.
adjust build systems so that libraries that contain MPI, GPU, and normal code are better handled; it is easy now for users to mix variants of these. For example, when linking one often wants to link fitters that support MPI but plotters that do not. This should be better managed.
The GPU manager needs to be improved to select the a different GPU device for multiple MPI processes running on the same machine.
One gets a bad_alloc or other crash sometimes when running fits with multiple reactions. This happens at the end of the fit and it does not appear to affect the fit results. It seems to arise when the clear() function of the AmpToolsInterface is called.
Malte was able to avoid the bug by changing line 387 to
if (intensityManager(reactionName)) delete m_intensityManagers[irct];
and moving it down to the bottom of that block, after line 391.
The only class that is really cleaning memory is the AmplitudeManager and, while there are some memory gymnastics with the Amplitude classes, it appears everything is handled OK.
It looks like if a config file has extra unused parameters declared, e.g., parameter reactionscale 1.25 , then this might cause undesirable behavior for in functions like intensity() in the FitResults objects. It looks like the generation of the ampParMap will try to fetch values for all parameters, and if these are not setup or used, then it will trigger an error. This is confusing behavior and this incorrect use case should be handled better.
AmpToolsInterface::resetConfigurationInfo
calls AmpToolsInterface::clear
clear
deletes DataReader objects stored in m_uniqueDataSets
m_uniqueDataSets
stores DataReaders created by m_userDataReaders-> newDataReader
Inside UserDataReader.h
the m_dataReaderInstances
map needs to be cleared when AmpToolsInterface is cleared otherwise the objects it points to after the first (or second pass?) has been deleted and will not be created again when newDataReader
is called again. This will results in a corrupted memory error.
Why would someone call resetConfigurationInfo
multiple times you ask? Not in any of the standard tools. I do it with these nifty fits to reset the state multiple times to find a good starting position
parameter commands in seed file written by FitResults are missing a newline character -- maybe these should be omitted all together from the seed since they are declaration and initialization command rather than purely initialization commands
If the size of the permutation iterator doesn't match the number of particles then the user is feeding four-vectors to the framework that are inconsistent with the number of particles in the reaction vector. This currently causes a failed assertion but only on the GPU. This is an easy mistake to make and should probably be checked for elsewhere with an appropriate message.
Would be nice to check an environment variable and let the user disable the banner screen to the console that has the AmpTools version, etc..
We need a mechanism to reset the errors and step sizes when one resets parameters. This is done for parameter scans or repeated fits with random starting points. If one fit fails the step sizes become inappropriate to ensure reliable performance in subsequent fits.
Looks like a call to maybe mnrset might be needed somewhere.
We need to check that if a different file, e.g., a .cfg file, is passed in instead of a .fit file then this doesn't generate an obscure crash. A problem was reported where if one runs:
$DALITZ/DalitzExe/plotResults dalitz1.cfg dalitz1.fit dalitz1.root
as instructed in the tutorial, then a strange crash occurs.
$DALITZ/DalitzExe/plotResults dalitz1.fit dalitz1.root
produces the desired outcome.
Add functionality in the ATI to set the parameters in AmplitudeManager and related classes based on the results of the fit instead of what is in the configuration file. This would allow easier MC generation based on the output of a fit.
The stacked 2D histograms are relatively useless in the plotter. This commonly comes up when wanting to compare, e.g., data to a sum of signal + background in some 2D plane. The problem is that the plotter draws the signal + background a s 2D stack which is not helpful. It would be better to create a single 2D histogram that is the sum and draw it instead. This requires a little reworking of the PlotFactory class.
Hello developer,
On page 11 of the doc (https://github.com/mashephe/AmpTools/blob/master/Tutorials/Dalitz/doc/doc.pdf), it says hm12gen
is the acceptance corrected result. However, from the name, hm12acc
seems to be the acceptance corrected result. Is that a typo or I misunderstood something?
Thanks,
Dazhi
I am loading the same MC sample for 4 different orientations in my SDME analysis. I wanted to profit from the recent PR #37 to reuse the memory, but I see a segmentation violation connected to the bootstrap data reader with the current master:
Interestingly enough, this happens even without requesting the bootstrap data reader. Here are two config files:
/work/halld2/home/aaustreg/tmp/4matt/bootstrap/BS_123.cfg (for bootstrapping)
/work/halld2/home/aaustreg/tmp/4matt/bootstrap/test.cfg (standard fit)
The crash happens with and without gpu acceleration. It works without problems for the latest release 0.13.1.
It is not clear how the scale
keyword in the configuration file works. I have a amplitude like this
T_tot = a_1 * T_1 + a_2 * T_2
where T
s are my amplitudes and a
s are the coefficients. It turns out the fitted results depend on the number of events I have, i.e. more events produce larger fitted coefficients. What I want to do is to keep the magnitude of a
s normalized but I am not sure if this is possible with the scale
keyword.
It looks like the likelihood return by the AmpToolsInterface is not the full likelihood in some cases. It sums the likelihood contributions for various reactions. But there may be contributions to the likelihood from GaussianBound parameters or user-defined likelihood contributions. These will not be included.
To evaluate the fit quality (for example, for SDME analyses), measured distributions and MC distributions weighted with fit results are compared as seen in FIG. 6 in
https://halldweb.jlab.org/DocDB/0055/005576/015/Paper_Draft__rho_SDMEs_2233003.pdf
But at least for this rho-meson analysis, the distributions are determined mainly by acceptance effects and it's very hard to tell whether the fit is really good or not by just seeing this comparison.
I think the comparison between the acceptance-corrected yield and the model is somehow better when it comes to the evaluation of the fit.
Here,
For example, in the plot c) of FIG. 6 in the above rho-meson paper draft, this comparison gives an impression that the fit works well because the discrepancy looks small. But in fact, when you check the acceptance-corrected yield vs the model, clear systematic discrepancies are observed.
In short, I think it's better to check this acceptance-corrected yield ( = ( data - background ) / acceptance ) in the plotter, and what is the easiest way to achieve this?
I'd like to hear your opinions and raise this feature request if this is not implemented.
Thanks!
Keigo Mizutani
When running MPI jobs the checks on sums of weights in the dataTerm() method of the Likelihood calculator get applied to the subset of events on the particular node. In reality, they should only be applied to the sums over all nodes on the leader job.
If one is doing many fits in succession, trying to "finalize" each fit will force a flush/reload of the MC. For some applications, there is a non-trivial amount of time needed to load data. If there is sufficient memory, there is no need to flush the MC once loaded. In addition there are reports of GPU malloc errors after many iterations. Perhaps GPU device memory becomes fragmented and there no large blocks available as needed for some calculations. The code should be modified to allow the generated MC to be persistent in memory.
Add an option that will write out the FitResults text file periodically during a long fit.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.