Giter Site home page Giter Site logo

Comments (13)

Sinclert avatar Sinclert commented on September 17, 2024 2

The fix was merged, and version 0.9.3 published

Closing issue.

from madminer.

alexander-held avatar alexander-held commented on September 17, 2024 1

Hi @rbarrue, the changes seem to come from the refactor in #483. The comment above the code you linked (# Filter out callable definitions when saving into HDF5 file) seems to imply that this may be on purpose. In case @Sinclert might have time to take a look or remember more about this, he might know.

What was previously used to refer to the custom observable in the file? If you have a minimal reproducible example of this that you could share, this would be very useful to understand what the behavior should be. I'm happy to help with the Python side of things, but do not know this codebase very well unfortunately.

from madminer.

Sinclert avatar Sinclert commented on September 17, 2024 1

@rbarrue I think I found the "bug":

📝 Context:

After the migration of column-oriented models to object-oriented models described by issue #482 (to which PRs #483, #484 and #485 belong) was completed, both LHEReader and DelphesReader classes are safety-checking the definition of their internal models, to ensure that they do not have SyntaxErrors (NameErrors are actually fine at this stage).

You can see that all reader-based data classes do so:

This standardisation of safety-checks within models creation has caused that, classes that did not include them in the first place (like the LHEReader class), are now doing so (and for good!). You can check how reader classes differ in this approach before Madminer v0.9.0:


🐞 The bug:

Now that both LHEReader and DelphesReader classes use safety-checks under-the-hood, an already existing bug within the Madminer codebase has resurfaced:

Setting the empty string value ("") as default for callable Observable definitions (in this line), is incompatible with safety-checking Observable models at creation (these lines).


🔧 The solution:

The solution would involve using a dummy parseable Python symbol (one that when doing eval(<symbol>) does not raise any errors), so that when safety-checking the callables, no exception is raised. I think the best candidate is None as a string (therefore "None").

Knowing that using the empty string is a problem when safety-checking using Python's eval(), this fix can also be included in other serialization functions, such as:

from madminer.

rbarrue avatar rbarrue commented on September 17, 2024 1

@rbarrue can you confirm whether or not the described bug-fix is enough to solve your problem?

Hi @Sinclert, I'll test the proposed implementation tomorrow and give feedback.

from madminer.

irinaespejo avatar irinaespejo commented on September 17, 2024

Hi @rbarrue

Can you provide us with the snippet where you use add_observable_from_function? Maybe there are useful hints.
Also posting the entire error output can be helpful.

Thank you!

from madminer.

Sinclert avatar Sinclert commented on September 17, 2024

Hey @rbarrue 👋🏻

As Alex pointed out, the hdf5.py interface suffered a big refactor in PR #483. Among the changes, multiple HDF5 keys were renamed, making pre madminer 0.9.0 setting files incompatible with madminer 0.9.0+ HDF5 parsing (as specified in the 0.9.0 release notes). Make sure you have generated your settings with the latest madminer version (0.9.2).

Considering that has been done, I can see that the referenced lines:

# Filter out callable definitions when saving into HDF5 file
observable_defs = [d if isinstance(d, str) else "" for d in observable_defs]
observable_defs = _encode_strings(observable_defs)

were already present before the refactor:

for key in observable_names:
definition = observables[key]
if isinstance(definition, str):
observable_definitions.append(definition.encode("ascii", "ignore"))
else:
observable_definitions.append("".encode("ascii", "ignore"))

So... if you are having problems with an observable definition that worked fine pre v0.9.0, it may be a bug. As my colleagues have pointed out, could you share your set up?

  • What version of madminer are you using to load the settings.
  • What version of madminer was used to save the settings.
  • What custom observables did you include in the madminer settings.

from madminer.

rbarrue avatar rbarrue commented on September 17, 2024

I was running this up until now on MadMiner v0.7.6 (it was working fine there), now I'm rerunning everything (even the setup file creation) on v0.9.2. The calculation of the observables is done properly, since the values are in the 'observations' column in the analysed .h5 files. To mention that this problem appears already in the processing stage if I do lhe.save(shuffle=True).

I'm uploading a minimal working example here with two 50k event .lhe files (sorry for the size, but these were the smallest ones I had), which are analysed and can then be combined at the end. I've already produced a set of analysed .h5 files which you can inspect, and if you want to see the error just run the code as-is, with the process_events functions commented. If you want to reanalyse the files, just uncomment the lines.

@irinaespejo the error is:
File "/cvmfs/sw.el7/python/3.7.2/lib/python3.7/site-packages/madminer/sampling/combine.py", line 127, in combine_and_shuffle
) = load_madminer_settings(filename, include_nuisance_benchmarks=False)
File "/cvmfs/sw.el7/python/3.7.2/lib/python3.7/site-packages/madminer/utils/interfaces/hdf5.py", line 115, in load_madminer_settings
observables[o_name] = Observable(o_name, o_def)
File "", line 6, in init
File "/cvmfs/sw.el7/python/3.7.2/lib/python3.7/site-packages/madminer/models/readers.py", line 50, in post_init
eval(self.val_expression)
File "", line 0

^

from madminer.

rbarrue avatar rbarrue commented on September 17, 2024

(sorry, closed this on accident)
So, the solution would simply be to replace

observable_defs = [d if isinstance(d, str) else "" for d in observable_defs] -> observable_defs = [d if isinstance(d, str) else "None" for d in observable_defs] and do the same in the NuisanceParameter and Systematics operations ?

from madminer.

Sinclert avatar Sinclert commented on September 17, 2024

Yes. That should be it.

Could you run your analysis either (A) modifying your local copy of madminer, or (B) installing a customised fork of madminer, to verify whether or not my proposal fixes your crash?

If the answer is positive, I am open to review a PR including those changes in addition to the necessary tests so that this loading - saving setting inconsistency do not happen again. Finally, I could release version 0.9.3.

from madminer.

Sinclert avatar Sinclert commented on September 17, 2024

@rbarrue can you confirm whether or not the described bug-fix is enough to solve your problem?

from madminer.

rbarrue avatar rbarrue commented on September 17, 2024

Hi @Sinclert, I tested the proposed changes and everything seems to work as expected. In your comment, you mentioned "in addition to the necessary tests so that this loading - saving setting inconsistency do not happen again", are you talking about adding some tests to the tests folder ?

from madminer.

Sinclert avatar Sinclert commented on September 17, 2024

Hi @rbarrue ,

Yes, exactly. Right now the tests folder is pretty empty, but I (we?) would like that to change.

What a better opportunity to start? 🙂

I would envision creating a tests/utils/interfaces/test_hdf5.py file (mimicking the folder structure under madminer), with a small set of test functions covering the saving-loading of madminer settings, in order to check that the model definitions being saved (whether it is Observable,NuisanceParameter or Systematics ) are valid upon eval(self.val_definition) (such as the one done within the models).

from madminer.

rbarrue avatar rbarrue commented on September 17, 2024

Thank you very much @Sinclert !

from madminer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.