Giter Site home page Giter Site logo

Comments (6)

aportelli avatar aportelli commented on June 18, 2024

Could I have the DB please?

from hadrons.

mmphys avatar mmphys commented on June 18, 2024

Here are two very similar dBs, but with different schedule, both exhibit the issue.
db.tar.gz

from hadrons.

mmphys avatar mmphys commented on June 18, 2024

Grid Commit: 0174f5f742782d1b43e49213bd9d729f7094962e [0174f5f7]
Hadrons Commit: 04e06e8 [04e06e8]
i.e. latest Grid, but Hadrons is prior to Fionn's latest changes and HadronsXmlValidate fix

from hadrons.

aportelli avatar aportelli commented on June 18, 2024

I can reproduce it. The error you get is about Grid pointers being inconsistent, which the DB knows nothing about (it does not save the geometries)... this is weird let me investigate.

from hadrons.

aportelli avatar aportelli commented on June 18, 2024

Ufff that was a painful one... but it was a bug related to the DB indeed, so good catch!
The error message really did not suggest anything like that and was a randomly appearing, indirect side-effect of the problem.

The short version is: the DB was not restoring the storage type (standard, cache or tmp) of the objects. The momentum phase in the sink function got 'standard' by default instead of 'cache', which means it will be part of garbage collection. Because it is not needed by anybody it was destroyed as soon as the sink module ended. Later in the meson contraction the sink function is called, and randomly it was possible to dereference the pointer on the destroyed object without triggering a bad access error.

So one additional issue is that the sink function was just capturing the address of the phase, which is fine but unsafe if the phase get destroyed in the mean time. So once more it is very important to use the envGet macro as it actually checks the object lives. This is a general comment I actually wrote the unsafe code 😄.

It is now fixed in develop, the DB restore the storage types and the phases are accessed in a safer way (i.e. the bug would result in a meaningful error message). Let me know if it works on your side.

This would have happened to us in production at one point or another, so thanks for the thorough testing it saved us a lot of headache.

from hadrons.

mmphys avatar mmphys commented on June 18, 2024

Awesome! Thanks for finding all that out and fixing it so quickly!
I'll recompile for both CPU and GPU then restart my jobs - should be a fairly thorough test.
I'll report back how it goes.
Thanks again

from hadrons.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.