Giter Site home page Giter Site logo

Comments (7)

ChrisRackauckas avatar ChrisRackauckas commented on June 4, 2024

Please share the MWE.

Are you appropriately setting the heap size when moving it to the HPC setting?

And BTW, did you try other BVP solvers like the MIRK ones? Generally those will scale better.

from differentialequations.jl.

novalex98 avatar novalex98 commented on June 4, 2024

Dear Dr. Rackauckas,

I'll develop an MWE. I can't share the code publicly, but I strongly suspect I can replicate it with a generic stiff problem.

The heap size is appropriate (I've tried 1G, 10G, and 50G. The total memory available to the program is 100G. They all behave the same) and I've shown that the problem is with garbage collection, not with the high performance computing environment: I can induce the same issue on my laptop if I add GC.gc() calls to the code. It only showed up on the HPC because garbage collection had to be called because of the limited heap size.

Regarding BVP solvers, this system is highly unstable, and will often have a lot of +infinity/-infinity values around the correct local minimum, so, for example, the Regula Falsi method doesn't find the local minimum. I'll look into the BVP solvers more, but I'm skeptical because its so unstable.

from differentialequations.jl.

novalex98 avatar novalex98 commented on June 4, 2024

I haven't quite replicated the issue, but I'm seeing a situation where garbage collection causes a change that seems related. The two attached codes are the same except that one runs garbage collection and the other doesn't.

The one without garbage collection gives a lot of

"Warning: dt() <= dtmin() at t =0.0"

The one with garbage collection gives far fewer warnings like this, and none after garbage collection runs the first time. in this case that doesn't stop it from pushing through (if anything it improves its performance) but its not clear to me why the appearance of these warnings depends on a garbage collection call later in the code.
Stiff_GC_MWE_3.txt
Stiff_GC_MWE.txt

from differentialequations.jl.

ChrisRackauckas avatar ChrisRackauckas commented on June 4, 2024

I'm not sure I understand the MWE:

  • tol_solve_ivp = 10^(-30) that's impossible to achieve with Float64 if relative values are around 10^4 for some things.
  • The jacobian function is not correct

My immediate guess just by looking at it is that there must be some uninitialized memory that is being used in the RadauIIA5 Jacobain, because it's an odd one. So two questions:

  1. one does this only happen with RadauIIA5 and no other solver?
  2. What happens if you add jac_mat .= 0 to the Jacobian function?

from differentialequations.jl.

novalex98 avatar novalex98 commented on June 4, 2024

"tol_solve_ivp = 10^(-30) that's impossible to achieve with Float64 if relative values are around 10^4 for some things."

That's odd to me because I've had it successfully converge with Radau.

I used Matlab's symbolic math toolbox to get my Jacobian. Where's the error?

from differentialequations.jl.

novalex98 avatar novalex98 commented on June 4, 2024

Adding jac_mat .= 0 to the start of the Jacobians in my original code fixed the problem, at least on my PC, alongside fixing a typo

(in one of my Jacobians I had a mistake of the form
"function(jac_u, u, p, t)
jac_v = [stuff]
return jac_v")
where the variable in the definition of the function and the variable returned were different.

from differentialequations.jl.

ChrisRackauckas avatar ChrisRackauckas commented on June 4, 2024

Okay awesome, this narrowed it down. Basically, for one of the Jacobians in RadauIIA5 there was uninitialized memory being used. In almost any normal case, someone would fill the J with memory, but for safety we normally default Jacobian memory to zero. Because it's uninitialized memory, if you GC vs not GC you'd get the random values in there. This is fixed by SciML/OrdinaryDiffEq.jl#2071 which adds the zeroing to W2, effectively doing the jac_mat .= 0 at construction time so you don't have to. That means any value not specified in the Jacobian defaults to zero, which is true is all other methods but RadauIIA5 is weird and had to define its own Jacobian and we missed adding the safety thing on that spot.

That's odd to me because I've had it successfully converge with Radau.

In theory it's possible, though Float64 can only guarantee a relative accuracy of around 1e-16, i.e. 1 + 1e-16 == 1 in floating point arithmetic, and thus if you have values near 1 (which you seem to have) then trying to make something converge to 1e-30 isn't likely to happen. It's possible, but not always.

I used Matlab's symbolic math toolbox to get my Jacobian. Where's the error?

The file that you sent had:

function Jacobian_func(jac_mat, u, p, t)
	p1, p2, p3, p4 = p
	
end

Not sure if that was intentional but it did expose the fact that the jac_mat was not edited and its memory was unutilized, so the example worked for the cause

where the variable in the definition of the function and the variable returned were different.

Yup so it was using the uninitialized memory, awesome this should have at least deterministic behavior now of treating the Jacobian as zero (and we should find a better way to error in these cases).

from differentialequations.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.