Giter Site home page Giter Site logo

Comments (11)

gabrielastro avatar gabrielastro commented on July 17, 2024 1

Thanks a lot! I can confirm that also UltraNest works in parallel. Under the line with Nested sampling with UltraNest, might it be good to have some function of UltraNest (i.e., not purely from species, which is done at the beginning) confirm to the user that indeed N processors are being used and are seen by Ultranest? Same thing for Multinest. This might save some users with a faulty set-up a lot of time by making diagnosis easier. (I will change the title of this thread because the parallel running works with both samplers now.)

On a related note, would it be possible to allow resume=False as an argument to fit.run_ultranest()? I guess it would correspond to 'overwrite' (i.e., when one is not trying to resume a run). It would make it more intuitive (True and False).

And finally, UltraNest has this very convenient compact output while running:

Creating directory for new run ultranest/run1
[ultranest] Sampling 1000 live points from prior ...
[ultranest] Widening roots to 1196 live points (have 1000 already) ...
[ultranest] Sampling 196 live points from prior ...
[ultranest] Widening roots to 1427 live points (have 1196 already) ...
[ultranest] Sampling 231 live points from prior ...
3235.3(14.70%) | Like=3253.39..3258.58 [3253.3935..3253.3948]*| it/evals=18315/2127774 eff=0.8413% N=1000

Especially the percentage is useful, and it remains compact. Would it be possible to have this for MultiNest too?

from species.

tomasstolker avatar tomasstolker commented on July 17, 2024 1

That would also be at the MultiNest level actually...

from species.

gabrielastro avatar gabrielastro commented on July 17, 2024

By the way, the version of h5py installed (1.12.2) should have SWMR (Single-Write Multiple-Read) support:

from h5py import version
from h5py import h5
print('  version.hdf5_version_tuple = ', version.hdf5_version_tuple)
print('  h5.get_config().swmr_min_hdf5_version = ', h5.get_config().swmr_min_hdf5_version)

yields, run in the parallel environment:

  version.hdf5_version_tuple =  (1, 12, 2)
  h5.get_config().swmr_min_hdf5_version =  (1, 9, 178)

Therefore, from what I see from ~/.local/lib/python3.9/site-packages/h5py/_hl/files.py, SWMR should be possible. I tried adding swmr=True: with h5py.File(self.database, "r", swmr=swmr) as h5_file: in species/read/read_object.py but this did not help.

Edit: Actually, it did help because now the error is at a different location:

"[…]/species/read/read_model.py", line 141, in open_database
 with h5py.File(self.database, "a") as hdf5_file

but there it is done only to # Test if the spectra are present in the database (?), so appending should not be needed there. Other calls to the class do need write access, however. I look into the code and it looks a bit too involved for me to change the different calls throughout, passing the parameters, etc., so I would leave this for the developer 😉.

from species.

tomasstolker avatar tomasstolker commented on July 17, 2024

Thanks for opening this issue! The mode was indeed incorrectly set in ReadModel when the HDF5 file was opened. Should have been fixed in commit 093df1d.

from species.

gabrielastro avatar gabrielastro commented on July 17, 2024

Excellent! Thank you very much for your quick fix. It works ✔️! Now multinest is running in parallel 😄.

By the way (I can open a separate "Issue" if you prefer), when run in parallel, it might be good if species printed the start-up messages (and maybe also the ones for setting up the fit: Interpolating Data… [DONE] and so on) prefixed with the processor number. A common problem with trying to run programs in parallel is that instead of one instance with N processes, N instances of the program are run simultaneously. (This typically comes from using different versions of MPI for compiling and for running.) If something like [proc N of M processors] were printed, at least once at start-up, it would be a good confirmation for the user that things are ok. Currently, there is no real way to tell, actually.

from species.

tomasstolker avatar tomasstolker commented on July 17, 2024

Feel free to create a pull request for that. It would be low priority to implement from my side since it seems to be running fine.

from species.

gabrielastro avatar gabrielastro commented on July 17, 2024

Ok. Yes, it runs fine! As a minimal version, how about for the beginning of SpeciesInit() in species/core/species_init.py:

       try:
           from mpi4py import MPI
           mpi_rank = MPI.COMM_WORLD.Get_rank()
           mpi_size = MPI.COMM_WORLD.Get_size()
           species_parallel_msg = "Proc. %d of %d\n" % (mpi_rank, mpi_size)
       except ModuleNotFoundError:
           species_parallel_msg = ""

       species_version = species.__version__
       species_msg = f"species v{species_version}"

       mess_len = max(len(species_msg),len(species_parallel_msg))
       print(mess_len * "=")
       print(species_msg)
       print(species_parallel_msg, end="")
       print(mess_len * "=")

or maybe setting species_parallel_msg to non-null only if additionally mpi_size>1, but maybe it is better as is so that the user may think "Oh? I could have more than one processor? How nice!" It is probably not quite coded in the official species style so maybe this can be seen as an informal, poor-man's pull request 😁? I tested it and it works fine both in parallel and without mpi4py support (doing before sys.modules['mpi4py'] = None to fake it).

from species.

tomasstolker avatar tomasstolker commented on July 17, 2024

Thanks for the suggestion! I have implemented this in commit ed70768 👍.

from species.

tomasstolker avatar tomasstolker commented on July 17, 2024

Especially the percentage is useful, and it remains compact. Would it be possible to have this for MultiNest too?

That would be a feature request for (Py)MultiNest!

from species.

gabrielastro avatar gabrielastro commented on July 17, 2024

Of course, that is one way :). I can try asking. Another thing that might be on the species level, though, is making MultiNest cancellable (interruptible) with Ctrl+C, which UltraNest ist. Currently, I need to pause ipython with Ctrl+Z and kill %% it, which is not elegant and (over)kill. I tried to find what part of UltraNest does the elegant screen updates but could not find it; maybe the two (display and interruptability) could be handled together…

from species.

gabrielastro avatar gabrielastro commented on July 17, 2024

Ok! Thanks. I guess usually one submits a script on a cluster and the job can be killed, or uses a jupyter notebook and the kernel can be stopped, so this inelegant "pause and kill" is usually not so much an inconvenience…

from species.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.