After getting <a href="https://github.com/JohannesBuchner/PyMultiNest/issues/142#issue

Thanks for the suggestion! I have implemented this in commit <a href="https://github.c

Running multinest or ultranest in parallel? about species HOT 11 CLOSED

gabrielastro commented on July 17, 2024

Running multinest or ultranest in parallel?

from species.

Comments (11)

gabrielastro commented on July 17, 2024 1

Thanks a lot! I can confirm that also UltraNest works in parallel. Under the line with Nested sampling with UltraNest, might it be good to have some function of UltraNest (i.e., not purely from species, which is done at the beginning) confirm to the user that indeed N processors are being used and are seen by Ultranest? Same thing for Multinest. This might save some users with a faulty set-up a lot of time by making diagnosis easier. (I will change the title of this thread because the parallel running works with both samplers now.)

On a related note, would it be possible to allow resume=False as an argument to fit.run_ultranest()? I guess it would correspond to 'overwrite' (i.e., when one is not trying to resume a run). It would make it more intuitive (True and False).

And finally, UltraNest has this very convenient compact output while running:

Creating directory for new run ultranest/run1
[ultranest] Sampling 1000 live points from prior ...
[ultranest] Widening roots to 1196 live points (have 1000 already) ...
[ultranest] Sampling 196 live points from prior ...
[ultranest] Widening roots to 1427 live points (have 1196 already) ...
[ultranest] Sampling 231 live points from prior ...
3235.3(14.70%) | Like=3253.39..3258.58 [3253.3935..3253.3948]*| it/evals=18315/2127774 eff=0.8413% N=1000

Especially the percentage is useful, and it remains compact. Would it be possible to have this for MultiNest too?

from species.

tomasstolker commented on July 17, 2024 1

That would also be at the MultiNest level actually...

from species.

gabrielastro commented on July 17, 2024

By the way, the version of h5py installed (1.12.2) should have SWMR (Single-Write Multiple-Read) support:

from h5py import version
from h5py import h5
print('  version.hdf5_version_tuple = ', version.hdf5_version_tuple)
print('  h5.get_config().swmr_min_hdf5_version = ', h5.get_config().swmr_min_hdf5_version)

yields, run in the parallel environment:

  version.hdf5_version_tuple =  (1, 12, 2)
  h5.get_config().swmr_min_hdf5_version =  (1, 9, 178)

Therefore, from what I see from ~/.local/lib/python3.9/site-packages/h5py/_hl/files.py, SWMR should be possible. I tried adding swmr=True: with h5py.File(self.database, "r", swmr=swmr) as h5_file: in species/read/read_object.py but this did not help.

Edit: Actually, it did help because now the error is at a different location:

"[…]/species/read/read_model.py", line 141, in open_database
 with h5py.File(self.database, "a") as hdf5_file

but there it is done only to # Test if the spectra are present in the database (?), so appending should not be needed there. Other calls to the class do need write access, however. I look into the code and it looks a bit too involved for me to change the different calls throughout, passing the parameters, etc., so I would leave this for the developer 😉.

from species.

tomasstolker commented on July 17, 2024

Thanks for opening this issue! The mode was indeed incorrectly set in ReadModel when the HDF5 file was opened. Should have been fixed in commit 093df1d.

from species.

gabrielastro commented on July 17, 2024

Excellent! Thank you very much for your quick fix. It works ✔️! Now multinest is running in parallel 😄.

By the way (I can open a separate "Issue" if you prefer), when run in parallel, it might be good if species printed the start-up messages (and maybe also the ones for setting up the fit: Interpolating Data… [DONE] and so on) prefixed with the processor number. A common problem with trying to run programs in parallel is that instead of one instance with N processes, N instances of the program are run simultaneously. (This typically comes from using different versions of MPI for compiling and for running.) If something like [proc N of M processors] were printed, at least once at start-up, it would be a good confirmation for the user that things are ok. Currently, there is no real way to tell, actually.

from species.

tomasstolker commented on July 17, 2024

Feel free to create a pull request for that. It would be low priority to implement from my side since it seems to be running fine.

from species.

gabrielastro commented on July 17, 2024

Ok. Yes, it runs fine! As a minimal version, how about for the beginning of SpeciesInit() in species/core/species_init.py:

       try:
           from mpi4py import MPI
           mpi_rank = MPI.COMM_WORLD.Get_rank()
           mpi_size = MPI.COMM_WORLD.Get_size()
           species_parallel_msg = "Proc. %d of %d\n" % (mpi_rank, mpi_size)
       except ModuleNotFoundError:
           species_parallel_msg = ""

       species_version = species.__version__
       species_msg = f"species v{species_version}"

       mess_len = max(len(species_msg),len(species_parallel_msg))
       print(mess_len * "=")
       print(species_msg)
       print(species_parallel_msg, end="")
       print(mess_len * "=")

or maybe setting species_parallel_msg to non-null only if additionally mpi_size>1, but maybe it is better as is so that the user may think "Oh? I could have more than one processor? How nice!" It is probably not quite coded in the official species style so maybe this can be seen as an informal, poor-man's pull request 😁? I tested it and it works fine both in parallel and without mpi4py support (doing before sys.modules['mpi4py'] = None to fake it).

from species.

tomasstolker commented on July 17, 2024

Thanks for the suggestion! I have implemented this in commit ed70768 👍.

from species.

tomasstolker commented on July 17, 2024

Especially the percentage is useful, and it remains compact. Would it be possible to have this for MultiNest too?

That would be a feature request for (Py)MultiNest!

from species.

gabrielastro commented on July 17, 2024

Of course, that is one way :). I can try asking. Another thing that might be on the species level, though, is making MultiNest cancellable (interruptible) with Ctrl+C, which UltraNest ist. Currently, I need to pause ipython with Ctrl+Z and kill %% it, which is not elegant and (over)kill. I tried to find what part of UltraNest does the elegant screen updates but could not find it; maybe the two (display and interruptability) could be handled together…

from species.

gabrielastro commented on July 17, 2024

Ok! Thanks. I guess usually one submits a script on a cluster and the job can be killed, or uses a jupyter notebook and the kernel can be stopped, so this inelegant "pause and kill" is usually not so much an inconvenience…

from species.

Running multinest or ultranest in parallel? about species HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent