healthypear / protopipe-grid-interface Goto Github PK

View Code? Open in Web Editor NEW

1.0 4.0 0.0 1.04 MB

Interface to DIRAC for the CTA prototype data analysis pipeline.

Home Page: https://cta-observatory.github.io/protopipe/

License: Other

Python 94.41% Shell 0.97% Dockerfile 4.62%

cta grid protopipe

protopipe-grid-interface's Introduction

hey there 👋

👩‍💻 About Me

- 🔭 I’m working as a post-doctoral researcher at the Max-Planck-Institut for Physics working on Very High energy astrophysics using imaging atmospheric Cherenkov telescopes
- 📚 I'm currently learning C++20 and Rust
- ⚡ In my free time I like to hike, climb, ski, drink...

🛠 Languages and tools

🔥 My Stats

protopipe-grid-interface's People

Contributors

Watchers

protopipe-grid-interface's Issues

Add and test the high-memory requirement method

Not really meant to be used always, but could be nice to have it for testing

j.setTag("HighMem")

to be placed before j.submitJob

Fix directory structure inconstistency and naming for estimators

This is a minor esthetic detail.

local

analysis/estimators/

energy_regressor
gamma_hadron_classifier

GRID

analysis/estimators/

Also, regressor is quite specific, better rename it into something more general or even just energy.

Should we archive this repository?

As reported in cta-observatory/protopipe#201

With the advent of DataPipe and magic-cta-pipe, this project is basically obsolete.

I do not see any further development since I left my previous position.

Should we archive this repository as read-only?

@kosack

Uniformise definition of SE lists throughout an analysis

Currently, such a list (which can be found with cta-prod-show-dataset DATASET_NAME --SEUsage) is relevant for,

protopipe-SUBMIT_JOBS when uploading the analysis configuration file,
protopipe-UPLOAD_MODELS when uploading models and configuration files.

In the first script this information is hardcoded (which is of course not good) and I am not sure how to use lists as a DIRAC script argument (I have the feeling that each argument should be unique - @arrabito ?).
In the second script I use argparse, which is of course more flexible.

The best option would be that of adding the list to the analysis_metadata.yaml file (either automatically from the command above, or leaving this to the user) so anytime those scripts are run that is the default list of SEs (but it would be good to leave protopipe-SUBMIT_JOBS the possibility to override this if necessary (but we fallback to the same issue as above).

UPDATE

The best option might actually be that of using grid.yaml, as that can be anyway easily retrieved by any script using the analysis_metadata.yaml and is used directly by protopipe-SUBMIT_JOBS.
Though a fallback default list of SEs should be always defined if such file is not available (or not used by the user if he/she didn't override the values from the CLI).

No need for "--analysis_path" argument for create_analysis_tree.py

All analyses should be stored in the shared_folder/analyses folder.

Do not submit job during logging call

protopipe-grid-interface/protopipe_grid_interface/scripts/submit_jobs.py

Line 611 in 4a5b4ab

log.info("Submission RESULT: %s", dirac.submitJob(j)["Value"])

also use DEBUG for the full job result, while INFO for the "Value" key (job ID)

Test upgrade to python3

Technically the interface to (CTA)DIRAC already supports python3, but this needs to be tested.

If true this would mean that we can merge the interface with the pipeline and make an all-in-one container

Create container Docker for python3-based installation

After #50 this will be most likely a strict requirement only for Windows users, for which I am pretty sure Dirac has no support (or at best it is not tested).

Anyhow it will provide an isolated and reproducible environment for everyone if they want it.

Managing a protopipe-dev environment with (CTA)DIRAC

After the recent changes in the pipeline, the environment needs also pandas, which is not required for installation in the usual pure ctapipe environment as there it is only used for tests and/or docs.

NOTE: the environment for ctapipe 0.11.0 has been updated with pandas for now

The installation of protopipe in its conda development environment should be added to the submit_jobs.py script on the local repo copy sent to the grid.

Improve submitted job name

Example of current job names

job_analysisName_particletype_runXXX_cleaningMode

where cleaningMode is either "tail" for "tailcut" or "wave" for wavelet

Better naming

analysisName_analysisStep_particletype_runXXX

where analysisStep can be TRAINING_energy, TRAINING_classification or DL2

Make particle type and n_files_per_job command-line options in SUBMIT_JOBS

Is your feature request related to a problem? Please describe.

Submitting the jobs for several particle types requires hand-editing grid.yaml each time, which is error-prone and makes it hard to automate. The reason these must be changed is to minimize the time it takes to submit all jobs, while also having a reasonably even number of showers per job (so jobs take similar amounts of time). For example:

particle	num files	files/job	showers/file	jobs	showers/job	submission time(h)
gamma	4000	10	20,000	400	200,000	0.7
proton	24000	20	50,000	1200	1,000,000	2
electron	2000	4	50,000	500	200,000	0.8

Describe the solution you'd like

Allow setting --files-per-job=<int> and --particle=<type> as command-line options

Would you like to contribute with a Pull Request?

Yes

Fix gfal2 to 2.20.2

If greater (only 2.20.4 has been tested) this leads to a weird crash during e.g. upload of files on DIRAC where a line on code from this dependency tries to call

errno.ECOMM

and returning

AttributeError: module 'errno' has no attribute 'ECOMM'

which is really weird since in the python in use (3.8.12) the errno library has this attribute....

It's a bug in gfal2 and it seems to affect only some Unix-based systems (macos, but not e.g. the linux instance on ccin2p3 machines)

Containerization

Requirements

Add miniconda3 to container (see #52)

List of containers

Singularity (should be just a matter of binding mounts, but at the moment I do not plan to maintain it actively...)
#52

Move docs to protopipe

Transfer and update the documentation contained in the README into the dedicated section of the documentation of protopipe.

Improve downloading of files before merging

use dirac-dms-directory-sync -j in case some files didn't get transferred

Merge this repository as a protopipe module

Requirements

Description

The idea is to make the interface a module of protopipe, e.g. protopipe.dirac.

The migration has to be made in a way that the history of this repository doesn't get lost.

Do not upload all protopipe folder at submission

All the cache and not required files to run the job should be excluded from the input sandbox

maybe when creating the tarball we can apply such an exclusion

test and dry modes do not behave as expected

test launches only the first file and not the first job

dry seems to use 1 file less when more than 1 are selected per job

in general it is expected that:

test launches the 1st job of the batch as it has been defined (so if N files per job, only 1 job of N files should be submitted) + config files
dry should to the same, but without actually submitting anything (not even configuration files)

Add mailmap and release drafter

Remove cleaning mode from job settings and filenames

This is not possible until the wavelet cleaning is in ctapipe.

Add check for dimension of a merged file

could be as simple checking that it is greater than the last single file or the more precise check its equal to the sum of the single files of the bunch

NOTE there could be a problem if one or more of the files in the bunch is not processed at the next level (e.g. the simtel file is corrupt)

Create particle types subfolders

Easier to remove corrupt or incomplete output files

Add optional argument to job submitter for saving images

An argument given to submit_jobs.py that enables/disables the addition of --save_images as an argument to protopipe.scripts.data_training.

Add environment variable to protopipe configs

not really necessary since the VagrantFile already links the source codes

test with Singularity only (#15)

Unit-testing for DIRAC-related operations

It would be really helpful for future automatization to have a bunch of unit tests for:

submitting 1 job of 1 file
submitting 1 job of N files (which will also merge the files on the running grid site)
download 1 file
upload 1 file

Caveats

test files need to be available (the ones used by the protopipe integration tests pipeline should be OK)
not sure if it's really easy to do as if the selected DIRAC site has an issue during the CI testing process, the process will fail because of (CTA)DIRAC and not because of this interface
not sure how to deal with the certificate in this case (if it's even possible)

@kosack @arrabito @bregeon

Testing

Currently, there are no unit or integration tests...

The whole interface is assuming Docker container

in case of Linux users the CTADIRAC framework is installed natively and there is no need to keep both source and output folder in the $HOME.

Also for singularity containers the home is shared and it is not obvious that the users puts (or links) everything there

the script has to at least differentiate between source code path and output path

also the "shared_folder" name for the output directory makes sense only in case of a container of some kind...

Unable to submit jobs from computing farm

Describe the bug
Hi!
Due to some problems with certificates I started working on the computing farm we have in Trieste. When I try to submit jobs (or download files form grid) I get a segmentation fault (core dumped)
I tried to contact some people taking care of the system in Trieste but I got non answers, do you have any suggestion on what I should check?
I tried to debug job submission but I got no clues about the error.
I attach a screen of my terminal with the steps protopipe-SUBMIT_JOBS performs before abruptly stopping

Make this version usable until in CTADIRAC

After the update to the latest CTADIRAC, this version of the interface works.

These are the steps that I am following to make it easily available to current and future protopipe users:

test upgraded version of CTADIRAC container for protopipe
Fork CTADIRAC
Add new definition file to forked version of CTADIRAC
Add VagrantFile to interface repo for Windows and macos users
enable account of SingularityHub and build the container there to allow others to test
Update docs accordingly (#2 )

The future plan is to merge this to CTADIRAC (go here and here to follow what happens).

Current plan to split this repository

Not all the code contained in this repository requires the same environment.

In particular, as explained in the README, 2 environments are needed:

the DIRAC/CTADIRAC environment,
the protopipe environment.

Some of the files require DIRAC (current candidates for migration under CTADIRAC/Interfaces/API):

example_configuration.cfg, to configure the job sent on the grid
submit_jobs_new_scheme.py, to send the job on the grid
delete_files.py, to delete files on the grid
download_files.py, to download filed from the grid
upload_file.py, to upload files (aka the model estimators trained locally) to the grid

of which the first 2 are protopipe-specific, while the other 3 are for general purpose (but set to work within the protopipe workflow).

Some files are Bash scripts:

pilot.sh, required to launch the job,
example_upload_models.sh, required to configure upload_file.py based on the protopipe workflow,
example_download_and_merge.sh, same but for downloading and merging in one step.

The rest of the code needs only protopipe or general python,

merge_tables.py, to merge the files downloaded from the grid (currently done together with the download operations through a correctly edited example_download_and_merge.sh,
example_split_jobs.py, to split lists of simulation files for each particle type.

Note: the merging code could be moved to protopipe and the merging itself could be done by temporarily switching envs from grid to protopipe and viceversa.

Last, the README file, which contents are planned to be moved to the protopipe's docs (#2).

Upload also grid configuration file at job submission

A job can crash if its name is too long

Not clear at the moment what is the maximum allowed length.

An easy solution would be to add an option to change the prefix of a job's name.
The prefix should be defined as everything before the analysis step, so "prototopipe_" and analysis name.
The rest of the job name shows analysis step, particle type, and run numbers which are information necessary for debugging.

The cleaning mode suffix could be already removed permanently as it's useless anyway.

Migrate to cta-observatory organization

Requirements

release v0.4.0

Export mapping of single files to merged tables

Each time submit_jobs.py is run from a grid.yaml configuration file which has n_file_per_job > 1 the resulting output file is a table merged from single files output tables.

The current output filename is composed by the first and last run numbers, but the list of runs which make the merged table is lost.

submit_jobs.py can just export a text table in some format as e.g.,

merged table name	list of simtel runs
TRAINING_energy_gamma_tail_run108-run140.h5	108
	120
	140

Merge into CTADIRAC

Requirements:

working version outside of CTADIRAC (technically works but not straightforward and easy for users)
test upgraded version of CTADIRAC container for protopipe

Next steps:

Fork CTADIRAC
Add new container definition file (same as in #4 )
move all code appropriately under CTADIRAC/Interfaces/API/protopipe (some will be moved to protopipe, see #5 )
create a test job for each step of the pipeline (basically the scripts) and add it/them progressively to CTADIRAC/Interfaces/test/
Open PR

Unable to create conda env on MAC

Describe the bug
Hi!
when trying to create protopipe-CTADIRAC env on the MacBook of one of our students we get the following error:

ResolvePackageNotFound:

diracgrid::fts3

We tried both dev and release environments (following the instructions from protopipe documentation) but we always get the same error.
We tried to delete and re-create env, after updating conda but we still get the error. Any Idea on how to proceed?

Desktop (please complete the following information):

OS: MacBook

Analysis config doesn't get uploaded if only 1 job is launched

Describe the bug

If only 1 job is launched (e.g. when launching protopipe-SUBMIT_JOBS with -test True) the output directory on DIRAC doesn't get created until its completion.
This causes the config file upload (which happens immediately after job submission) to crash.

To Reproduce

launch any data job (TRAINING, DL2) with only 1 file

Expected behavior

The script shouldn't crash at config submission.

Proposed solution

Move this step before job submission

healthypear / protopipe-grid-interface Goto Github PK

protopipe-grid-interface's Introduction

hey there 👋

👩‍💻 About Me

🛠 Languages and tools

🔥 My Stats

protopipe-grid-interface's People

Contributors

Watchers

protopipe-grid-interface's Issues

UPDATE

Requirements

List of containers

Requirements

Description

Caveats

Requirements

Recommend Projects

Recommend Topics

Recommend Org