Giter Site home page Giter Site logo

healthypear / protopipe-grid-interface Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 0.0 1.04 MB

Interface to DIRAC for the CTA prototype data analysis pipeline.

Home Page: https://cta-observatory.github.io/protopipe/

License: Other

Python 94.41% Shell 0.97% Dockerfile 4.62%
cta grid protopipe

protopipe-grid-interface's Introduction

hey there ๐Ÿ‘‹

๐Ÿ‘ฉโ€๐Ÿ’ป About Me

- ๐Ÿ”ญ Iโ€™m working as a post-doctoral researcher at the Max-Planck-Institut for Physics working on Very High energy astrophysics using imaging atmospheric Cherenkov telescopes
- ๐Ÿ“š I'm currently learning C++20 and Rust
- โšก In my free time I like to hike, climb, ski, drink...

๐Ÿ›  Languages and tools

python logo bash logo markdown logo c logo cplusplus logo git logo codecov logo numpy logo pandas logo jupyter logo gcc logo docker logo vagrant logo github logo gitlab logo bitbucket logo trello logo linux logo apple logo vscode logo vim logo atom logo

๐Ÿ”ฅ My Stats

stats graph languages graph streak graph

protopipe-grid-interface's People

Contributors

kosack avatar

Watchers

 avatar  avatar  avatar  avatar

protopipe-grid-interface's Issues

Uniformise definition of SE lists throughout an analysis

Currently, such a list (which can be found with cta-prod-show-dataset DATASET_NAME --SEUsage) is relevant for,

  • protopipe-SUBMIT_JOBS when uploading the analysis configuration file,
  • protopipe-UPLOAD_MODELS when uploading models and configuration files.

In the first script this information is hardcoded (which is of course not good) and I am not sure how to use lists as a DIRAC script argument (I have the feeling that each argument should be unique - @arrabito ?).
In the second script I use argparse, which is of course more flexible.

The best option would be that of adding the list to the analysis_metadata.yaml file (either automatically from the command above, or leaving this to the user) so anytime those scripts are run that is the default list of SEs (but it would be good to leave protopipe-SUBMIT_JOBS the possibility to override this if necessary (but we fallback to the same issue as above).

UPDATE

The best option might actually be that of using grid.yaml, as that can be anyway easily retrieved by any script using the analysis_metadata.yaml and is used directly by protopipe-SUBMIT_JOBS.
Though a fallback default list of SEs should be always defined if such file is not available (or not used by the user if he/she didn't override the values from the CLI).

Test upgrade to python3

Technically the interface to (CTA)DIRAC already supports python3, but this needs to be tested.

If true this would mean that we can merge the interface with the pipeline and make an all-in-one container

Create container Docker for python3-based installation

After #50 this will be most likely a strict requirement only for Windows users, for which I am pretty sure Dirac has no support (or at best it is not tested).

Anyhow it will provide an isolated and reproducible environment for everyone if they want it.

Managing a protopipe-dev environment with (CTA)DIRAC

After the recent changes in the pipeline, the environment needs also pandas, which is not required for installation in the usual pure ctapipe environment as there it is only used for tests and/or docs.

NOTE: the environment for ctapipe 0.11.0 has been updated with pandas for now

The installation of protopipe in its conda development environment should be added to the submit_jobs.py script on the local repo copy sent to the grid.

Improve submitted job name

Example of current job names

job_analysisName_particletype_runXXX_cleaningMode

where cleaningMode is either "tail" for "tailcut" or "wave" for wavelet

Better naming

analysisName_analysisStep_particletype_runXXX

where analysisStep can be TRAINING_energy, TRAINING_classification or DL2

Make particle type and n_files_per_job command-line options in SUBMIT_JOBS

Is your feature request related to a problem? Please describe.

Submitting the jobs for several particle types requires hand-editing grid.yaml each time, which is error-prone and makes it hard to automate. The reason these must be changed is to minimize the time it takes to submit all jobs, while also having a reasonably even number of showers per job (so jobs take similar amounts of time). For example:

particle num files files/job showers/file jobs showers/job submission time(h)
gamma 4000 10 20,000 400 200,000 0.7
proton 24000 20 50,000 1200 1,000,000 2
electron 2000 4 50,000 500 200,000 0.8

Describe the solution you'd like

Allow setting --files-per-job=<int> and --particle=<type> as command-line options

Would you like to contribute with a Pull Request?

Yes

Fix gfal2 to 2.20.2

If greater (only 2.20.4 has been tested) this leads to a weird crash during e.g. upload of files on DIRAC where a line on code from this dependency tries to call

errno.ECOMM

and returning

AttributeError: module 'errno' has no attribute 'ECOMM'

which is really weird since in the python in use (3.8.12) the errno library has this attribute....

It's a bug in gfal2 and it seems to affect only some Unix-based systems (macos, but not e.g. the linux instance on ccin2p3 machines)

Containerization

Requirements

  • Add miniconda3 to container (see #52)

List of containers

  • Singularity (should be just a matter of binding mounts, but at the moment I do not plan to maintain it actively...)
  • #52

Merge this repository as a protopipe module

Requirements

Description

The idea is to make the interface a module of protopipe, e.g. protopipe.dirac.

The migration has to be made in a way that the history of this repository doesn't get lost.

test and dry modes do not behave as expected

test launches only the first file and not the first job

dry seems to use 1 file less when more than 1 are selected per job

in general it is expected that:

  • test launches the 1st job of the batch as it has been defined (so if N files per job, only 1 job of N files should be submitted) + config files
  • dry should to the same, but without actually submitting anything (not even configuration files)

Add check for dimension of a merged file

could be as simple checking that it is greater than the last single file or the more precise check its equal to the sum of the single files of the bunch

NOTE there could be a problem if one or more of the files in the bunch is not processed at the next level (e.g. the simtel file is corrupt)

Unit-testing for DIRAC-related operations

It would be really helpful for future automatization to have a bunch of unit tests for:

  • submitting 1 job of 1 file
  • submitting 1 job of N files (which will also merge the files on the running grid site)
  • download 1 file
  • upload 1 file

Caveats

  • test files need to be available (the ones used by the protopipe integration tests pipeline should be OK)
  • not sure if it's really easy to do as if the selected DIRAC site has an issue during the CI testing process, the process will fail because of (CTA)DIRAC and not because of this interface
  • not sure how to deal with the certificate in this case (if it's even possible)

@kosack @arrabito @bregeon

Testing

Currently, there are no unit or integration tests...

The whole interface is assuming Docker container

in case of Linux users the CTADIRAC framework is installed natively and there is no need to keep both source and output folder in the $HOME.

Also for singularity containers the home is shared and it is not obvious that the users puts (or links) everything there

the script has to at least differentiate between source code path and output path

also the "shared_folder" name for the output directory makes sense only in case of a container of some kind...

Unable to submit jobs from computing farm

Describe the bug
Hi!
Due to some problems with certificates I started working on the computing farm we have in Trieste. When I try to submit jobs (or download files form grid) I get a segmentation fault (core dumped)
I tried to contact some people taking care of the system in Trieste but I got non answers, do you have any suggestion on what I should check?
I tried to debug job submission but I got no clues about the error.
I attach a screen of my terminal with the steps protopipe-SUBMIT_JOBS performs before abruptly stopping
Schermata del 2022-08-25 16-44-31

Make this version usable until in CTADIRAC

After the update to the latest CTADIRAC, this version of the interface works.

These are the steps that I am following to make it easily available to current and future protopipe users:

  • test upgraded version of CTADIRAC container for protopipe
  • Fork CTADIRAC
  • Add new definition file to forked version of CTADIRAC
  • Add VagrantFile to interface repo for Windows and macos users
  • enable account of SingularityHub and build the container there to allow others to test
  • Update docs accordingly (#2 )

The future plan is to merge this to CTADIRAC (go here and here to follow what happens).

Current plan to split this repository

Not all the code contained in this repository requires the same environment.

In particular, as explained in the README, 2 environments are needed:

  • the DIRAC/CTADIRAC environment,
  • the protopipe environment.

Some of the files require DIRAC (current candidates for migration under CTADIRAC/Interfaces/API):

  • example_configuration.cfg, to configure the job sent on the grid
  • submit_jobs_new_scheme.py, to send the job on the grid
  • delete_files.py, to delete files on the grid
  • download_files.py, to download filed from the grid
  • upload_file.py, to upload files (aka the model estimators trained locally) to the grid

of which the first 2 are protopipe-specific, while the other 3 are for general purpose (but set to work within the protopipe workflow).

Some files are Bash scripts:

  • pilot.sh, required to launch the job,
  • example_upload_models.sh, required to configure upload_file.py based on the protopipe workflow,
  • example_download_and_merge.sh, same but for downloading and merging in one step.

The rest of the code needs only protopipe or general python,

  • merge_tables.py, to merge the files downloaded from the grid (currently done together with the download operations through a correctly edited example_download_and_merge.sh,
  • example_split_jobs.py, to split lists of simulation files for each particle type.

Note: the merging code could be moved to protopipe and the merging itself could be done by temporarily switching envs from grid to protopipe and viceversa.

Last, the README file, which contents are planned to be moved to the protopipe's docs (#2).

A job can crash if its name is too long

Not clear at the moment what is the maximum allowed length.

An easy solution would be to add an option to change the prefix of a job's name.
The prefix should be defined as everything before the analysis step, so "prototopipe_" and analysis name.
The rest of the job name shows analysis step, particle type, and run numbers which are information necessary for debugging.

The cleaning mode suffix could be already removed permanently as it's useless anyway.

Export mapping of single files to merged tables

Each time submit_jobs.py is run from a grid.yaml configuration file which has n_file_per_job > 1 the resulting output file is a table merged from single files output tables.

The current output filename is composed by the first and last run numbers, but the list of runs which make the merged table is lost.

submit_jobs.py can just export a text table in some format as e.g.,

merged table name list of simtel runs
TRAINING_energy_gamma_tail_run108-run140.h5 108
120
140

Merge into CTADIRAC

Requirements:

  • working version outside of CTADIRAC (technically works but not straightforward and easy for users)
  • test upgraded version of CTADIRAC container for protopipe

Next steps:

  • Fork CTADIRAC
  • Add new container definition file (same as in #4 )
  • move all code appropriately under CTADIRAC/Interfaces/API/protopipe (some will be moved to protopipe, see #5 )
  • create a test job for each step of the pipeline (basically the scripts) and add it/them progressively to CTADIRAC/Interfaces/test/
  • Open PR

Unable to create conda env on MAC

Describe the bug
Hi!
when trying to create protopipe-CTADIRAC env on the MacBook of one of our students we get the following error:

ResolvePackageNotFound:

  • diracgrid::fts3

We tried both dev and release environments (following the instructions from protopipe documentation) but we always get the same error.
We tried to delete and re-create env, after updating conda but we still get the error. Any Idea on how to proceed?

Desktop (please complete the following information):

  • OS: MacBook

Analysis config doesn't get uploaded if only 1 job is launched

Describe the bug

If only 1 job is launched (e.g. when launching protopipe-SUBMIT_JOBS with -test True) the output directory on DIRAC doesn't get created until its completion.
This causes the config file upload (which happens immediately after job submission) to crash.

To Reproduce

launch any data job (TRAINING, DL2) with only 1 file

Expected behavior

The script shouldn't crash at config submission.

Proposed solution

Move this step before job submission

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.