Giter Site home page Giter Site logo

clas12simulations's People

Contributors

maureeungaro avatar robertej19 avatar sangbaek avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

clas12simulations's Issues

Web Interface for Job Submission

Instead of passing arguments through an scard.txt file which then gets copied and parsed into a database, have a web site where a user can make selections through a GUI, these options would then be written directly to a server side database, and then eventually parsed into submission scripts and passed to HTCondor.

This is a large project and would require significant changes to the code base.

Cronjob Functionality (future)

This might not be ready to implement yet, but at some point will need cronjobs (or other task scheduler management system) to do things such as:

  • scan scard table in database for unsubmitted jobs and submit to HTCondor
  • scan output directories for completed jobs and update DB records accordingly, and distribute output files if needed
  • Other schedueling tasks
    This is probably not useful presently, but will be needed in the future and the codebase can start being developed at any time

Sanitization of database entries

The inputs to the database coming from exterior files, i.e. scard and gcard files, need to be "sanatized" to the best of our ability to prevent against SQL injection attacks.

Currently the scard has minimal to moderate techniques to prevent against this, the gcard has ZERO techniques for sanitization implemented. This should be of high priority; we cannot publish a DB without moderate to high levels of security in all areas in this regard.

Improve GCard directory comprehension & downloading

The current mechanism to find and download gcards (found in gcard_helper.py) is very basic and very probably might not work on a general online directory. It works for the example Mauri linked but should be made more robust and general to work in more cases, and throw error messages if things go wrong.

improve debug = 0 print functions

the user will be interested in some kind of log - batch ID, submission ID,timestamp - the DEBUG = 0 case should output all of these to the client

Grab submitted job ID number from HTCondor

When submitting a job using HTCondor, we get a message such as:
jobs 7482391...98 submitted successfully

We will at the end of the day need these HTCondor job IDs for logging purposes, which can maybe be accomplished with something like subprocess.check_output()

directories support

We'll need this for the gcards, the generator, the background merging.

Basically the gcard entry (or the others) should support an entire directory containing gcards.
Then the simulation is duplicated with the exact same conditions except it will use each gcard in the directory.

When we discussed this I proposed a directory visible by the submit scripts (same machine). But it could get complicated.

I propose one mechanism only, to read directories visible online. The python script could use wget to get them.

Example for the gcard:

gcard: https://username.jlab.org/ungaro/tmp/gcards

The script would download the entire directory and execute N condor submits, one for each gcard in there.

Redesign Users table & db_user_entry

When an scard is submitted, query the Users table, if user does not exist, add user to DB. Remove "default" user. Remove email field and replace with host name field, using something like whoami, echo $USER, or hostname

Allow User to Specify Custom Input Event File

Allow a user to upload and use a custom event generation file, so they do not have to use one of the 3 currently functioning generators if they do not want to. This file would probably have to be copied to the subMIT machine and then passed through to HTCondor, and then maybe deleted after simulation is complete so as to not build up unneeded files.

Improve relative paths

The current file structure system lists relative/absolute paths that I think can get corrupted easily if things are not ran from their intended locations. This might not be an issue but I feel like there might be a better way to set up path locations than what is currently done in the system.

Enforce Foreign Key Relations in SQLite DB

FK relations need to be enforced, at least between scard and users tables (i.e. the username in every scard.txt should match up with a registered user in the USERS table, otherwise throw an error)

Job Statistics Parsing and Logging

Create output files from runscript to contain statistics on runtime, cores used, other statistics. Can be used in connection with what HTCondor already produces, or be a separate file. Parse and log this information into the MC Database

Improved Error Handling

There are a number of errors that can occur:
user lists scard location that doesn't exist
user lists gcard location that doesn't exist
connection to gcard online directory fails / times out
connection to DB fails / times out
You can think of many more. There should be error handling included to handle these cases, most do not yet exist.

Unique Naming and Logging/Storage of Submission Scripts

The problem is best described with the following example:

I want to submit 5 jobs (Queue = 5) with 100 events each. I run our software, which creates the submission scripts (clas12.condor, runscript.sh) to do this. I submit to HTCondor.

Before any job has processed (job is idle, or running but not complete) I decide I also want to submit 7 jobs (Queue = 7) with 200 events each. I run our software, which creates the submission scripts, and submit the batch.

Instead of having 5 jobs with 100 events and 7 jobs with 200 events, I will instead end up with 14 jobs with 200 events each. I.e., files are not passed to HTCondor in a permanent fashion, they are "live" on the subMIT node until processing is complete.

To fix this, we need to do something like uniquely name each submission script file (maybe just with a unix timestamp even) and store the scripts in their own dedicated folder. After simulations are complete these files can be stored long term, or destroyed to save disc space.

Optimize events / job distribution

User wants to simulate 100 million events. Is it more efficient to submit 1 million jobs with 100 events each, or 100 jobs with 1 million events each? What is the most time efficient way of distributing events? Does this answer change based on what pool you submit to and what resources are available?

The answer to these questions can be included as logic for job submissions so that submissions can be handled intelligently - maybe at end of day we change the scard to instead of have X number of events and Y number of jobs, the user just specifies that they want X*Y number of events and the logic of how many jobs is decided automatically

Generate submission scripts on server side, use functions not templates

Generate submission scripts on server side, use functions not templates. Break up the functions into different sections (header, generator, gemc, statistic logging) that can take in variable arguments. Perhaps split the src/ directory into "client" and "server" directories, with the server being were jobs will actually be submitted / run, and "client" is where scards will be gathered and entered into a DB.

Update submissions tables design

Remove uneeded submission table fields (submission files paths) & add needed ones: submitted on (date,etc ), running on (date, node), completed (date, node) connect back from database

One run script file, not 3

Currently on some computing pools (Tier 2, maybe other) we have 3 submission scripts: condor_wrapper, run_job, and runscript.sh. These should be combined intelligently to create 1 run script file. If we need to create separate templates to run on Tier 2 vs. OSG, maybe we should do this, and then call which template to overwrite based on which pool we are submitting to.

Module directory for subprocedures

If a python script is only called by another main python script, not by bash shell,
we can regard them as "subpackage." We can make a separate directory for them with init.py. This is a super minor thing that can be done afterward.

Create tests for pull requests / CI

It might be helpful to start setting up validation tests to aid with pull requests so we don't have to manually test a bunch of edge cases every time we have a PR. We could use something like https://travis-ci.org/. I haven't had to set this up myself on any previous projects, but should be mostly straightforward. Perhaps we can discuss this at our next group meeting.

Some version of templates missing at some point

Eventually templates will not be used so every script should be written from scratch from db but for some notes..

Luminosity option used at March tutorial was
"
-LUMI_EVENT="NLUMI, 248.5ns, 4ns" -LUMI_P="e-, 10.6GeV, 0deg, 0*deg" -LUMI_V="(0.0, 0.0, -10)cm" -LUMI_SPREAD_V="(0.03, 0.03)cmโ€
"
The missing templates had an option to change NLUMI i luminosity was not 0 at scard.

gcards_scard is not correctly replaced at copying gcard, i.e.
cp gcards_scard out_basename $ClusterIdnnevents_scard
is changed to
cp https://userweb.jlab.org/~ungaro/tmp/gcards/ out
basename $ClusterId_n10
by python codes for example.
The url should be replaced with directory path like $/src/utils/../../submission_files/gcards/gcard_8_batch_4.gcard.

Change keys of genOutput, genExecutable to match current documentation

From line 200~ of file_struct.py

Keys of genOutput and genExecutable were name of generators themselves at March collaboration meeting. I.e. dvcsgen and generate-dis were used instead of dvcs and dis-rad. Please respect consistency. Please check https://clasweb.jlab.org/clas12/clas12SoftwarePage/html/tutorials/subMit/p1.html

A possible improvement would be to query (either every time or periodically with a cronjob) an online repository, using urllib2 for example, and parse a file structure, instead of statically defining these dictionaries in file_struct.py

Build setup.py for sake of simplicity

Related to #39.

Currently, we're trying individual shell commands for python
e.g. ) python utils/*.py etc etc.

I'm suggesting to write a setup.py which captures libraries and main scripts at the same time, to make other users easily execute our codes.

Grab information on pool availability

This is probably not as urgent, but it would be good to be able to have ~live statistics for how busy pools are when it comes time to submit a job, in order to be able to:
1.) have an estimate of how long it will take for a job to be submitted, this could be done by finding out how many cores are avaliable, or how many current jobs are in queue vs how many cores the pool has, etc.
2.) to be able to decide which is the best pool to submit the job to (the best pool will be the one that can process the job the soonest).
If you go onto http://submit.mit.edu/condormon/index.php you can see occupancy statistics for Tier 2. If we can get an api for this, and for other pools, I think that is what we are looking for. Tier 2 can only run ~ 1-2K jobs at once from what I've seen, so we could know that if there are 25K jobs in front of us by the time we want to submit, that maybe we would go to OSG or something like this

Running scripts at subMIT

The current version of codes cannot run at my macOS local with following error messages:

python src/Submit_Job.py Traceback (most recent call last): File "src/Submit_Job.py", line 34, in <module> db_batch_entry.Batch_Entry(args.scard) File "/Users/sangbaek/CLAS12/clas12simulation/src/db_batch_entry.py", line 26, in Batch_Entry username = user_validation.user_validation() File "/Users/sangbaek/CLAS12/clas12simulation/src/utils/user_validation.py", line 19, in user_validation user_already_exists = utils.sql3_grab(strn) File "/Users/sangbaek/CLAS12/clas12simulation/src/utils/utils.py", line 79, in sql3_grab c.execute(strn) sqlite3.OperationalError: no such column: hostname

and at subMIT
CLAS12 Off Campus Resources Database not found, creating! Traceback (most recent call last): File "src/Submit_Job.py", line 31, in <module> create_database.create_database(args) File "/afs/lns.mit.edu/user/sangbaek/test/clas12simulations/src/utils/create_database.py", line 19, in create_database file_struct.PKs[i],file_struct.foreign_key_relations[i]) File "/afs/lns.mit.edu/user/sangbaek/test/clas12simulations/src/utils/utils.py", line 57, in create_table sql3_exec(strn) File "/afs/lns.mit.edu/user/sangbaek/test/clas12simulations/src/utils/utils.py", line 66, in sql3_exec printer2('Executing SQL Command: {}'.format(strn)) #Turn this on for explict printing of all DB write commands ValueError: zero length field name in format

Manage concurrency

To deal with concurrency, consider indexing immediately off of BatchID, and having that be the only auto-increment. Might need a more robust solution, but this would be a start

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.