clas12simulations's People
clas12simulations's Issues
test
testing
Database Table Design
- Develop v1.0 DB schema regarding pertinent tables and fields
Web Interface for Job Submission
Instead of passing arguments through an scard.txt file which then gets copied and parsed into a database, have a web site where a user can make selections through a GUI, these options would then be written directly to a server side database, and then eventually parsed into submission scripts and passed to HTCondor.
This is a large project and would require significant changes to the code base.
db structure - Submissions
Create two more entries - submission on..?
Cronjob Functionality (future)
This might not be ready to implement yet, but at some point will need cronjobs (or other task scheduler management system) to do things such as:
- scan scard table in database for unsubmitted jobs and submit to HTCondor
- scan output directories for completed jobs and update DB records accordingly, and distribute output files if needed
- Other schedueling tasks
This is probably not useful presently, but will be needed in the future and the codebase can start being developed at any time
Enable Job Submission on Holyoke Computing Pool
Through SubMIT or directly, we should be able to submit jobs to Holyoke. This should be done if possible.
Sanitization of database entries
The inputs to the database coming from exterior files, i.e. scard and gcard files, need to be "sanatized" to the best of our ability to prevent against SQL injection attacks.
Currently the scard has minimal to moderate techniques to prevent against this, the gcard has ZERO techniques for sanitization implemented. This should be of high priority; we cannot publish a DB without moderate to high levels of security in all areas in this regard.
Improve GCard directory comprehension & downloading
The current mechanism to find and download gcards (found in gcard_helper.py) is very basic and very probably might not work on a general online directory. It works for the example Mauri linked but should be made more robust and general to work in more cases, and throw error messages if things go wrong.
improve debug = 0 print functions
the user will be interested in some kind of log - batch ID, submission ID,timestamp - the DEBUG = 0 case should output all of these to the client
Pass arbitrary location of scard to Submit_job.py
Pass arbitrary location of scard to submit_job.py as a command line arguement (-s ) and also to db_batch_entry.py
Grab submitted job ID number from HTCondor
When submitting a job using HTCondor, we get a message such as:
jobs 7482391...98 submitted successfully
We will at the end of the day need these HTCondor job IDs for logging purposes, which can maybe be accomplished with something like subprocess.check_output()
Enable directory support for background merging
Similar to gcard and user specified event file support; need to enable the file transfer of background merging into the simulation
Enable Job Submission on MIT Tier 3 Computing Pool
As of late February, 2019, unable to successfully submit and process jobs on MIT Tier 3. This needs to be fixed.
directories support
We'll need this for the gcards, the generator, the background merging.
Basically the gcard entry (or the others) should support an entire directory containing gcards.
Then the simulation is duplicated with the exact same conditions except it will use each gcard in the directory.
When we discussed this I proposed a directory visible by the submit scripts (same machine). But it could get complicated.
I propose one mechanism only, to read directories visible online. The python script could use wget to get them.
Example for the gcard:
gcard: https://username.jlab.org/ungaro/tmp/gcards
The script would download the entire directory and execute N condor submits, one for each gcard in there.
Redesign Users table & db_user_entry
When an scard is submitted, query the Users table, if user does not exist, add user to DB. Remove "default" user. Remove email field and replace with host name field, using something like whoami, echo $USER, or hostname
implement Bobby's table design
Allow User to Specify Custom Input Event File
Allow a user to upload and use a custom event generation file, so they do not have to use one of the 3 currently functioning generators if they do not want to. This file would probably have to be copied to the subMIT machine and then passed through to HTCondor, and then maybe deleted after simulation is complete so as to not build up unneeded files.
Improve relative paths
The current file structure system lists relative/absolute paths that I think can get corrupted easily if things are not ran from their intended locations. This might not be an issue but I feel like there might be a better way to set up path locations than what is currently done in the system.
Create global debug variable & print statement
set up a global DEBUG variable which can be passed as an argument to any and all scripts to print out messages, similar to VERBOSE
Enforce Foreign Key Relations in SQLite DB
FK relations need to be enforced, at least between scard and users tables (i.e. the username in every scard.txt should match up with a registered user in the USERS table, otherwise throw an error)
Job Statistics Parsing and Logging
Create output files from runscript to contain statistics on runtime, cores used, other statistics. Can be used in connection with what HTCondor already produces, or be a separate file. Parse and log this information into the MC Database
Improved Error Handling
There are a number of errors that can occur:
user lists scard location that doesn't exist
user lists gcard location that doesn't exist
connection to gcard online directory fails / times out
connection to DB fails / times out
You can think of many more. There should be error handling included to handle these cases, most do not yet exist.
Unique Naming and Logging/Storage of Submission Scripts
The problem is best described with the following example:
I want to submit 5 jobs (Queue = 5) with 100 events each. I run our software, which creates the submission scripts (clas12.condor, runscript.sh) to do this. I submit to HTCondor.
Before any job has processed (job is idle, or running but not complete) I decide I also want to submit 7 jobs (Queue = 7) with 200 events each. I run our software, which creates the submission scripts, and submit the batch.
Instead of having 5 jobs with 100 events and 7 jobs with 200 events, I will instead end up with 14 jobs with 200 events each. I.e., files are not passed to HTCondor in a permanent fashion, they are "live" on the subMIT node until processing is complete.
To fix this, we need to do something like uniquely name each submission script file (maybe just with a unix timestamp even) and store the scripts in their own dedicated folder. After simulations are complete these files can be stored long term, or destroyed to save disc space.
Optimize events / job distribution
User wants to simulate 100 million events. Is it more efficient to submit 1 million jobs with 100 events each, or 100 jobs with 1 million events each? What is the most time efficient way of distributing events? Does this answer change based on what pool you submit to and what resources are available?
The answer to these questions can be included as logic for job submissions so that submissions can be handled intelligently - maybe at end of day we change the scard to instead of have X number of events and Y number of jobs, the user just specifies that they want X*Y number of events and the logic of how many jobs is decided automatically
Which SQL Server to Use
What RDBMS is the best option for our project? MySQL, Postgres, Maria, others?
Generate submission scripts on server side, use functions not templates
Generate submission scripts on server side, use functions not templates. Break up the functions into different sections (header, generator, gemc, statistic logging) that can take in variable arguments. Perhaps split the src/ directory into "client" and "server" directories, with the server being were jobs will actually be submitted / run, and "client" is where scards will be gathered and entered into a DB.
Update submissions tables design
Remove uneeded submission table fields (submission files paths) & add needed ones: submitted on (date,etc ), running on (date, node), completed (date, node) connect back from database
One run script file, not 3
Currently on some computing pools (Tier 2, maybe other) we have 3 submission scripts: condor_wrapper, run_job, and runscript.sh. These should be combined intelligently to create 1 run script file. If we need to create separate templates to run on Tier 2 vs. OSG, maybe we should do this, and then call which template to overwrite based on which pool we are submitting to.
Module directory for subprocedures
If a python script is only called by another main python script, not by bash shell,
we can regard them as "subpackage." We can make a separate directory for them with init.py. This is a super minor thing that can be done afterward.
Create tests for pull requests / CI
It might be helpful to start setting up validation tests to aid with pull requests so we don't have to manually test a bunch of edge cases every time we have a PR. We could use something like https://travis-ci.org/. I haven't had to set this up myself on any previous projects, but should be mostly straightforward. Perhaps we can discuss this at our next group meeting.
Some version of templates missing at some point
Eventually templates will not be used so every script should be written from scratch from db but for some notes..
Luminosity option used at March tutorial was
"
-LUMI_EVENT="NLUMI, 248.5ns, 4ns" -LUMI_P="e-, 10.6GeV, 0deg, 0*deg" -LUMI_V="(0.0, 0.0, -10)cm" -LUMI_SPREAD_V="(0.03, 0.03)cmโ
"
The missing templates had an option to change NLUMI i luminosity was not 0 at scard.
gcards_scard is not correctly replaced at copying gcard, i.e.
cp gcards_scard out_basename $ClusterId
nnevents_scard
is changed to
cp https://userweb.jlab.org/~ungaro/tmp/gcards/ outbasename $ClusterId
_n10
by python codes for example.
The url should be replaced with directory path like $/src/utils/../../submission_files/gcards/gcard_8_batch_4.gcard.
Change keys of genOutput, genExecutable to match current documentation
From line 200~ of file_struct.py
Keys of genOutput and genExecutable were name of generators themselves at March collaboration meeting. I.e. dvcsgen and generate-dis were used instead of dvcs and dis-rad. Please respect consistency. Please check https://clasweb.jlab.org/clas12/clas12SoftwarePage/html/tutorials/subMit/p1.html
A possible improvement would be to query (either every time or periodically with a cronjob) an online repository, using urllib2 for example, and parse a file structure, instead of statically defining these dictionaries in file_struct.py
Build setup.py for sake of simplicity
Related to #39.
Currently, we're trying individual shell commands for python
e.g. ) python utils/*.py etc etc.
I'm suggesting to write a setup.py which captures libraries and main scripts at the same time, to make other users easily execute our codes.
Grab information on pool availability
This is probably not as urgent, but it would be good to be able to have ~live statistics for how busy pools are when it comes time to submit a job, in order to be able to:
1.) have an estimate of how long it will take for a job to be submitted, this could be done by finding out how many cores are avaliable, or how many current jobs are in queue vs how many cores the pool has, etc.
2.) to be able to decide which is the best pool to submit the job to (the best pool will be the one that can process the job the soonest).
If you go onto http://submit.mit.edu/condormon/index.php you can see occupancy statistics for Tier 2. If we can get an api for this, and for other pools, I think that is what we are looking for. Tier 2 can only run ~ 1-2K jobs at once from what I've seen, so we could know that if there are 25K jobs in front of us by the time we want to submit, that maybe we would go to OSG or something like this
Running scripts at subMIT
The current version of codes cannot run at my macOS local with following error messages:
python src/Submit_Job.py Traceback (most recent call last): File "src/Submit_Job.py", line 34, in <module> db_batch_entry.Batch_Entry(args.scard) File "/Users/sangbaek/CLAS12/clas12simulation/src/db_batch_entry.py", line 26, in Batch_Entry username = user_validation.user_validation() File "/Users/sangbaek/CLAS12/clas12simulation/src/utils/user_validation.py", line 19, in user_validation user_already_exists = utils.sql3_grab(strn) File "/Users/sangbaek/CLAS12/clas12simulation/src/utils/utils.py", line 79, in sql3_grab c.execute(strn) sqlite3.OperationalError: no such column: hostname
and at subMIT
CLAS12 Off Campus Resources Database not found, creating! Traceback (most recent call last): File "src/Submit_Job.py", line 31, in <module> create_database.create_database(args) File "/afs/lns.mit.edu/user/sangbaek/test/clas12simulations/src/utils/create_database.py", line 19, in create_database file_struct.PKs[i],file_struct.foreign_key_relations[i]) File "/afs/lns.mit.edu/user/sangbaek/test/clas12simulations/src/utils/utils.py", line 57, in create_table sql3_exec(strn) File "/afs/lns.mit.edu/user/sangbaek/test/clas12simulations/src/utils/utils.py", line 66, in sql3_exec printer2('Executing SQL Command: {}'.format(strn)) #Turn this on for explict printing of all DB write commands ValueError: zero length field name in format
Manage concurrency
To deal with concurrency, consider indexing immediately off of BatchID, and having that be the only auto-increment. Might need a more robust solution, but this would be a start
No runscript_files under submission_files
Probably need a .keep file inside the directory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.