Light

raymondehlers / overwatch Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 6.0 4.31 MB

OVERWATCH provides real time detector monitoring and QA using data from the ALICE HLT

Home Page: https://overwatch.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 85.98% Shell 0.32% JavaScript 4.23% CSS 0.71% HTML 5.10% CMake 0.95% C++ 2.22% Dockerfile 0.49%

alice-experiment hep physics

overwatch's Introduction

Hi, I'm Raymond 👋

I'm a scientist studying the forces that hold matter together and the physics of the early universe about 1 microsecond after the big bang via high-energy collisions of nuclei, sometimes described as "little bangs", at the Large Hadron Collider at CERN in Geneva, Switzerland.

More specifically, I'm an experimental physicist with a focus on studying high momentum transfer processes known as jets, with a particular interest in jet substructure. We use these jets as calibrated probes to study quantum chromodynamics (QCD) and the quark-gluon plasma (QGP) formed in heavy-ion collisions. I experimentally measure jet observables as a member of the ALICE (A Large Ion Collider Experiment) collaboration, as well as the rigorous extraction of fundamental physics parameters via statistical methods (ie. Bayesian Inference) as a member of the JETSCAPE collaboration.

I'm currently a postdoctoral researcher at UC Berkeley and Lawrence Berkeley National Lab.

Further info + contact me

You can find more about me, my work, or just to reach out at my website

overwatch's People

Contributors

Stargazers

Watchers

Forkers

chughes90 ostr00000 sarahlapointe piwowarc raquishp tsyjwct

overwatch's Issues

Update for reset on request

To accommodate the P2 GUI, we need to handle reset on request. To handle this properly, we need

Update receiver
- Test reset on request with receiver
Update process runs merger. The cleanest way probably involves
- Name the reset histogram files something different to differentiate them.
- Extensive testing, since we are modifying the merger.
- Save each combined file, naming it according to the scheme when we don't reset histograms

Final deployment issues

Connect to local service from docker. This is kind of a pain... See: https://forums.docker.com/t/cant-connect-to-host-listening-unix-socket-from-container-vm/15526 , https://stackoverflow.com/questions/24319662/from-inside-of-a-docker-container-how-do-i-connect-to-the-localhost-of-the-mach , https://stackoverflow.com/questions/31249112/allow-docker-container-to-connect-to-a-local-host-postgres-database , moby/moby#11376
Pack the ZODB database for transfer automatically. Perhaps we do this for the docker image too? See: https://opensourcehacker.com/2009/09/01/packing-and-copying-data-fs-from-production-server-for-local-development/ , http://zope.zope.narkive.com/ufHN83Xh/is-it-possible-to-copy-object-b-w-two-zodb-database
Update rsync to only move the required files (ie newFiles). See: https://stackoverflow.com/questions/16647476/how-to-rsync-only-a-specific-list-of-files
Fix uwsgi configuration for selecting devel vs deploy. See: https://uwsgi-docs.readthedocs.io/en/latest/Configuration.html

Do not write file with receiver if run number is 0

Now that HLT resets receivers, it appears that it then resets the run number to 0, but stills responds to requests with an empty payload. The receiver should note the run number and not write the file since it is empty.

Add RCT and Logbook links

Add RCT link to each run page next to the logbook link.

Example RCT link

OVERWATCH Status page

Implement a status page showing the status of the tunnels, receivers, ongoing run, etc.

However, we may want to restrict the access

jsRoot branch deployment

Needed for deployment of the jsRoot branch more broadly. Ordered by criticality.

Critical:

Docker
- Create a single docker image containing both nginx and the web app
  - Will mount data via volumes
- Can we log to somewhere external? Say the sshfs connected drive? See: #4
Utilize vulcanize on layout.html. It current chokes on jinja functions, so some care is needed. Should be used when deployed, but should work without it when developing -> Committed to repo, but should almost always be fine - Updates to it should be rare.
- Minimize polymer components needed
Combined, simplify, and improve initOVERWATCH.sh. Move all configuration to one stub. Consolidate logging information (maybe use log4bash?). Should there only be one
script for running processing and creating HLT receivers?
Test and/or fix upstart script (Written but untested). Needed for OVERWATCH at CERN on Ubuntu 14.04 -> Irrelevant. SLC 7, Ubuntu 16.04, and Debian 8 all support systemd. It will need to be edited by install location, but this is not particularly difficult.
Use CDN when possible for jsRoot, Polymer

TODO:

Limit log file size for receivers. logrotate will probably work.

jsRoot branch web site completion

From #18

TODO:

OVERWATCH status to show the status of other sites and processing. See: #12
Remove any additional routing if possible

Better logging in the web app

Implement better logging and error notification. It is all explained here.

Asynchronus request for QA and partial merge functions

Split out the QA and partial merge functions to use worker pools using something such as celery. This would require asynchronous loading of the images once the process is completed. This would also help with #2.

jsRoot branch web site

Needed for jsRoot branch deployment. Listed by priority:

Critical:

Utilize database. Either ZODB or sqlite. ZODB seems very promising. See #2
Time slices interface
- Display
- Make directly linkable
Disable login when behind SSO. See: #6
Update validation using database values and more sophisticated checks. This is how we keep everything save

Move (and eventually delete) redundant files with the same number of events

We have a very large number of redundant files due to the HLT continuing to send data until after the run ends (until recently). While not necessarily problematic, it would be to clean them up from a disk usage perspective. Plus, it is fewer files to manage. We will probably need a reprocessing afterwards to ensure that everything works correctly (plus we'll need to recreate the combined hists files to ensure they are based on the right timestamp).

To identify the redundant files, we should look at the number of events in a given run. Once we have the same number in two different files (we should check that this doesn't happen to one of the first files because this may be a false positive), then the following files are redundant and can be removed. Perhaps it's best to move them first, verify that everything works, and then delete them afterwards. If nEvents is not in the file, perhaps the comparison script created for the EMCal corrections can be adapted to make comparisons here?

This should probably be done externally of OVERWATCH. I think there was a similar script developed in December of last year.

What do you think?

Refresh automatically

Something like this should work fine: https://stackoverflow.com/a/8711918

Add combined files to files page links

deploy2017 branch processing completion

Continuation from #19

TODO:

Update QA container.
Figure out how to implement time series. Probably a post processing function.
Update Basic QA.

Allow additional user options for reprocessing

Allow additional user options for reprocessing via the time dependent merge. Additional options should include:

Option to disable scaling by nEvents.
Option to change the hot channel warning threshold. This should be done generally enough such that other values could be set in the future.

Receiver continues to send data after EOR

This causes us to save a bunch of useless data, as it keeps sending the same data over and over after the end of run. This was not the case during the PbPb run in 2015.

For a resolution, perhaps look at comparing the current file with the previous one. If so, then tell the receiver to reset the merger. The best way to achieve this is currently unclear

Determine run period using AliEn

Based on Salvatore's suggestion, we could access the run period information from AliEn. This would look something like:

Retrieve the run period information via alien_find.
Connect to the OVERWATCH database and update a new run period object with the first and last run. This should be straightforward, since the connection is available via the utilities module.
Will need to manage how to keep the AliEn connection open.

jsRoot branch processing

Needed for jsRoot branch deployment. Listed by priority

Critical:

Utilize database. See #2
Finish updates to time slices.
Review subsystem class directory structure. In particular, we need to remove the dependence on dirPrefix, because that can change from system to system (although perhaps this is fine, because we require the "data/" symlink).

Time slices with different processing options are not displayed correctly

Caused by two time slices with different processing options by the same time extent not being differentiated in the filename. For example

0-3 minutes, scale on:
timeSlice.0.3.root
-> Processing produces the scaled result, as expected
0-3 minutes, scale off:
timeSlice.0.3.root
-> Processing produces the unscaled result, as expected
0-3 minutes, scale on:
-> Processing knows this has been done by -> Returns existing time slice, but uses data from timeSlice.0.3...
-> This is the unscaled result! Wrong!

Handle server failures with ajax properly

Currently, the user just waits there. Eventually they will probably give up, but it would be better to handle them proactively.

At least handle 500 server internal error

Update to database to handle metadata

Processing relies heavily on metadata. These operations can be very slow, particularly on slows disks. To resolve this, a database should be created (likely built using MongoDB) which caches and manages such metadata, thereby reducing the load on the disk.

A longer term approach could store the data directly there, creating ROOT histograms (or some other visualization tool) on the fly. May nicely integrate with #1.

jsRoot branch interface completion

From #17

TODO:

Site side of OVERWATCH status. See #12
Display and store HLT mode
Make some sort of bookmarks on the run list to make navigation easier. At worst, everyone 100 runs. Better would be by month. Ideal would be period, but basically impossible
without querying logbook...
Implement new processing options. See: #3

jsRoot branch interface

Needed for deployment of the jsRoot branch. Ordered by priority.

Critical:

Show spinning wheel when loading an ajax request

Update authentication method

Update authentication method, likely by adding CERN Single Sign On (SSO) support. This would require a reworking of the authentication system. Our authentication system is very simple, so this should not be too terrible. For more information, see here. The integration process appears to be a very involved.

It may be simpler to implement using SAML2. See the CERN SSO page for more information.

deploy2017 branch documentation

Desired for the deploy2017 branch deployment. Ordered by priority. This would be nice, but less critical than other issues.

TODO:

Update detector docs
Update module docs
Update README.md

Handles slashes and spaces in hist + image name

The TPC QA component has a slash and a space in histogram, which then propagates through the code and causes issues when writing the images. The space is fine, but the slash is not. Should be fixed by checking for and replacing dangerous characters when writing filenames.

Typescript for js to simplify handling issues

It looks quite nice and could save substantial time: https://www.typescriptlang.org/

Print histogram titles for HLT

Print histogram titles on the canvas for the HLT

jsroot Integration

Make the displayed histograms interactive by using jsroot. Allows zooming, log axis switches, precise value information, etc.

The implementation that we are most likely to be interested in is to created JSON files along with the images. It is further described under their documentation and implementation information page. Briefly, we would create the JSON files using TBufferJSON, and then these would be served using jsroot. However, such an approach may require a reworking of the page, as loading all the images at once would likely make things very slow. The required async requests for the images may also further complicate the implementation. Further information is available at their documentation and examples.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.