Giter Site home page Giter Site logo

bfonta / bye_splits Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 8.0 796 KB

Understand and fix the observed cluster splitting in the CMS Stage 2 reconstruction on FPGAs

License: MIT License

Python 94.15% Shell 2.05% Makefile 0.46% CSS 0.06% HTML 0.09% C++ 3.18%

bye_splits's Introduction

Table of Contents

  1. Installation
  2. Data production
    1. Skimming
    2. Data sources
  3. Reconstruction Chain
    1. Cluster Size Studies
  4. Event Visualization
    1. Setup
    2. Setup in local browser
    3. Visualization in local browser
      1. 2D display app
      2. 3D display app
    4. Visualization with OpenShift OKD4
      1. Additional information
  5. Cluster Radii Studies
  6. Merging plotly and bokeh with flask
    1. Introduction
    2. Flask embedding
      1. Note
  7. Producing tikz standalone pictures

img

This repository reproduces the CMS HGCAL L1 Stage2 reconstruction chain in Python for quick testing. It can generate an event visualization app. It was originally used for understanding and fixing the observed cluster splitting.

Installation

# setup conda environment
create -n <EnvName> python=3 pandas uproot pytables h5py
conda activate <EnvName>

# setup a ssh key if not yet done and clone the repository
git clone [email protected]:bfonta/bye_splits.git
# enforce git hooks locally (required for development)
git config core.hooksPath .githooks

The user could also use Mamba, a fast and robust package manager. It is fully compatible with conda packages and supports most of conda’s commands.

Data production

Skimming

To make the size of the files more manageable, a skimming step was implemented that relies on ROOT's RDataFrame. Several cuts are applied, and additionally many type conversions are run for uproot usage at later steps. To run it:

python bye_splits/production/produce.py --nevents -1 --particles photons

where "-1" represents all events, and the input file is defined in config.yaml.

Data sources

This framework relies on photon-, electron- and pion-gun samples produced via CRAB. The most up to date versions are currently stored under:

Photons (PU0) /dpm/in2p3.fr/home/cms/trivcat/store/user/lportale/DoublePhoton_FlatPt-1To100/GammaGun_Pt1_100_PU0_HLTSummer20ReRECOMiniAOD_2210_BCSTC-FE-studies_v3-29-1_realbcstc4/221025_153226/0000/
Electrons (PU0) /dpm/in2p3.fr/home/cms/trivcat/store/user/lportale/DoubleElectron_FlatPt-1To100/ElectronGun_Pt1_100_PU200_HLTSummer20ReRECOMiniAOD_2210_BCSTC-FE-studies_v3-29-1_realbcstc4/221102_102633/0000/
Pions (PU0) /dpm/in2p3.fr/home/cms/trivcat/store/user/lportale/SinglePion_PT0to200/SinglePion_Pt0_200_PU0_HLTSummer20ReRECOMiniAOD_2210_BCSTC-FE-studies_v3-29-1_realbcstc4/221102_103211/0000
Photons (PU200) /eos/user/i/iehle/data/PU200/photons/ntuples
Electrons (PU200) /eos/user/i/iehle/data/PU200/electrons/ntuples

The PU0 files above were merged and are stored under /data_CMS/cms/alves/L1HGCAL/, accessible to LLR users and under /eos/user/b/bfontana/FPGAs/new_algos/, accessible to all lxplus and LLR users. The latter is used since it is well interfaced with CERN services. The PU200 files were merged and stored under /eos/user/i/iehle/data/PU200/<particle>/.

Reconstruction Chain

The reconstruction chain is implemented in Python. To run it:

python bye_splits/run_chain.py

where one can use the -h flag to visualize available options. To use the steps separately in your own script use the functions defined under bye_splits/tasks/, just as done in the iterative_optimization.py script.

For plotting results as a function of the optimization trigger cell parameter:

python plot/meta_algorithm.py

The above will create html files with interactive outputs.

Cluster Size Studies

The script bye_splits/scripts/cluster_size.py reads a configuration file bye_splits/scripts/cl_size_params.yaml and runs the Reconstruction Chain on the .root inside corresponding to the chosen particle, where the clustering step is repeated for a range of cluster radii that is specified in the parameter file under cl_size: Coeffs.

The most convenient way of running the study is to do:

bash run_cluster_size.sh <username>

where <username> is your lxplus username, creating .hdf5 files containing Pandas DFs containing cluster properties (notably energy, eta, phi) and associated gen-level particle information for each radius. The bash script acts as a wrapper for the python script, setting a few options that are convenient for the cluster size studies that are not the default options for the general reconstruction chain. As of now, the output .hdf5 files will be written to your local directory using the structure:

├── /<base_dir>
│            ├── out
│            ├── data
│            │   ├──new_algos

with the files ending up in new_algos/. Currently working on implementing an option to send the files directly to your eos/ directory, assuming the structure:

├── /eos/user/<first_letter>/<username>
│                                   ├── out
│                                   ├── data
│                                   │   ├──PU0
│                                   │   │   ├──electrons
│                                   │   │   ├──photons
│                                   │   │   ├──pions
│                                   │   ├──PU200
│                                   │   │   ├──electrons
│                                   │   │   ├──photons
│                                   │   │   ├──pions

Event Visualization

The repository creates two web apps that can be visualized in a browser. The code is stored under bye_splits/plot.

Setup

Please install the following from within the conda environment you should have already created:

conda install -c conda-forge pyarrow
#if the above fails: python -m pip install pyarrow
python3 -m pip install --upgrade pip setuptools #to avoid annoying "Setuptools is replacing distutils." warning

Setup in local browser

Since browser usage directly in the server will necessarily be slow, we can:

Use LLR's intranet at llruicms01.in2p3.fr:<port>/display

Forward it to our local machines via ssh. To establish a connection between the local machine and the remote llruicms01 server, passing by the gate, use:

ssh -L <port>:llruicms01.in2p3.fr:<port> -N <llr_username>@llrgate01.in2p3.fr
# for instance: ssh -L 8080:lruicms01.in2p3.fr:8080 -N [email protected]

The two ports do not have to be the same, but it avoids possible confusion. Leave the terminal open and running (it will not produce any output).

Visualization in local browser

1) 2D display app

In a new terminal window go to the llruicms01 machines and launch one of the apps, for instance:

bokeh serve bye_splits/plot/display/ --address llruicms01.in2p3.fr --port <port>  --allow-websocket-origin=localhost:<port>
# if visualizing directly at LLR: --allow-websocket-origin=llruicms01.in2p3.fr:<port>

This uses the server-creation capabilities of bokeh, a python package for interactive visualization (docs). Note the port number must match. For further customisation of bokeh serve see the serve documentation. The above command should give access to the visualization under http://localhost:8080/display. For debugging, just run python bye_splits/plot/display/main.py and see that no errors are raised.

2) 3D display app

Make sure you have activated your conda environment. conda activate

Run the following lines. With these commands, some useful packages to run the web application (e.g. dash, uproot, awkward, etc) will be installed in your conda environment:

conda install dash
python3 -m pip install dash-bootstrap-components
python3 -m pip install dash-bootstrap-templates
conda install pandas pyyaml numpy bokeh awkward uproot h5py pytables
conda install -c conda-forge pyarrow fsspec

Then go to the llruicms01 machine (if you are indide LLR intranet) or to your preferred machine and launch:

python bye_splits/plot/display_plotly/main.py --port 5004 --host localhost

In a browser, go to http://localhost:5004/. Make sure you have access to the geometry and event files, to be configured in config.yaml.

Visualization with OpenShift OKD4

We use the S2I (Source to Image) service via CERN's PaaS (Platform-as-a-Service) using OpenShift to deploy and host web apps in the CERN computing environment here. There are three ways to deploys such an app: S2I represents the easiest (but less flexible) of the three; instructions here. It effectively abstracts away the need for Dockerfiles.

We will use S2I's simplest configuration possible under app.sh. The image is created alongside the packages specified in requirements.txt. The two latter definitions are documented here.

We are currently running a pod at https://viz2-hgcal-event-display.app.cern.ch/. The port being served by bokeh in app.sh must match the one the pod is listening to, specified at configuration time before deployment in the OpenShift management console at CERN. The network visibility was also updated to allow access from outside the CERN network.

Additional information

Cluster Radii Studies

A DashApp has been built to interactively explore the effect of cluster size on various cluster properties, which is currently hosted at https://bye-splits-app-hgcal-cl-size-studies.app.cern.ch/. To run the app locally, you can do:

bash run_cluster_app.sh <username>

where <username> is your lxplus username. The app reads the configuration file bye_splits/plot/display_clusters/config.yaml and assumes that you have a directory structure equivalent to the directories described in the cluster size step (depending on your choice of ```Local```).

It performs the necessary analysis on the files in the specified directory to generate the data for each page, which are themselves written to files in this directory. In order to minimize duplication and greatly speed up the user experience, if one of these files does not exist in your own directory, it looks for it under the appropriate directories (listed in our Data Sources), where a large number of the possible files already exist. The same procedure is used for reading the generated cluster size files, so you can use the app without having had to run the study yourself.

Merging plotly and bokeh with flask

Introduction

Flask is a python micro web framework to simplify web development. It is considered "micro" because it’s lightweight and only provides essential components. Given that plotly's dashboard framework, dash, runs on top of flask, and that bokeh can produce html components programatically (which can be embedded in a flask app), it should be possible to develop a flask-powered web app mixing these two plotting packages. Having a common web framework also simplifies future integration.

Flask embedding

The embedding of bokeh and plotly plots within flask is currently demonstrated in plot/join/app.py. Two servers run: one from flask and the other from bokeh, so special care is required to ensure the browser where the app is being served listens to both ports. Listening to flask's port only will cause the html plot/join/templates/embed.html to be rendered without bokeh plots.

Note

Running a server is required when more advanced callbacks are needed. Currently only bokeh has a server of its own; plotly simply creates an html block with all the required information. If not-so-simple callbacks are required for plotly plots, another port will have to be listened to.

Producing tikz standalone pictures

For the purpose of illustration, tikz standalone script have been included under docs/tikz/. To run them (taking docs/tikz/flowchart.tex as an example):

cd docs/tikz/
pdflatex -shell-escape flowchart.tex

The above should produce the flowchart.svg file. The code depends on latex and pdf2svg.

bye_splits's People

Contributors

bfonta avatar mchiusi avatar isehle avatar

Stargazers

Andrew Gilbert avatar Théo Cuisset avatar  avatar

Watchers

 avatar  avatar  avatar

bye_splits's Issues

Lack of Default "All Events" Option

In order to run the reconstruction chain as currently written, one has to pass a specified integer number of events here, when it would be convenient to have a default "all events" option (by passing nevents=-1 for example).

[Bug Report] The code always returns the same event when a single random event is required

provide_random_event function always returns the same event (depending on the hardcoded seed: 42)

In my display code the user can ask a random event to show it on the screen. The random event is randomly chosen among the available ones by this function, thanks to a seed. The seed is defined here.

I notice that, since the seed in all cases is 42 (hardcoded value), the provided event is always the same.
Of course, if I change the seed value, the displayed event changes as well.

We can implement a seed choice directly in the display part of the code (I think it is possible, but maybe not elegant), or fix this small problem in event.py.

Broken Link

Please install the following from within the =conda= environment you [[conda_install][should have already created]]:

The internal link that this line references appears to be broken, see below:
Capture d’écran 2023-03-13 à 17 46 47

HGCAL Event Display

Steps to achieve a useful HGCAL event display, with 2D and 3D plots of geometry plus photon, electron, pion and jet events. Plots for cluster-related studies will also be included.

Data production and skimming

  • add boost options in the skimming #12 @bfonta
  • detect whenever the intermediate file created by get_data() is older than the input ROOT file, and change the reprocess flag to True when that happens @mchiusi
  • produce input files with trigger module sums @portalesHEP

2D plots with bokeh

  • control energy threshold from the GUI @bfonta
  • add colors for modules sums #10 @bfonta
    • define data management when different columns have different number of items per event
  • improve the hover information on the module sum side @bfonta
  • find correct x and y cell positions #5 #8 @bfonta
  • visualize cells and wafers U and V in a squared grid @bfonta
  • add gen particle information @bfonta

3D plots with plotly/dash

  • add colorbar in 3d plots @mchiusi
  • add layer selection slider in 3d plots @mchiusi
  • control energy threshold from the GUI @mchiusi
  • add gen particle information @mchiusi
    • produce .root files containing gen particle information layer by layer @bfonta.

Clustering-related studies

Joining plots and web deployment with Flask

Add unit tests

As the project grows, we should add simple unit tests, to avoid constant bug fixes. As a start:

  • check provide_* functions for event and geometry data
  • run the reconstruction chain for multiple parameters and guarantee that it completes successfully and the final result makes sense
  • run the validation of the reconstruction chain
  • make sure the event display scripts complete successfully

The above should run automatically every time a PR is submitted for review.

[Planning] ROI finder + Seeding

  • A region of interest (ROI) will initially be defined as a cylinder parallel to the z axis containing a single module per layer; more complex definitions can be envisaged at a later stage, particularly considering more than one module per layer (advantage: potentially increase energy resolution for particles that cross module boundaries). One has to take into account that each FPGA stage 1 board has access to around 2% of the endcap, so a limit exists on how many modules fit into a board.
  • The ROIs will look at a region in the CE-E and another one in the CE-HSi (which might or might not be required), but will integrate all layers in a cylinder around those regions and pass it to the seeding (stage 2)
  • Each ROI might find something worth exploring or not
  • The seeding will receive as input all trigger cells belonging to all ROIs
  • It is up to the clustering (stage 2 after the seeding) to connect physically-related seeds created from different ROIs

Hardcoded paths in produce.cc

Hardcoded input directory in produce.cc

std::string dir = "/eos/user/b/bfontana/FPGAs/new_algos/";

While we should expect it to be read from the configuration file
photons: /eos/user/i/iehle/data/PU0/photons/skim_photons_0PU_bc_stc_hadd.root
electrons: /eos/user/i/iehle/data/PU0/electrons/skim_electrons_0PU_bc_stc_hadd.root
pions: /eos/user/i/iehle/data/PU0/pions/skim_pions_0PU_bc_stc_hadd.root

In addition, at the moment it can only use the same base directory for input and output files:

skim(tree_name, dir + infile, dir + outfile, particles, nevents);

But the input directory could be read-only (for instance if reading data in another user directory), so it would be better to have a separate base directory to store the output. And ideally this output base directory should be configurable as well.

[Bug] Mismatch in TCs and hexa-modules mutual position

Bug description
The mutual position of TCs and hexa-modules is not correct.
Some TCs come out of their hexagonal module or cross the border. See some examples attached.

Code to correct
The geometry of TCs and hex-modules in the electromagnetic section of the calorimeter is computed here.

The most correlated variables to this issue are diamond_x, diamond_y and hex_x, hex_y.

PR #10

Screenshot 2023-02-28 at 09 59 15
Screenshot 2023-02-28 at 10 01 12

[Feature Request] Cluster radii code duplication

A new PR is needed for the following, discussed in the last resolved comment of #47.

(...) involve merging my [@mchiusi] run_radii_chain and @isehle run_cluster_size scripts to use the run_default_chain and run_new_chain. It's not a trivial task because the current code in these latter scripts doesn't allow running the clustering step multiple times. While I don't think it's difficult, I would prefer to open a new PR and address these code repetitions. After making these modifications, we should also test all the chains independently from the webApp. What do you think? (...)

Is your feature request related to a problem? Please describe.
Merging duplicate code.

Describe the solution you'd like and (optionally) propose some code to make it happen
@mchiusi @isehle

Describe alternatives you've considered
@mchiusi @isehle

Fix cluster-TC association id variable

Describe the bug
Currently tc_cluster_id is being used during sample production and for TC association to clusters. We should use tc_multicluster_id instead.

Additional information
An explanation of all "id-like" variable follows to avoid confusions:

  • cl3d_id is the ID of the 3D cluster, which is defined as the ID of the seed TC
  • tc_cluster_id is the ID of the "2D cluster" to which this TC belongs
  • tc_multicluster_id is the ID of the 3D cluster to which this TC belongs
  • tc_id is the ID of this TC

To be noted that there is no 2D step with the current clustering algo version, so "2D clusters" are effectively single TCs in that case. To be noted also that "seed TC" doesn't have a real meaning with the current algo. Effectively this is the first TC that has been used to build the cluster.

The above clarification was kindly provided by @jbsauvan.

Remove dependence on old and new cell phis

The reconstruction tasks fill and smooth are currently performing twice as many operations as needed. This comes from the legacy study that motivated this repository, namely optimizing trigger cell phi positions, where two collections of trigger cells (old and new after applying shifts) needed to be processed.
The reconstruction chain must be written such that it only runs each task once, and it should be up to the client to decide how many times it is called.

Extreme Lag When Loading Pile Up Data

The method used to read in the data here, notably using uproot's tree.arrays() method, is unable to handle the trigger cell data when pile up is included. Indeed even selecting just 10 events lags for over thirty minutes (and I aborted before it finished). We'll need to find another way of reading in the data.

data = tree.arrays(filter_name='/' + '|'.join(allvars) + '/',

Reducing the number of scripts for running reco chains

At the moment in the scripts folder there are the following files:

I plan to reduce this number of files, keeping only the first three from the list. Indeed, for instance, run_radii_chain.py script sequentially calls default and new chain, thus resulting in an unusual code-repetition.

This work will involve merging run_radii_chain and run_cluster_size scripts into run_default_chain and run_new_chain. It's not a trivial task because the current code in these latter scripts doesn't allow running the clustering step multiple times with different clustering parameters. We will have to open a new PR to address these changes.

Additional context

  • After making these modifications, we should also test all the chains independently and through the webApp.
  • I have identified an issue in the code I merged in PR #47 when running pions. Indeed it may happens that the CS is not found for pions, resulting in an incomplete .hdf5 file and thus a webApp error. I have already solved this problem in my own fork and we can merge it here soon. The new working version of the application has been deployed online.
    I would suggest to make a unique PR, since the modification to solve the issue above already involved run_radii_chain.py script

If you have any suggestions please write them here or we can discuss them in person next week when I'll back at LLR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.