Giter Site home page Giter Site logo

annikastein / adversarial-training-for-jet-tagging Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 18.46 MB

Code for "Improving robustness of jet tagging algorithms with adversarial training" (arXiv:2203.13890) Utilized elsewhere yo. Teaching material for undergraduates up to PhD student or postdoc level.

Home Page: https://link.springer.com/article/10.1007/s41781-022-00087-1

License: GNU Lesser General Public License v2.1

Python 2.03% Jupyter Notebook 97.97% Shell 0.01%
adversarial-attacks adversarial-training algorithms classification flavor jet jet-tagging particle-physics pytorch robustness

adversarial-training-for-jet-tagging's Introduction

Adversarial-Training-for-Jet-Tagging

Code for:

Improving robustness of jet tagging algorithms with adversarial training
A. Stein et al. Comput.Softw.Big Sci., 2022.

Jet Flavor dataset

Obtained from http://mlphysics.ics.uci.edu/ and originally created for

Jet Flavor Classification in High-Energy Physics with Deep Neural Networks
D. Guest et al. Physical Review D, 2016.

Get and prepare dataset

Download

Login to a copy18-node of the HPC with high bandwith (will download 2.2GB)

wget http://mlphysics.ics.uci.edu/data/hb_jet_flavor_2016/dataset.json.gz
mkdir -p /hpcwork/<your-account>/jet_flavor_MLPhysics/dataset
mv dataset.json.gz /hpcwork/<your-account>/jet_flavor_MLPhysics/dataset

Extracting the data via awkward arrays

Actually, it turns out that reading the file is not straightforward, at some point, the data has to be unzipped or extracted. The file might have the simple ending ".json", but it's rather various JSON-like entries distributed over several lines of the entire .json file. Consult the notebook preparations/read_dataset.ipynb for further details and potential alternatives to use the dataset. Finally, I ended up using awkward arrays with which the next steps become a bit easier.

A first look at the data

Some initial investigations before proceeding to the actual framework will be conducted inside preparations/explore_dataset.ipynb.

Calculate defaults

To use custom default values that fit well to the bulk distribution, preliminary studies are done inside preparations/defaults.ipynb. It's also the first notebook that makes use of helpers/variables.py.

Clean samples

In order to not store too many versions of the same data, cleaning the samples will not be done as a separate step, but comes later when doing the preprocessing (scaling). There, also the final shape of the arrays will be flattened, the result should be a set of usable PyTorch tensors. During the cleaning, I would not cut on any variables, but would only modify certain unphysical values and place them at special default bins - i.e. the fractions of jets of a certain flavor, in certain pt and eta bins do not change by the next step of cleaning (and preprocessing) the data.

Calculate sample weights

Sample weights are calculated in preparations/reweighting.ipynb.

Preprocessing

Calculate scalers (from trainset only, and ignore defaults), apply scalers (do not ignore defaults when applying the scaler, alternative: set to zero), train/val/test splitting & shuffling, build sample weights and bins. See preparations/clean_preprocess.ipynb for a first working example of the entire preprocessing chain. Also, evaluate/tools.py can be used later to facilitate communication between training or evaluation scripts with the preprocessing step.

Run framework (training, evaluation)

Training

All relevant scripts are placed inside training, e.g. standalone training on current node is done with training.py, and for submission to the batch system, there is training.sh and submit_training.py. Can use nominal or adversarial training.

Evaluation

ROC curves: evaluate/eval_roc_new.py. Training history (loss): evaluate/plot_loss.py. Tagger outputs and discriminator shapes: evaluate/eval_discriminator_shapes.py. Plotting of input variables evaluate/eval_inputs.py.

adversarial-training-for-jet-tagging's People

Contributors

annikastein avatar

Forkers

akkarimi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.