Giter Site home page Giter Site logo

irslushy / radarqc.jl Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 3.0 174.19 MB

RONIN (Random forest Optimized Nonmeteorological IdentificatioN) (currently RadarQC.jl) is a Julia implementation of Dr. Alex DesRosiers' P3 MLQC code for removing non-meteorological gates from airborne radar scans using random forests.

Home Page: https://irslushy.github.io/RadarQC.jl/

Julia 20.02% Jupyter Notebook 74.01% Python 5.97%

radarqc.jl's Introduction

RadarQC.jl

RadarQC.jl contains a combination julia/python implementation of the algorithm described in DesRosiers and Bell 2023 for removing non-meteoroloigcal gates from airborne radar scans. Care has been taken to ensure relative similarity to the form described in the manuscript, but some changes have been made in the interest of computational speed.

A key part of the process is computing necessary derived parameters from the raw radar moments, which may be custom-specified in a parameters file. Many of the relevant functions for these calculations are contained within RadarQC.jl.



Acknowledgments

Much of the data used to train the models in this repository is the product of arduous manual editing of radar scans. ELDORA data is provided by the authors of Bell, Lee, Wolff, & Cai 2013. NOAA P3 TDR Data is courtsey of Dr. Paul Reasor, Dr. John Gamache, and Kelly Neighbour. As mentioned above, the code is adapted from the original work of Dr. Alex DesRosiers.


Getting Started:

Setting up the environment (CSU)

After cloning the repository, start Julia using RadarQC as the project directory, either by calling

julia --project=RadarQC

from the parent directory of RadarQC or modifying the JULIA_PROJECT environment variable.
Then, enter package mode in the REPL by pressing ].


Next, run instantiate to download the necessary dependencies. This should serve both to download/install dependencies and precompile the RadarQC package. Now, exit package using the dlete key. To ensure that everything was installed properly, run using RadarQC on the Julia REPL. No errors or information should print out if successful.

Guide adaped from https://github.com/mmbell/Scythe.jl/tree/main

Setting up the environment (Derecho)

Getting Julia

export JULIA_DEPOT_PATH=$SCRATCH/julia
curl -fsSL https://install.julialang.org | sh

Setting up RadarQC project

After cloning the repository, start Julia using RadarQC as the project directory, either by calling

julia --project=RadarQC

from the parent directory of RadarQC or modifying the JULIA_PROJECT environment variable.
Then, enter package mode in the REPL by pressing ].


Next, run activate โ€œ<path/to/repo/RadarQC.jl>โ€. Then run instantiate to download the necessary dependencies. This should serve both to download/install dependencies and precompile the RadarQC package. Run add iJulia if you will be viewing the code in a Jupyter notebook and need access to the Jupyter kernel.

Now, exit package mode using the delete key. To ensure that everything was installed properly, run using RadarQC on the Julia REPL. No errors or information should print out if successful.

Guide adapted from https://github.com/mmbell/Scythe.jl/tree/main

Example notebook


If you're looking to jump right in, check out RadarQC Example Notebook - it contains everything you need to get up and running.



Guide: Processing new data, training, and evaluating a new model



The first step in training a new random forest model is determining which portions of the data will be used for training, testing, and validation. A helpful function here is split_training_testing! - this can be used to automatically split a collection of scans into a training directory and a testing directory. In order for the script to be configured properly, the variables relating to the different paths must be modified by the user - this is shown in the example notebook.

The current configuration is consistent with the 80/20 training/testing split described in the manuscript, as well as to have an equal number of scans from each "case" represented in the testing set. It is expected that the script would work for different training/testing splits, but this has not yet been tested.

Once the training and testing scans have been placed into separate directories, data processing may begin. calculate_features will be the primary function utilized here. The script processes a directory (or single scan) of scans, and outputs the calculated features into an .h5 file, with the desired features specified by the user in a text file.

For the case where training scans are located within /cfradials/training/, the desired features to be calculated are specified in features.txt, and you wish to output the input features to training_set.h5, invoke the function as

calculate_features("/cfradials/training", "features.txt", "training_set.h5")

If you wish to remove a validation set from the training dataset, utilize remove_validation

Finally, we can train a model to process our data. To do so, utilize train_model. If training data is contained within training_set.h5, and you wish to name your trained model trained_model.joblib, invoke as follows. It's recommended to end the model name in .joblib as this is the method used to serialzied it to disk.

train_model("training_set.h5", "trained_model.joblib")

NOTE: This may take on the order of 20-30 minutes if running on the entire ELDORA set.

This script also includes the option to verify the model on the training set and output the results to a separate h5 file. If you wish to do this, execute the same as above, but include the keyword argument verify=true

Evaluating the model

Now - let's apply the trained model on a set of data. The useful function here is QC_scan. In order, pass it arguments of the input location, the configuration file, and the path to the trained model. For this reason, it's important to keep the configuration file used to calculate input features in a known location.

The function will calculate the necessary input features, apply the Random Forest model, and apply the resulting prediction the fields specified by keyword argument VARIABLES_TO_QC. These new variables will then be written back out into the specified netcdf file under the field name concatenated with keyword argument QC_suffix. If this name is already in use in the NetCDF, it will be overwritten.


Notes on data conventions


For the verification 'Y' array in the training scripts I have adopted the convention that 1 indicates METEOROLOGICAL DATA, and 0 indicates NON_METEOROLOGICAL DATA

Furthermore, for QC-ed variables in the output files, the following is adopted:

FILL_VALUE: Removed during MLQC process or didn't meet QC thresholds VALUE: Retained during MLQC process

Data is written out to NetCDF files to be CF-Compliant in Julia and other column-major languages, such that it has dimensions of (range x time). However, if it were loaded in a row-major language, such as python, it would take on dimensions of (time x range).

radarqc.jl's People

Contributors

irslushy avatar cenamiller avatar

Stargazers

 avatar Jen DeHart avatar Alex DesRosiers avatar Jonathan Schluesche avatar

Watchers

Michael M. Bell avatar  avatar

radarqc.jl's Issues

Hardcoded links to datasets

There are a lot of hardcoded paths to datasets in the example notebook. In the short term, we can put them in a project directory on our system. Long-term it would be nice to be able to download them. Also would be nice to test performance with larger datasets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.