Giter Site home page Giter Site logo

lavrukhina_flare-classifier's Introduction

flare-classifier

Project description

This repository is dedicated to the work related to the search for red dwarf flares using machine learning techniques. The task was reformulated as a binary classification problem, where label 1 indicates a flare event, 0 โ€“ other events. Three models of classificators based on features extracted from light curve were trained: Random forest, CatBoost and MLP.

How to run demo DVC pipeline

python3 -mvenv venv
source venv/bin/activate
python3 -mpip install -r requirements.txt
python3 -mpip install dvc
python3 -mdvc repro

The pipeline consists of several stages:

  • prepare_datasets: feature extraction process, raw data combines to train/val/test samples
  • train_rf: training process of Random Forest model
  • train_catboost: training process of CatBoost model
  • train_mlp: training process of MLP model
  • evaluate_rf: Random Forest evaluation
  • evaluate_catboost: CatBoost evaluation
  • evaluate_mlp: MLP evaluation

All metrics and metadata placed in metrics folder and respective subfolder after pipeline execution.

Data Version Control

DVC, Data Version Control, is a useful tool for data versioning with git. We use it with our S3-compatible storage available at s3.lpc.snad.space

Credentials

You need a login at https://minio.lpc.snad.space, contact [email protected] if you believe you could have one.

Steps to setup aws-cli before using dvc with our remote.

  1. Install aws-cli from yout package manager, like brew install awscli or python3 -mpip install awscli
  2. Run aws configure and keep your terminal open
  3. Go to https://minio.lpc.snad.space/access-keys and create an access key, copy-paste public and private keys into the terminal, set region to us-east-1, nothing for output format (see details here)
  4. Run aws configure set default.s3.signature_version s3v4
  5. Save the access key in the browser, keep the policy empty

Start with DVC

  1. Install DVC with s3 extra, for example via python3 -mpip install 'dvc[s3]' (note quotes) or via your package manager
  2. dvc pull gets the data from the remote and put it where it belongs
  3. dvc add path adds new / updated datafile to a local dvc repository
  4. dvc push pushes data to the remote
  5. git commit commits dvc hashes (not data files themselves) into the local git repo
  6. git push

Do not forget to dvc push when git push (maybe we need git post-commit hook for this?).

LINCC-Frameworks Python Project Template

This project was automatically generated using the LINCC-Frameworks python-project-template.

For more information about the project template see the readme documentation.

lavrukhina_flare-classifier's People

Contributors

anlava avatar hombit avatar aiofhuman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.