Giter Site home page Giter Site logo

supernnova / supernnova Goto Github PK

View Code? Open in Web Editor NEW
28.0 6.0 7.0 5.62 MB

Open Source Photometric classification https://supernnova.readthedocs.io

License: MIT License

Python 99.84% Shell 0.16%
cosmology supernova deep-learning arxiv reproducible-science recurrent-neural-networks pandas python pytorch bayesian-neural-networks

supernnova's Introduction

Paper DOI arXiv Data DOI

Logo

Build Status

Read the documentation

For the main branch: https://supernnova.readthedocs.io

The paper branch differs slightly from the master. Take a look to "changelog_paper_to_new_branch" or Build the docs for this branch.

Installation

Clone this repository (preferred)

git clone https://github.com/supernnova/supernnova.git

or install pip module (check versioning)

pip install supernnova

Read the paper

Links to the publication: MNRAS,ArXiv. All results quoted in these publications were produced using the branch "paper" which is frozen for reproducibility.

Please include the full citation if you use this material in your research: A Möller and T de Boissière, MNRAS, Volume 491, Issue 3, January 2020, Pages 4277–4293.

Table of contents

  1. Repository overview
  2. Getting Started 0. [Use Poetry in new releases]
    1. With Conda
    2. With Docker
  3. Usage
  4. Reproduce paper
  5. Pipeline Description
  6. Running tests
  7. Build the docs

Repository overview

├── supernnova              --> main module
    ├──data                 --> scripts to create the processed database
    ├──visualization        --> data plotting scripts
    ├──training             --> training scripts
    ├──validation           --> validation scripts
    ├──utils                --> utilities used throughout the module
├── tests                   --> unit tests to check data processing
├── sandbox                 --> WIP scripts

Getting started

With Conda

cd env

# Create conda environment
conda create --name <env> --file <conda_file_of_your_choice>

# Activate conda environment
source activate <env>

With Docker

cd env

# Build docker images
make cpu  # cpu image
make gpu  # gpu image (requires NVIDIA Drivers + nvidia-docker)

# Launch docker container
python launch_docker.py (--use_gpu to run GPU based container)

For more detailed instructions, check the full setup instructions

Usage

When cloning this repository:

# Create data
python run.py --data  --dump_dir tests/dump --raw_dir tests/raw --fits_dir tests/fits

# Train a baseline RNN
python run.py --train_rnn --dump_dir tests/dump

# Train a variational dropout RNN
python run.py --train_rnn --model variational --dump_dir tests/dump

# Train a Bayes By Backprop RNN
python run.py --train_rnn --model bayesian --dump_dir tests/dump

# Train a RandomForest
python run.py --train_rf --dump_dir tests/dump

When using pip, a full example is https://supernnova.readthedocs.io

# Python
import supernnova.conf as conf
from supernnova.data import make_dataset

# get config args
args =  conf.get_args()

# create database
args.data = True            # conf: making new dataset
args.dump_dir = "tests/dump"        # conf: where the dataset will be saved
args.raw_dir = "tests/raw"      # conf: where raw photometry files are saved 
args.fits_dir = "tests/fits"        # conf: where salt2fits are saved 
settings = conf.get_settings(args)  # conf: set settings
make_dataset.make_dataset(settings) # make dataset

Reproduce paper results

Please change to branch paper:

python run_paper.py

General pipeline description

  • Parse raw data in FITS format
  • Create processed database in HDF5 format
  • Train Recurrent Neural Networks (RNN) or Random Forests (RF) to classify photometric lightcurves
  • Validate on test set

Running tests with py.test

PYTHONPATH=$PWD:$PYTHONPATH pytest -W ignore --cov supernnova tests

Build docs

cd docs && make clean && make html && cd ..
firefox docs/_build/html/index.html

supernnova's People

Contributors

anaismoller avatar dependabot[bot] avatar fjammes avatar gbpoole avatar jhu-s avatar julienpeloton avatar tdeboissiere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

supernnova's Issues

SNANA FLT changed for BAND

SNANA format:
Rick Kessler "changed the name of the FLT column in the data to BAND … SNANA codes accept either FLT or BAND for back-compatibility, and non-SNANA codes will need to do the same. The other ‘either’ option is REDSHIFT_FINAL or REDSHIFT_CMB."

'features' doesn't exist

Hi Anais and team,

Things are no longer working smoothly on midway. Got an exception about 'features' does not exist. If you have any ideas on what might be behind it I'm all ears.

Log is located at /scratch/midway2/rkessler/PIPPIN_OUTPUT/DJB_SPEC/3_CLAS_old/SNNVANILLATRAIN_TRAIN_SPEC_FIT/output.log on midway2.

Full log is below:

[Data processing] 15s
Traceback (most recent call last):
  File "run.py", line 204, in <module>
    raise e
  File "run.py", line 43, in <module>
    make_dataset.make_dataset(settings)
  File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/data/make_dataset.py", line 749, in make_dataset
    data_utils.save_to_HDF5(settings, df)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/data_utils.py", line 576, in save_to_HDF5
    list_training_features + ["FLT"]
AssertionError
/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "run.py", line 28, in <module>
    settings = conf.get_settings()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/conf.py", line 364, in get_settings
    settings = experiment_settings.ExperimentSettings(args)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 57, in __init__
    self.set_feature_lists()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 190, in set_feature_lists
    self.all_features = hf["features"][:].astype(str)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/h5py/_hl/group.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'features' doesn't exist)"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 200, in <module>
    settings = conf.get_settings()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/conf.py", line 364, in get_settings
    settings = experiment_settings.ExperimentSettings(args)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 57, in __init__
    self.set_feature_lists()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 190, in set_feature_lists
    self.all_features = hf["features"][:].astype(str)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/h5py/_hl/group.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'features' doesn't exist)"

Not that in addition, this exception is uncaught, in that it does not produce a done file marked with FAILURE as supposed to, and instead the application terminates.

Pip update

Pip version is not in sync with recent SNN changes.

Settings are not the same and thus crashes

Cyclic speeding

train_cyclic

data is loaded twice, once in get_lr and once afterwards. Optimise to load only once.

TypeError in method process_single_csv

Line 473 of make_dataset.py reads:

for c_ in [2, list(set(len(settings.sntypes.keys())))]:

This is currently throwing the following error:

TypeError: 'int' object is not iterable

This will always happen, as method len always returns an int type.

problem reading sntypes

It is the missing types problem and error we discussed when reading from different SNANA sims. pls help!!

Let me know if you need more files...

Unusual results for PS1 Data

As discussed on Slack - When fitting PS1 data (HEAD and PHOT files in here $PS1_ROOT/lcmerge/PS1_PS1MD_cen_SIGCLIP_FITS/ ) we get some weird results - only 450 SNe in the sample are classified as Ia, and this seems to be entirely random. A couple spectroscopically confirmed SNIa are being listed as 100% chance CC, so something's up. I generated some SNN example plots using the following command :

/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/old-PS1/job_plot.slurm

and in general the relevant directory is here:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/SNNTEST_PS_DATAPS1_SNNTEST_PS_PS1+MVCC

AttributeError: 'ExperimentSettings' object has no attribute 'sntype_var'

Using the branch elasticc, the code used to process ZTF fails with error:

File "fink_science/snn/processor.py", line 167, in snn_ia
    ids, pred_probs = classify_lcs(pdf, model, 'cpu')
  File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/validation/validate_onthefly.py", line 99, in classify_lcs
    df = format_data(df, settings)
  File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/validation/validate_onthefly.py", line 59, in format_data
    df = pivot_dataframe_single_from_df(df, settings)
  File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/data/make_dataset.py", line 722, in pivot_dataframe_single_from_df
    + class_columns
AttributeError: 'ExperimentSettings' object has no attribute 'sntype_var'

Any ideas?

Database

TypeError: No conversion path for dtype: dtype('<U120')

data_types_training in make_dataset

access above path exists

To avoid issues of access in cluster deployment we should add

import os
os.access(path, os.R_OK)

recent issue involved
supernnova/utils/data_utils.py
Path(f"{settings.fits_dir}/FITOPT000.FITRES").exists()

but could happen with other checks

PS1 classification issues

Morning!

After you fixed the plotting issue from yesterday, I went and reclassified the PS1 data. We were seeing only 450 SNe pass as SNN-classified type Ia, and most of the spectroscopically confirmed Ia were marked as 100% CC by SNN. This hasn't changed since the update pushed yesterday, and we are a bit lost. (to be clear - I did not re-train a model, just re-used the same model on the data again after the update).

I've sent some pictures on slack that I hope demonstrate the problem, and the relevant directories are here:

This is where the PS1 training set was generated and trained:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/BP-PS1-CLASS/
This is where it was used to fit the PS1 data:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/
And tested on simulated PS1 data here:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/BP-PS1-CLASSTEST/

I remade the light-curve plots, they live here: /scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/SNNTEST_PS_DATAPS1_SNNTEST_PS_PS1+MVCC/dump/lightcurves/SNNTEST_PS_PS1+MVCC/early_prediction

They look good now!

Function pivot_dataframe_batch is not handling concurrent exceptions (fails silently)

In make_dataset.py, lines 696-701 read:

    for chunk_idx in tqdm(list_chunks, desc="Pivoting dataframes", ncols=100):
        parallel_fn = partial(pivot_dataframe_single, settings=settings)
        # Process each file in the chunk in parallel
        with ProcessPoolExecutor(max_workers=max_workers) as executor:
            start, end = chunk_idx[0], chunk_idx[-1] + 1
            executor.map(parallel_fn, list_files[start:end])

The iterator with results of the execution of the map function is not being accessed at any time afterwards. Unfortunately, this has as a side effect that, if any of the executors fail, the corresponding exception is never going to be raised, and, therefore, it will fail silently. (See https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.