supernnova / supernnova Goto Github PK

Open Source Photometric classification https://supernnova.readthedocs.io

License: MIT License

Python 99.84% Shell 0.16%

cosmology supernova deep-learning arxiv reproducible-science recurrent-neural-networks pandas python pytorch bayesian-neural-networks

supernnova's Issues

[on the fly] fill automatically columns not used

if model uses db with information such as redshift but this info is not used in the classification. automatically fill with zeros before formatting for classification.

Function pivot_dataframe_batch is not handling concurrent exceptions (fails silently)

In make_dataset.py, lines 696-701 read:

    for chunk_idx in tqdm(list_chunks, desc="Pivoting dataframes", ncols=100):
        parallel_fn = partial(pivot_dataframe_single, settings=settings)
        # Process each file in the chunk in parallel
        with ProcessPoolExecutor(max_workers=max_workers) as executor:
            start, end = chunk_idx[0], chunk_idx[-1] + 1
            executor.map(parallel_fn, list_files[start:end])

The iterator with results of the execution of the map function is not being accessed at any time afterwards. Unfortunately, this has as a side effect that, if any of the executors fail, the corresponding exception is never going to be raised, and, therefore, it will fail silently. (See https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map )

access above path exists

To avoid issues of access in cluster deployment we should add

import os
os.access(path, os.R_OK)

recent issue involved
supernnova/utils/data_utils.py
Path(f"{settings.fits_dir}/FITOPT000.FITRES").exists()

but could happen with other checks

Documentation: on the fly

Document on the fly

AttributeError: 'ExperimentSettings' object has no attribute 'sntype_var'

Using the branch elasticc, the code used to process ZTF fails with error:

File "fink_science/snn/processor.py", line 167, in snn_ia
    ids, pred_probs = classify_lcs(pdf, model, 'cpu')
  File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/validation/validate_onthefly.py", line 99, in classify_lcs
    df = format_data(df, settings)
  File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/validation/validate_onthefly.py", line 59, in format_data
    df = pivot_dataframe_single_from_df(df, settings)
  File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/data/make_dataset.py", line 722, in pivot_dataframe_single_from_df
    + class_columns
AttributeError: 'ExperimentSettings' object has no attribute 'sntype_var'

Any ideas?

problem reading sntypes

It is the missing types problem and error we discussed when reading from different SNANA sims. pls help!!

Let me know if you need more files...

implement random length in arXiv script

redshift_label support csv + additional_train_var for validation

CSV does not support redshift label (fixed)

model trained w addionatinal_train_var does not use it properly

clean up cli_args

Some features are repeated and some are misleading...

refactor branch dump configs

dump feature list in config dump

SNANA FLT changed for BAND

SNANA format:
Rick Kessler "changed the name of the FLT column in the data to BAND … SNANA codes accept either FLT or BAND for back-compatibility, and non-SNANA codes will need to do the same. The other ‘either’ option is REDSHIFT_FINAL or REDSHIFT_CMB."

Unusual results for PS1 Data

As discussed on Slack - When fitting PS1 data (HEAD and PHOT files in here $PS1_ROOT/lcmerge/PS1_PS1MD_cen_SIGCLIP_FITS/ ) we get some weird results - only 450 SNe in the sample are classified as Ia, and this seems to be entirely random. A couple spectroscopically confirmed SNIa are being listed as 100% chance CC, so something's up. I generated some SNN example plots using the following command :

/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/old-PS1/job_plot.slurm

and in general the relevant directory is here:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/SNNTEST_PS_DATAPS1_SNNTEST_PS_PS1+MVCC

Cyclic speeding

train_cyclic

data is loaded twice, once in get_lr and once afterwards. Optimise to load only once.

Database

TypeError: No conversion path for dtype: dtype('<U120')

data_types_training in make_dataset

PS1 classification issues

Morning!

After you fixed the plotting issue from yesterday, I went and reclassified the PS1 data. We were seeing only 450 SNe pass as SNN-classified type Ia, and most of the spectroscopically confirmed Ia were marked as 100% CC by SNN. This hasn't changed since the update pushed yesterday, and we are a bit lost. (to be clear - I did not re-train a model, just re-used the same model on the data again after the update).

I've sent some pictures on slack that I hope demonstrate the problem, and the relevant directories are here:

This is where the PS1 training set was generated and trained:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/BP-PS1-CLASS/
This is where it was used to fit the PS1 data:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/
And tested on simulated PS1 data here:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/BP-PS1-CLASSTEST/

I remade the light-curve plots, they live here: /scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/SNNTEST_PS_DATAPS1_SNNTEST_PS_PS1+MVCC/dump/lightcurves/SNNTEST_PS_PS1+MVCC/early_prediction

They look good now!

Warning when empty files are given

preprocessing crashes but not HEAD assembly.
See if df.MJD.values[-1] == -777.0:

Allow the user to define filter labels in pivot_dataframe_single

In file make_dataset.py, line 558 reads:

list_filters = data_utils.FILTERS

Maybe it should instead say:

list_filters = settings.list_filters

This would allow the user to define what filters to use and the label they have in the input table.

Confusion matrix labels

support confusion matrix labels which are not the first 5 itemised in the dictionary

Input from spark dataframe

Explore the potential of using spark to input data as alerts -> convert to object -> build database

recursive folders for raw directory

generalize to find photometry files in recursive folders

TypeError in method process_single_csv

Line 473 of make_dataset.py reads:

for c_ in [2, list(set(len(settings.sntypes.keys())))]:

This is currently throwing the following error:

TypeError: 'int' object is not iterable

This will always happen, as method len always returns an int type.

Pickle format change

Change pickle format to something more compatible

db error when only 1 sntype given

make dataset list([2,len(settings.sntypes.keys()]) will repeat columns. add a set into the splits

Photo window not supported with csv

Add support for photo window with csv inputs

Pip update

Pip version is not in sync with recent SNN changes.

Settings are not the same and thus crashes

redshift_label and --zspe/zpho

Eliminate --zspe/zpho options and just change redshift_label for simplicity.

'features' doesn't exist

Hi Anais and team,

Things are no longer working smoothly on midway. Got an exception about 'features' does not exist. If you have any ideas on what might be behind it I'm all ears.

Log is located at /scratch/midway2/rkessler/PIPPIN_OUTPUT/DJB_SPEC/3_CLAS_old/SNNVANILLATRAIN_TRAIN_SPEC_FIT/output.log on midway2.

Full log is below:

[Data processing] 15s
Traceback (most recent call last):
  File "run.py", line 204, in <module>
    raise e
  File "run.py", line 43, in <module>
    make_dataset.make_dataset(settings)
  File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/data/make_dataset.py", line 749, in make_dataset
    data_utils.save_to_HDF5(settings, df)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/data_utils.py", line 576, in save_to_HDF5
    list_training_features + ["FLT"]
AssertionError
/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "run.py", line 28, in <module>
    settings = conf.get_settings()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/conf.py", line 364, in get_settings
    settings = experiment_settings.ExperimentSettings(args)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 57, in __init__
    self.set_feature_lists()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 190, in set_feature_lists
    self.all_features = hf["features"][:].astype(str)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/h5py/_hl/group.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'features' doesn't exist)"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 200, in <module>
    settings = conf.get_settings()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/conf.py", line 364, in get_settings
    settings = experiment_settings.ExperimentSettings(args)
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 57, in __init__
    self.set_feature_lists()
  File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 190, in set_feature_lists
    self.all_features = hf["features"][:].astype(str)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/h5py/_hl/group.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'features' doesn't exist)"

Not that in addition, this exception is uncaught, in that it does not produce a done file marked with FAILURE as supposed to, and instead the application terminates.

supernnova / supernnova Goto Github PK

supernnova's Issues

Recommend Projects

Recommend Topics

Recommend Org