supernnova / supernnova Goto Github PK
View Code? Open in Web Editor NEWOpen Source Photometric classification https://supernnova.readthedocs.io
License: MIT License
Open Source Photometric classification https://supernnova.readthedocs.io
License: MIT License
if model uses db with information such as redshift but this info is not used in the classification. automatically fill with zeros before formatting for classification.
In make_dataset.py
, lines 696-701 read:
for chunk_idx in tqdm(list_chunks, desc="Pivoting dataframes", ncols=100):
parallel_fn = partial(pivot_dataframe_single, settings=settings)
# Process each file in the chunk in parallel
with ProcessPoolExecutor(max_workers=max_workers) as executor:
start, end = chunk_idx[0], chunk_idx[-1] + 1
executor.map(parallel_fn, list_files[start:end])
The iterator with results of the execution of the map
function is not being accessed at any time afterwards. Unfortunately, this has as a side effect that, if any of the executors fail, the corresponding exception is never going to be raised, and, therefore, it will fail silently. (See https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map )
To avoid issues of access in cluster deployment we should add
import os
os.access(path, os.R_OK)
recent issue involved
supernnova/utils/data_utils.py
Path(f"{settings.fits_dir}/FITOPT000.FITRES").exists()
but could happen with other checks
Document on the fly
Using the branch elasticc
, the code used to process ZTF fails with error:
File "fink_science/snn/processor.py", line 167, in snn_ia
ids, pred_probs = classify_lcs(pdf, model, 'cpu')
File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/validation/validate_onthefly.py", line 99, in classify_lcs
df = format_data(df, settings)
File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/validation/validate_onthefly.py", line 59, in format_data
df = pivot_dataframe_single_from_df(df, settings)
File "/home/libs/miniconda/lib/python3.7/site-packages/supernnova/data/make_dataset.py", line 722, in pivot_dataframe_single_from_df
+ class_columns
AttributeError: 'ExperimentSettings' object has no attribute 'sntype_var'
Any ideas?
It is the missing types problem and error we discussed when reading from different SNANA sims. pls help!!
Let me know if you need more files...
CSV does not support redshift label (fixed)
model trained w addionatinal_train_var does not use it properly
Some features are repeated and some are misleading...
dump feature list in config dump
SNANA format:
Rick Kessler "changed the name of the FLT column in the data to BAND … SNANA codes accept either FLT or BAND for back-compatibility, and non-SNANA codes will need to do the same. The other ‘either’ option is REDSHIFT_FINAL or REDSHIFT_CMB."
As discussed on Slack - When fitting PS1 data (HEAD and PHOT files in here $PS1_ROOT/lcmerge/PS1_PS1MD_cen_SIGCLIP_FITS/ ) we get some weird results - only 450 SNe in the sample are classified as Ia, and this seems to be entirely random. A couple spectroscopically confirmed SNIa are being listed as 100% chance CC, so something's up. I generated some SNN example plots using the following command :
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/old-PS1/job_plot.slurm
and in general the relevant directory is here:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/SNNTEST_PS_DATAPS1_SNNTEST_PS_PS1+MVCC
train_cyclic
data is loaded twice, once in get_lr and once afterwards. Optimise to load only once.
TypeError: No conversion path for dtype: dtype('<U120')
data_types_training in make_dataset
Morning!
After you fixed the plotting issue from yesterday, I went and reclassified the PS1 data. We were seeing only 450 SNe pass as SNN-classified type Ia, and most of the spectroscopically confirmed Ia were marked as 100% CC by SNN. This hasn't changed since the update pushed yesterday, and we are a bit lost. (to be clear - I did not re-train a model, just re-used the same model on the data again after the update).
I've sent some pictures on slack that I hope demonstrate the problem, and the relevant directories are here:
This is where the PS1 training set was generated and trained:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/BP-PS1-CLASS/
This is where it was used to fit the PS1 data:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/
And tested on simulated PS1 data here:
/scratch/midway2/rkessler/PIPPIN_OUTPUT/BP-PS1-CLASSTEST/
I remade the light-curve plots, they live here: /scratch/midway2/rkessler/PIPPIN_OUTPUT/PANOPTICON-DATA/3_CLAS/SNNTEST_PS_DATAPS1_SNNTEST_PS_PS1+MVCC/dump/lightcurves/SNNTEST_PS_PS1+MVCC/early_prediction
They look good now!
preprocessing crashes but not HEAD assembly.
See if df.MJD.values[-1] == -777.0:
In file make_dataset.py
, line 558 reads:
list_filters = data_utils.FILTERS
Maybe it should instead say:
list_filters = settings.list_filters
This would allow the user to define what filters to use and the label they have in the input table.
support confusion matrix labels which are not the first 5 itemised in the dictionary
Explore the potential of using spark to input data as alerts -> convert to object -> build database
generalize to find photometry files in recursive folders
Line 473 of make_dataset.py reads:
for c_ in [2, list(set(len(settings.sntypes.keys())))]:
This is currently throwing the following error:
TypeError: 'int' object is not iterable
This will always happen, as method len
always returns an int
type.
Change pickle format to something more compatible
make dataset list([2,len(settings.sntypes.keys()]) will repeat columns. add a set into the splits
Add support for photo window with csv inputs
Pip version is not in sync with recent SNN changes.
Settings are not the same and thus crashes
Eliminate --zspe/zpho options and just change redshift_label for simplicity.
Hi Anais and team,
Things are no longer working smoothly on midway. Got an exception about 'features' does not exist. If you have any ideas on what might be behind it I'm all ears.
Log is located at /scratch/midway2/rkessler/PIPPIN_OUTPUT/DJB_SPEC/3_CLAS_old/SNNVANILLATRAIN_TRAIN_SPEC_FIT/output.log
on midway2.
Full log is below:
[Data processing] 15s
Traceback (most recent call last):
File "run.py", line 204, in <module>
raise e
File "run.py", line 43, in <module>
make_dataset.make_dataset(settings)
File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/data/make_dataset.py", line 749, in make_dataset
data_utils.save_to_HDF5(settings, df)
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/data_utils.py", line 576, in save_to_HDF5
list_training_features + ["FLT"]
AssertionError
/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Traceback (most recent call last):
File "run.py", line 28, in <module>
settings = conf.get_settings()
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/conf.py", line 364, in get_settings
settings = experiment_settings.ExperimentSettings(args)
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 57, in __init__
self.set_feature_lists()
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 190, in set_feature_lists
self.all_features = hf["features"][:].astype(str)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/h5py/_hl/group.py", line 177, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'features' doesn't exist)"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 200, in <module>
settings = conf.get_settings()
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/conf.py", line 364, in get_settings
settings = experiment_settings.ExperimentSettings(args)
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 57, in __init__
self.set_feature_lists()
File "/project2/rkessler/PRODUCTS/classifiers/supernnova/supernnova/utils/experiment_settings.py", line 190, in set_feature_lists
self.all_features = hf["features"][:].astype(str)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/project2/rkessler/PRODUCTS/miniconda/envs/snn_gpu/lib/python3.6/site-packages/h5py/_hl/group.py", line 177, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'features' doesn't exist)"
Not that in addition, this exception is uncaught, in that it does not produce a done file marked with FAILURE as supposed to, and instead the application terminates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.