Anne Dattilo: @aedattilo
This repository contains TensorFlow models and data processing code for identifying exoplanets in astrophysical light curves from K2 data. For complete background, see our paper in The Astronomical Journal.
This code is modified from Shallue & Vanderburg (2018). Because of this, some documentation still refers to Kepler instead of K2.
For a full walkthrough of Astronet, refer to exoplanet-ml. It is written for Tensorflow 1.4 and is compatible with both Python 2 and 3.
If you find this code useful, please cite our paper:
Dattilo, A., Vanderburg, A., et. al. (2019). Identifying Exoplanets with Deep Learning. II. Two New Super-Earths Uncovered by a Neural Network in K2 Data. The Astronomical Journal, 157(5), 169.
- TensorFlow code for:
- Preprocessing K2 data.
- Building different types of neural network classification models.
- Training and evaluating a new model.
- Using a trained model to generate new predictions.
- Utilities for operating on light curves. These include:
- Reading Kepler data from
.idl
files. - Applying a median filter to smooth and normalize a light curve.
- Phase folding, splitting, removing periodic events, etc.
- Reading Kepler data from
- In addition, some C++ implementations of light curve utilities are located in light_curve_util/cc/.
- Utilities derived from third party code.
First, ensure that you have installed the following required packages:
- TensorFlow (instructions)
- Pandas (instructions)
- NumPy (instructions)
- AstroPy (instructions)
- PyDl (instructions)
- Bazel (instructions)
- Abseil Python Common Libraries (instructions)
- Optional: only required for unit tests.
Verify that all dependencies are satisfied by running the unit tests:
bazel test astronet/... light_curve_util/... third_party/...
Processsed lightcurves are provided under tfrecord
K2 lightcurves can be dowloaded from the Mikulski Archive for Space Telescopes,
K2_candidates.csv
contains the EPIC IDs and parameters of all targets used in Dattilo (2019). The provided code will process lightcurves from .idl
files.
If you would like to process your own data, here is how. Otherwise, skip to Training.
To train a model to identify exoplanets, you will need to provide TensorFlow
with training data in
TFRecord format. The
TFRecord format consists of a set of sharded files containing serialized
tf.Example
protocol buffers.
The command below will generate a set of sharded TFRecord files for the TCEs in
the training set. Each tf.Example
proto will contain the following light curve
representations:
global_view
: Vector of length 701: a "global view" of the TCE.local_view
: Vector of length 51: a "local view" of the TCE.
In addition, each tf.Example
will contain the value of each column in the
input TCE CSV file. The columns include:
rowid
: Integer ID of the row in the TCE table.kepid
: Kepler ID of the target star.tce_plnt_num
: TCE number within the target star.av_training_set
: Autovetter training set label.tce_period
: Period of the detected event, in days.
# Use Bazel to create executable Python scripts.
#
# Alternatively, since all code is pure Python and does not need to be compiled,
# we could invoke the source scripts with the following addition to PYTHONPATH:
# export PYTHONPATH="/path/to/source/dir/:${PYTHONPATH}"
bazel build astronet/...
# Directory to save output TFRecord files into.
TFRECORD_DIR="${HOME}/astronet/tfrecord"
# Preprocess light curves into sharded TFRecord files using 5 worker processes.
bazel-bin/astronet/data/generate_input_records \
--input_tce_csv_file=${TCE_CSV_FILE} \
--kepler_data_dir=${KEPLER_DATA_DIR} \
--output_dir=${TFRECORD_DIR} \
--num_worker_processes=5
When the script finishes you will find 8 training files, 1 validation file and
1 test file in TFRECORD_DIR
. The files will match the patterns
train-0000?-of-00008
, val-00000-of-00001
and test-00000-of-00001
respectively.
The astronet directory contains several types of neural network architecture and various configuration options. To train a convolutional neural network to classify K2 TCEs as either "planet" or "not planet", run the following training script:
# Directory to save model checkpoints into.
MODEL_DIR="${HOME}/astronet/model/"
# Run the training script.
bazel-bin/astronet/train \
--model=AstroCNNModel \
--config_name=local_global \
--train_files=${TFRECORD_DIR}/train* \
--eval_files=${TFRECORD_DIR}/val* \
--model_dir=${MODEL_DIR}
Optionally, you can also run a TensorBoard server in a separate process for real-time monitoring of training progress and evaluation metrics.
# Launch TensorBoard server.
tensorboard --logdir ${MODEL_DIR}
The TensorBoard server will show a page like this:
Run the following command to evaluate a model on the test set. The result will be printed on the screen, and a summary file will also be written to the model directory, which will be visible in TensorBoard.
# Run the evaluation script.
bazel-bin/astronet/evaluate \
--model=AstroCNNModel \
--config_name=local_global \
--eval_files=${TFRECORD_DIR}/test* \
--model_dir=${MODEL_DIR}
The output should look something like this:
INFO:tensorflow:Saving dict for global step 10000: accuracy/accuracy = 0.9625159, accuracy/num_correct = 1515.0, auc = 0.988882, confusion_matrix/false_negatives = 10.0, confusion_matrix/false_positives = 49.0, confusion_matrix/true_negatives = 1165.0, confusion_matrix/true_positives = 350.0, global_step = 10000, loss = 0.112445444, losses/weighted_cross_entropy = 0.11295206, num_examples = 1574.