DeepLINK-T is a variable selection framework that guarantees the false discovery rate (FDR) control in time series data. Three key ingredients for DeepLINK-T are
- a Long Short-Term Memory (LSTM) autoencoder for generating time series knockoff variables,
- an LSTM prediction network using both original and knockoff variables,
- the application of the knockoffs framework for variable selection with FDR control.
DeepLINK-T requires Python 3 (>= 3.7.6) with the following packages:
- keras >= 2.4.3
- numpy >= 1.18.5
- pandas >= 1.1.0
- tensorflow >= 2.3.0
Clone the github repository and enter DeepLINK-T directory with
$ git clone https://github.com/zuowx/DeepLINK-T.git
$ cd DeepLINK-T
Create a conda environment for DeepLINK-T and install all packages with
$ conda create -n $ENV_NAME python=3.7.6
$ conda activate $ENV_NAME
$ pip install -r requirements.txt
The inputs of DeepLINK-T are a data tensor (number of subjects
Variable selection for regression task:
$ python infer.py --input_path $INPUT_VARIABLES --response_path $RESPONSE --output_path $OUTPUT
Variable selection for classification task:
$ python infer.py --input_path $INPUT_VARIABLES --response_path $RESPONSE --output_path $OUTPUT --fit_type classification
The output of DeepLINK-T is a json file with each key as a feature index and each value as a list of ranks in each iteration. The length of the value list could be less than the number of specified iterations.
-h, --help show this help message and exit
--input_path INPUT_PATH
path to the explanatory variables (tensor in .npy
format (number of subjects, number of time points,
number of feature))
--response_path RESPONSE_PATH
path to the response variables (matrix in .npy format)
--output_path OUTPUT_PATH
output path for selection results (in .json format
with key=feature, value=list of selected ranks in each
run)
--n_iter N_ITER number of iterations for running DeepLINK-T
--q Q targeted FDR level
--n_bottleneck N_BOTTLENECK
number of bottleneck dimension in the autoencoder
--aut_epoch AUT_EPOCH
number of autoencoder training epochs
--aut_lr AUT_LR learning rate for the autoencoder
--aut_norm AUT_NORM normalization for the autoencoder (either bn or ln)
--mlp_epoch MLP_EPOCH
number of prediction training epochs
--mlp_lr MLP_LR learning rate for the prediction network
--fit_type FIT_TYPE either regression or classification
--response_type RESPONSE_TYPE
either sequence or scaler
Run the following code and the output is in test/test.json
:
$ python infer.py --input_path test/test_X.npy --response_path test/test_y.npy --output_path test/test.json
The complete example may take several hours to run on GPU. Users may use the following code for a quicker test:
$ python infer.py --input_path test/test_X.npy --response_path test/test_y.npy --output_path test/test.json --n_iter 1 --aut_epoch 50 --mlp_epoch 50
The inference can be finished in 1 minute if running on GPU. The output should be an empty test.json
file in the test
directory.
Codes used in the simulation studies are in simulation.py
. Options for the script include
-h, --help show this help message and exit
--x_design X_DESIGN factor model design
--y_design Y_DESIGN link function design
--r R number of factors
--m M number of subjects
--n N number of time points
--p P number of features
--s S number of true signals
--rho RHO parameter in the AR(1) covariance structure
--amplitude AMPLITUDE
amplitude of the true signals
--q Q targeted FDR level
--it IT number of iterations for running DeepLINK-T
--n_bottleneck N_BOTTLENECK
number of bottleneck dimension in the autoencoder
--aut_epoch AUT_EPOCH
number of autoencoder training epochs
--aut_lr AUT_LR learning rate for the autoencoder
--aut_norm AUT_NORM normalization for the autoencoder (either bn or ln)
--mlp_epoch MLP_EPOCH
number of prediction training epochs
--mlp_lr MLP_LR learning rate for the prediction network
--output_path OUTPUT_PATH
Data used in the real data analyses are in folder real_data/
. Detailed information of each real-world dataset is in real_data/README.md
Copyright (C) 2024 University of Southern California
Authors: Wenxuan Zuo, Zifan Zhu, Yuxuan Du, Yi-Chun Yeh, Jed A. Fuhrman, Jinchi Lv, Yingying Fan, Fengzhu Sun
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Commercial users should contact Dr. Fengzhu Sun ([email protected]) or Dr. Yingying Fan ([email protected]), copyright at University of Southern California.