Giter Site home page Giter Site logo

qdata / spacetimeformer Goto Github PK

View Code? Open in Web Editor NEW
738.0 21.0 178.0 16.15 MB

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Home Page: https://arxiv.org/abs/2109.12218

License: MIT License

Python 38.63% C++ 0.45% Cuda 2.68% C 0.36% Shell 0.13% Jupyter Notebook 57.75%

spacetimeformer's Introduction

Spacetimeformer Multivariate Forecasting

This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecasting", Grigsby et al., 2021. (arXiv).

Spacetimeformer is a Transformer that learns temporal patterns like a time series model and spatial patterns like a Graph Neural Network.

Below we give a brief explanation of the problem and method with installation instructions. We provide training commands for high-performance results on several datasets.

NEW MARCH 2023! We have updated the public version of the paper to v3 - the final major update expected. See the v3 release notes below.

Data Format

We deal with multivariate sequence to sequence problems that have continuous inputs. The most common example is time series forecasting where we make predictions at future ("target") values given recent history ("context"):

Every model and dataset uses this x_context, y_context, x_target, y_target format. X values are time covariates like the calendar datetime, while Ys are variable values. There can be additional context variables that are not predicted.

Spatiotemporal Attention

Typical deep learning time series models group Y values by timestep and learn patterns across time. When using Transformer-based models, this results in "temporal" attention networks that can ignore spatial relationships between variables.

In contrast, Graph Neural Networks and similar methods model spatial relationships with explicit graphs - sharing information across space and time in alternating layers.

Spactimeformer learns full spatiotemporal patterns between all varibles at every timestep.

We implement spatiotemporal attention with a custom Transformer architecture and embedding that flattens multivariate sequences so that each token contains the value of a single variable at a given timestep:

Spacetimeformer processes these longer sequences with a mix of efficient attention mechanisms and Vision-style "windowed" attention.

This repo contains the code for our model as well as several high-quality baselines for common benchmarks and toy datasets.

Paper v3 Release Notes

The Spacetimeformer project began in 2021. The project underwent a major revision in summer 2022, with most of the updates being merged to the public codebase shortly thereafter. However, the updated version of the paper was not released until March 2023. Here we summarize the major changes:

  • Updated Experimental Results and Additional Datasets. Spacetimeformer's L * N spatiotemporal attention format is super flexible but inherently GPU-intensive. We are now able to scale the method to much larger datasets and model sizes. As part of this process we added many new datasets to the codebase - far more than are mentioned in the paper. The model and training routines now also support exogenous variables and mixed-length context sequences.

  • Implementation Changes and Scalability Improvements. Learnable position embeddings and other Transformer architecture adjustments including windowed attention for long sequences.

  • Time Series Tricks and Non-Stationarity. The most common case where timeseries Transformers fail is due to distribution shift between the train and test splits, which often happens in forecasting when long-term trends change the magnitude of test-time sequences. In these situations it has been shown that simple linear models can outperform larger Transformers. All models in this repo (including Spacetimeformer) now have options for input normalization, seasonal decomposition, and linear output components that greatly reduce this effect.

  • Spatiotemporal Attention's Improvements over ST-GNNs and Connections to Vision Transformers. The original purpose of our multivariate sequence format was to provide an easy-to-implement alternative to more complex GNN operations that combined the advantages of timeseries Transformers. What was not fully clear at the time is how the full (L * N)^2 attention graph can provide a context-dependent and fully spatiotemporal graph learning mechanism. Since 2021, it has also become much easier to motivate Spacetimeformer as a Vision Transformer analogue for time series forecasting. See Appendix A2 and A3 for detailed discussions.

Installation and Training

This repository was written and tested for python 3.8 and pytorch 1.11.0. Note that the training process depends on specific (now outdated) versions of pytorch lightning and torchmetrics.

git clone https://github.com/QData/spacetimeformer.git
cd spacetimeformer
conda create -n spacetimeformer python==3.8
conda activate spacetimeformer
pip install -r requirements.txt
pip install -e .

This installs a python package called spacetimeformer. The package does not install pytorch or torchvision automatically, and you should follow the official pytorch installation instructions for 1.11 depending on your CUDA software version.

Commandline instructions for each experiment can be found using the format: python train.py *model* *dataset* -h.

Models

  • linear: a basic autoregressive linear model. New June 2022: expanded to allow for seasonal decomposition and independent params for each variable (inspired by DLinear).
  • lstnet: a more typical RNN/Conv1D model for multivariate forecasting. Based on the attention-free implementation of LSTNet.
  • lstm: a typical encoder-decoder LSTM without attention. We use scheduled sampling to anneal teacher forcing throughout training.
  • mtgnn: a hybrid GNN that learns its graph structure from data. For more information refer to the paper. We use the implementation from pytorch_geometric_temporal (requires some extra installation).
  • s4: long-sequence state-space model (paper) (requires some extra installation).
  • heuristic: simple heuristics like "repeat the last value in the context sequence" as a sanity-check.
  • spacetimeformer: the multivariate long-range transformer architecture discussed in our paper.
    • note that the "Temporal" ablation discussed in the paper is a special case of the spacetimeformer model. It is conceptually similar to Informer. Set the embed_method = temporal. Spacetimeformer has many configurable options and we try to provide a thorough explanation with the commandline -h instructions.

Datasets

Spatial Forecasting
  • metr-la and pems-bay: traffic forecasting datasets. We use a very similar setup to DCRNN.
  • precip: daily precipitation data from a latitude-longitude grid over the Continental United States.
  • hangzhou: metro station ridership data.
Time Series Forecasting
  • toy2: is the toy dataset mentioned at the beginning of our experiments section. It is heavily based on the toy dataset in TPA-LSTM.
  • asos: is the codebase's name for what the paper calls "NY-TX Weather."
  • solar_energy: Is the codebase's name for the time series benchmark more commonly called "AL Solar."
  • exchange: A common time series benchmark dataset of exchange rates.
  • weather: A common time series benchmark dataset of 21 weather indiciators.
  • ettm1: A common time series benchmark dataset of "electricity transformer temperatures" and related variables.
  • traffic: More of a spatial-temporal benchmark for forecasting traffic conditions in 800+ roadlanes. There is no roadmap/graph provided, so this makes for a good demo of Spacetimeformer's automatic spatial learning. However, sequence lengths can be very long and this dataset has meaningful distribution shift.
Image Completion
  • mnist: Highlights the similarity between multivariate forecasting and vision models by completing the right side of an MNIST digit given the left side, where each row is a different variable.
  • cifar: A harder image completion task where the variables are color channels and the sequence is flattened across rows.
Copy Tasks
  • copy: Copy binary input sequences with rows shifted by varying amounts. An example of a hard task for Temporal attention that is easy for Spatiotemporal attention.
  • cont_copy: A continuous version of the copy task with additional settings to study distribution shift.
"Global" or Multiseries Datasets
  • m4: The M4 competition dataset (overview). Collection of 100k univariate series at various resolutions.

  • wiki: The Wikipedia web traffic dataset from the Kaggle competition. 145k univariate high-entropy series at a single resolution.

  • monash: Loads the Monash Time Series Forecasting Archive. Up to ~400k time univariate timeseries.

    (We load these benchmarks in an unusual format where the context sequence is all data up until the current time - leading to variable length sequences with padding.)

Logging with Weights and Biases

We used wandb to track all of results during development, and you can do the same by providing your username and project as environment variables:

export STF_WANDB_ACCT="your_username"
export STF_WANDB_PROJ="your_project_title"
# optionally: change wandb logging directory (defaults to ./data/STF_LOG_DIR)
export STF_LOG_DIR="/somewhere/with/more/disk/space"

wandb logging can then be enabled with the --wandb flag.

There are several figures that can be saved to wandb between epochs. These vary by dataset but can be enabled with --attn_plot (for Transformer attention diagrams) and --plot (for prediction plotting and image completion).

Example Training Commands

General Notes:
  1. Commands are listed without GPU counts. For one GPU, add --gpus 0, three GPUs: --gpus 0 1 2 etc. Some of these models require significant GPU memory (A100 80GBs). Other hyperparameter settings were used in older versions of the paper with more limited compute resources. If I have time I will try to update with competetive alternatives on smaller GPUs.

  2. Some datasets require a --data_path to the dataset location on disk. Others are included with the source code or downloaded automatically.

Linear autoregressive model with independent weights and seasonal decomposotion (DLinear-style) on ETTm1:

python train.py linear ettm1 --context_points 288 --target_points 96 --run_name linear_ettm1_regression --gpus 0 --use_seasonal_decomp --linear_window 288 --data_path /path/to/ETTm1.csv

Spacetimeformer on Pems-Bay (MAE: ~1.61):

python train.py spacetimeformer pems-bay --batch_size 32 --warmup_steps 1000 --d_model 200 --d_ff 700 --enc_layers 5 --dec_layers 6 --dropout_emb .1 --dropout_ff .3 --run_name pems-bay-spatiotemporal --base_lr 1e-3 --l2_coeff 1e-3 --loss mae --data_path /path/to/pems_bay/ --d_qk 30 --d_v 30 --n_heads 10 --patience 10 --decay_factor .8

Spacetimeformer on MNIST completion:

python train.py spacetimeformer mnist --embed_method spatio-temporal --local_self_attn full --local_cross_attn full --global_self_attn full --global_cross_attn full --run_name mnist_spatiotemporal --context_points 10

Spacetimeformer on AL Solar (MSE: ~7.75):

python train.py spacetimeformer solar_energy --context_points 168 --target_points 24 --d_model 100 --d_ff 400 --enc_layers 5 --dec_layers 5 --l2_coeff 1e-3 --dropout_ff .2 --dropout_emb .1 --d_qk 20 --d_v 20 --n_heads 6 --run_name spatiotemporal_al_solar --batch_size 32 --class_loss_imp 0 --initial_downsample_convs 1 --decay_factor .8 --warmup_steps 1000

More Coming Soon...

Citation

If you use this model in academic work please feel free to cite our paper

@misc{grigsby2021longrange,
      title={Long-Range Transformers for Dynamic Spatiotemporal Forecasting}, 
      author={Jake Grigsby and Zhe Wang and Yanjun Qi},
      year={2021},
      eprint={2109.12218},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

spacetimeformer's People

Contributors

jakegrigsby avatar qiyanjun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spacetimeformer's Issues

Strange predictions.

Hi! Do you have any ideas about strange shift in predictions? I mean the shape of prediction is pretty well and looks like target, but the distance between context and prediction is too big.
media_images_test_prediction_plots_34_7
media_images_test_prediction_plots_29_3
media_images_test_prediction_plots_29_2

How many epochs?

I am renting an expensive cloud service ($15/hr), and I just want to know how many epochs is this setting? So that I can calculate/estimate the running time.

python train.py spacetimeformer metr-la --start_token_len 3 --batch_size 32 \
--gpus 0 1 2 3 --grad_clip_norm 1 --d_model 128 --d_ff 512 --enc_layers 5 \
--dec_layers 4 --dropout_emb .3 --dropout_ff .3 --dropout_qkv 0 \ 
--run_name spatiotemporal_metr-la --base_lr 1e-3 --l2_coeff 1e-2 \

Inference samples

Hey there!

I loved the paper and the approach you had with this. I wonder if there is a guidance or idea of how to make inference of new data.

Datasets

Could you please share precip and asos datasets? I am very curious to test this model on these datasets.

Purpose of ignore_col Parameter?

Hello -

Thank you for all the improvements in the model, this is truly a very robust implementation.

Quick Question: What is the purpose of the ignore_columns parameter within the CSVTimeSeries Class? Specifically, if you were targetting specific columns in a CSV dataset, how would ignoring specific columns change yc_dim and yt_dim.

Thank you for answering a possible simple question.

Best

Guidance on how to modify training loop to use data from longitudinal study

Hello,

I'm working on a project using data from a longitudinal study that has many subjects tied to exam dates and associated features at each of those exam dates. Looking into the core task Spacetimeformer is targetting it looks like it's more focused on datasets which have a single time index and using that to predict uniformly against that index, but I'm wondering if it won't be too challenging to modify the training loop to support multiple time indices as each subject will have their own (e.g. Subject 1 has exam date of 2016/01/02, Subject 2 also has exam date of 2016/01/02 etc.)

I imagine I can modify the CSVTorchDset to load across subjects to keep it simple (one batch still has one subjects' worth of context + target points), but before digging into it further I was wondering if you had come across this while researching and have any guidance.

Let me know if you'd like to see some sample data.

Thanks! Excellent project by the way.

"CUDA extension for cauchy multiplication not found" when running toy project

Hi!

I get error "cauchy multiplication not found" when running the command below.
I have tried the suggested "python setup install" in the cachy directory without success.

python train.py spacetimeformer toy2 --run_name spatiotemporal_toy2 --d_model 100 --d_ff 400 --enc_layers 2 --dec_layers 2 --gpus 0 --batch_size 32 --start_token_len 4 --n_heads 4 --grad_clip_norm 1 --early_stopping --trials 1

CUDA extension for cauchy multiplication not found. go to ./extensions/cauchy and try `python setup.py install`

Any tips how to fix this error?

Question about embedding

Hi, great job!

I just have a quick question about the embedding part. Although position embedding is a tradition in Transformer based models, it seems you embeded so much different kinds of information into the original sequence in an addition way. I am not sure if my understanding is correct. The concatenation form is easier for the network to learn at cost of more parameters and vice versa for the addition form. I believe the nerual network bears certain capability to extract different information from the raw input and projects them into higher order latent space, but would that possibly deteriorates the performance if we add too much? And will large hidden dim mitigates?

BTW, I also heard of the explanation: (a+b)' = a' + b' which means the addition and concatenation operations are equal fom a gradient perspective. And that contradicts to my gut feeling.

Thanks very much!

0 value features causing issues with reverse scaling

Hello -

I was getting some nonsense loss function results in my final prediction (ex: MSE >100000 for 0-1 scaled features) and I believe I figured out the source of the issue.

I believe the reverse scaling function was having an issue with my "precipitation" features, which are normally 0. I am working on my own fix but just wanted to pass along.

about number of features in prediction

Thank you for great well organized project!

Looking to the data processing functions you set features (target_cols variable) the same for X and for y. Can you please recommend "how" if I want to predict one feature from many in the dataset how to pass it to models?

How to interpret the forecast results of Toy2, exchange, NY-TX datasets

Hello dear author! Recently, after I modified it, I tested the effect on toy2, exchange and NY-TX according to the original text, and got the effect picture in the article. What are their respective prediction targets on these three datasets?(use the specified command without modification)
(1) In the test of the toy2 data set, wandb records the prediction results of the test set at different time steps, with eight pictures at a single time step. In the test set of toy2, how to determine the part of the time period corresponding to the multivariate prediction output?
toy2
toy2_8
(2) The result output on the exchange is also a single time step with eight pictures. Are the eight graphs here one-to-one corresponding to the exchange rates of eight different countries? (Use several other countries and national data as multivariate input to predict the output of the country)
exchange
(3) What is the prediction goal of the NY-TX dataset, is it to combine the data of these six sites to predict one of the sites? The article states that the experiment forecasts the temperature of three weather stations in Texas and three in New York. Appendix C.1 shows the forecast map for one of these stations (Figure 6). With so many diagrams, how do we determine which station is being predicted, and what is the approximate time period corresponding to the prediction?
image

Questions about Global-Attention and Decoder Fast-Attention

Thanks for uploading the code. This is a very interesting project!

I have a few questions surrounding some implementation details.

1) Global-Attention
What is the purpose of WindowTime function and how does it help facilitate the concept of "Global-attention"?

2) Decoder Fast-Attention
When the Performer option is selected, the FastAttention for decoder self-attention is not instantiated with the causal flag set to True. This results in the attention mechanism of the forecast sequence attending to future tokens. Is this issue being addressed somewhere else?

getting an error like this for the asos data

File "c:\users\jeeva\downloads\spacetimeformer\spacetimeformer\forecaster.py", line 164, in _log_stats
self.log(f"{section}/{key}", outs[key].mean(), sync_dist=True)
AttributeError: 'float' object has no attribute 'mean'

Misconfiguration Exception

Hello,

I am trying to run your model and followed the steps provided for preparing the environment. I then tried to use the cli provided and with each operation, I would run into an error of

Traceback (most recent call last): File "train.py", line 447, in <module> main(args) File "train.py", line 432, in main trainer.fit(forecaster, datamodule=data_module) File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit self._call_and_handle_interrupt( File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run results = self._run_stage() File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage return self._run_train() File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train self.fit_loop.run() File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 246, in advance self.trainer._logger_connector.update_train_step_metrics() File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 197, in update_train_step_metrics self._log_gpus_metrics() File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 225, in _log_gpus_metrics self.trainer.lightning_module.log( File "/home/alfred/anaconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 366, in log raise MisconfigurationException( pytorch_lightning.utilities.exceptions.MisconfigurationException: You are trying to self.log() but it is not managed by the "Trainer" control flow

I tried fruitlessly to debug it and was wondering if you had any insight.

Thank you!

How to set up the env (ModuleNotFoundError: No module named 'spacetimeformer')

Hello, thank you for opening great project!

Now I'm trying to run the code in the Google Colab.
After checking the README.md file, I ran below code in the notebook.

!git clone https://github.com/QData/spacetimeformer.git
!pip install -r spacetimeformer/requirements.txt  # SAME AS cd spacetimeformer && pip install -r requirements.txt 
!pip install -e

After all the packages were successfully installed, I tried below code.

!python spacetimeformer/spacetimeformer/train.py spacetimeformer pems-bay .... (AS written in the `README.md`)

However, it raised an error below

Traceback (most recent call last):
  File "spacetimeformer/train.py", line 11, in <module>
    import spacetimeformer as stf
ModuleNotFoundError: No module named 'spacetimeformer'

I think it's because there's no module spacetimeformer.py in the repo, as well as spacetimeformer is not included in requirements.txt
Is there anything I missed while setting the environment?
Or did I make a mistake when running train.py?

Thank you for your help in advance :)

Question on dimensions of Datasets

Hello -

I am looking to try new datasets with your model, but just having a little hard time understanding the x_dim and y_dim hard coded into the train.py.

What exactly do each of these mean?

For example, for the solar_energy data set, I see that the y_dim is 137 because it has 137 features, but where does the 6 come from?

How should I set the epoch?

Hello dear author, I cloned your project code and tested it, and found that the epochs cannot be set in the command, and the exact location of the epochs cannot be found in the file. I read the args-related part of the instruction set in the file(E.g:--batch_size --workers) and can't seem to find it. How did the author set the epochs? If possible, could you point out how to modify the epochs? Thank you very much if you would like to help.

Unable to import spacetimeformer from spacetimeformer

Hello, I'm unable to import spacetimeformer from spacetimeformer. However, I can import spacetimeformer_model. My goal is to run the demo notebook and to run kernel= spacetimeformer(g=3, m=2). Could you help me solving this issue? Thanks!

training_epoch_loop.py", line 486, in _update_learning_rates raise MisconfigurationException( pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val/forecast_loss which is not available. Available metrics are: ['train/mape', 'train/mae', 'train/mse', 'train/rse', 'train/forecast_loss', 'train/class_loss', 'train/loss', 'train/acc']. Condition can be set using `monitor` key in lr scheduler dict

training_epoch_loop.py", line 486, in _update_learning_rates
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val/forecast_loss which is not available. Available metrics are: ['train/mape', 'train/mae', 'train/mse', 'train/rse', 'train/forecast_loss', 'train/class_loss', 'train/loss', 'train/acc']. Condition can be set using monitor key in lr scheduler dict

All forecast values close to mean value

Hello Jake

I have a model which uses spatial attention with GAT and temporal attention with Transformers. I am working on Pems bay dataset for my project.

During training, the decoder is one shot meaning all the timesteps are fed into the decoder at the same time. When I printed out the predictions, they are all near the mean value and not capturing the trends in the data.

Is there any obvious issues that you see?

I checked position encoding, attention weights from graphs and normalised with Z-score normalisation and use inverse scaling before comparison.

Question: Can we use Spacetimeformer for Recommendation?

Hi,
I read your paper last year and was really impressed on the model and its results. Thanks for your contribution. I am working on recommendation engine models and I was wondering if we could use this model architecture for sequential recommendations (especially short session where we have no previous user history or user information)?

Input of Time2Vec embed

I have a question that bothers me a lot recently - it appears to me like some people on the web get time2vec concept wrong. What you described in the article is correct from my point of view, but the implementation is incorrect, at least for NY-TX dataset.
As I understand, time2vec is used to create an embedding of time. Thus, the input of time2vec layer should be some time information, like day number since beginning of data, timestep, year etc. To support my claim, the excerpt from time2vec paper:

To answer Q3, we trained a model on our synthesized dataset where the input integer (day) is used as
the time for Time2Vec

I've run your model with ny-tx dataset and from what I see the input of time2vec is just the RAW temperature data, which doesn't contain any time-related information.
I will be grateful if you dispel my doubts, maybe I misunderstand sth. And, thank you for a great article :)

PyTorch 1.12 with device "mps" for apple m1

Hi,
PyTorch 1.12 is finally optimized for apple silicon M1 processor but it requires torch.device("mps") configuration.

I was not able to set it and got this error when trying to set device to "mps" :

RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: mps

Could you please provide such parameter settings?
Thanks.

multiple-step ahead prediction for PeMS dataset

Hi authors, thanks for sharing the code of this paper. I have a question when trying to implement these models. It seems like the multiple-step ahead prediction is treated in different ways for the PeMS dataset.

For example, in the LSTM model, the output of the decoder at each timestep is used as the decoder input to generate the prediction at the next step, meaning we don't use any ground truth data at the inference step. However, in the spacetimeformer model, we simply use a mask to mask out future positions but we still use ground truth data as the decoder input. Essentially, I think it's equivalent to a rolling 1-step prediction that is different from the multiple-step ahead prediction in the LSTM model.

Training times.

For the example commands stated in the repository what are the training times that we should be expecting. And what were the hardware specification these tests were run on, with 4 gpu's specified keeping in mind.

Error in test

I tried to use the model on my custom dataset.
Training was successful.
When I try to test the model using pytorch_lightning.Trainer.test(), I faced an error after a while.
screenshot_105
screenshot_106

I found that the reason is that themeans or stds in pred_distrib = pyd.Normal(means, stds) has nan values.
screenshot_107

Training and validation has no error, even some steps of test was good. But why this error in test? I confirmed that the test data has no nan.

Thanks

jupyter notebooks

Hello, this is really a great project. I found it with the Medium article. Would it be possible to add some jupyter notebooks? Like this it would be easier to see the workflow and the outcomes :-D

GPU memory leak

After 80 epochs (8 hours), I got this error

Epoch 80:  87%|████████████████▌  | 750/858 [04:56<00:42,  2.53it/s, loss=0.258]Traceback (most recent call last):
  File "train.py", line 442, in <module>
    main(args)
  File "train.py", line 424, in main
    trainer.fit(forecaster, datamodule=data_module)
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
    self._run(model)
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
    self.dispatch()
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
    self.accelerator.start_training(self)
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
    return self.run_train()
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train
    self.train_loop.run_training_epoch()
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 569, in run_training_epoch
    self.trainer.logger_connector.log_train_epoch_end_metrics(epoch_output)
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 325, in log_train_epoch_end_metrics
    self.log_metrics(epoch_log_metrics, {})
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 208, in log_metrics
    mem_map = memory.get_memory_profile(self.log_gpu_memory)
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/core/memory.py", line 365, in get_memory_profile
    memory_map = get_gpu_memory_map()
  File "/home/u7701783/.local/lib/python3.8/site-packages/pytorch_lightning/core/memory.py", line 384, in get_gpu_memory_map
    result = subprocess.run(
  File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/nvidia-smi', '--query-gpu=memory.used', '--format=csv,nounits,noheader']' returned non-zero exit status 255.
Epoch 80:  87%|████████▋ | 750/858 [04:57<00:42,  2.52it/s, loss=0.258]         

Getting error on PEMS-BAY dataset

I downloaded the pems-bay.h5 file from https://zenodo.org/record/4263971 and used https://github.com/liyaguang/DCRNN/blob/master/scripts/generate_training_data.py to generate test.npz, train.npz, val.npz files in ./data/pems_bay/.

When I run the command from the README.md python train.py spacetimeformer pems-bay --batch_size 32 --warmup_steps 1000 --d_model 200 --d_ff 700 --enc_layers 5 --dec_layers 6 --dropout_emb .1 --dropout_ff .3 --run_name pems-bay-spatiotemporal --base_lr 1e-3 --l2_coeff 1e-3 --loss mae --data_path ./data/pems_bay/ --d_qk 30 --d_v 30 --n_heads 10 --patience 10 --decay_factor .8

I get the following error

Traceback (most recent call last):
File "/spacetimeformer-main/spacetimeformer/train.py", line 854, in
main(args)
File "/spacetimeformer-main/spacetimeformer/train.py", line 758, in main
) = create_dset(args)
File "/spacetimeformer-main/spacetimeformer/train.py", line 394, in create_dset
data = stf.data.metr_la.METR_LA_Data(config.data_path)
File "/spacetimeformer-main/spacetimeformer/data/metr_la/metr_la.py", line 43, in init
x_c_train, y_c_train = self._split_set(context_train)
File "/spacetimeformer-main/spacetimeformer/data/metr_la/metr_la.py", line 21, in _split_set
time = 2.0 * x[:, :, 0] - 1.0
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

Training loop broken

When trying to run the model for example as.
python train.py spacetimeformer toy2 --run_name spatiotemporal_toy2
--d_model 100 --d_ff 400 --enc_layers 2 --dec_layers 2
--gpus 0 --batch_size 32 --start_token_len 4 --n_heads 4
--grad_clip_norm 1 --early_stopping --trials 1

Training crashes immediately.

pytorch_lightning.utilities.exceptions.MisconfigurationException: You are trying to self.log() but it is not managed by

ValueError: Found array with dim 3. StandardScaler expected <= 2

When I just followed the README and ran "python train.py spacetimeformer asos --context_points 160 --target_points 40 --start_token_len 8 --grad_clip_norm 1 --gpus 0 --batch_size 128 --d_model 200 --d_ff 800 --enc_layers 3 --dec_layers 3 --local_self_attn none --local_cross_attn none --l2_coeff .01 --dropout_emb .1 --run_name temporal_asos_160-40-nll --loss nll --time_resolution 1 --dropout_ff .2 --n_heads 8 --trials 3 --embed_method temporal --early_stopping --attn_plot --plot", I got the following error:
ValueError: Found array with dim 3. StandardScaler expected <= 2

Feature Request: Possible to Retrieve Raw Prediction Results?

Hello -

I hope all is well. Is it possible to produce raw predicted values from the model? I see the options to generate prediction plots with wandb, but is there any possibility of producing the raw data for each feature in a dataset.

The goal I have is that there is research that does not normalize their input data, so I would like to inverse transform to produce non-normalized distance metrics of the predicted results. Inputting non-normalized values to the model I believe would cause issues.

Any thoughts you have would be appreciated and thank you for the wonderful source code + information.

Best!

Using the model to inference

Hi,

Thank you for this module.
Is there an example for inferencing?
I tried adding self.save_hyperparameters() to the Spacetimeformer_Forecaster init and then using the checkpoint as below but this requires x_c, y_c, x_t and y_t.
Shouldn't inference only need the context and not the target? Is there some other step that I need to take to produce a model that can be used to inference?

    model = Spacetimeformer_Forecaster.load_from_checkpoint(checkpoint_path=checkpoint_path)

Thank you for your help.

OSError: libtorch_global_deps.so: cannot open shared object file: No such file or directory

After creating the environment and running the script provided in the Example Spacetimeformer Training Commands section of the repo, I get the following stack trace:

Traceback (most recent call last):
File "train.py", line 7, in <module>
import pytorch_lightning as pl
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 30, in <module>
from pytorch_lightning.callbacks import Callback  # noqa: E402
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
from pytorch_lightning.callbacks.base import Callback
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/callbacks/base.py", line 21, in <module>
import torch
File "/home/fsuser/.local/lib/python3.8/site-packages/torch/__init__.py", line 196, in <module>
_load_global_deps()
File "/home/fsuser/.local/lib/python3.8/site-packages/torch/__init__.py", line 149, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/ctypes/__init__.py", line 369, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/fsuser/.local/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory

The same stack trace is replicable by simply importing torch in a .py script.

I've checked /home/fsuser/.local/lib/python3.8/site-packages/torch/lib/ and there is no libtorch_global_deps.so file. Do I have to pull it from somewhere or install some other torch library?

I'm running this code on Ubuntu 20.04, python3.8 torch1.9.0 and cuda 10.2.

The results of attn in the prediction outputs

Dear authors

I recently make the application of spacetimeformer, but it seems a little bit confused for me to understand the attn in the outputs.

Would you mind give me an introduction of how to utilize the logits, labes and attn in the model outputs.

How to input npz format data to model training?

Hello ,dear author,Thank you very much for providing this spacetimeformer model.
But my dataset is a meteorological dataset in npz format, how to input npz into the model?
Which code do I need to see in the repo?

Thank you very much if you would like to help.

Issue while loading a model

Hi,
After the training I'm trying to save the model and load it from the checkpoint.
But I got mismatch size error, so I tried to load the data as soon as the training is over.
But I still have the issue.
It looks like that it's the same issue as #13 .

The code :

    # Test from the code
    trainer.test(datamodule=data_module, ckpt_path="best")
    #Added only these two line
    trainer.save_checkpoint("best.ckpt")
    forecaster = forecaster.load_from_checkpoint(checkpoint_path="best.ckpt")

And I got a LOT of size mismatch

RuntimeError: Error(s) in loading state_dict for Spacetimeformer_Forecaster:
	size mismatch for spacetimeformer.enc_embedding.x_emb.embed_weight: copying a param with shape torch.Size([7, 6]) from checkpoint, the shape in current model is torch.Size([5, 6]).
	size mismatch for spacetimeformer.enc_embedding.y_emb.weight: copying a param with shape torch.Size([512, 43]) from checkpoint, the shape in current model is torch.Size([512, 31]).
	size mismatch for spacetimeformer.enc_embedding.var_emb.weight: copying a param with shape torch.Size([18, 512]) from checkpoint, the shape in current model is torch.Size([1, 512]).
	size mismatch for spacetimeformer.dec_embedding.x_emb.embed_weight: copying a param with shape torch.Size([7, 6]) from checkpoint, the shape in current model is torch.Size([5, 6]).
	size mismatch for spacetimeformer.dec_embedding.y_emb.weight: copying a param with shape torch.Size([512, 43]) from checkpoint, the shape in current model is torch.Size([512, 31]).
	size mismatch for spacetimeformer.dec_embedding.var_emb.weight: copying a param with shape torch.Size([18, 512]) from checkpoint, the shape in current model is torch.Size([1, 512]).
	size mismatch for spacetimeformer.classifier.weight: copying a param with shape torch.Size([18, 512]) from checkpoint, the shape in current model is torch.Size([1, 512]).
	size mismatch for spacetimeformer.classifier.bias: copying a param with shape torch.Size([18]) from checkpoint, the shape in current model is torch.Size([1]).

What am I doing wrong ?

Also I didn't get what does the --start_token_len does ?

Length of decoder start token. Adds this many of the final context points to the start of the target sequence.

For example if I have one data every hour, and start_token_len = 3, it will predict for the 3 hours from now ? And train to predict this value ?

Best Regards ! And thanks for the model !

Ploting

How to save the plot offline without using wandb

about start tokens

Hi, thanks for this great work!

I have been thinking about the architecture, and am curious why we need a start token as decoder input? My understanding is that all information in start token has already been included in encoder input, and it should be injected into decoder via encoded sequence (Fig. 3). Can you explain the motivation for using start token in decoder?

Thank you!

How to get inference result?

Thanks for your great work.

I want to infer a trained spacetimeformer model. The model needs 4 inputs; (x_c, y_c, x_t, y_t).
We have only x_c, y_c, and x_t ready.

  • y_t must be zeros with right shape. Right?

  • output, (logits, labels), attn = spacetimeformer(x_c, y_c, x_t, y_t). The output has two tensors; output.loc and output.scale.
    What is the real output for y_t here? output.loc or output.scale?

    Thanks in advance.

Spacetimeformer invalid accelerator name

executed the following exaple from the readme:

python train.py spacetimeformer exchange --batch_size 32 --warmup_steps 1000 --d_model 200 --d_ff 700 --enc_layers 5 --dec_layers 6 --dropout_emb .1 --dropout_ff .3 --run_name pems-bay-spatiotemporal --base_lr 1e-3 --l2_coeff 1e-3 --loss mae --d_qk 30 --d_v 30 --n_heads 10 --patience 10 --decay_factor .8 --workers 4 --gpus 0

I get the following error:

Traceback (most recent call last):
  File "train.py", line 851, in <module>
    main(args)
  File "train.py", line 816, in main
    trainer = pl.Trainer(
  File "/opt/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/utilities/argparse.py", line 345, in insert_env_defaults
    return fn(self, **kwargs)
  File "/opt/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 433, in __init__
    self._accelerator_connector = AcceleratorConnector(
  File "/opt/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 193, in __init__
    self._check_config_and_set_final_flags(
  File "/opt/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 292, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid accelerator name: `accelerator='dp'`. Available names are: cpu, cuda, hpu, ipu, mps, tpu.

I see in the train.py file accelerator is set to "dp". I dont seen "dp" as a valid command in the torch lightning documentation, where dp is a data parallel strategy and the accelerator s/b gpu

PTGT

HI there,

Could you explain why did you decide to exclude PyTorch Geometric Temporal from the citations?

Bests,

Benedek

Non-Time Independent Variable

Hello and thanks for the great work!
I am working on a forecasting battery data where given time and current I wish to predict voltage and temperature. So far I have tried putting current in y_context, but not y _target, which hasn't been working great. Rather than treat current as a y variable which needs to be predicted, I would like to use it in x_context, and x_target in predicting voltage and temperature. I tried just putting it into the x_context and x_target, but that didn't work which I suspect is because it thought the current was another time unit (like hours). Is there a better way to include current in x?

load_from_checkpoint error

`model = spacetimeformer_model.Spacetimeformer_Forecaster(d_x=6, d_y=6)
model.load_from_checkpoint(check_point)
data_module, inv_scaler, null_val = create_dset()
trainer = pl.Trainer()

trainer.test(model=model, datamodule=data_module)`

there is an error as
RuntimeError: Error(s) in loading state_dict for Spacetimeformer_Forecaster:
Unexpected key(s) in state_dict:

could you tell me how to solve it? thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.