zezhishao / step Goto Github PK

Code for our SIGKDD'22 paper Pre-training-Enhanced Spatial-Temporal Graph Neural Network For Multivariate Time Series Forecasting.

License: Apache License 2.0

Python 99.90% Shell 0.10%

graph-neural-networks multivariate-time-series pre-training traffic-forecasting

step's People

Stargazers

Watchers

step's Issues

BUG: Performance issue after cleaning code.

关于adj_mx.pkl这个数据集的问题

您好，我想请教一个问题，是就是 METR-LA 数据处理完成之后，会复制 adj_mx.pkl 到 datasets 目录。可是我发现把这个文件删了，也不影响后面的训练和推理测试，想问一下这个文件是不是没有什么意义的。这个是图数据么？后面的图学习模块不依赖这个数据集么？

[Solved][Pinned] ModuleNotFoundError: No module named 'cts'

Hi, it's a very nice work. But when I trying run this code, I find an error:

When running prediction_rescaled = SCALER_REGISTRY.get(self.scaler["func"])(forward_return[0], **self.scaler["args"])

Maybe some codes missed, SCALER_REGISTRY.get(self.scaler["func"])() should be a function, but it wasn't defined
Best wishes
Thank you

作者您好，向您请教一下discrete_graph_learning.py中实现Equation(2)的细节

作者您好，作为GNN的初学者对discrete_graph_learning.py代码中Equation(2)实现的这部分还存在一些疑惑：
self.rel_rec = torch.FloatTensor(np.array(encode_one_hot(np.where(np.ones((self.num_nodes, self.num_nodes)))[0]), dtype=np.float32))
self.rel_send = torch.FloatTensor(np.array(encode_one_hot(np.where(np.ones((self.num_nodes, self.num_nodes)))[1]), dtype=np.float32))
这里我理解的作用是将每个node看作reciever和sender的角色，然后将节点之间的关系表示为one-hot向量矩阵，但我好奇这个为什么能这么表示呢？譬如self.num_nodes为3的时候这两个属性分别为tensor([[1., 0., 0.], [1., 0., 0.], [1., 0., 0.], [0., 1., 0.], [0., 1., 0.], [0., 1., 0.], [0., 0., 1.], [0., 0., 1.], [0., 0., 1.]])与tensor([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.], [1., 0., 0.], [0., 1., 0.], [0., 0., 1.], [1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])，这两个属性的值怎么就能表示节点之间的关系了呢？
以及在下方有将这两个属性与node feature相乘，分别得到receiver和sender的特征表示：
receivers = torch.matmul(self.rel_rec.to(node_feat.device), node_feat)
senders = torch.matmul(self.rel_send.to(node_feat.device), node_feat)
为什么这两个矩阵相乘就能作为receiver和sender的特征表示呢？

滴滴

作者你好，datasets/PEMS04/data_in4032_out12.pkl是怎么生成的呀

torch.cuda.OutOfMemoryError

作者您好，非常感谢您开源分享的工作，最近刚开始尝试复现结果，但是一开始运行TSFormer_PEMS04.py 就显示显存不够的问题了，显卡是一张16GB的V100, 参数用默认的batch_size=6, 显卡数量有改成1，后面把batch_size改成4才勉强够跑起来（显存占用94%），我看不管是模型参数和输入数据都不大，这个显存占用量合理吗，谢谢
(P.S., 不同torch版本会引起这个问题吗，我是torch 2.0.1和cuda 11.7, 因为我这边搭环境不像在实验室或者自己电脑上搭那么方便，先用了现有的环境去尝试)

Error

RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [32, 32, 325, 13]

单变量是否可以用这个预训练模型

作者，您好！请教一个问题

我的数据集只有两列（时间，数量），请问是否可以用这个预训练模型，谢谢

The spectral patch encoder

Hi! Thanks for the wonderful work and the open source code.

In the Patch class, each patch is Fourier transformed and then the output is fed into a linear layer to generate the patch embedding. I wonder the reason why conducting the Fourier transform first, which is not mentioned in the paper.

Thanks!

关于 Pre-training Stage 和 Forecasting Stage 中数据集的一些疑问？

作者您好！很感谢您开源了自己的代码以及对此领域做出的贡献。

针对您的工作，有一些疑惑：
Pre-training Stage和Forecasting Stage在进行训练时是否采用的是同一个数据集，比如都采用PEMS04，只是说Pre-training Stage采用的是PEMS04中的长序列数据，Forecasting Stage采用是PEMS04的短序列数据（12个时间步）。

如果是的话，在Pre-training Stage中如何做到不会泄漏Forecasting Stage需要预测的数据？因为文章中采用的数据集都是按照错位滑动获取的。例如1-12时间步的数据预测结果是13-24，2-13的时间步的数据预测结果是14-25，以此类推。如果作者使用长训练数据，假设使用为1-2016时间步的连续数据进行Pre-training，然后利用这些数据生成的隐藏特征H在输入至GWN，此时假设GWN输入的短序列数据为1-12，那么将预测13-24，然而实际上13-24在Pre-training Stage已经看到了。在此有些疑惑，这样会不会就导致了标签泄漏？

期待您的回复！

No moduel named "'step" again

I am sorry for that I meet a problem which have appeared before. I have tried the solutions like set the work directory and run in "cmd", but failed. And I really don't know what can I do next.....

训练、测试过程的loss值可视化

作者您好！如果我想将模型的训练过程和测试过程的loss可视化（使用matplotlib等工具绘制二维折线图展现epochs-loss变化），应该在文档的那个地方添加可视化函数呢？没有找到mae、mape、rmse的计算位置。希望您能解答我的问题，谢谢！

复现出现张量维度不匹配

你好师兄，我是一个TS小白，我在跑你的STEP代码一直出现这个张量维度不匹配的问题
Traceback (most recent call last):
File "D:\Experiment\Pycode\STEP-github\step\run.py", line 33, in
launch_training(args.cfg, args.gpus)
File "D:\Experiment\Pycode\STEP-github\basicts\launcher.py", line 20, in launch_training
easytorch.launch_training(cfg=cfg, devices=gpus, node_rank=node_rank)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\easytorch\launcher\launcher.py", line 86, in launch_training
train_dist(cfg)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\easytorch\launcher\launcher.py", line 35, in training_func
raise e
File "D:\APP\Anaconda\envs\TS\lib\site-packages\easytorch\launcher\launcher.py", line 31, in training_func
runner.train(cfg)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\easytorch\core\runner.py", line 339, in train
loss = self.train_iters(epoch, iter_index, data)
File "D:\Experiment\Pycode\STEP-github\basicts\runners\base_tsf_runner.py", line 238, in train_iters
forward_return = list(self.forward(data=data, epoch=epoch, iter_num=iter_num, train=True))
File "D:\Experiment\Pycode\STEP-github\step\step_runner\step_runner.py", line 66, in forward
prediction, pred_adj, prior_adj, gsl_coefficient = self.model(history_data=history_data, long_history_data=long_history_data, future_data=None, batch_seen=iter_num, epoch=epoch)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Experiment\Pycode\STEP-github\step\step_arch\step.py", line 65, in forward
y_hat = self.backend(short_term_history, hidden_states=hidden_states, sampled_adj=sampled_adj).transpose(1, 2)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Experiment\Pycode\STEP-github\step\step_arch\graphwavenet\model.py", line 187, in forward
gate = self.gate_convsi
File "D:\APP\Anaconda\envs\TS\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\torch\nn\modules\conv.py", line 307, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\APP\Anaconda\envs\TS\lib\site-packages\torch\nn\modules\conv.py", line 303, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [4, 32, 207, 2017]
我询问GPT也调试了许久，没能解决这个问题，能向您请教以下解决方案么（我只修改了文件路径，模型里代码我没有进行任何操作,您的另外两个模型我都能跑通，只有这个出现了报错，所以我想跟你请教一下）
如果大佬您有空的话，希望可以得到你的回复

launch_training() got an unexpected keyword argument 'devices'

Hello, I want to find out if the bug can be solved?
And there is also a bug I've encountered when I try to add some codes, I am not sure if anyone has encountered this problem before:
rv = reductor(4)
TypeError: cannot pickle 'module' object

Easy Torch Documentation

Hello!
I say the preprint and I am interested in using this architecture for a problem that I am working on. I figure the easiest way to change it to work on my own problem is to change on of the easy torch files to fit my data. Is there any documentation and an example file explaining in more detail how it works?
Thanks!

预训练相关问题

您好，有两个问题向您请教：
1.您的预训练时间是多少？预训练跑得慢吗？
2.我在用您的模型跑我自己的数据集时，将训练好的模型接下游任务时，每个epoch的耗时很长，是否意味着您的下游任务每一次都会调用您的预训练任务呢？

数据集泛化问题

作者您好，想咨询一下您有没有遇见换数据集后训练损失很大的情况？同时验证集损失不下降

RuntimeError: Numpy is not available

window系统，numpy== 1.22.3， torch ==1.10 ，easytorch==1.2.10 ，很奇怪这个错误两天了，尝试过对torch升级降级，尝试过对numpy升级降级，网上的方法都是说版本不兼容，改完还是一直跑不起来，不知道作者有没有遇到过相似的问题，或者解决思路，麻烦告知！！！辛苦了！非常谢谢！！

File "D:\project\STEP-github\basicts\launcher.py", line 20, in launch_training
easytorch.launch_training(cfg=cfg, devices=gpus, node_rank=node_rank)
TypeError: launch_training() got an unexpected keyword argument 'devices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\launcher\launcher.py", line 52, in training_func
runner.train(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\core\runner.py", line 333, in train
self.init_training(cfg)
File "D:\project\STEP-github\basicts\runners\base_tsf_runner.py", line 61, in init_training
super().init_training(cfg)
File "D:\project\STEP-github\basicts\runners\base_runner.py", line 86, in init_training
super().init_training(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\core\runner.py", line 396, in init_training
self.train_data_loader = self.build_train_data_loader(cfg)
File "D:\project\STEP-github\basicts\runners\base_runner.py", line 63, in build_train_data_loader
train_data_loader = super().build_train_data_loader(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\core\runner.py", line 162, in build_train_data_loader
dataset = self.build_train_dataset(cfg)
File "D:\project\STEP-github\basicts\runners\base_tsf_runner.py", line 111, in build_train_dataset
dataset = cfg"DATASET_CLS"
File "D:\project\STEP-github\basicts\data\dataset.py", line 21, in init
self.data = torch.from_numpy(processed_data).float()
RuntimeError: Numpy is not available

Traceback (most recent call last):
File "D:\project\STEP-github\basicts\launcher.py", line 20, in launch_training
easytorch.launch_training(cfg=cfg, devices=gpus, node_rank=node_rank)
TypeError: launch_training() got an unexpected keyword argument 'devices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\project\STEP-github\step\run.py", line 26, in
launch_training(args.cfg, args.gpus)
File "D:\project\STEP-github\basicts\launcher.py", line 24, in launch_training
easytorch.launch_training(cfg=cfg, gpus=gpus, node_rank=node_rank)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\launcher\launcher.py", line 93, in launch_training
train_dist(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\launcher\launcher.py", line 56, in training_func
raise e
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\launcher\launcher.py", line 52, in training_func
runner.train(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\core\runner.py", line 333, in train
self.init_training(cfg)
File "D:\project\STEP-github\basicts\runners\base_tsf_runner.py", line 61, in init_training
super().init_training(cfg)
File "D:\project\STEP-github\basicts\runners\base_runner.py", line 86, in init_training
super().init_training(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\core\runner.py", line 396, in init_training
self.train_data_loader = self.build_train_data_loader(cfg)
File "D:\project\STEP-github\basicts\runners\base_runner.py", line 63, in build_train_data_loader
train_data_loader = super().build_train_data_loader(cfg)
File "C:\Users\123\AppData\Roaming\Python\Python310\site-packages\easytorch\core\runner.py", line 162, in build_train_data_loader
dataset = self.build_train_dataset(cfg)
File "D:\project\STEP-github\basicts\runners\base_tsf_runner.py", line 111, in build_train_dataset
dataset = cfg"DATASET_CLS"
File "D:\project\STEP-github\basicts\data\dataset.py", line 21, in init
self.data = torch.from_numpy(processed_data).float()
RuntimeError: Numpy is not available

作者你好，请问这个可以用单步预测里面吗，类似mtgnn 里exchage那种的

如题

关于easytorch

显示“ module 'easytorch' has no attribute 'launch_training'”，如果改成easy-torch则会显示“ModuleNotFoundError: No module named ' basicts.data.'”请问这个问题应该怎么解决呀~

您好，十分感谢您的工作，但是我遇到一个问题

图中有8块gpu从头训练TSFormer，但是从日志里面看初始化好模型之后就没有后续的训练过程，日志也没有报错，gpu也显示占用着内存在运行，请问下这是什么情况，烦请解答，感谢！

请问如何用TSformer预训练模型，是否有readme文件介绍步骤

作者，您好！
请问如何用TSformer预训练模型，是否有readme文件介绍步骤？
比如，下面的文件是怎么得到的，谢谢！
TSFormer_METR-LA.pt

Training time

Hi, could you please give me an idea of how long the transformer training takes? Thanks

想知道TSFormer提取长时间序列的效果

TSFormer提取的是长时间特征，但是下游任务是交通时空序列而不是时间序列任务。大佬在做实验的时候有对比其它时间序列模型的效果吗？或者当下游任务是时间序列时，能否有效提升效果。

Question Regarding Self-loop Removal in Graph Structure Learning

Hi！While reviewing your code, I noticed that you remove self-loops in the graph structure learning. I am curious about the rationale behind this decision, as I would like to deepen my understanding of the topic. Could you please help me understand why removing self-loops is important or beneficial in Graph Structure Learning?
Thank you for your time and consideration.
Best regards.

GPU memory issue

Dear author,

Thanks for your opensource code!

I currently try to run

python step/run.py --cfg='step/STEP_METR-LA.py' --gpus='0,1'

on two V100 GPUs (with 32GB memory). However, the program fails to run and reported memory overflow. I don't think that this should be the correct case. Could you please tell me what is the normal GPU memory usage? Thanks.

配置“CFG.RE_SCALE=False”

作者您好，再请教一个问题，谢谢！
我们在配置文件 step/TSFormer_METR-LA.py 中加入了配置“CFG.RE_SCALE=False”（basicts中有这种配置），但是，计算出来个各种metrics仍然很大，好像这个配置没有起作用。
请问如何设置，使得得到的metrics（MAE，MSE，RSME 等评价指标）比较小，谢谢您！

ModuleNotFoundError: No module named "'step"

Hello, I have some problems running the training model. After I installed the requirements and processed the data as required, the following errors occurred in the training for the model:

input:
!python step/run.py --cfg='step/STEP_METR-LA.py' --gpus='0'

output:
2022-12-05 11:06:42,782 - easytorch-launcher - INFO - Launching EasyTorch training.
Traceback (most recent call last):
File "D:\python-venv\STEP\step\run.py", line 26, in
launch_training(args.cfg, args.gpus)
File "D:\python-venv\STEP\basicts\launcher.py", line 19, in launch_training
easytorch.launch_training(cfg=cfg, gpus=gpus, node_rank=node_rank)
File "D:\python-venv\venv\lib\site-packages\easytorch\launcher\launcher.py", line 80, in launch_training
cfg = init_cfg(cfg, node_rank == 0)
File "D:\python-venv\venv\lib\site-packages\easytorch\launcher\launcher.py", line 13, in init_cfg
cfg = import_config(cfg, verbose=save)
File "D:\python-venv\venv\lib\site-packages\easytorch\config.py", line 245, in import_config
cfg = import(path, fromlist=[cfg_name]).CFG
ModuleNotFoundError: No module named "'step"

Error for running the code

I've got several errors while running the code.
a).
For the forecasting part,

I run the following command as written in the Readme.md "python step/run.py --cfg='step/step_METR-LA.py' --gpus='0'" then I got
2022-09-29 15:15:25,915 - easytorch-launcher - INFO - Launching EasyTorch training.
Traceback (most recent call last):
File "step/run.py", line 20, in
launch_training(args.cfg, args.gpus)
File "/home/ght/tsformer/STEP/basicts/launcher.py", line 19, in launch_training
easytorch.launch_training(cfg=cfg, gpus=gpus, node_rank=node_rank)
File "/home/ght/anaconda3/envs/torch/lib/python3.8/site-packages/easytorch/launcher/launcher.py", line 58, in launch_training
cfg = init_cfg(cfg, node_rank == 0)
File "/home/ght/anaconda3/envs/torch/lib/python3.8/site-packages/easytorch/config/utils.py", line 210, in init_cfg
cfg = import_config(cfg, verbose=save)
File "/home/ght/anaconda3/envs/torch/lib/python3.8/site-packages/easytorch/config/utils.py", line 173, in import_config
cfg = import(path, fromlist=[cfg_name]).CFG
ModuleNotFoundError: No module named 'step.step_METR-LA'
There seems only one dataset's scripts (METR-LA), no scripts for others like PEMS04 ets.

b).
For the pretraining part,
I run "python step/run.py --cfg='step/TSFormer_METR-LA.py' --gpus='0,1,2,3'",
then I got

2022-09-29 15:05:58,680 - easytorch-training - ERROR - Traceback (most recent call last):
File "/home/ght/anaconda3/envs/torch/lib/python3.8/site-packages/easytorch/launcher/launcher.py", line 30, in training_func
runner.train(cfg)
File "/home/ght/anaconda3/envs/torch/lib/python3.8/site-packages/easytorch/core/runner.py", line 361, in train
self.on_epoch_end(epoch)
File "/home/ght/tsformer/STEP/basicts/runners/base_runner.py", line 141, in on_epoch_end
if self.test_data_loader is not None and epoch % self.test_interval == 0:
AttributeError: 'TSFormerRunner' object has no attribute 'test_data_loader'

I check the scripts, it seems that test_data_loader is not defined.
How can I solve the above issues, looking forward to your reply.
: )

What does the section 4 mean in the readme.md?

Hi!

In the Forecasting Stage of readme.md, you said "Then trian the downstream STGNN (Graph WaveNet) like in section 4". I tried to find that in your paper Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting, but it seems that the section 4 of the paper does not give the exact method on how to continue this step. Could you please give some instructions on how to preceed? Thanks!

Yours,
Can

关于预训练数据问题

作者您好！很感谢您开源了自己的代码以及对此领域做出的贡献。

针对您的工作，有一些疑惑：
Pre-training Stage的数据是怎样生成的呢？我在您的代码里没有找到这部分内容，希望您能给予指导。

期待您的回复！

关于pre-training结果的复现

您好，我想请教您们关于pre-training的复现，我有以下的结果：

[复现实验1] 我能使用您们提供的checkpoints以及代码、参数，复现downstream tasks的结果；
[复现实验2] 我很难使用您们提供的代码与参数，复现pre-training的结果；以及用新pre-training的checkpoints复现downstream tasks的结果。

因为我能有效使用您们提供的checkpoints，所以pre-training的复现问题应该不来源于机器差异或者环境依赖差异；
所以，我想请教下关于pre-training的参数、代码，您们有细微的调整吗？或者引入一些其他的tricks帮助pre-training。

非常感谢！

在训练期间会同时使用CPU 和 GPU进行训练吗？

我在使用提供的代码时发现CPU 和GPU都busy，想咨询下是否同时使用GPU 和GPU一起训练，是为了最大限度利用资源吗？

请问这个库怎么用呢，有没有使用示例

用自己的数据集来预训练模型时，如何配置

作者您好！

我们想用自己的数据集来预训练模型。
我们看了一下 TSFormer_METR-LA.py 文件，里面有些配置不明白。因此，请教一下。

CFG.DATASET_INPUT_LEN = 288 * 7 这个288和7分别表示什么，谢谢！

CFG.MODEL.PARAM = {
"patch_size":12, ############ 请问这个12表示输出时间步吗？
"in_channel":1,
"embed_dim":96,
"num_heads":4,
"mlp_ratio":4,
"dropout":0.1,
"num_token":288 * 7 / 12, ############ 请问这个地方为什么除以12？
"mask_ratio":0.75,
"encoder_depth":4,
"decoder_depth":1,
"mode":"pre-train"
}

从 TSFormer_METR-LA.py 文件看不出原始的输入数据的文件名以及文件扩展名，请问输入数据应该放到哪个文件夹，并且，文件取名有什么要求吗？是不是要这样取名 METR-LA.h5 ？还是要类似于 scaler_in2016_out12.pkl ？

谢谢！

关于数据集

您好，

非常感谢您的工作。请问这三个数据集METR-LA、PEMS-BAY和PEMS04的速度传感器之间的连接情况不会发生变化吧，也就是邻接矩阵不会随着时间推移而发生改变吧？

祝好

Question about optimal mask ratio

Hi, Thank you for your awesome work. I am trying to reproduce the results of your Hyper-parameter Study experiment, but when I changed the mask ratio, I got different results from the paper on all three datasets.
In order to reproduce the problem, I clone STEP locally, use the dataset you provided, and then I only change the mask ratio to 0.25 and GPU num to 4 and then re-train the TSFormer and STEP models. Here are the results.
Result for PEMS04 dataset,

"2023-04-06 11:44:21,101 - easytorch-training - INFO - Epoch 89 / 100
2023-04-06 11:48:05,343 - easytorch-training - INFO - Result : [train_time: 224.24 (s), lr: 6.25e-05, train_MAE: 17.7495, train_RMSE: 28.6461, train_MAPE: 0.1334]
2023-04-06 11:48:05,345 - easytorch-training - INFO - Start validation.
2023-04-06 11:50:55,722 - easytorch-training - INFO - Result : [val_time: 170.38 (s), val_MAE: 18.1383, val_RMSE: 28.1314, val_MAPE: 0.1222]
2023-04-06 11:50:57,130 - easytorch-training - INFO - Checkpoint checkpoints/STEP_100/e87a2127b73d6ba8cda30085f7beb2b3/STEP_best_val_MAE.pt saved
2023-04-06 11:53:47,204 - easytorch-training - INFO - Evaluate best model on test data for horizon 1, Test MAE: 16.3986, Test RMSE: 26.5760, Test MAPE: 0.1097
2023-04-06 11:53:47,209 - easytorch-training - INFO - Evaluate best model on test data for horizon 2, Test MAE: 16.8978, Test RMSE: 27.5110, Test MAPE: 0.1138
2023-04-06 11:53:47,213 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 17.3182, Test RMSE: 28.1838, Test MAPE: 0.1168
2023-04-06 11:53:47,216 - easytorch-training - INFO - Evaluate best model on test data for horizon 4, Test MAE: 17.6034, Test RMSE: 28.6846, Test MAPE: 0.1193
2023-04-06 11:53:47,219 - easytorch-training - INFO - Evaluate best model on test data for horizon 5, Test MAE: 17.8579, Test RMSE: 29.0966, Test MAPE: 0.1208
2023-04-06 11:53:47,222 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 18.0855, Test RMSE: 29.4533, Test MAPE: 0.1219
2023-04-06 11:53:47,226 - easytorch-training - INFO - Evaluate best model on test data for horizon 7, Test MAE: 18.2933, Test RMSE: 29.7824, Test MAPE: 0.1234
2023-04-06 11:53:47,229 - easytorch-training - INFO - Evaluate best model on test data for horizon 8, Test MAE: 18.4903, Test RMSE: 30.0777, Test MAPE: 0.1249
2023-04-06 11:53:47,232 - easytorch-training - INFO - Evaluate best model on test data for horizon 9, Test MAE: 18.6646, Test RMSE: 30.3532, Test MAPE: 0.1263
2023-04-06 11:53:47,235 - easytorch-training - INFO - Evaluate best model on test data for horizon 10, Test MAE: 18.8327, Test RMSE: 30.6055, Test MAPE: 0.1280
2023-04-06 11:53:47,238 - easytorch-training - INFO - Evaluate best model on test data for horizon 11, Test MAE: 19.0170, Test RMSE: 30.8598, Test MAPE: 0.1294
2023-04-06 11:53:47,241 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 19.2633, Test RMSE: 31.1607, Test MAPE: 0.1311
2023-04-06 11:53:47,286 - easytorch-training - INFO - Result : [test_time: 170.16 (s), test_MAE: 18.0602, test_RMSE: 29.3930, test_MAPE: 0.1221]"

Result for METR-LA dataset,

2023-04-10 06:42:39,345 - easytorch-training - INFO - Epoch 96 / 100
2023-04-10 06:45:14,351 - easytorch-training - INFO - Result : [train_time: 155.01 (s), lr: 1.56e-04, train_MAE: 2.7075, train_RMSE: 5.3461, train_MAPE: 0.0707]
2023-04-10 06:45:14,354 - easytorch-training - INFO - Start validation.
2023-04-10 06:45:56,978 - easytorch-training - INFO - Result : [val_time: 42.62 (s), val_MAE: 2.6742, val_RMSE: 5.1144, val_MAPE: 0.0727]
2023-04-10 06:45:59,152 - easytorch-training - INFO - Checkpoint checkpoints/STEP_100/aa28a6aab40136a6af5885691c378f94/STEP_best_val_MAE.pt saved
2023-04-10 06:47:22,991 - easytorch-training - INFO - Evaluate best model on test data for horizon 1, Test MAE: 2.1564, Test RMSE: 3.7474, Test MAPE: 0.0513
2023-04-10 06:47:22,994 - easytorch-training - INFO - Evaluate best model on test data for horizon 2, Test MAE: 2.4244, Test RMSE: 4.4858, Test MAPE: 0.0598
2023-04-10 06:47:22,998 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 2.6010, Test RMSE: 4.9821, Test MAPE: 0.0661
2023-04-10 06:47:23,002 - easytorch-training - INFO - Evaluate best model on test data for horizon 4, Test MAE: 2.7384, Test RMSE: 5.3681, Test MAPE: 0.0713
2023-04-10 06:47:23,005 - easytorch-training - INFO - Evaluate best model on test data for horizon 5, Test MAE: 2.8518, Test RMSE: 5.6969, Test MAPE: 0.0758
2023-04-10 06:47:23,009 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 2.9472, Test RMSE: 5.9612, Test MAPE: 0.0793
2023-04-10 06:47:23,012 - easytorch-training - INFO - Evaluate best model on test data for horizon 7, Test MAE: 3.0293, Test RMSE: 6.1882, Test MAPE: 0.0824
2023-04-10 06:47:23,015 - easytorch-training - INFO - Evaluate best model on test data for horizon 8, Test MAE: 3.1039, Test RMSE: 6.3804, Test MAPE: 0.0852
2023-04-10 06:47:23,019 - easytorch-training - INFO - Evaluate best model on test data for horizon 9, Test MAE: 3.1690, Test RMSE: 6.5411, Test MAPE: 0.0877
2023-04-10 06:47:23,023 - easytorch-training - INFO - Evaluate best model on test data for horizon 10, Test MAE: 3.2300, Test RMSE: 6.6843, Test MAPE: 0.0900
2023-04-10 06:47:23,026 - easytorch-training - INFO - Evaluate best model on test data for horizon 11, Test MAE: 3.2900, Test RMSE: 6.8177, Test MAPE: 0.0923
2023-04-10 06:47:23,030 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 3.3564, Test RMSE: 6.9536, Test MAPE: 0.0946
2023-04-10 06:47:23,075 - easytorch-training - INFO - Result : [test_time: 83.92 (s), test_MAE: 2.9081, test_RMSE: 5.8956, test_MAPE: 0.0780]

Result for PEMS-BAY dataset,

2023-04-12 21:09:24,701 - easytorch-training - INFO - Epoch 99 / 100
2023-04-12 21:17:14,073 - easytorch-training - INFO - Result : [train_time: 469.37 (s), lr: 3.13e-05, train_MAE: 1.4102, train_RMSE: 3.1304, train_MAPE: 0.0307]
2023-04-12 21:17:14,076 - easytorch-training - INFO - Start validation.
2023-04-12 21:19:04,273 - easytorch-training - INFO - Result : [val_time: 110.20 (s), val_MAE: 1.4673, val_RMSE: 3.0659, val_MAPE: 0.0333]
2023-04-12 21:19:07,612 - easytorch-training - INFO - Checkpoint checkpoints/STEP_100/88e31eeaa70eda996a9b04849238f61f/STEP_best_val_MAE.pt saved
2023-04-12 21:22:46,004 - easytorch-training - INFO - Evaluate best model on test data for horizon 1, Test MAE: 0.8249, Test RMSE: 1.5080, Test MAPE: 0.0159
2023-04-12 21:22:46,013 - easytorch-training - INFO - Evaluate best model on test data for horizon 2, Test MAE: 1.0714, Test RMSE: 2.1557, Test MAPE: 0.0215
2023-04-12 21:22:46,023 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 1.2304, Test RMSE: 2.6389, Test MAPE: 0.0256
2023-04-12 21:22:46,031 - easytorch-training - INFO - Evaluate best model on test data for horizon 4, Test MAE: 1.3444, Test RMSE: 2.9996, Test MAPE: 0.0288
2023-04-12 21:22:46,038 - easytorch-training - INFO - Evaluate best model on test data for horizon 5, Test MAE: 1.4280, Test RMSE: 3.2660, Test MAPE: 0.0313
2023-04-12 21:22:46,046 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 1.4926, Test RMSE: 3.4677, Test MAPE: 0.0333
2023-04-12 21:22:46,054 - easytorch-training - INFO - Evaluate best model on test data for horizon 7, Test MAE: 1.5432, Test RMSE: 3.6218, Test MAPE: 0.0349
2023-04-12 21:22:46,062 - easytorch-training - INFO - Evaluate best model on test data for horizon 8, Test MAE: 1.5850, Test RMSE: 3.7426, Test MAPE: 0.0363
2023-04-12 21:22:46,070 - easytorch-training - INFO - Evaluate best model on test data for horizon 9, Test MAE: 1.6201, Test RMSE: 3.8362, Test MAPE: 0.0374
2023-04-12 21:22:46,079 - easytorch-training - INFO - Evaluate best model on test data for horizon 10, Test MAE: 1.6513, Test RMSE: 3.9144, Test MAPE: 0.0385
2023-04-12 21:22:46,088 - easytorch-training - INFO - Evaluate best model on test data for horizon 11, Test MAE: 1.6804, Test RMSE: 3.9816, Test MAPE: 0.0393
2023-04-12 21:22:46,096 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 1.7121, Test RMSE: 4.0462, Test MAPE: 0.0402
2023-04-12 21:22:46,249 - easytorch-training - INFO - Result : [test_time: 218.64 (s), test_MAE: 1.4320, test_RMSE: 3.3536, test_MAPE: 0.0319]
2023-04-12 21:22:47,739 - easytorch-training - INFO - Checkpoint checkpoints/STEP_100/88e31eeaa70eda996a9b04849238f61f/STEP_099.pt saved

Here is a comparison of the two results, STEP's results are from https://arxiv.org/pdf/2206.09113.pdf, and https://github.com/zezhishao/BasicTS.

I would like to ask you to help me figure it out.
Looking forward to your reply.
Best regards

关于re-scale data的问题

您好，

非常感谢您的工作。我在base_tsf_runner.py文件中发现val_iters以及train_iters函数中都有一个re-scale data的操作。即
prediction_rescaled = SCALER_REGISTRY.get(self.scaler["func"])(forward_return[0], **self.scaler["args"]) real_value_rescaled = SCALER_REGISTRY.get(self.scaler["func"])(forward_return[1], **self.scaler["args"])
请问这两行代码功能是什么，为什么不能直接用forward计算出来的pred计算loss和metric.

祝好！

Question about time-dependent structure learning

Dear authors,

Thanks very much for your open-source code. I find it very helpful.

I have a minor question as I go through your code. In the paper, you say that you learn structures jointly with $\mathbf{G}^i$, which is global, and $\mathbf{H}^i$, which is time dependent (Eqn. 2). However, when I go through your code, you comment out the $\mathbf{H}$ part (link) as you say that it may lead to instability.

I am curious about it. As training STEP is time-consuming, could you please tell something about the consequences of including time-variant features for learning graph? Thanks for your help.

如何在自己的数据集上训练模型

请问大神如何用自己制作数据集用STEP训练模型？

About STEP-DCRNN

I noticed that step-dcrnn experiments appear in the ablation experiments section, how did you incorporate step into dcrnn? A code example would be helpful.

easytorch-training - ERROR RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [8, 32, 207, 13]

感谢老哥之前的回复，我现在重新调整了环境，但是还是遇到了一些问题，不知道之前您这边遇到过吗？

`2022-12-07 13:27:34,613 - easytorch-training - INFO - Epoch 1 / 100
0%| | 0/2997 [00:08<?, ?it/s]
2022-12-07 13:27:42,909 - easytorch-training - ERROR - Traceback (most recent call last):
File "D:\python-venv\STEP\venv2\lib\site-packages\easytorch\launcher\launcher.py", line 52, in training_func
runner.train(cfg)
File "D:\python-venv\STEP\venv2\lib\site-packages\easytorch\core\runner.py", line 351, in train
loss = self.train_iters(epoch, iter_index, data)
File "D:\python-venv\STEP\basicts\runners\base_tsf_runner.py", line 237, in train_iters
forward_return = list(self.forward(data=data, epoch=epoch, iter_num=iter_num, train=True))
File "D:\python-venv\STEP\step\step_runner\step_runner.py", line 66, in forward
prediction, pred_adj, prior_adj, gsl_coefficient = self.model(history_data=history_data, long_history_data=long_history_data, future_data=None, batch_seen=iter_num, epoch=epoch)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\python-venv\STEP\step\step_arch\step.py", line 65, in forward
y_hat = self.backend(short_term_history, hidden_states=hidden_states, sampled_adj=sampled_adj).transpose(1, 2)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\python-venv\STEP\step\step_arch\graphwavenet\model.py", line 197, in forward
gate = self.gate_convsi
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [8, 32, 207, 13]

Traceback (most recent call last):
File "D:\python-venv\STEP\step\run.py", line 27, in
launch_training(args.cfg, args.gpus)
File "D:\python-venv\STEP\basicts\launcher.py", line 19, in launch_training
easytorch.launch_training(cfg=cfg, gpus=gpus, node_rank=node_rank)
File "D:\python-venv\STEP\venv2\lib\site-packages\easytorch\launcher\launcher.py", line 93, in launch_training
train_dist(cfg)
File "D:\python-venv\STEP\venv2\lib\site-packages\easytorch\launcher\launcher.py", line 56, in training_func
raise e
File "D:\python-venv\STEP\venv2\lib\site-packages\easytorch\launcher\launcher.py", line 52, in training_func
runner.train(cfg)
File "D:\python-venv\STEP\venv2\lib\site-packages\easytorch\core\runner.py", line 351, in train
loss = self.train_iters(epoch, iter_index, data)
File "D:\python-venv\STEP\basicts\runners\base_tsf_runner.py", line 237, in train_iters
forward_return = list(self.forward(data=data, epoch=epoch, iter_num=iter_num, train=True))
File "D:\python-venv\STEP\step\step_runner\step_runner.py", line 66, in forward
prediction, pred_adj, prior_adj, gsl_coefficient = self.model(history_data=history_data, long_history_data=long_history_data, future_data=None, batch_seen=iter_num, epoch=epoch)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\python-venv\STEP\step\step_arch\step.py", line 65, in forward
y_hat = self.backend(short_term_history, hidden_states=hidden_states, sampled_adj=sampled_adj).transpose(1, 2)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\python-venv\STEP\step\step_arch\graphwavenet\model.py", line 197, in forward
gate = self.gate_convsi
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\python-venv\STEP\venv2\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [8, 32, 207, 13]
(venv2) PS D:\python-venv\STEP>
`

借鉴您的预训练模型

大佬您好，我也想使用预训练的**，用在生成对抗网络上，生成一段时间序列，请问您有甚麽好的建议吗？看您的代码，在预训练部分理解了，在后面的预测部分有点卡壳，之前没接触过GNN，代码没看懂，想听听您的意见。

RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [8, 32, 207, 13]

Hello,

I got a 8GB 2080 super GPU and when I followed your instructions to change the batch size and GPU number in the STEP_METR-LA.py file as follows:

import os
import sys


# TODO: remove it when basicts can be installed by pip
sys.path.append(os.path.abspath(__file__ + "/../../.."))
import torch
from easydict import EasyDict
from basicts.utils.serialization import load_adj

from .step_arch import STEP
from .step_runner import STEPRunner
from .step_loss import step_loss
from .step_data import ForecastingDataset


CFG = EasyDict()

# ================= general ================= #
CFG.DESCRIPTION = "STEP(METR-LA) configuration"
CFG.RUNNER = STEPRunner
CFG.DATASET_CLS = ForecastingDataset
CFG.DATASET_NAME = "METR-LA"
CFG.DATASET_TYPE = "Traffic speed"
CFG.DATASET_INPUT_LEN = 12
CFG.DATASET_OUTPUT_LEN = 12
CFG.DATASET_ARGS = {
    "seq_len": 288 * 7
    }
CFG.GPU_NUM = 1

# ================= environment ================= #
CFG.ENV = EasyDict()
CFG.ENV.SEED = 0
CFG.ENV.CUDNN = EasyDict()
CFG.ENV.CUDNN.ENABLED = True

# ================= model ================= #
CFG.MODEL = EasyDict()
CFG.MODEL.NAME = "STEP"
CFG.MODEL.ARCH = STEP
adj_mx, _ = load_adj("datasets/" + CFG.DATASET_NAME + "/adj_mx.pkl", "doubletransition")
CFG.MODEL.PARAM = {
    "dataset_name": CFG.DATASET_NAME,
    "pre_trained_tsformer_path": "tsformer_ckpt/TSFormer_METR-LA.pt",
    "tsformer_args": {
                    "patch_size":12,
                    "in_channel":1,
                    "embed_dim":96,
                    "num_heads":4,
                    "mlp_ratio":4,
                    "dropout":0.1,
                    "num_token":288 * 7 / 12,
                    "mask_ratio":0.75,
                    "encoder_depth":4,
                    "decoder_depth":1,
                    "mode":"forecasting"
    },
    "backend_args": {
                    "num_nodes" : 207,
                    "supports"  :[torch.tensor(i) for i in adj_mx],         # the supports are not used
                    "dropout"   : 0.3,
                    "gcn_bool"  : True,
                    "addaptadj" : True,
                    "aptinit"   : None,
                    "in_dim"    : 2,
                    "out_dim"   : 12,
                    "residual_channels" : 32,
                    "dilation_channels" : 32,
                    "skip_channels"     : 256,
                    "end_channels"      : 512,
                    "kernel_size"       : 2,
                    "blocks"            : 4,
                    "layers"            : 2
    },
    "dgl_args": {
                "dataset_name": CFG.DATASET_NAME,
                "k": 10,
                "input_seq_len": CFG.DATASET_INPUT_LEN,
                "output_seq_len": CFG.DATASET_OUTPUT_LEN
    }
}
CFG.MODEL.FROWARD_FEATURES = [0, 1, 2]
CFG.MODEL.TARGET_FEATURES = [0]
CFG.MODEL.DDP_FIND_UNUSED_PARAMETERS = True

# ================= optim ================= #
CFG.TRAIN = EasyDict()
CFG.TRAIN.LOSS = step_loss
CFG.TRAIN.OPTIM = EasyDict()
CFG.TRAIN.OPTIM.TYPE = "Adam"
CFG.TRAIN.OPTIM.PARAM= {
    "lr":0.005,
    "weight_decay":1.0e-5,
    "eps":1.0e-8,
}
CFG.TRAIN.LR_SCHEDULER = EasyDict()
CFG.TRAIN.LR_SCHEDULER.TYPE = "MultiStepLR"
CFG.TRAIN.LR_SCHEDULER.PARAM= {
    "milestones":[1, 18, 36, 54, 72],
    "gamma":0.5
}

# ================= train ================= #
CFG.TRAIN.CLIP_GRAD_PARAM = {
    "max_norm": 3.0
}
CFG.TRAIN.NUM_EPOCHS = 100
CFG.TRAIN.CKPT_SAVE_DIR = os.path.join(
    "checkpoints",
    "_".join([CFG.MODEL.NAME, str(CFG.TRAIN.NUM_EPOCHS)])
)
# train data
CFG.TRAIN.DATA = EasyDict()
CFG.TRAIN.NULL_VAL = 0.0
# read data
CFG.TRAIN.DATA.DIR = "datasets/" + CFG.DATASET_NAME
# dataloader args, optional
CFG.TRAIN.DATA.BATCH_SIZE = 8
CFG.TRAIN.DATA.PREFETCH = False
CFG.TRAIN.DATA.SHUFFLE = True
CFG.TRAIN.DATA.NUM_WORKERS = 2
CFG.TRAIN.DATA.PIN_MEMORY = True
# curriculum learning
CFG.TRAIN.CL = EasyDict()
CFG.TRAIN.CL.WARM_EPOCHS = 0
CFG.TRAIN.CL.CL_EPOCHS = 6
CFG.TRAIN.CL.PREDICTION_LENGTH = 12

# ================= validate ================= #
CFG.VAL = EasyDict()
CFG.VAL.INTERVAL = 1
# validating data
CFG.VAL.DATA = EasyDict()
# read data
CFG.VAL.DATA.DIR = "datasets/" + CFG.DATASET_NAME
# dataloader args, optional
CFG.VAL.DATA.BATCH_SIZE = 8
CFG.VAL.DATA.PREFETCH = False
CFG.VAL.DATA.SHUFFLE = False
CFG.VAL.DATA.NUM_WORKERS = 2
CFG.VAL.DATA.PIN_MEMORY = True

# ================= test ================= #
CFG.TEST = EasyDict()
CFG.TEST.INTERVAL = 1
# evluation
# test data
CFG.TEST.DATA = EasyDict()
# read data
CFG.TEST.DATA.DIR = "datasets/" + CFG.DATASET_NAME
# dataloader args, optional
CFG.TEST.DATA.BATCH_SIZE =8
CFG.TEST.DATA.PREFETCH = False
CFG.TEST.DATA.SHUFFLE = False
CFG.TEST.DATA.NUM_WORKERS = 2
CFG.TEST.DATA.PIN_MEMORY = True

The main change here is that the batch sizes were set to 8 and the GPU number was set to 1.
However,when I run the command " python step/run.py --cfg='step/STEP_METR-LA.py' --gpus='0' " , I got an error like below:

Traceback (most recent call last):
  File "/home/trp-mrta/桌面/STEP/STEP/step/run.py", line 26, in <module>
    launch_training(args.cfg, args.gpus)
  File "/home/trp-mrta/桌面/STEP/STEP/basicts/launcher.py", line 19, in launch_training
    easytorch.launch_training(cfg=cfg, gpus=gpus, node_rank=node_rank)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/easytorch/launcher/launcher.py", line 93, in launch_training
    train_dist(cfg)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/easytorch/launcher/launcher.py", line 56, in training_func
    raise e
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/easytorch/launcher/launcher.py", line 52, in training_func
    runner.train(cfg)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/easytorch/core/runner.py", line 351, in train
    loss = self.train_iters(epoch, iter_index, data)
  File "/home/trp-mrta/桌面/STEP/STEP/basicts/runners/base_tsf_runner.py", line 237, in train_iters
    forward_return = list(self.forward(data=data, epoch=epoch, iter_num=iter_num, train=True))
  File "/home/trp-mrta/桌面/STEP/STEP/step/step_runner/step_runner.py", line 66, in forward
    prediction, pred_adj, prior_adj, gsl_coefficient = self.model(history_data=history_data, long_history_data=long_history_data, future_data=None, batch_seen=iter_num, epoch=epoch)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/trp-mrta/桌面/STEP/STEP/step/step_arch/step.py", line 65, in forward
    y_hat = self.backend(short_term_history, hidden_states=hidden_states, sampled_adj=sampled_adj).transpose(1, 2)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/trp-mrta/桌面/STEP/STEP/step/step_arch/graphwavenet/model.py", line 197, in forward
    gate = self.gate_convs[i](residual)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 307, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/trp-mrta/anaconda3/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 303, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [8, 32, 207, 13]

Is anything I can do to fix this problem？ Thanks in advance!

Can Zhang

ModuleNotFoundError for easytorch

When I run the following command, I get error.
I have already installed easytorch-3.4.7.

bash scripts/data_preparation/all.sh
Traceback (most recent call last):
File "scripts/data_preparation/METR-LA/generate_training_data.py", line 12, in
from basicts.data.transform import standard_transform
File "/home/ght/tsformer/STEP/basicts/data/init.py", line 3, in
from easytorch.utils.registry import scan_modules
ModuleNotFoundError: No module named 'easytorch.utils.registry'
Traceback (most recent call last):
File "scripts/data_preparation/METR-LA/generate_training_data.py", line 12, in
from basicts.data.transform import standard_transform
File "/home/ght/tsformer/STEP/basicts/data/init.py", line 3, in
from easytorch.utils.registry import scan_modules
ModuleNotFoundError: No module named 'easytorch.utils.registry'

What's the problem? Is it because of the version of easytorch? If so, then what is the right version number?
Looking forward to your reply
: )

如何在其他数据集上测试模型的性能。

作者您好！我还有一个问题，我在METR-LA数据集上训练得到了一个模型，但是我想测试这个模型（.pt）在其他数据集（比如PEMS系列数据集）上的性能，应该怎么做？我看到工程中有test/test_inference.py文件，这应该是预测部分的代码把，但是我看这个这个预测部分指定测试数据集的方式是通过参数“ parser.add_argument('-c', '--cfg', default="step/STEP_METR-LA.py", help='training config')”指定训练时的配置文件来找到测试的数据集，这样就没办法在别的数据集上测试模型的性能，只能在训练时的数据集上测试模型的性能？
希望您能解答我的问题，十分感谢！

Tsformer pretrain question

In the file tsformer_runner.py, it uses method "test" to do the test while training, however, I found that it only use the last batch of the test_dataloader. Is there something wrong with the for loop? I'm I wrong?
` @torch.no_grad()
@master_only
def test(self):
"""Evaluate the model.

    Args:
        train_epoch (int, optional): current epoch if in training process.
    """

    for _, data in enumerate(self.test_data_loader):
        forward_return = self.forward(data=data, epoch=None, iter_num=None, train=False)
    # re-scale data
    prediction_rescaled = SCALER_REGISTRY.get(self.scaler["func"])(forward_return[0], **self.scaler["args"])
    real_value_rescaled = SCALER_REGISTRY.get(self.scaler["func"])(forward_return[1], **self.scaler["args"])
    # metrics
    for metric_name, metric_func in self.metrics.items():
        metric_item = metric_func(prediction_rescaled, real_value_rescaled, null_val=self.null_val)
        self.update_epoch_meter("test_"+metric_name, metric_item.item())`

Module import error

Hello, this is a nice work, but I meet some problems when running the code

Input:

python step/run.py --cfg='step/TSFormer_PEMS04.py' --gpus='0, 1'

Output
Exception has occurred: TypeError
the 'package' argument is required to perform a relative import for '.basicts.data.dataset'
File "E:\code\Traffic\Analysis\STEP\basicts\data_init_.py", line 10, in
scan_modules(os.getcwd(), file, ["init.py", "registry.py"])
File "E:\code\Traffic\Analysis\STEP\step\step_runner\tsformer_runner.py", line 4, in
from basicts.data.registry import SCALER_REGISTRY
File "E:\code\Traffic\Analysis\STEP\step\step_runner_init_.py", line 1, in
from .tsformer_runner import TSFormerRunner
File "E:\code\Traffic\Analysis\STEP\step\TSFormer_PEMS04.py", line 10, in
from .step_runner import TSFormerRunner
File "E:\code\Traffic\Analysis\STEP\basicts\launcher.py", line 19, in launch_training
easytorch.launch_training(cfg=cfg, gpus=gpus, node_rank=node_rank)
File "E:\code\Traffic\Analysis\STEP\step\run.py", line 26, in
launch_training(args.cfg, args.gpus)
TypeError: the 'package' argument is required to perform a relative import for '.basicts.data.dataset'

Hope to add visual part of the code

Can you give the visual code of Figure 3？
BasicTS also does not provide the code of the prediction part, can this part be provided?

数据集替换

作者你好，我在尝试将数据集换成其他数据集时遇到这个问题，我想知道这两个离散图的参数长度是通过什么样的方式获取的呢

zezhishao / step Goto Github PK

step's People

Stargazers

Watchers

Forkers

step's Issues

Recommend Projects

Recommend Topics

Recommend Org