Giter Site home page Giter Site logo

densetnt's Introduction

DenseTNT

  • This is the official implementation of the paper: DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets (ICCV 2021).
  • DenseTNT v1.0 was released in November 1st, 2021.
  • Updates:
    • June 24th, 2023: Add evaluation metrics for Argoverse 2.
    • Sep 3, 2022: Add training code for Argoverse 2.
    • July 25th, 2022: Add detailed code comments.

Argoverse Version

This branch is for Argoverse 2. Code for Argoverse 1 is at another branch.

Quick Start

Requires:

  • Python ≥ 3.8
  • PyTorch ≥ 1.6

1) Install Packages

 pip install -r requirements.txt

2) Install Argoverse 2

Argoverse 2 requires Python ≥ 3.8

pip install av2

3) Compile Cython

Compile a .pyx file into a C file using Cython (already installed at step 1):

cd src/ && cython -a utils_cython.pyx && python setup.py build_ext --inplace && cd ../

Performance

Results on Argoverse 2:

brier-minFDE minADE minFDE MR
validation set 2.38 1.00 1.71 0.216

DenseTNT

1) Train

Suppose the training data of Argoverse motion forecasting is at ./data/train/.

OUTPUT_DIR=argoverse2.densetnt.1; \
GPU_NUM=8; \
python src/run.py --argoverse --argoverse2 --future_frame_num 60 \
  --do_train --data_dir data/train/ --output_dir ${OUTPUT_DIR} \
  --hidden_size 128 --train_batch_size 64 --use_map \
  --core_num 16 --use_centerline --distributed_training ${GPU_NUM} \
  --other_params \
    semantic_lane direction l1_loss \
    goals_2D enhance_global_graph subdivide goal_scoring laneGCN point_sub_graph \
    lane_scoring complete_traj complete_traj-3 \

2) Evaluate

Suppose the validation data of Argoverse motion forecasting is at ./data/val/.

  • Optimize minFDE:
    • Add --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1 to the end of the training command.

3) Train Set Predictor (Optional)

Compared with the optimization algorithm (default setting), the set predictor has similar performance but faster inference speed.

After training DenseTNT, suppose the model path is at argoverse2.densetnt.1/model_save/model.16.bin. The command for training the set predictor is:

OUTPUT_DIR=argoverse2.densetnt.set_predict.1; \
MODEL_PATH=argoverse2.densetnt.1/model_save/model.16.bin; \
GPU_NUM=8; \
python src/run.py --argoverse --argoverse2 --future_frame_num 60 \
  --do_train --data_dir data/train/ --output_dir ${OUTPUT_DIR} \
  --hidden_size 128 --train_batch_size 64 --use_map \
  --core_num 16 --use_centerline --distributed_training ${GPU_NUM} \
  --other_params \
    semantic_lane direction l1_loss \
    goals_2D enhance_global_graph subdivide goal_scoring laneGCN point_sub_graph \
    lane_scoring complete_traj \
    set_predict=6 set_predict-6 data_ratio_per_epoch=0.4 set_predict-topk=0 set_predict-one_encoder set_predict-MRratio=0.0 \
    set_predict-train_recover=${MODEL_PATH} \

To evaluate the set predictor, just add --do_eval to the end of this training command.

Results of the set predictor on Argoverse 2:

brier-minFDE minADE minFDE MR
validation set 2.32 0.96 1.62 0.233

Citation

If you find our work useful for your research, please consider citing the paper:

@inproceedings{densetnt,
  title={Densetnt: End-to-end trajectory prediction from dense goal sets},
  author={Gu, Junru and Sun, Chen and Zhao, Hang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15303--15312},
  year={2021}
}

densetnt's People

Contributors

gentlesmile avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

densetnt's Issues

Running this code for single-machine multi-GPUs

Dear authors, thank you for releasing this wonderful code. Great work!

I am trying to run your code for single-machine multi-GPUs. I read PyTorch's docs and tutorials for data parallel training, but did not come up with a method. I am wondering whether you could give me some hints. Thank you.

How to use Pytorch with Waymo data

Hello,

Thanks for sharing the great work.
Since you evaluated your model with Waymo data, I wonder how you could use Waymo dataloader which is in Tensorflow? Is there a way we can bypass that and still have a Pytorch code?
Thanks!

License

Hello

Thank you for a great job!

I would like to use your code.
What is the license for this code?

do_eval

OUTPUT_DIR=models.densetnt.1;
python src/run.py --argoverse --future_frame_num 30
--do_eval --data_dir ./train/data --output_dir ${OUTPUT_DIR}
--hidden_size 128 --train_batch_size 64 --use_map
--core_num 16 --use_centerline
--eval_params optimization MRminFDE cnt_sample=9 opti_time=0.1

Hello, the above is my command when running prediction, but the following problem always occurs. What is the reason?


2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] File "/home/notebook/code/personal/waymo/DenseTNT/src/modeling/vectornet.py", line 32, in forward
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] hidden_states, lengths = utils.merge_tensors(input_list, device)
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] File "/home/notebook/code/personal/S9048717/waymo/DenseTNT/src/utils.py", line 808, in merge_tensors
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] res[i][:tensor.shape[0
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] RuntimeError: The expanded size of the tensor (128) must match the existing size (64) at non-singleton dimension 1. Target sizes: [19, 128]. Tensor sizes: [19, 64]
2022-07-08 14:44:26 2022-07-08 14:44:24.000 [INFO] [Driver] Task task-20220708144211-34515 run failed, Exit code : 1

Question about goals sampling

Hi,

In the ICCV paper,I noticed that you sampled dense goals along the road, as illustrated in your paper:
image

but in this code repo, you sampled goals along the lane centerline, which is as same as the vanilla TNT:

for index_polygon, polygon in enumerate(polygons):
for i, point in enumerate(polygon):
hash = get_hash(point)
if hash not in visit:
visit[hash] = True
points.append(point)
if 'subdivide' in args.other_params:
subdivide_points = get_subdivide_points(polygon)
points.extend(subdivide_points)
subdivide_points = get_subdivide_points(polygon, include_self=True)
mapping['goals_2D'] = np.array(points)

The parameter include_beside of function get_subdivide_points() will produce a denser goal set, but it seems you are not using it.

INTERACTION dataset

Hello,
Thanks for sharing the great work.
I wonder how you use densetnt for INTERACTION dataset. Could you release the related part?
Thanks!

Training time on Waymo

Hi,

Thanks for your awesome work.

How long did you train DenseTNT on the Waymo dataset? And how many GPUs (type) did you use?

minADE is not right

when I trained the Set Predictor, eval results are as follows:

other_errors {'stage_one_k': 3.0803698258030754, 'stage_one_recall': 0.9638398086076706, 'set_MR_pred': 0.0738903979863932, 'set_minFDE_pred': 1.3842987156892936}
{'minADE': 14.246385467933312, 'minFDE': 1.3842987156892916, 'MR': 0.0738903979863932}
ADE 14.316873007381048
DE@1 9.875993647573898
DE@2 19.429086244927365
DE@3 3.496420582410066

the minADE is absolutely wrong. I visualize the pred trajs, they are also not right, I think the complete traj module is not trained sufficient, but I trained the model follow your instructions, so where exactly is the problem?

inference question

Dear MARS Lab,

Thanks for sharing the excellent work!
By following your instruction, the evaluation result is successfully reproduced.

Just wanna ask, do you guys have a plan to upload the inference code?

result on AV2

Argoverse 2 dataset has been released. Have you ever tried to use DenseTNT on it? And what about the result?

Question about Outcomes

Thanks for the great job.
I have some questions about the definition of outcomes. What are stage_one_k and stage_one_recall in other_errors? What are FDE, ADE, DE@1, DE@2, and DE@3 mean? Are FDE and ADE in outcomes represent the minFED1 and minADE1?
Thanks

Why use two optimiziers?

Thanks for sharing this great work. I don't understand why introducing 'optimizer_2' for optimizing the complete trajectory decoder, could you give some directions? Thanks!

About log for loss

Dear authors,

Thanks for sharing this excellent work!
Can you share the log information about training loss? THX

Online Mode

Thank you for releasing your code. And I'm comfused about the offline model to produce pseudo-labels seems not work, or your online or test mode is not released? It seems now only support offline mode, right?
Thank you very much and I'm looking forward to your response.

Model Weights

Can you share your model weights or alternatively your final outputs?

About the goal set predictor (online)

Dear authors,

Thanks for sharing this excellent work!
I noticed that the training command you provided is for offline optimization mode, could you please provide the command for the goal set predictor training (the second stage, online mode)?

A problem about the experiments.

Thank you for sharing the great work! I have a confusion about the experiments:

The running results with this code seem far from the result in the paper. All params are default, after 16 training epochs:
loss=4.445
FDE: 3.1506308334590956
MR(2m,4m,6m): (0.4871615584819174, 0.2329565318727421, 0.13079283688769763)

What could be the problem?

Multi-GPU training does not move on this interface

image
root@18dc3f8e2e1d:/workspace/wangs/DenseTNT# python src/run.py --argoverse --future_frame_num 30 --do_train --data_dir /workspace/datasets/Argoverse/train/data/ --output_dir models.densetnt.1 --hidden_size 128 --train_batch_size 64 --use_map --core_num 16 --use_centerline --distributed_training 8 --other_params semantic_lane direction l1_loss goals_2D enhance_global_graph subdivide goal_scoring laneGCN point_sub_graph lane_scoring complete_traj complete_traj-3
{'add_prefix': None, 'agent_type': None, 'argoverse': True, 'attention_decay': False, 'autoregression': None, 'core_num': 16, 'cuda_visible_device_num': None, 'data_dir': '/workspace/datasets/Argoverse/train/data/', 'data_dir_for_val': 'val/data/', 'debug': False, 'distributed_training': 8, 'do_eval': False, 'do_test': False, 'do_train': True, 'eval_batch_size': 64, 'eval_params': [], 'future_frame_num': 30, 'future_test_frame_num': 16, 'global_graph_depth': 1, 'gpu_split': 0, 'hidden_dropout_prob': 0.1, 'hidden_size': 128, 'initializer_range': 0.02, 'inter_agent_types': None, 'learning_rate': 0.001, 'log_dir': 'models.densetnt.1', 'lstm': False, 'master_port': '12355', 'max_distance': 50.0, 'method_span': [0, 1], 'mode_num': 6, 'model_recover_path': None, 'model_save_dir': 'models.densetnt.1/model_save', 'multi': None, 'nms_threshold': None, 'no_agents': False, 'no_cuda': False, 'no_sub_graph': False, 'not_use_api': False, 'num_train_epochs': 16.0, 'nuscenes': False, 'old_version': False, 'other_params': {'semantic_lane': True, 'direction': True, 'l1_loss': True, 'goals_2D': True, 'enhance_global_graph': True, 'subdivide': True, 'goal_scoring': True, 'laneGCN': True, 'point_sub_graph': True, 'lane_scoring': True, 'complete_traj': True, 'complete_traj-3': True}, 'output_dir': 'models.densetnt.1', 'placeholder': 0.0, 'reuse_temp_file': False, 'seed': 42, 'single_agent': True, 'stage_one_K': None, 'sub_graph_batch_size': 8000, 'sub_graph_depth': 3, 'temp_file_dir': 'models.densetnt.1/temp_file', 'train_batch_size': 64, 'train_extra': False, 'train_params': [], 'use_centerline': True, 'use_map': True, 'visualize': False, 'waymo': False, 'weight_decay': 0.01}

10/21/2022 01:57:04 - INFO - main - ***** args *****
output_dir models.densetnt.1
other_params ['semantic_lane', 'direction', 'l1_loss', 'goals_2D', 'enhance_global_graph', 'subdivide', 'goal_scoring', 'laneGCN', 'point_sub_graph', 'lane_scoring', 'complete_traj', 'complete_traj-3']
10/21/2022 01:57:11 - INFO - main - device: cuda
Loading dataset ['/workspace/datasets/Argoverse/train/data/']
/opt/conda/lib/python3.8/site-packages/scipy/init.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.4)
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
10/21/2022 01:57:12 - INFO - argoverse.data_loading.vector_map_loader - Loaded root: ArgoverseVectorMap
Running DDP on rank 3.
Running DDP on rank 5.
Running DDP on rank 1.
Running DDP on rank 7.
Running DDP on rank 0.
Running DDP on rank 6.
Running DDP on rank 4.
Running DDP on rank 2.
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 6
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 4
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 2
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 3
10/21/2022 01:57:14 - INFO - argoverse.data_loading.vector_map_loader - Loaded root: ArgoverseVectorMap
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 5
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 1
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 7
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
['/workspace/datasets/Argoverse/train/data/129892.csv', '/workspace/datasets/Argoverse/train/data/179439.csv', '/workspace/datasets/Argoverse/train/data/153379.csv', '/workspace/datasets/Argoverse/train/data/11971.csv', '/workspace/datasets/Argoverse/train/data/181683.csv'] ['/workspace/datasets/Argoverse/train/data/209097.csv', '/workspace/datasets/Argoverse/train/data/102649.csv', '/workspace/datasets/Argoverse/train/data/186077.csv', '/workspace/datasets/Argoverse/train/data/74459.csv', '/workspace/datasets/Argoverse/train/data/89887.csv']
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [06:12<00:00, 552.14it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [00:07<00:00, 27049.96it/s]
valid data size is 205942

Question About the dense goal sampling

Hi @GentleSmile,

In your paper, the sampling strategy is densely sampled lines within 50 meters from initial position of target agent's trajectory. However, in your implementation, it seems that you firstly score the sparse goals and then dense the selected top 150 goals to generate dense goals from this 150 goals
(

goals_2D_new = utils.get_neighbour_points(goals_2D[topk_ids], topk_ids=topk_ids, mapping=mapping[i])
),
which is different from your illustration in the paper.
Is this true? If so, why change it?
I am looking forward to your reply. Thank you in advance

End-to-End training or Two-Stage training ?

While a two-stage training schedule is described in the training details section in the paper, the end-to-end training is used in this repo.

Any reasons for the difference?

Some questions about loss function.

Thank you for sharing the great work! I have a confusion about the second stage loss function.

In "def goals_2D_per_example_calc_loss()", there has an nll_loss which caculated by "scores" and "[mapping[i]['goals_2D_labels']]", the "scores" is about dense goals, but the "[mapping[i]['goals_2D_labels']]" seems index from sparse goals. Am I misunderstanding? Why does this work?

hardwares

I'm wondering how much time you train the model.
In fact ,i plan to train the model with two 2080ti but found it takes a lot of time to train(2h /epoch) and i found that it seems that the gpu utility is always low . is that a problem about the code?
Can we accelerate the train progress by preprocessing the data?

eval_batch_size

When use a small eval_batch_size, the eval results will be bad, because global_graph() use the max length in a batch to pad zero in utils.merge_tensors(). Change this 'merge_tensors' to use a fixed length, and then use different eval_batch_size will get the same eval result.
lADPGR5pkZNRijDNAQbNBDg_1080_262

About Visualization

Hi! Thanks again for your great work!
I trained and evaluate your model and the performance matches what you mentioned in your paper.
I'm wondering if you could provide me some method to visualize the prediction or any results, like heatmaps or other traj lines.
I'm a little confuse, is there any function in your code for that? and how to use it?

Some Training strategy

I found that the best model performance is still the default(bz64, epoch16) in the paper by increasing the batch-size 64 to 256 and training epoch from 16 to 64 and changing to cos-lr.

I would like to ask what is a good training strategy to refer to if I want to improve the model performance.

Then it is more doubtful that the model does not become better by increasing the batch-size and epoch training times.

Thanks

Model for Waymo Motion Dataset ?

Thanks for the sharing this awesome code.

Are you also planning to update code for Waymo Motion Dataset ?
the winning model of the 2021 Waymo Motion Prediction Challenge

how to visualization

Your work has been a great help for a beginner, but I don’t understand how to visualize the results obtained? Hope to get your reply, thank you again for your work.

Possible performance matching issue, how to optimize minFDE?

Dear authors,

First of all, thank you soooooo much for sharing this terrific work!
I trained and validated the model using the command you kindly provided in the readme file. The performance on the validation split was ADE=0.79 and FDE= 1.25(FYI, I use 8-GPU setting for training). I think the performance matches the "DenseTNT w/ 100ms optimization" mode in the paper. So I was wondering if the command you provided is for "DenseTNT w/ 100ms optimization" mode and if it is possible for us to train the model with the mode "DenseTNT w/ 100ms optimization (minFDE)". If so, which parameters should we adjust in the training/testing command?

Some questions about code details

I read the code on branch argoverse2, could you help me answer some questions about the code?

  1. dataset_argoverse.py L501 , the code should be like in the following?
lane_type_to_int[LaneType.BIKE] = 2 
lane_type_to_int[LaneType.BUS] = 3
  1. dataset_argoverse.py L527, why the code vectors.append(vector) not in the end of this for loop?
  2. decoder.py L339 loss[i] += F.nll_loss(pred_probs[i].unsqueeze(0), torch.tensor([argmin], device=device)), should the second parameter be torch.tensor([1], device=device)?
  3. dataset_argoverse.py L428 should add:
if len(focal_track.object_states) != 110:
            return None
because some object_states of focal_track is less than 110, or the code won't run. 
  1. argoverse2 didn't train the set_predictor, have you try to train set_predictor? If the performance could be closet to that of offline optimization?

The program is stuck

微信图片_20221020182525
The progress bar will not be updated after arriving here
What could be the problem?

The doubt about anchor-free

I'm confused that the paper said DenseTNT is anchor-free, however, in the code, all the lane boundary point(goals_2D) is the model input, Is this contradictory?

Hung up at random epochs when training

Thanks so much for the sharing and congratulations on your great work! I am looking for your suggestions on what to do with an issue I encountered recently. Thank you really much for your attention!

When I was running the training program, I found that sometimes the program would be hung up at random epochs (Not the 1st epoch, and not at the end of an epoch). When it was hung up, the GPU usage is 0% while the GPU memory remains on the same level, far from being full. For the CPU part, the memory is far from being full and the CPU usage is 0%. The program just stopped outputing, and the the training was not continuing. The only way to exit it is to stop it, and the traceback showed that it stopped at:
while not spawn_context.join():
On 'top', the program sometimes show the status of 'Sleeping' after being hung up. I encountered the problem on different settings (Windows10+WSL+Docker(Ubuntu)+Nvidia RTX 3080; Ubuntu 20.04.4+Nvidia RTX A6000). I am looking forward to your suggestions on this issue.

Thanks again for your kind help and I am looking forward to learning more from you!

CUDA error during training

Dear author, thanks again for sharing the code.

Now I'm trying to train and test on Argoverse Prediction Dataset.

And I met an error at the beginning of training like below.

I'm not sure if it's related with GPU memory or batch_size. (I'm using a single Titan X Pascal)

Can you give me some advice? Thank you !

['train/data/203014.csv', 'train/data/122663.csv', 'train/data/186083.csv', 'train/data/179329.csv', 'train/data/39652.csv'] ['train/data/1859.csv', 'train/data/99352.csv', 'train/data/31180.csv', 'train/data/175042.csv', 'train/data/79405.csv']
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [23:46<00:00, 144.36it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [00:07<00:00, 26514.47it/s]
valid data size is 205942
Traceback (most recent call last):
  File "src/run.py", line 309, in <module>
    main()
  File "src/run.py", line 298, in main
    run(args)
  File "src/run.py", line 280, in run
    while not spawn_context.join():
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/jaehyeon/conda_ws/denseTNT/src/run.py", line 198, in demo_basic
    model = VectorNet(args).to(rank)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 605, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal

minADE is not right

when I trained the Set Predictor, eval results are as follows:

other_errors {'stage_one_k': 3.0803698258030754, 'stage_one_recall': 0.9638398086076706, 'set_MR_pred': 0.0738903979863932, 'set_minFDE_pred': 1.3842987156892936}
{'minADE': 14.246385467933312, 'minFDE': 1.3842987156892916, 'MR': 0.0738903979863932}
ADE 14.316873007381048
DE@1 9.875993647573898
DE@2 19.429086244927365
DE@3 3.496420582410066

the minADE is absolutely wrong. I visualize the pred trajs, they are also not right, I think the complete traj module is not trained sufficient, but I trained the model follow your instructions, so where exactly is the problem?

training Error

when I follow the README.md running it, an error popped up saying StopIteration. Happended on src/dataset_argoverse.py line 450:
root, dirs, cur_files = os.walk(each_dir).next()
StopIteration

How to fix it? Thanks!

pedestrians question

Hi,
Thank you for sharing the great work.
Since densetnt need sampling final points from centerline. So, how to deal with the situation when pedestrians are not on the lane?
I wonder if you could give me some advice.
THX

About model size

Hi authors,

Congratulations on your excellent work! Can you tell me the total number of model parameters used by your best-performed model on the Argoverse dataset?

Meaning of other_params

Hi,

Thank you for sharing a great work.
It would be great if you could provide some documentation about the meaning of the parameters, especially other_params, such as laneGCN-4.

Thank you!

Doubts in the evaluation and optimization section

Hello, I have a question to interrupt you. I reproduced your results experimentally, using a train-before-evaluation approach. But I found that the results after using do_eval (FDE 1.0513439117392331, MR 0.09578942034860154) are far better than when training(FDE: 3.1969465177600322 MR(2m,4m,6m): (0.49478493944897106, 0.23618785871750297, 0.1306969923570714).I found that running do_eval also running recover --model, is the optimization model? If it is a model, the new model generated is not seen.

And the relevant part of the test doesn't seem to be tested after optimization. Evaluate whether the optimization and testing parts are run together. (I just reproduced the results, I didn't study your code, please forgive me if I misunderstood) If you can answer my question, I will be very grateful

MinFDE optimization error

Thank you for your wonderful work!

I'm trying to reproduce your results on Argoverse dataset and encountering the following problem.
Let's say I have a dataset stored in folders /datasets/argoverse/train/data and /datasets/argoverse/val/data. I want to optimize minFDE, so I add the following line to my command: --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1

OUTPUT_DIR=models.densetnt.1; GPU_NUM=8; CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python src/run.py --argoverse --future_frame_num 30 --do_train --data_dir /datasets/argoverse/train/data --output_dir ${OUTPUT_DIR} --hidden_size 128 --train_batch_size 64 --sub_graph_batch_size 4096 --use_map --core_num 16 --use_centerline --distributed_training ${GPU_NUM} --other_params semantic_lane direction l1_loss goals_2D enhance_global_graph subdivide lazy_points new laneGCN point_sub_graph stage_one stage_one_dynamic=0.95 laneGCN-4 point_level point_level-4 point_level-4-3 complete_traj complete_traj-3 --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1 --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1

This command will fail due to this assertion https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/utils.py#L276 because the models.densetnt.1 folder is not created yet and my validation is not in val/data folder.

This brings me to three questions:

  1. Which command should I use to reproduce the experiment?
  2. Should I firstly run the command without --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1 and then run the command with it? What is the intended use case?
  3. Is there a way to specify folders for train and validation in one command? It seems that the path for validation is just hard-coded in the line https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/utils.py#L318

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.