neeharperi / futuredet Goto Github PK

View Code? Open in Web Editor NEW

114.0 114.0 13.0 27.68 MB

Forecasting from LiDAR via Future Object Detection. CVPR '22

Python 87.03% C++ 5.13% Cuda 7.73% Shell 0.12%

futuredet's People

Contributors

Stargazers

Watchers

Forkers

jie311 jlqzzz sinead-li pinkglove lamhocn davidsshi secretsettler enginbozkurt wangjuenew nuyoah0123

futuredet's Issues

using MaP

Hi,
Thank you for sharing your implementation of the FutureDet model. I ran into some issues while using the Map pipeline. Initially, I used the "nusc_centerpoint_forecast_n3dtfm_detection.py" configuration file.
The size of the bev tensor was not (1, 180, 180) as expected.
To address this, I made the following changes:

After making these changes, the size of the bev tensor became (1, 180, 180) as desired
After the previous changes, the bev tensor still wasn't the desired size. So, I modified the first input of the cnv2d layer from 6 to 1.

I'm unsure if this change is correct and would appreciate your feedback.

Expect the re-implenmation in MMDetection3D

RuntimeError: CUDA out of memory.

It reported the following error：
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 153, in forward
x = self.blocksi
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/utils/misc.py", line 95, in forward
input = module(input)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.77 GiB total capacity; 1.52 GiB already allocated; 34.38 MiB free; 1.55 GiB reserved in total by PyTorch)
My GPU model is RTX 3070，how to fix this problem or where should I adjust the batch size in the code?

res["lidar"]["bev_map"] : axes dont match array

Hi, Thank you so much for contributing such a great work.

I followed your installation step and data generating step.

The process seems to be plain sailing:

However, when I changed the paths to my own and started to run, an error occurred:

Have you met the similar problem?

No such file or directory: 'models/FutureDetection/nusc_centerpoint_forecast_n3dtf_detection/metrics_summary.json'

EXM, I can't find the json config "metrics_summary.json" when I evaluate the det3d model.
I would appreciate it if you could provide the file additionally!
Thanks!!

AssertionError: Samples in split doesn't match samples in predictions.

Traceback (most recent call last):
File "/home/lu/Workspace/FutureDet/tools/dist_test.py", line 270, in
main()
File "/home/lu/Workspace/FutureDet/tools/dist_test.py", line 260, in main
association_oracle=args.association_oracle, postprocess=args.postprocess, nogroup=args.nogroup)
File "/home/lu/Workspace/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 838, in evaluation
nogroup=nogroup
File "/home/lu/Workspace/FutureDet/det3d/datasets/nuscenes/nusc_common.py", line 686, in eval_main
nogroup=nogroup
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/nuscenes/eval/detection/evaluate.py", line 225, in init
set(self.gt_boxes.sample_tokens))
AssertionError: Samples in split doesn't match samples in predictions.
As I the debug result shows the length of self.pred_boxes.sample_tokens is 404 doesn't match the length of self.gt_boxes.sample_tokens which is 81.

I was using the v1.0-mini dataaset, and I've changed the param in dist_test.py:
parser.add_argument("--split", default="mini_val") parser.add_argument("--version", default="v1.0-mini") parser.add_argument("--modelCheckPoint", default="latest")

This issue was is attributed to the inability to access the “val” dataset.
In the v1.0-mini dataset there are 323 training data samples and 81 testing data samples.
As I debug in the
if args.testset: print("Use Test Set") dataset = build_dataset(cfg.data.test) else: if args.split == "val" or args.split == "mini_val": print("Use Val Set") dataset = build_dataset(cfg.data.val) else: print("Use Train Set") cfg.data.val.info_path = cfg.data.val.info_path.replace("infos_val_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps), "infos_train_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps)) cfg.data.val.ann_file = cfg.data.val.info_path.replace("infos_val_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps), "infos_train_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps)) dataset = build_dataset(cfg.data.val)
However the output of the dataset.flag.shape is still 323 when I set the '--split' as "mini_val"

When the '--split' as "mini_train" is has the same output dataset.flag.shape = 323.
The result of dataset.flag.shape is supposed 81.
It seems that the cfg.data.val.info_path and cfg.data.ann_file dosen't work.
I can't figure out what could cause the problem, I'll thanks a lot if you can help.

RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

I met this problem when I run the command python train.py --experiment FutureDetection --model forecast_n0.
I've changed the voxel_size to [0.2, 0.2] in line 101, and [0.1, 0.1, 0.2] in line 162.
The detail traceback is as followed:
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 157, in forward
x = torch.cat(ups, dim=1)
RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

It was it the forward function in rpn.py:
def forward(self, x): ups = [] for i in range(len(self.blocks)): x = self.blocks[i](x) if i - self._upsample_start_idx >= 0: ups.append(self.deblocks[i - self._upsample_start_idx](x)) if len(ups) > 0: x = torch.cat(ups, dim=1) return x
The x.shape is torch.Size([1, 256, 135, 135])；
and the structure of the self.blocks network is
ModuleList( (0): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False) (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) (1): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False) (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) )
the structure of the self.deblocks network is
ModuleList( (0): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) )
The debug information is as followed

Is this issue caused by the setting of neural network parameters? Or because the change of voxel_size?

The number of NMS

Hi @neeharperi ,
Thanks for sharing your code.
I have a question about the algorithm. It seems that we need to do N+1 times NMS (for example, 7 times NMS if we predict 6 future frames) if we predict N frames before the back-casting tracking. I'd like to know if there is some method that we do not need to do many NMS.
Thanks again!

Clues to solve ValueError: res["lidar"]["bev_map"] axes don't match array

Hi, thanks for the great work. And I meet the same issue with issue#2, as following,

  File "~/Repo/FutureDet/det3d/datasets/pipelines/preprocess.py", line 221, in __call__
    res["lidar"]["bev_map"] = bev.transpose(2, 0, 1)
ValueError: axes don't match array

Although I use the v1.0-mini nuscenes data, I don't think it matters.
After my debugging, I found that the problem lies in the bev process of generating info_xxx.pkl, as follows

FutureDet/det3d/datasets/nuscenes/nusc_common.py

Lines 508 to 509 in 960115e

    
           ego_map = nusc.get_ego_centric_map(sweeps[0]["sample_data_token"]) 
        
           bev = cv2.resize(ego_map, dsize=(180, 180), interpolation=cv2.INTER_CUBIC)

FutureDet/det3d/datasets/nuscenes/nusc_common.py

Line 544 in 960115e

info["bev"] = bev

The BEV image is loaded from the sdk in your another repo python-sdk/nuscenes/nuscenes.py#L837-L842

But their dimensions are (h,w), such as ego_map.shape: (900, 900), bev.shape: (180, 180), so bev.transpose(2, 0, 1) is wrong operations for this.

Hope that can provide clues to solve the problem.

The problem about batchsize and dataloader_num

Hi,sir.My budget allows me to rent only one 3090, so I increased the batch size to 2 and set the dataloader to 8 to accelerate training using your code. However, I encountered an issue. The problem is that the training stops at 1925/13725 in the progress bar during the first epoch, but there is no error reported in the terminal. Subsequently, I tried to investigate the issue locally in PyCharm with batch size set to 1 and dataloader set to 8. This time, the error reported was 'Process finished with exit code 137 (interrupted by signal 9: SIGKILL).' Additionally, upon checking the output logs, I observed that the memory increases after every few batches. Can the dataloader for this model only be set to 4 and the batchsize only be set to 1? Did you encounter this issue during the development of the model?

Probability of trajectories

Hi
Thanks for sharing your implementation. I read your article and explored your implementation. I understand that the FutureDet network reports all predicted trajectories. Is there a method for sorting trajectories in your implementation? Is there a method that can tell me which trajectory is more likely to happen?

How to speed up training。

Hello, thank you for your excellent work. I would like to ask how to speed up training. I have a 12GB GPU. What are the suitable values for samples_per_gpu and learning rate (lr)? Currently, it takes over 7 days to train a model for the "car" category.

Problems with training

Thank you so much for your good open-source work. According to your process, I installed the relevant libraries and trained, but the error is reported as follows:

python train.py --experiment FutureDetection --model forecast_n3dtf
/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
nuScenes devkit not found!
no apex
No Tensorflow
No APEX!
2022-06-26 15:16:58,755 - INFO - Distributed training: False
2022-06-26 15:16:58,755 - INFO - torch.backends.cudnn.benchmark: False
2022-06-26 15:16:58,839 - INFO - Finish RPN Initialization
2022-06-26 15:16:58,840 - INFO - num_classes: [1]
Use HM Bias: -2.19
2022-06-26 15:16:58,902 - INFO - Finish CenterHead Initialization
Traceback (most recent call last):
File "./tools/train.py", line 140, in
main()
File "./tools/train.py", line 115, in main
datasets = [build_dataset(cfg.data.train)]
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/builder.py", line 41, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/home/featurize/work/futuredet/FutureDet/det3d/utils/registry.py", line 78, in build_from_cfg
return obj_cls(args)
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 528, in init
root_path, info_path, pipeline, test_mode=test_mode, class_names=class_names
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/custom.py", line 37, in init
self._set_group_flag()
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/custom.py", line 164, in _set_group_flag
self.flag = np.ones(len(self), dtype=np.uint8)
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 602, in len
self.load_infos(self._info_path)
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 576, in load_infos
ratios = [frac / v for v in _cls_dist.values()]
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 576, in
ratios = [frac / v for v in _cls_dist.values()]
ZeroDivisionError: float division by zero
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 33612) of binary: /environment/miniconda3/bin/python
Traceback (most recent call last):
File "/environment/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main**", mod_spec)
File "/environment/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-06-26_15:17:04
host : featurize
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 33612)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Can you give me some suggestions? Thank you very much!

	ego_map = nusc.get_ego_centric_map(sweeps[0]["sample_data_token"])
	bev = cv2.resize(ego_map, dsize=(180, 180), interpolation=cv2.INTER_CUBIC)

neeharperi / futuredet Goto Github PK

futuredet's People

Contributors

Stargazers

Watchers

Forkers

futuredet's Issues

./tools/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2022-06-26_15:17:04 host : featurize rank : 0 (local_rank: 0) exitcode : 1 (pid: 33612) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-06-26_15:17:04
host : featurize
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 33612)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html