neeharperi / futuredet Goto Github PK
View Code? Open in Web Editor NEWForecasting from LiDAR via Future Object Detection. CVPR '22
Forecasting from LiDAR via Future Object Detection. CVPR '22
Hi,
Thank you for sharing your implementation of the FutureDet model. I ran into some issues while using the Map pipeline. Initially, I used the "nusc_centerpoint_forecast_n3dtfm_detection.py" configuration file.
The size of the bev tensor was not (1, 180, 180) as expected.
To address this, I made the following changes:
After making these changes, the size of the bev tensor became (1, 180, 180) as desired
After the previous changes, the bev tensor still wasn't the desired size. So, I modified the first input of the cnv2d layer from 6 to 1.
I'm unsure if this change is correct and would appreciate your feedback.
It reported the following error:
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 153, in forward
x = self.blocksi
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/utils/misc.py", line 95, in forward
input = module(input)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.77 GiB total capacity; 1.52 GiB already allocated; 34.38 MiB free; 1.55 GiB reserved in total by PyTorch)
My GPU model is RTX 3070,how to fix this problem or where should I adjust the batch size in the code?
EXM, I can't find the json config "metrics_summary.json" when I evaluate the det3d model.
I would appreciate it if you could provide the file additionally!
Thanks!!
Traceback (most recent call last):
File "/home/lu/Workspace/FutureDet/tools/dist_test.py", line 270, in
main()
File "/home/lu/Workspace/FutureDet/tools/dist_test.py", line 260, in main
association_oracle=args.association_oracle, postprocess=args.postprocess, nogroup=args.nogroup)
File "/home/lu/Workspace/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 838, in evaluation
nogroup=nogroup
File "/home/lu/Workspace/FutureDet/det3d/datasets/nuscenes/nusc_common.py", line 686, in eval_main
nogroup=nogroup
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/nuscenes/eval/detection/evaluate.py", line 225, in init
set(self.gt_boxes.sample_tokens))
AssertionError: Samples in split doesn't match samples in predictions.
As I the debug result shows the length of self.pred_boxes.sample_tokens is 404 doesn't match the length of self.gt_boxes.sample_tokens which is 81.
I was using the v1.0-mini dataaset, and I've changed the param in dist_test.py:
parser.add_argument("--split", default="mini_val") parser.add_argument("--version", default="v1.0-mini") parser.add_argument("--modelCheckPoint", default="latest")
This issue was is attributed to the inability to access the “val” dataset.
In the v1.0-mini dataset there are 323 training data samples and 81 testing data samples.
As I debug in the
if args.testset: print("Use Test Set") dataset = build_dataset(cfg.data.test) else: if args.split == "val" or args.split == "mini_val": print("Use Val Set") dataset = build_dataset(cfg.data.val) else: print("Use Train Set") cfg.data.val.info_path = cfg.data.val.info_path.replace("infos_val_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps), "infos_train_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps)) cfg.data.val.ann_file = cfg.data.val.info_path.replace("infos_val_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps), "infos_train_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps)) dataset = build_dataset(cfg.data.val)
However the output of the dataset.flag.shape is still 323 when I set the '--split' as "mini_val"
When the '--split' as "mini_train" is has the same output dataset.flag.shape = 323.
The result of dataset.flag.shape is supposed 81.
It seems that the cfg.data.val.info_path and cfg.data.ann_file dosen't work.
I can't figure out what could cause the problem, I'll thanks a lot if you can help.
I met this problem when I run the command python train.py --experiment FutureDetection --model forecast_n0.
I've changed the voxel_size to [0.2, 0.2] in line 101, and [0.1, 0.1, 0.2] in line 162.
The detail traceback is as followed:
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 157, in forward
x = torch.cat(ups, dim=1)
RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)
It was it the forward function in rpn.py:
def forward(self, x): ups = [] for i in range(len(self.blocks)): x = self.blocks[i](x) if i - self._upsample_start_idx >= 0: ups.append(self.deblocks[i - self._upsample_start_idx](x)) if len(ups) > 0: x = torch.cat(ups, dim=1) return x
The x.shape is torch.Size([1, 256, 135, 135]);
and the structure of the self.blocks network is
ModuleList( (0): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False) (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) (1): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False) (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) )
the structure of the self.deblocks network is
ModuleList( (0): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) )
The debug information is as followed
Is this issue caused by the setting of neural network parameters? Or because the change of voxel_size?
Hi @neeharperi ,
Thanks for sharing your code.
I have a question about the algorithm. It seems that we need to do N+1 times NMS (for example, 7 times NMS if we predict 6 future frames) if we predict N frames before the back-casting tracking. I'd like to know if there is some method that we do not need to do many NMS.
Thanks again!
Hi, thanks for the great work. And I meet the same issue with issue#2, as following,
File "~/Repo/FutureDet/det3d/datasets/pipelines/preprocess.py", line 221, in __call__
res["lidar"]["bev_map"] = bev.transpose(2, 0, 1)
ValueError: axes don't match array
Although I use the v1.0-mini nuscenes data, I don't think it matters.
After my debugging, I found that the problem lies in the bev process of generating info_xxx.pkl
, as follows
FutureDet/det3d/datasets/nuscenes/nusc_common.py
Lines 508 to 509 in 960115e
The BEV image is loaded from the sdk in your another repo python-sdk/nuscenes/nuscenes.py#L837-L842
But their dimensions are (h,w), such as ego_map.shape: (900, 900)
, bev.shape: (180, 180)
, so bev.transpose(2, 0, 1)
is wrong operations for this.
Hope that can provide clues to solve the problem.
Hi,sir.My budget allows me to rent only one 3090, so I increased the batch size to 2 and set the dataloader to 8 to accelerate training using your code. However, I encountered an issue. The problem is that the training stops at 1925/13725 in the progress bar during the first epoch, but there is no error reported in the terminal. Subsequently, I tried to investigate the issue locally in PyCharm with batch size set to 1 and dataloader set to 8. This time, the error reported was 'Process finished with exit code 137 (interrupted by signal 9: SIGKILL).' Additionally, upon checking the output logs, I observed that the memory increases after every few batches. Can the dataloader for this model only be set to 4 and the batchsize only be set to 1? Did you encounter this issue during the development of the model?
Hi
Thanks for sharing your implementation. I read your article and explored your implementation. I understand that the FutureDet network reports all predicted trajectories. Is there a method for sorting trajectories in your implementation? Is there a method that can tell me which trajectory is more likely to happen?
Hello, thank you for your excellent work. I would like to ask how to speed up training. I have a 12GB GPU. What are the suitable values for samples_per_gpu and learning rate (lr)? Currently, it takes over 7 days to train a model for the "car" category.
Thank you so much for your good open-source work. According to your process, I installed the relevant libraries and trained, but the error is reported as follows:
python train.py --experiment FutureDetection --model forecast_n3dtf
/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
Can you give me some suggestions? Thank you very much!
Hi, nice work, Can you tell me when will you update the document to run this code?
Hi
Thank you for sharing the source code.
I tried to get results on the test split, but I could not. Is it any straightforward solution?
Hi
Thank you for sharing your implementation.
I have a problem with the "Prepare Data for Training and Evaluation". I've got the out-of-memory error. Can you share your machine properties?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.