Giter Site home page Giter Site logo

futuredet's People

Contributors

neeharperi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

futuredet's Issues

using MaP

Hi,
Thank you for sharing your implementation of the FutureDet model. I ran into some issues while using the Map pipeline. Initially, I used the "nusc_centerpoint_forecast_n3dtfm_detection.py" configuration file.
The size of the bev tensor was not (1, 180, 180) as expected.
To address this, I made the following changes:
2
After making these changes, the size of the bev tensor became (1, 180, 180) as desired
After the previous changes, the bev tensor still wasn't the desired size. So, I modified the first input of the cnv2d layer from 6 to 1.

aa

I'm unsure if this change is correct and would appreciate your feedback.

RuntimeError: CUDA out of memory.

It reported the following error:
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 153, in forward
x = self.blocksi
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/utils/misc.py", line 95, in forward
input = module(input)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.77 GiB total capacity; 1.52 GiB already allocated; 34.38 MiB free; 1.55 GiB reserved in total by PyTorch)
My GPU model is RTX 3070,how to fix this problem or where should I adjust the batch size in the code?
nvidia_smi截图

res["lidar"]["bev_map"] : axes dont match array

Hi, Thank you so much for contributing such a great work.

I followed your installation step and data generating step.

The process seems to be plain sailing:
image

However, when I changed the paths to my own and started to run, an error occurred:
image

Have you met the similar problem?

AssertionError: Samples in split doesn't match samples in predictions.

Traceback (most recent call last):
File "/home/lu/Workspace/FutureDet/tools/dist_test.py", line 270, in
main()
File "/home/lu/Workspace/FutureDet/tools/dist_test.py", line 260, in main
association_oracle=args.association_oracle, postprocess=args.postprocess, nogroup=args.nogroup)
File "/home/lu/Workspace/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 838, in evaluation
nogroup=nogroup
File "/home/lu/Workspace/FutureDet/det3d/datasets/nuscenes/nusc_common.py", line 686, in eval_main
nogroup=nogroup
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/nuscenes/eval/detection/evaluate.py", line 225, in init
set(self.gt_boxes.sample_tokens))
AssertionError: Samples in split doesn't match samples in predictions.
As I the debug result shows the length of self.pred_boxes.sample_tokens is 404 doesn't match the length of self.gt_boxes.sample_tokens which is 81.

I was using the v1.0-mini dataaset, and I've changed the param in dist_test.py:
parser.add_argument("--split", default="mini_val") parser.add_argument("--version", default="v1.0-mini") parser.add_argument("--modelCheckPoint", default="latest")

This issue was is attributed to the inability to access the “val” dataset.
In the v1.0-mini dataset there are 323 training data samples and 81 testing data samples.
As I debug in the
if args.testset: print("Use Test Set") dataset = build_dataset(cfg.data.test) else: if args.split == "val" or args.split == "mini_val": print("Use Val Set") dataset = build_dataset(cfg.data.val) else: print("Use Train Set") cfg.data.val.info_path = cfg.data.val.info_path.replace("infos_val_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps), "infos_train_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps)) cfg.data.val.ann_file = cfg.data.val.info_path.replace("infos_val_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps), "infos_train_%dsweeps_withvelo_filter_True"%(cfg.data.val.nsweeps)) dataset = build_dataset(cfg.data.val)
However the output of the dataset.flag.shape is still 323 when I set the '--split' as "mini_val"
错误1
错误2
When the '--split' as "mini_train" is has the same output dataset.flag.shape = 323.
The result of dataset.flag.shape is supposed 81.
It seems that the cfg.data.val.info_path and cfg.data.ann_file dosen't work.
I can't figure out what could cause the problem, I'll thanks a lot if you can help.

RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

I met this problem when I run the command python train.py --experiment FutureDetection --model forecast_n0.
I've changed the voxel_size to [0.2, 0.2] in line 101, and [0.1, 0.1, 0.2] in line 162.
The detail traceback is as followed:
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 157, in forward
x = torch.cat(ups, dim=1)
RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

It was it the forward function in rpn.py:
def forward(self, x): ups = [] for i in range(len(self.blocks)): x = self.blocks[i](x) if i - self._upsample_start_idx >= 0: ups.append(self.deblocks[i - self._upsample_start_idx](x)) if len(ups) > 0: x = torch.cat(ups, dim=1) return x
The x.shape is torch.Size([1, 256, 135, 135]);
and the structure of the self.blocks network is
ModuleList( (0): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False) (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) (1): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False) (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) )
the structure of the self.deblocks network is
ModuleList( (0): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) )
The debug information is as followed
调试信息
Is this issue caused by the setting of neural network parameters? Or because the change of voxel_size?

The number of NMS

Hi @neeharperi ,
Thanks for sharing your code.
I have a question about the algorithm. It seems that we need to do N+1 times NMS (for example, 7 times NMS if we predict 6 future frames) if we predict N frames before the back-casting tracking. I'd like to know if there is some method that we do not need to do many NMS.
Thanks again!

Clues to solve ValueError: res["lidar"]["bev_map"] axes don't match array

Hi, thanks for the great work. And I meet the same issue with issue#2, as following,

  File "~/Repo/FutureDet/det3d/datasets/pipelines/preprocess.py", line 221, in __call__
    res["lidar"]["bev_map"] = bev.transpose(2, 0, 1)
ValueError: axes don't match array

Although I use the v1.0-mini nuscenes data, I don't think it matters.
After my debugging, I found that the problem lies in the bev process of generating info_xxx.pkl, as follows

ego_map = nusc.get_ego_centric_map(sweeps[0]["sample_data_token"])
bev = cv2.resize(ego_map, dsize=(180, 180), interpolation=cv2.INTER_CUBIC)

The BEV image is loaded from the sdk in your another repo python-sdk/nuscenes/nuscenes.py#L837-L842

But their dimensions are (h,w), such as ego_map.shape: (900, 900), bev.shape: (180, 180), so bev.transpose(2, 0, 1) is wrong operations for this.

Hope that can provide clues to solve the problem.

The problem about batchsize and dataloader_num

Hi,sir.My budget allows me to rent only one 3090, so I increased the batch size to 2 and set the dataloader to 8 to accelerate training using your code. However, I encountered an issue. The problem is that the training stops at 1925/13725 in the progress bar during the first epoch, but there is no error reported in the terminal. Subsequently, I tried to investigate the issue locally in PyCharm with batch size set to 1 and dataloader set to 8. This time, the error reported was 'Process finished with exit code 137 (interrupted by signal 9: SIGKILL).' Additionally, upon checking the output logs, I observed that the memory increases after every few batches. Can the dataloader for this model only be set to 4 and the batchsize only be set to 1? Did you encounter this issue during the development of the model?

Probability of trajectories

Hi
Thanks for sharing your implementation. I read your article and explored your implementation. I understand that the FutureDet network reports all predicted trajectories. Is there a method for sorting trajectories in your implementation? Is there a method that can tell me which trajectory is more likely to happen?

How to speed up training。

Hello, thank you for your excellent work. I would like to ask how to speed up training. I have a 12GB GPU. What are the suitable values for samples_per_gpu and learning rate (lr)? Currently, it takes over 7 days to train a model for the "car" category.

Problems with training

Thank you so much for your good open-source work. According to your process, I installed the relevant libraries and trained, but the error is reported as follows:

python train.py --experiment FutureDetection --model forecast_n3dtf
/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
nuScenes devkit not found!
no apex
No Tensorflow
No APEX!
2022-06-26 15:16:58,755 - INFO - Distributed training: False
2022-06-26 15:16:58,755 - INFO - torch.backends.cudnn.benchmark: False
2022-06-26 15:16:58,839 - INFO - Finish RPN Initialization
2022-06-26 15:16:58,840 - INFO - num_classes: [1]
Use HM Bias: -2.19
2022-06-26 15:16:58,902 - INFO - Finish CenterHead Initialization
Traceback (most recent call last):
File "./tools/train.py", line 140, in
main()
File "./tools/train.py", line 115, in main
datasets = [build_dataset(cfg.data.train)]
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/builder.py", line 41, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/home/featurize/work/futuredet/FutureDet/det3d/utils/registry.py", line 78, in build_from_cfg
return obj_cls(**args)
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 528, in init
root_path, info_path, pipeline, test_mode=test_mode, class_names=class_names
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/custom.py", line 37, in init
self._set_group_flag()
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/custom.py", line 164, in _set_group_flag
self.flag = np.ones(len(self), dtype=np.uint8)
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 602, in len
self.load_infos(self._info_path)
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 576, in load_infos
ratios = [frac / v for v in _cls_dist.values()]
File "/home/featurize/work/futuredet/FutureDet/det3d/datasets/nuscenes/nuscenes.py", line 576, in
ratios = [frac / v for v in _cls_dist.values()]
ZeroDivisionError: float division by zero
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 33612) of binary: /environment/miniconda3/bin/python
Traceback (most recent call last):
File "/environment/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/environment/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/environment/miniconda3/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-06-26_15:17:04
host : featurize
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 33612)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Can you give me some suggestions? Thank you very much!

Test split

Hi
Thank you for sharing the source code.
I tried to get results on the test split, but I could not. Is it any straightforward solution?

Machine properties

Hi
Thank you for sharing your implementation.
I have a problem with the "Prepare Data for Training and Evaluation". I've got the out-of-memory error. Can you share your machine properties?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.