megvii-basedetection / bevdepth Goto Github PK
View Code? Open in Web Editor NEWOfficial code for BEVDepth.
License: MIT License
Official code for BEVDepth.
License: MIT License
Dear authors,
Thank you for your excellent work! I have some questions about the convnext model. You provided very helpful information in issue #20 and I have a few more questions about this.
Thank you again for your great contribution to this area!
Hello, thanks for sharing your code.
I see that you conduct data augmentation by randomly sampling time intervals in previous frames.
I wonder whether you can explain some details about it, like the longest time interval allowed, sample key frames or sweeps, and how to set the sampling interval when doing inference
BEVDepth-pure use Which version of ConvNeXt and whether to pretrain on the ImageNet datasets or other datasets?
Compared with the ResNet-50 model, do you only use ResNet-100 as the image backbone and keep other configs (i.e. bev resulution) unchanged or do you also change other configs?
How to perform validation at each training iteration?
Thank you.
How to resume from the checkpoint if the training is interrupted?
'img_backbone_conf': dict( type='RegNet', arch='regnetx_1.6gf', frozen_stages=0, out_indices=[0, 1, 2, 3], norm_eval=False, init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://regnetx_1.6gf'), ), 'img_neck_conf': dict( type='SECONDFPN', in_channels=[72, 168, 408, 912], # [256, 512, 1024, 2048] upsample_strides=[0.25, 0.5, 1, 2], out_channels=[128, 128, 128, 128], ),
Hi,
Thanks for sharing your code.
When I run the code, I met the error bellow?
Can you help me?
My environment is that:
cuda==11.2
pytorch=1.9.0
File "/home/CN/zizhang.wu/anaconda3/envs/bevdepth_wxq/lib/python3.7/site-packages/mmcv/cnn/bricks/conv.py", line 42, in build_conv_layer
layer = conv_layer(*args, **kwargs, **cfg_)
File "/home/CN/zizhang.wu/anaconda3/envs/bevdepth_wxq/lib/python3.7/site-packages/mmcv/ops/deform_conv.py", line 347, in __init__
super(DeformConv2dPack, self).__init__(*args, **kwargs)
File "/home/CN/zizhang.wu/anaconda3/envs/bevdepth_wxq/lib/python3.7/site-packages/mmcv/utils/misc.py", line 330, in new_func
output = old_func(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'im2col_step'
Just struggling to install the pytorch, torchvision, cudatoolkit, and so on. PLEASE help indicate the version.
Hello, upon training a new model and running evaluation using the nuscenes trainval, I run in to this error:
Traceback (most recent call last): File "./exps/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da.py", line 156, in <module> run_cli() File "./exps/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da.py", line 152, in run_cli main(args) File "./exps/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da.py", line 120, in main trainer.test(model, ckpt_path=args.ckpt_path) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 698, in _call_and_handle_interrupt self.training_type_plugin.reconciliate_processes(traceback.format_exc()) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 533, in reconciliate_processes raise DeadlockDetectedException(f"DeadLock detected from rank: {self.global_rank} \n {trace}") pytorch_lightning.utilities.exceptions.DeadlockDetectedException: DeadLock detected from rank: 0 Traceback (most recent call last): File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl results = self._run(model, ckpt_path=self.tested_ckpt_path) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run self._dispatch() File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch self.training_type_plugin.start_evaluating(self) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 207, in start_evaluating self._results = trainer.run_stage() File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1286, in run_stage return self._run_evaluate() File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1334, in _run_evaluate eval_loop_results = self._evaluation_loop.run() File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 151, in run output = self.on_run_end() File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 131, in on_run_end self._evaluation_epoch_end(outputs) File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 231, in _evaluation_epoch_end model.test_epoch_end(outputs) File "/data/bny220000/projects/bevdepth/exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 367, in test_epoch_end self.evaluator.evaluate(all_pred_results, all_img_metas) File "/data/bny220000/projects/bevdepth/evaluators/det_mv_evaluators.py", line 212, in evaluate self._evaluate_single(result_files[name]) File "/data/bny220000/projects/bevdepth/evaluators/det_mv_evaluators.py", line 90, in _evaluate_single nusc_eval = NuScenesEval(nusc, File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/nuscenes/eval/detection/evaluate.py", line 80, in __init__ self.pred_boxes, self.meta = load_prediction(self.result_path, self.cfg.max_boxes_per_sample, DetectionBox, File "/data/bny220000/projects/bevdepth/venv/lib/python3.8/site-packages/nuscenes/eval/common/loaders.py", line 47, in load_prediction assert len(all_results.boxes[sample_token]) <= max_boxes_per_sample, \ AssertionError: Error: Only <= 500 boxes per sample allowed!
Hi~ The visualazation in our paper is quite awesom. However, I haven't found any codes with visualization in our repo. Could you please give the visualization codes here? Thank you
Hello, thank you for your great work.
I have a question regarding acceptible arguments of your config file.
Is it possible to set the load_interval
in config file like mmdetection?
If it's possible, where should I specify the argument?
Thank you for your reply.
line 76 of bev_depth_head.py (self.neck = build_neck(bev_neck_conf) ): 'SECONDFPN is not in the model registry'
Thanks to the great job!
However, I retrained the model with config: "bev_depth_lss_r50_256x704_128x128_24e.py' absolutely follow the Readme, the depth loss is about 8 and the detection loss is about 6. When is test with the latest checkpoint, the NDS is only about 35% and the mAP is about 31%. Is it normal? or are there any other problems?
Thanks for your excellent work!
I use this command to train net: python [EXP_PATH] --amp_backend native -b 8 --gpus 8
But only gpu 0 works.
How to use multiple gpus to train net.
hello! thanks for your great works!
when i go into source code, i found "sweep", "ida" and "bda". these maybe something we have not seen in LSS. can you kindly explain the meaning of the three concepts? thanks!
--- update ---
sorry for my not familiar with nuscenes data. i google and found sweep images are not annotated intermediate frames (the counterpart of annotated keyframe). maybe the first sweep image (index 0 of sweep axis) in your code is keyframe. is that correct?
but i still not know about "ida" and "bda". hope you can kindly make some explanation for them. thanks!
for example in LSSFPN
def forward(self,
sweep_imgs,
mats_dict,
timestamps=None,
is_return_depth=False):
"""Forward function.
Args:
sweep_imgs(Tensor): Input images with shape of (B, num_sweeps,
num_cameras, 3, H, W).
mats_dict(dict):
# ...
ida_mats(Tensor): Transformation matrix for ida with
shape of (B, num_sweeps, num_cameras, 4, 4).
# ...
bda_mat(Tensor): Rotation matrix for bda with shape
of (B, 4, 4).
# ...
"""
batch_size, num_sweeps, num_cams, num_channels, img_height, \
img_width = sweep_imgs.shape
# ... ...
have you try the swim transformer tiny as the image backbone like the bevdet-4d ?
I wonder if this change can improve the proformance
Hi,@yinchimaoliang. Thank you for the great work. Could you kindly provide the per class performance of ablation study in Table 1 in the paper? I'm quite interested in it. Lots of thanks!
Hi, Yinhao, thanks for the great work. Sorry to bother you. I wonder if the code support or will support to do training and evaluation on Waymo dataset.. Could you please give me some advice?
Hi, thanks for opening access the nice work!
I have a question about the meaning of "ida" that shows up a lot in the dataset script e.g. ida_transformation , and does it mean transformation among different view cameras?
Hi, I have been reading this code for some time, but I fail to find any multi-frame fusion config. All I have found is num_sweeps=1:
Does it mean that now this code doesn't support multi-frame fusion yet? I wonder if in the comparison with SOTA results on nuscenes val set (0.351mAP and 0.475NDS for R50) and test set (0.503mAP and 0.600NDS for which backbone?) in the paper, BEVDepth has used multi-frame fusion? Or in the leaderboard? If so, could you please give me some advice on how to reproduce the paper results or the leaderboard results?
For the ResNet101 validation model in the paper, what BEV resolution do you use?
Could help indicate the difference in parameter settings in exp?
hi, I found that the implementation of this repo uses the ASPP module
I am curious about the ablation study of this module which is not in the manuscript~
Does it has been considered in the ablation study of Depth Correction in Table 1?
self.depth_channels = int( (self.dbound[1] - self.dbound[0]) / self.dbound[2])
I use the bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da.py default config to reproduce the reuslt on the 8 A100 80G GPUS . Train loss explodes in last epoches, the follow picture show the details:
Another a experiments have the same problems on 4 V100 32G gpus with set 40 epoches to train.
python3 exps/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da.py --amp_backend native -b 8 --gpus 4
Hi, would it be possible to have the training log?
In voxel pooling there is a ranking system that separates all voxel locations from each other. It is calculated as follows in python code:
ranks = geom_feats[:, 0] * (self.nx[1] * self.nx[2] * B) \
+ geom_feats[:, 1] * (self.nx[2] * B) \
+ geom_feats[:, 2] * B \
+ geom_feats[:, 3]
feat [0] has z axis, [1] has y axis and [2] has x axis information [3] contains batch number. Variable nx contains limits on x,y,z axes.
The cuda code for accelerated voxel pooling has a missing multiplication. This might be a mistake, or I might be missing something:
atomicAdd(
&output_features[(batch_idx * num_voxel_y * num_voxel_x + y * num_voxel_x + x) * num_channels + channel_idx],
input_features[pt_idx * num_channels + channel_idx]);
z coordinate is only 1 therefore it is encoded with (batch * max_x * max_y) but then y coordinate and x coordinate is not multiplied with batch index. And there is no addition of batch index as is in the case of python code.
Is this a mistake?
Thanks,
Cem
I saw that the CBGS experiment config was updated to have 1e-2 weight decay - should the non-CBGS config and the single-frame config also have 1e-2 instead of 1e-7?
BEVDepth/exps/bev_depth_lss_r50_256x704_128x128_24e.py
Lines 368 to 373 in 72116c3
My version is :
python 3.7
cuda 11.1
torch==1.9.0+cu111
torchvision==0.10.0+cu111
pytorch-lightning: 1.6.5
mmcv-full 1.5.0
mmdet 2.25.0
mmdet3d 0.18.0
mmsegmentation 0.26.0
To use bfloat16 with native amp you must install torch greater or equal to 1.10.
in trainer = pl.Trainer.from_argparse_args(args, callbacks=[ema_callback])
could you tell us the exact version?
You mentioned the paper "Is pseudo-lidar needed for monocular 3d object detection".
But do you have comparisons/ablations in terms of the way you use camera intrinsics?
Thanks!
Hello, thank you for open-sourcing your code.
For the released model, it seems like you used a previous sweep aggregation. I can't seem to track where the sweep_idxes and num_sweeps are specified; they just seem to default to none here:
BEVDepth/exps/bev_depth_lss_r50_256x704_128x128_24e.py
Lines 222 to 223 in 0f55c61
Should some command-line parameters be specified?
Thank you!
Hello, I have been struggling to recreate the results shown on the detection leaderboard. Can someone provide me with the specific information to recreate?
Thanks for your great work. I try to reproduce the training process but stuck by version of pytorch_lightning. There are a series of errors like:
TypeError: can't pickle dict_keys objects
ValueError: You selected an invalid accelerator name: accelerator='ddp'
. Available names are: cpu, cuda, hpu, ipu, mps, tpu.
Which version is the stable one for your work?
Can big olds give me some hints? Thanks
BEVDepth/dataset/nusc_mv_det_dataset.py
Line 75 in a030aae
ida_mat[:2, 2] = ida_tran
.Hi, I want to know the difference between sweep_frame and key_frame.
Hi,
Great work!
The script is that!
' python exps/bev_depth_lss_r50_256x704_128x128_24e.py -b 2 --gpus 2
I met the error below that!
‘ Missing logger folder: outputs/bev_depth_lss_r50_256x704_128x128_24e/lightning_logs Missing logger folder: outputs/bev_depth_lss_r50_256x704_128x128_24e/lightning_logs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3] LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3] Traceback (most recent call last): File "exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 489, in <module> run_cli() File "exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 485, in run_cli main(args) File "exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 454, in main trainer.fit(model) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1221, in _run self._call_callback_hooks("on_fit_start") File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks fn(self, self.lightning_module, *args, **kwargs) File "/DATA/xiaoquan.wang/BEVDepth/callbacks/ema.py", line 77, in on_fit_start for model_ref in trainer.model.model.modules(): File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'DistributedDataParallel' object has no attribute 'model' Traceback (most recent call last): File "/DATA/xiaoquan.wang/BEVDepth/exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 489, in <module> run_cli() File "/DATA/xiaoquan.wang/BEVDepth/exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 485, in run_cli main(args) File "/DATA/xiaoquan.wang/BEVDepth/exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 454, in main trainer.fit(model) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1221, in _run self._call_callback_hooks("on_fit_start") File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks fn(self, self.lightning_module, *args, **kwargs) File "/DATA/xiaoquan.wang/BEVDepth/callbacks/ema.py", line 77, in on_fit_start for model_ref in trainer.model.model.modules(): File "/home/nio/anaconda3/envs/surrounddepth/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'DistributedDataParallel' object has no attribute 'model'
‘
I haven't seen your visulization in your code which was illustrated in your paper figure, could you give some tips?
It's only change to the follow setting?
`'cams': [
'CAM_FRONT',
],
'Ncams':
1,`
`def safe_log10(x, eps=1e-10):
result = np.where(x > eps, x, -10)
np.log10(result, out=result, where=result > 0)
return result
def safe_log(x, eps=1e-5):
return np.log(x+eps)
def calculate(gt, pred):
if gt.shape[0] == 0:
return np.nan, np.nan, np.nan, np.nan, np.nan, np.nan
#thresh = np.maximum((gt / pred), (pred / gt))
#a1 = (thresh < 1.25).mean()
#a2 = (thresh < 1.25 ** 2).mean()
#a3 = (thresh < 1.25 ** 3).mean()
# abs_rel absolute relative error
abs_rel = np.mean(np.divide(np.abs(gt - pred), gt, out=np.zeros_like(pred), where=gt!=0))
sq_rel = np.mean(np.divide(((gt - pred) ** 2), gt, out=np.zeros_like(pred), where=gt!=0))
rmse = (gt - pred) ** 2
rmse = np.sqrt(rmse.mean())
rmse_log = (safe_log(gt) - safe_log(pred)) ** 2
rmse_log = np.sqrt(rmse_log.mean())
# to compute SILog metric
err = safe_log(pred) - safe_log(gt)
silog = np.sqrt(np.mean(err ** 2) - np.mean(err) ** 2) * 100
if np.isnan(silog):
silog = 0
log_10 = (np.abs(safe_log10(gt) - safe_log10(pred))).mean()
logger.info('abs_rel: {}\t rmse: {}\t log_10: {}\t rmse_log: {}\t silog: {}\t sq_rel: {}\t'.format(abs_rel, rmse, log_10, rmse_log, silog, sq_rel))
return [abs_rel, rmse, log_10, rmse_log, silog, sq_rel]
`
` def eval_step(self, batch, batch_idx, prefix: str):
(sweep_imgs, mats, _, img_metas, _, gt_labels, depth_labels) = batch
#(sweep_imgs, mats, _, img_metas, _, _) = batch
if torch.cuda.is_available():
for key, value in mats.items():
mats[key] = value.cuda()
sweep_imgs = sweep_imgs.cuda()
gt_labels = [gt_label.cuda() for gt_label in gt_labels]
preds, depth_preds = self.model(sweep_imgs, mats)
if len(depth_labels.shape) == 5:
depth_labels = depth_labels[:, 0, ...]
depth_labels = self.get_downsampled_gt_depth(depth_labels.cuda())
depth_preds = depth_preds.permute(0, 2, 3, 1).contiguous().view(-1, self.depth_channels)
fg_mask = torch.max(depth_labels, dim=1).values > 0.0
depth_result = calculate(depth_labels[fg_mask].cpu().numpy(), np.round(depth_preds[fg_mask].cpu().numpy(), 2))
if isinstance(self.model, torch.nn.parallel.DistributedDataParallel):
results = self.model.module.get_bboxes(preds, img_metas)
else:
results = self.model.get_bboxes(preds, img_metas)
for i in range(len(results)):
results[i][0] = results[i][0].tensor.detach().cpu().numpy()
results[i][1] = results[i][1].detach().cpu().numpy()
results[i][2] = results[i][2].detach().cpu().numpy()
results[i].append(img_metas[i])
return results`
The following result is different from your paper ablation experiment with your pretrained bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da.pth weight.
I noticed the experiment about replacing the image view transform depth with point cloud in the introduction in the paper. I wonder if this part is similar to the depth supervision part in BEVdepth (only depth supervision for keyframes)? Do you only replace the depth of keyframes, or replace all frames (including non-keyframes and keyframes) with the depth GT of the point cloud? There should be a synchronization problem of point cloud and image in this process?
python ./exps/bev_depth_lss_r50_256x704_128x128_24e.py --amp_backend native -b 1 --gpus 1
gives error
Traceback (most recent call last): File "/mnt/data/home/bny220000/projects/bevdepth/./exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 486, in <module> run_cli() File "/mnt/data/home/bny220000/projects/bevdepth/./exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 482, in run_cli main(args) File "/mnt/data/home/bny220000/projects/bevdepth/./exps/bev_depth_lss_r50_256x704_128x128_24e.py", line 452, in main trainer.fit(model) File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1221, in _run self._call_callback_hooks("on_fit_start") File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks fn(self, self.lightning_module, *args, **kwargs) File "/mnt/data/home/bny220000/projects/bevdepth/callbacks/ema.py", line 83, in on_fit_start trainer.ema_model = ModelEMA(trainer.model.cuda(), 0.9990) File "/mnt/data/home/bny220000/projects/bevdepth/callbacks/ema.py", line 44, in __init__ self.ema = deepcopy( File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 296, in _reconstruct value = deepcopy(value, memo) File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib/python3.9/copy.py", line 272, in _reconstruct y.__setstate__(state) File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 580, in __setstate__ parameters, expect_sparse_gradient = self._build_params_for_reducer() File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 593, in _build_params_for_reducer modules_and_parameters = [ File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 594, in <listcomp> [ File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 597, in <listcomp> for parameter in [ File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 597, in <listcomp> for parameter in [ File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1470, in named_parameters for elem in gen: File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1415, in _named_members members = get_members_fn(module) File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1468, in <lambda> lambda module: module._parameters.items(), File "/mnt/data/home/bny220000/projects/bevdepth/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'LightningDistributedModule' object has no attribute '_parameters'
The config params of 640x1600 model is as follows:
final_dim = (640, 1600)
backbone_conf['final_dim'] = final_dim
ida_aug_conf['final_dim'] = final_dim
ida_aug_conf['resize_lim'] = (0.94, 1.25)
The inference results is strange, only the instance in CAM_BACK fov seems to be normal, while objects in other cam's fov tend to miss a certain scale
python: /opt/conda/conda-bld/magma-cuda111_1605822518874/work/interface_cuda/interface.cpp:901: void magma_queue_create_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int): Assertion `queue->dCarray__ != __null’ failed.
Has anyone encountered this problem? The problem occurs more randomly at different epochs. When it occurs, the training stops, but the memory is not automatically released.
My Environment:
CUDA 11.2
cudatoolkit 11.1
torch 1.9.1+cu111
pytorch-lightning 1.6.0
python 3.7.13
mmdet3d 1.0.0rc4
mmcv 1.6.0
mmcv-full 1.6.1
mmsegmentation 0.27.0
I changed final_dim value of the bev_depth_lss_r50_256x704_128x128_24e.py code:
final_dim = (896, 1600)
The train loss is normal, but the metrices is very low. How to train model of largger scale as input size?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.