alvinyh / faster-voxelpose Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 18.0 20.2 MB

Official implementation of Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

License: MIT License

Python 97.00% Shell 3.00%

faster-voxelpose's Introduction

Hi there 👋

faster-voxelpose's People

Contributors

Stargazers

Watchers

Forkers

wisamalsamak yulong314 wolfworld6 hagtaril miu200521358 menmitsu changken yanqi1811 gpastal24 cucdengjunli fishfishson omkarkabadagi5823 anzisheng kimactor danbmh cedhons rongjiewang

faster-voxelpose's Issues

关于训练过程中突然退出的问题

尊敬的作者，您好！感谢您在基于空间体素化的3D人体姿态估计作出的贡献，我有个问题就是：
我安装了python3.8和相关的requirements后(部分requirement版本不兼容已经替换掉了)，开始run trian.py，结果发现在提示11次以上的the bounding box isn't large sufficiently之后，直接退出，停止训练了。我通过断点debug发现在dataloader处停掉了，请问你是否知道问题所在么？我使用的显卡是2080ti 12G。

The visualization is wrong with the line camera[curr_seq][c] and curr_seq and c are both 01234 but the second parameter should be a letter like k for the shelf camera settings.

train failed,why all epoch display "NaN or Inf found in input tensor"?

problem in here:
Epoch: 10
Epoch: [10][0/13] Time: 0.613s (0.613s) Speed: 39.1 samples/s Data: 0.420s (0.420s) Loss: nan (nan) Loss_2d: 0.0019247 (0.0019247) Loss_1d: nan (nan) Loss_bbox: 0.027619 (0.027619) Loss_joint: nan (nan) Memory 0.0
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.

all of training epoch like this,how to deal with it?
i hope get the answer.thank you!

How to use a good trained model to predict multi angle photos—— Questions from new scholars

Thank you for sharing such beautiful code. I have a question. In readme, I can find the use of train, which is used to train models, and evaluate, which is used to evaluate models. But I went to predicate from there. For example, I had trained the model, and then I wanted to use three new kinds of campus photos to fuse the trained model into a 3D image. How should I operate, thank you!

Questions about "save_debug_2d_images" function

My GPU is 3090, so I cannot use the Torch1.4.0+cuda10.1 environment mentioned by the author. The environment I am using is Torch1.11.0+cuda11.3, which is the lowest version of the 30 series. I set gpus to "0" in the configuration file, num_ image is set to 2K, and there is a problem with the image shown in the above figure during operation. This problem is caused by the save_debug_2d_images function. The error prompt indicates that the location of the error is in the library.

I hope someone can help me solve this problem. Thank you very much

Defining own testcase

Hello authors,
Thank you for sharing the code of this amazing work. I am using your framework to solve a multi-person multi-camera problem in real-time environment. I wanted to ask if we need to train the model again with our own data after setting up the environment or if we could use the pre-trained model and test it on the new data.
It would be really helpful if you could help me out in this aspect.

Thank you for your time.

Source code error on calculating target_index

Where could i found the model_best.pth.tar？

Thank you very much for your perfect work.

When I would like test your code, I could not find the model_best.pth.tar. Could you please provide the model for testing.

Thanks.

Did not find the Loss_off defined in Eq(3)

The centernet only returns two outputs, which is different from that described in the paper.

Besides, I find the target has key offset. However, it is not used in any loss in the loss dict.

Is there any reason the Loss_off is omitted or my understanding is wrong.

Where can I find pretrained backbone for Shelf/Campus data?

Hi, thank you for this great work.
I'm trying to run your model on my own data, but I cannot run Shelf/Campus pretrained model because there is no pretrained backbone. The backbone for panoptic has 15 keypoints but Shelf/Campus has 17 keypoints, so I get error if I try to load it under Shelf/Campus config. Could you upload it please?

Anyone can tell me about train time?

How long does it need for training?
Thanks!

the bounding box isn't sufficiently large to cover poses!

Thanks for your work！When I run "python run/train.py --cfg configs/campus/jln64.yaml". I meet the following problem:

Can you help me to solve this problem?

Possible bugs?

Hi , I have been trying to train the model.
At first I tried the repo as is and I was getting this bug

Then I removed the part that saves the imgs ,note that it works ok when evaluating, and I was getting killed cause of RAM when I had the default 10k samples. I reduced it to 1-5k and I managed to load them succesfully. If I understand correctly in the synthetic data experiments the whole dataset is loaded into RAM.

After that I was getting CUDA OOM, the culprit seemed to be

Faster-VoxelPose/lib/core/function.py

Lines 69 to 70 in 4daaeda

    
           else: 
        
               accu_loss += (loss_2d + loss_1d + loss_bbox) / accumulation_steps

so I removed that part.

The model seemed to train at first but then I was NaN tensor and the accumulated loss was nan as well.
During validation using the debug_save_imgs function , actually works and I can see the predicted 3d poses and their projections. The thing is when I print the final_fused_poses variable, each joint is always [0,0,0,-1..].
The error for each actor is stuck at 0.0.

I have tried training both in an RTX3080 (where I cant use torch 1.4.0 since cuda 11 is the minimum version supported by RTX30xx series) and on colab where I installed the requirements from the requirements.txt file.

如何用train好的模型去predict多角度照片？——来自新学者的疑问

感谢前辈将如此优美的代码共享，我有一个疑问，在readme里面，我可以找到train的用法，这是用来训练模型，还可以找到evaluate，这是用来评价模型的好坏。可是我从那里去predicate，比如我已经训练好了模型，然后我想用三种campus照片去通过训练好的模型融合出3d图片来。我该如何操作，谢谢！

A little performance drop when running this code, ask for HigherHRnet version

Dear authors:
It is grateful to read your paper and code. when i try to run this project to reproduce your paper work. my result is dropped about 2mm, could you explain why ?

is your code responde to this setting? using [5 views; mask; weights;].

my conda environment is that, show in the picture:

my GPU is RTX3090, cuda11.3 , torch1.11.0

Regarding the 2d preds (heatmaps)

I don't know why the performance is slightly lower when I execute the code following Readme.

I continued learning by following Readme. (data preparation, preprocessing, using smae config file, etc.)
However, we found some performance degradation in all datasets.

Downloaded weights (Panoptic)
Train weights (Panoptic)
Downloaded weights (Shelf)
Train weights (Shelf)
Downloaded weights (Campus)
Train weights (Campus)

I wonder if it's because I missed something in the process. (for examples, data argument set as true?)

CampusSeq数据的一些问题

有一些关于Campus数据的问题想要请教：
原始数据下载后，每个相机的外参的旋转矩阵和代码库中下载的链接一致，但是平移向量不同，三个相机的外参平移向量分别是【[-1.787557, 1.361094, 5.226973]，[4.9229, 1.1614, 6.6849], [-4.9013, 0.5299, 11.2024]】。
而代码库中下载的相机参数的平移向量是[[1774.8953318252247,-5051.695948238737, 1923.3559877015355], [-6240.579909342256, 5247.348264374987, 1947.3802148598609], [11943.56106545541, -1803.8527374133198, 1973.3939116534714]]，这个如何解释？

How should I run the model in real-time through the camera

Hello，how should I run the model in real-time through the camera。Thank you。

visualize

I'm using Faster-VoxelPose for a project, and I'm experiencing issues during the visualization stage. I suspect these issues might be related to incorrect camera calibration. The results I'm getting are not as expected, especially when using the Shelf and Campus datasets.

Could you please advise on the correct process for camera calibration in Faster-VoxelPose to ensure accurate visualization? Are there any specific parameters or calibration techniques that I should be aware of?

Any tips or common mistakes to avoid in this process would also be highly appreciated.

Thank you for your support.

RuntimeError when trainning on multiple GPUs.

Thanks for your work! when I run "python run/train.py --cfg configs/panoptic/jln64.yaml" with a " GPUS: '0,1' " setting in jln64.yaml, I got a RuntimeError as follows:

Exception has occurred: RuntimeError
Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "run/../lib/models/voxelpose.py", line 38, in forward
bbox_preds = self.pose_net(input_heatmaps, meta, cameras, resize_transform)
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "run/../lib/models/human_detection_net.py", line 81, in forward
feature_cubes = self.project_layer(heatmaps, meta, cameras, resize_transform)
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "run/../lib/models/project_whole.py", line 80, in forward
sample_grids[c] = self.project_grid(cameras[curr_seq][c], w, h, nbins, resize_transform, device).squeeze(0)
File "run/../lib/models/project_whole.py", line 52, in project_grid
xy = do_transform(xy, resize_transform)
File "run/../lib/utils/transforms.py", line 62, in affine_transform_pts_cuda
out = torch.mm(t, torch.t(pts_homo))
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:277
File "/newerDisk/zzh/python-projects/Faster-VoxelPose/lib/core/function.py", line 42, in train_3d
cameras=cameras, resize_transform=resize_transform)
File "/newerDisk/zzh/python-projects/Faster-VoxelPose/run/train.py", line 140, in main
train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
File "/newerDisk/zzh/python-projects/Faster-VoxelPose/run/train.py", line 170, in
main()
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "run/../lib/models/voxelpose.py", line 38, in forward
bbox_preds = self.pose_net(input_heatmaps, meta, cameras, resize_transform)
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "run/../lib/models/human_detection_net.py", line 81, in forward
feature_cubes = self.project_layer(heatmaps, meta, cameras, resize_transform)
File "/newerDisk/zzh/anaconda3/envs/fvpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "run/../lib/models/project_whole.py", line 80, in forward
sample_grids[c] = self.project_grid(cameras[curr_seq][c], w, h, nbins, resize_transform, device).squeeze(0)
File "run/../lib/models/project_whole.py", line 52, in project_grid
xy = do_transform(xy, resize_transform)
File "run/../lib/utils/transforms.py", line 62, in affine_transform_pts_cuda
out = torch.mm(t, torch.t(pts_homo))
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:277

The GPUs used are two RTX 1080ti. The OS is Ubuntu 16.04.7 LTS. The CUDA version is 10.2 and pytorch version is 1.4.0.
There would be no problem if running on single GPU, but the above error will happen in the case of multiple GPUs.

Hello, if I want to input is a video file for estimation, what changes need to be made?

关于训练时中止训练的问题

Dear author, thank you for the source code. After installing Python 3.8, I installed the relevant library files according to the requirements. txt. But when I execute:
python run/train. py -- cfg configs/campus/jln64.yaml.
The code was killed before it was recycled for a generation.

visualization

The visualization is wrong with the line camera[curr_seq][c] and curr_seq and c are both 01234 but the second parameter should be a letter like k for the shelf camera settings

数据集下载不下来，

http://campar.in.tum.de/Chair/MultiHumanPose

this site can not download the datasets.

help

Training Error

I use CUDA version: 11.4 & torch version: 1.11.0.

When I train the model on Panoptic dataset, this error occured.

Traceback (most recent call last):
  File "/home/vis/projects/Faster-VoxelPose/run/train.py", line 168, in <module>
    main()
  File "/home/vis/projects/Faster-VoxelPose/run/train.py", line 138, in main
    train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
  File "/home/vis/projects/Faster-VoxelPose/run/../lib/core/function.py", line 66, in train_3d
    accu_loss.backward()
  File "/home/vis/anaconda3/envs/pose/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/vis/anaconda3/envs/pose/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: 
[torch.cuda.FloatTensor [1, 32, 1]] is at version 8; expected version 6 instead. Hint: the backtrace further above shows the operation 
that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!"

I think the problem is the code below:

if loss_joint > 0:
            optimizer.zero_grad()
            loss_joint.backward()
            optimizer.step()

        if accu_loss > 0 and (i + 1) % accumulation_steps == 0:
            optimizer.zero_grad()
            accu_loss.backward()
            optimizer.step()
            accu_loss = 0.0
        else:
            accu_loss += (loss_2d + loss_1d + loss_bbox) / accumulation_steps

There are two loss which apply backward().
If I use only one loss, then it works well. But using both make error like that.
Has anyone solved this problem?

I need to train the model myself best.pth.tar?

Doesn't it look like Best.pth.tar in repo？

RuntimeError: CUDA out of memory

When I train for less than a while it starts to give errors, can you help me see what the reason is, thank you!

3D Pose visualization

Thanks for your excellent work.
Can you provide the code about project 3D pose to the 2d Image, like Fig5 in the paper?
Thanks a lot.

Number of cameras

Hello! How many cameras did you use in the context of the article?

How do I make predictions on my own dataset?

How did you fine-tune the backbone?

Hello,
Thank you for putting together such a brilliant repository.
I want to further train the backbone as well, which I believe is from here: https://github.com/HRNet/HRNet-Human-Pose-Estimation
However, FasterVoxelPose's ResNet seems to be incompatible with HRNet because of output dimensions.
Could you also include the code you used for this part in the readme?

"(ResNet-50 pretrained on COCO dataset and finetuned jointly on Panoptic dataset and MPII)"

Alternatively, could you quickly explain how you modified it? I want to make sure that I don't run into any incompatibility issues.

Thank you again.

Inference time in the paper

Dear authors,

I conducted several tests and can't reproduce the FPS metric you mention in "Table 3: Comparison with SOTA on Panoptic".

My specs:
Python 3.6.10
torch.version: '1.4.0'
torch.version.cuda: '10.1'
GPU: Quadro RTX 6000
Batch size: 1

Example cameras, scene and frame:
Cameras list: ['00_03', '00_06', '00_12', '00_13']
Scene: 160906_pizza1
Frame: 00000347
Config: only difference: CAMERA_NUM: 4

The test I do is without data loading, just inference in the loop:

from imutils.video import FPS
fps = FPS().start()
for i in range(120):
    final_poses, poses, proposal_centers, _, input_heatmap = model(views=inputs, 
                                                                   meta=meta, 
                                                                   cameras=cameras,
                                                                   resize_transform=resize_transform)
    fps.update()

fps.stop()
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

[INFO] elasped time: 9.80
[INFO] approx. FPS: 12.24

The FPS I get is ~12, so considerably lower than ~30 that is claimed in the paper.
Am I missing something here?

Thank you!

EDIT:
I did the same test on another machine:
Python 3.6.13
torch.version: '1.7.1+cu110'
GPU: Tesla V100
Other parameters were the same

[INFO] elasped time: 11.99
[INFO] approx. FPS: 10.01

RuntimeError: CUDA error: invalid device ordinal

Thanks for your work！When I run "python run/train.py --cfg configs/campus/jln64.yaml". I meet the following problem:

Can you help me ?

How to define the axes in my own scenario

Hello I am trying to use this model in a real world setting.
Could you please explain how the axes should be defined during calibration?(I have them as x,z,y, z being the height dimension)
It appears that each camera predicts its own pose and they are in completely different positions. Nontheless the projection back to the camera works ok for each respective camera if the pred comes from that one.

Memory leak when training the network

Dear Authors,

I tried to train the network, but the memory usage on GPU keeps increasing with every iteration.

My specs:
torch: 1.4.0
torchvision: 0.5.0
cuda: 10.1

Steps to reproduce:

Pull the repo
Changed the data path in the config to my local dataset
Download pretrained backbone from voxelpose repository
set GPUS: '0' in config
Data preprocessing runs (resizing images)
Run the training: python run/train.py --cfg configs/panoptic/jln64.yaml

I tried to pinpoint the exact spot where the memory leak occurs, but it seems like it happens in different places in the network. The major increase seem to happen:

i the resnet backbone, when 1st pass out of 5 goes through self.layer1
after that a small one in encoder-decoder part of the pose-net.

I use GPUtil for checking, the memory, and put it in the following places:

` if views is not None:
input_heatmaps = torch.stack([self.backbone(view) for view in views], dim=0)
else:
input_heatmaps = torch.stack(input_heatmaps, dim=0)
print("after self.backbone", GPUtil.getGPUs()[0].memoryUsed)
batch_size = input_heatmaps[0].shape[0]

    # human detection network
    proposal_heatmaps_2d, proposal_heatmaps_1d, proposal_centers, \
                          bbox_preds = self.pose_net(input_heatmaps, meta)
    print("after self.pose_net", GPUtil.getGPUs()[0].memoryUsed)
    mask = (proposal_centers[:, :, 3] >= 0)
    
    # joint localization network
    fused_poses, poses = self.joint_net(meta, input_heatmaps, proposal_centers.detach(), mask)
    print("after self.joint_net", GPUtil.getGPUs()[0].memoryUsed)
    # compute the training loss
    if self.training:
        assert targets is not None, 'proposal ground truth not set'
        proposal2gt = proposal_centers[:, :, 3]
        proposal2gt = torch.where(proposal2gt >= 0, proposal2gt, torch.zeros_like(proposal2gt))

        # compute 2d loss of proposal heatmaps
        loss_2d = F.mse_loss(proposal_heatmaps_2d[:, 0], targets['2d_heatmaps'], reduction='mean')
        
        # unravel the 1d gt heatmaps and compute 1d loss
        matched_heatmaps_1d = torch.gather(targets['1d_heatmaps'], dim=1, index=proposal2gt.long()\
                                           .unsqueeze(2).repeat(1, 1, proposal_heatmaps_1d.shape[2]))
        loss_1d = F.mse_loss(proposal_heatmaps_1d[mask], matched_heatmaps_1d[mask], reduction='mean')
        
        # compute the loss of bbox regression, only apply supervision on gt positions
        bbox_preds = torch.gather(bbox_preds, 1, targets['index'].long().view(batch_size, -1, 1).repeat(1, 1, 2))
        loss_bbox = F.l1_loss(bbox_preds[targets['mask']], targets['bbox'][targets['mask']], reduction='mean')

        del proposal_heatmaps_2d, proposal_heatmaps_1d, bbox_preds
        
        # weighted L1 loss of joint localization
        joints_3d = torch.gather(meta[0]['joints_3d'].float(), dim=1, index=proposal2gt.long().view\
                                         (batch_size, -1, 1, 1).repeat(1, 1, self.num_joints, 3))[mask]
        joints_vis = torch.gather(meta[0]['joints_3d_vis'].float(), dim=1, index=proposal2gt.long().view\
                                         (batch_size, -1, 1).repeat(1, 1, self.num_joints))[mask].unsqueeze(2)
        loss_joint = F.l1_loss(poses[0][mask] * joints_vis, joints_3d[:, :, :2] * joints_vis, reduction="mean") +\
                     F.l1_loss(poses[1][mask] * joints_vis, joints_3d[:, :, ::2] * joints_vis, reduction="mean") +\
                     F.l1_loss(poses[2][mask] * joints_vis, joints_3d[:, :, 1:] * joints_vis, reduction="mean") +\
                     2 * F.l1_loss(fused_poses[mask] * joints_vis, joints_3d * joints_vis, reduction="mean")

        loss_dict = {
            "2d_heatmaps": loss_2d,
            "1d_heatmaps": loss_1d,
            "bbox": 0.1 * loss_bbox,
            "joint": loss_joint,
            "total": loss_2d + loss_1d + 0.1 * loss_bbox + loss_joint
        }
    else:
        loss_dict = None
    print("after lossess block", GPUtil.getGPUs()[0].memoryUsed)
    # confidence score
    fused_poses = torch.cat([fused_poses, proposal_centers[:, :, 3:5].reshape(batch_size,\
                             -1, 1, 2).repeat(1, 1, self.num_joints, 1)], dim=3)
    return fused_poses, poses, proposal_centers.detach(), loss_dict, input_heatmaps`

Readings are as follows 1st iteration:

after self.backbone 2274.0
after self.pose_net 1680.0
after self.joint_net 1680.0
after lossess block 1680.0

2nd iteration

after self.backbone 2630.0
after self.pose_net 2640.0
after self.joint_net 2640.0
after lossess block 2640.0

3rd iteration

after self.backbone 2880.0
after self.pose_net 2890.0
after self.joint_net 2890.0
after lossess block 2890.0

4th iteration

after self.backbone 3132.0
after self.pose_net 3140.0
after self.joint_net 3140.0
after lossess block 3142.0

And it goes on like this until OOM.

Did you experience similar problems?
Thank you for your help.

training problem

when I trained the model on panoptic datasets and met such problem. and I use the torch1.13, cuda 11.8.
File "/workspace/faster_voxel_pose/run/train.py", line 181, in
main()
File "/workspace/faster_voxel_pose/run/train.py", line 151, in main
train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
File "/workspace/faster_voxel_pose/run/../lib/core/function.py", line 41, in train_3d
final_poses, poses, proposal_centers, loss_dict, input_heatmap = model(views=inputs, meta=meta, targets=targets,
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/faster_voxel_pose/run/../lib/models/voxelpose.py", line 38, in forward
bbox_preds = self.pose_net(input_heatmaps, meta, cameras, resize_transform)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/faster_voxel_pose/run/../lib/models/human_detection_net.py", line 94, in forward
proposal_heatmaps_1d = self.c2c_net(torch.flatten(feature_1d, 0, 1)).view(batch_size, self.max_people, -1)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/faster_voxel_pose/run/../lib/models/cnns_1d.py", line 131, in forward
hm = self.output_hm(x)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/fx/traceback.py", line 57, in format_stack
return traceback.format_stack()
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:114.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/workspace/faster_voxel_pose/run/train.py", line 181, in
main()
File "/workspace/faster_voxel_pose/run/train.py", line 151, in main
train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
File "/workspace/faster_voxel_pose/run/../lib/core/function.py", line 71, in train_3d
accu_loss.backward()
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 1]] is at version 7; expected version 5 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Evaluate on the wild

Hi could we use this model on a never seen video?

使用gtx 3080 ti 训练 campus数据集时，batch=1 仍然 out of memory

我的训练环境如下：
python 3.7
torch 1.4
显卡 gtx3080 Ti, 显存12G。
为了节省显存，我把batch设为1，SYNTHETIC 的NUM_DATA设为1000，
运行作者提供的train.py时, 在epoch = 0时 run了一会就会报out of memory：
错误信息如下：
`Epoch: 0
Save the sampling grid in HDN for sequence synthetic
Epoch: [0][0/1000] Time: 563.691s (563.691s) Speed: 0.0 samples/s Data: 6.174s (6.174s) Loss: nan (nan) Loss_2d: 0.0008510 (0.0008510) Loss_1d: nan (nan) Loss_bbox: 0.012933 (0.012933) Loss_joint: nan (nan) Memory 292969472.0
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
helloo E:\project\Faster-VoxelPose_23_04_25\output\campus\voxelpose_50\jln64\train_00000000
Save the sampling grid in JLN for sequence synthetic
helloo E:\project\Faster-VoxelPose_23_04_25\output\campus\voxelpose_50\jln64\train_00000100
Epoch: [0][100/1000] Time: 0.078s (5.717s) Speed: 12.8 samples/s Data: 0.000s (0.064s) Loss: nan (nan) Loss_2d: 0.0008510 (nan) Loss_1d: nan (nan) Loss_bbox: 0.015552 (383651026093232355625853440229376.000000) Loss_joint: nan (nan) Memory 2886614528.0
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
helloo E:\project\Faster-VoxelPose_23_04_25\output\campus\voxelpose_50\jln64\train_00000200
Epoch: [0][200/1000] Time: 0.077s (2.913s) Speed: 13.0 samples/s Data: 0.000s (0.034s) Loss: nan (nan) Loss_2d: 0.0050989 (nan) Loss_1d: nan (nan) Loss_bbox: 0.039250 (1752363751526759131174129536860160.000000) Loss_joint: nan (nan) Memory 5246115328.0
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
helloo E:\project\Faster-VoxelPose_23_04_25\output\campus\voxelpose_50\jln64\train_00000300
Epoch: [0][300/1000] Time: 0.079s (1.973s) Speed: 12.7 samples/s Data: 0.000s (0.024s) Loss: nan (nan) Loss_2d: inf (nan) Loss_1d: nan (nan) Loss_bbox: 0.025086 (1170183103178998798672389530976256.000000) Loss_joint: nan (nan) Memory 7605616128.0
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
Epoch: [0][400/1000] Time: 0.078s (1.502s) Speed: 12.9 samples/s Data: 0.000s (0.019s) Loss: nan (nan) Loss_2d: 0.0042343 (nan) Loss_1d: nan (nan) Loss_bbox: 0.034341 (878474771048066240862024315699200.000000) Loss_joint: nan (nan) Memory 9965116928.0
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
helloo E:\project\Faster-VoxelPose_23_04_25\output\campus\voxelpose_50\jln64\train_00000400
Traceback (most recent call last):
File "E:\project\Faster-VoxelPose_23_04_25\run\train.py", line 171, in
main()
File "E:\project\Faster-VoxelPose_23_04_25\run\train.py", line 140, in main
train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
File "E:\project\Faster-VoxelPose_23_04_25\run..\lib\core\function.py", line 45, in train_3d
cameras=cameras, resize_transform=resize_transform)
File "E:\env\py37_pt140_cu101\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "E:\env\py37_pt140_cu101\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "E:\env\py37_pt140_cu101\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "E:\project\Faster-VoxelPose_23_04_25\run..\lib\models\voxelpose.py", line 38, in forward
bbox_preds = self.pose_net(input_heatmaps, meta, cameras, resize_transform)
File "E:\env\py37_pt140_cu101\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "E:\project\Faster-VoxelPose_23_04_25\run..\lib\models\human_detection_net.py", line 81, in forward
feature_cubes = self.project_layer(heatmaps, meta, cameras, resize_transform)
File "E:\env\py37_pt140_cu101\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "E:\project\Faster-VoxelPose_23_04_25\run..\lib\models\project_whole.py", line 84, in forward
cubes[i] = torch.mean(F.grid_sample(heatmaps[i], shared_sample_grid, align_corners=True), dim=0).squeeze(0)
File "E:\env\py37_pt140_cu101\lib\site-packages\torch\nn\functional.py", line 2711, in grid_sample
return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)
RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 12.00 GiB total capacity; 11.21 GiB already allocated; 0 bytes free; 11.26 GiB reserved in total by PyTorch)

Process finished with exit code 1
`
想请问各位还要什么方法可以减小训练时候的显存消耗来保证12G显存可以训练？

NAN or INF problem

Dear author,recently I tried to run your code on my sever. However, I found the 'NAN or INF found in tensor' when I start training.
I used torch==1.4.0,cudatoolkit==10.1,torchvision==0.5.0 and the others are same to the requirements.txt.
I changed GPUS: '0,1' to GPUS: '0' and NUM_DATA: 1000 to NUM_DATA: 500.
I trained the model for 30 epoch, but it still shows 'NAN or INF found in tensor'.
作者您好，最近我尝试在服务器上运行您的代码，但是始终显示 'NAN or INF found in tensor' 。
我使用的配置和您写的一样，torch==1.4.0，cudatoolkit==10.1,torchvision==0.5.0等等。
由于一些原因，我将多GPU并行处进行了修改，只有一块GPU；同时从1000减少了NUM DATA到500。
我尝试训练到了epoch 30，但是始终显示 'NAN or INF found in tensor'。
在其他人的Issues里，我看到您似乎已经解决了这个问题，这是解决问题的新代码吗？

the bounding box isn't sufficiently large to cover poses!

Dear author! When I try to debug such problem,it seems that the problem comes from the pytorch dataloader! How to solve such problem?

Failed to get CMU Panoptic dataset

Hello!
I encountered some problems and failed to get the dataset.
I try to visit domedb.perception.cs.cmu.edu/ , however, it redirects me to https://domedb.perception.cs.cmu.edu:5001/, which needs to login.

I also tried to get the dataset with the script ./scripts/getData.sh 171204_pose1_sample, and got the result as below:

Connecting to domedb.perception.cs.cmu.edu (domedb.perception.cs.cmu.edu)|128.2.220.8|:5001... connected.
ERROR: cannot verify domedb.perception.cs.cmu.edu's certificate, issued by '[email protected],CN=Synology Inc. CA,OU=Certificate Authority,O=Synology Inc.,L=Taipei,ST=Taiwan,C=TW':
  Unable to locally verify the issuer's aut

How to visualize the results for multi-person

Thank you for your perfect work. I have a question that how could i visualize the results for multi-person?

Thank you very much.

Article: deployment to the basketball court

Dear Authors,

thank you for publishing your ispiring work.
In the article you mention that you "deployed our model to a basketball court and a retail store". Did this model need any kind of retraining or finetuning to accout for new camera set-up or calibration?

Thank you

	else:
	accu_loss += (loss_2d + loss_1d + loss_bbox) / accumulation_steps