silverster98 / humanise Goto Github PK

View Code? Open in Web Editor NEW

112.0 112.0 6.0 30.6 MB

Official implementation of the NeurIPS22 paper "HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes"

Home Page: https://silverster98.github.io/HUMANISE/

License: MIT License

Python 99.68% Shell 0.32%

3d-scene-understanding deep-learning motion-generation

humanise's People

Contributors

Stargazers

Watchers

Forkers

thusiyuan xxxnhb cxhcmhhh kimx3966 rucchzy bruinxiong

humanise's Issues

Code release for APD metric

Hi, bro!
When do you prepare to release the code for APD evaluation? Thank you!

Hi, I noticed that you mentioned you use a V100 GPU with batch size of 32 in training. However, I found it's hard to set batch size of 32 with a single V100 GPU. Could you tell me more detail about your environment and training?

Why the `shuffle ` is True when testing?

https://github.com/Silverster98/HUMANISE/blob/main/eval_metric_motion.py#L14

How to render different view demo？

Hi, I noticed that you use fixed top-down view to visualize the results here. How could I modify the camera pose to get another view visualization such as looking form one corner of the room?

{scene_id}_vh_clean_2.ply doesn't have field of name label

Hi, I was running your script align_motion.py for aligning the motions with scenes, and seems that in this line labels[:] = plydata['vertex'].data['label'], you use property 'label' from {scene_id}_vh_clean_2.ply.
However, I cannot find 'label' under the ply file.
Looking forward to your reply!

The process of calculating the generation metrics is extremely slow

The bottleneck is in the SMPLX_Util.get_body_vertices_sequence, since it will load the smplx pretrained weights repeatedly. For example, there are 1319 examples of action walk, and the k == 10, then the number of times of the loading process will be 13190, making the IO time extremely long. My suggestion is: simply instance 1 smplx model with batch_size=max_motion_len, and select the unmasked smpl parameters after the inference of smplx model.
Here is my code to test the time cost of three modes:

import torch
import smplx
import mmengine as me


test_mode = 'cuda'  # cpu, cuda, cuda_static

device = 'cpu'
if test_mode in ['cuda', 'cuda_static']:
    device = 'cuda'

seq_len = 60
torch_param = dict()
torch_param['body_pose'] = torch.randn(seq_len, 63).to(device)
torch_param['betas'] = torch.randn(seq_len, 10).to(device)
torch_param['transl'] = torch.randn(seq_len, 3).to(device)
torch_param['global_orient'] = torch.randn(seq_len, 3).to(device)
torch_param['left_hand_pose'] = torch.randn(seq_len, 45).to(device)
torch_param['right_hand_pose'] = torch.randn(seq_len, 45).to(device)

static_model = smplx.create(model_path='data/models_smplx_v1_1/models',
                            model_type='smplx',
                            gender='neutral',
                            num_betas=10,
                            use_pca=False,
                            batch_size=seq_len,
                            ext='npz')

static_model = static_model.to(device)

for i in me.track_iter_progress(range(100)):
    if test_mode in ['cpu', 'cuda']:
        model = smplx.create(model_path='data/models_smplx_v1_1/models',
                             model_type='smplx',
                             gender='neutral',
                             num_betas=10,
                             use_pca=False,
                             batch_size=seq_len,
                             ext='npz').to(device)
        output = model(return_verts=True, **torch_param)
    elif test_mode == 'cuda_static':
        output = static_model(return_verts=True, **torch_param)

When test_mode = 'cpu':
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 100/100, 1.7 task/s, elapsed: 58s, ETA: 0s
When test_mode = 'cuda':
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 100/100, 1.7 task/s, elapsed: 60s, ETA: 0s
When test_mode = 'cuda_static':
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 100/100, 40.2 task/s, elapsed: 2s, ETA: 0s
40x faster when the seq_len=60, half of the max_motion_len.

About align my own motion data with HUMANISE setting

Wang, it is really a great work on this project! I’m new to it and I would like to compare HUMANISE with my work in my paper.
However, I am not sure how to align my motion data, which is generated by my motion diffusion model, with your setting. Specifically, I don't know how to align the Coordinate System of my motion with HUMANISE. if I use visualize_dataset.py with my motion data directly, the motion data I generated will have the wrong initial orientation in visualization .
Thus, could you provide me with the detail about the motion Coordinate System of HUMANISE? I would appreciate it.

My.mp4

The code implementation for more actions

Hi! In your paper, you showed some examples of generalizing to 'jump up', 'turn to', 'open' and 'place' motions. I wonder if you can provide the code of generating these types of motions? Because in data/align_motion.py (line 1798-1808), I only find support of the 4 default actions:

## sample valid position and rotation according to sit action
        if action == 'sit':
            action_align = SitAlign(self.annotations, self.instance_to_semantic, self.label_mapping, scene_path, static_scene, static_scene_label, translate_mat, body_vertices, joints_traj)
        elif action == 'stand up':
            action_align = StandUpAlign(self.annotations, self.instance_to_semantic, self.label_mapping, scene_path, static_scene, static_scene_label, translate_mat, body_vertices, joints_traj)
        elif action == 'walk':
            action_align = WalkAlign(self.annotations, self.instance_to_semantic, self.label_mapping, scene_path, static_scene, static_scene_label, translate_mat, body_vertices, joints_traj)
        elif action =='lie':
            action_align = LieAlign(self.annotations, self.instance_to_semantic, self.label_mapping, scene_path, static_scene, static_scene_label, translate_mat, body_vertices, joints_traj)
        else:
            raise Exception('Unsupport action: {}'.format(action))

Thanks!

Align motion for Scannet test split?

Hi @Silverster98

In the paper you mention that you follow Scannet's original train-test split and get 16.5k motions in 543 scenes for training and 3.1k motions in 100 scenes for testing.

How do you align motion for Scannet's test split?
In Scannet's test set, I see that every scene only has "_00_vh_clean_2.ply" files and does not have other label files such as "_00_vh_clean_2.labels.ply", ".aggregation.json" and "_vh_clean_2.0.010000.segs.json" which are needed for motion alignment.
Am I missing something?
Also I had a Minor doubt
Scannet has 707 scenes in its train split. Why does HUMANISE have 543 scene in its train split? Is it because while generating alignment motion it so happened that for some scenes, out of 10, none of the motions were selected as they could not follow the alignment constraints?

Looking forward for your reply.
Thank you!

Action-Specific Model path

I ran the evaluation script as the following but got very poor results. The avatar is just flying with meaningless motions. Is there any extra parameters setting or anything I have missed?
bash scripts/eval.sh 20220829_194320 "walk"

motion.mp4

Is it possible to align more data with scenes using the same way as you did?

Did you try to align data like open the door or something else with the scenes? Thanks for your time.

Extract motion segments from AMASS with BABEL

Hi @Silverster98,
I was running python dataset/babel_process.py --action "walk" for extracting walk motion segments. The total segments that I get, irrespective of time duration, is 6345. And the selected motions between the time range of 1sec to 4sec are 3844.
However, in the dataset which you provide, the pure_motion for walk has only 777 motion segments.

What is the time duration you set for humanise dataset? Is it 1sec to 4sec itself?
Do you perform any human/automatic checks to remove motion segments having inconsistent walk motion (as babels annotations have some error) to go from 3844 to 777?

Thanks!

Visualization

Hi Wang,
Nice work! I notice that you have visualized your generated human motion in .mp4 and .gif files. Since I want to consider your method as one of our baseline methods and analyze their qualitative comparisons. Can you provide the codes that visualize all generated poses in a .png file, like the following example?

Many Thanks!

Quantitative Evaluation

Hi, I run the following command the eval the model.

bash scripts/eval_metric.sh 20220829_194320 "walk"

I get a file named recon.json in the folder, and I have a few questions.

How could I get another generation.json with generation metrics?
How could I reproduce table2 in your paper?

Thanks!

Auxiliary loss

Hi Wang,
I noticed that you defined an action recognition loss as one of the auxiliary losses in the paper. However, I did not find this loss in the code( return ground_loss, rec_trans_loss, rec_orient_loss, rec_body_pose_loss, rec_body_mesh_loss, kl_loss). May be I did not find it?

Thx

ModuleNotFoundError: No module named 'pointops_cuda'

Hello,

I am having an error when executing : bash scripts/train.sh sit

  File "./project/HUMANISE/model/pointtransformer/pointops.py", line 7, in <module>
    import pointops_cuda
ModuleNotFoundError: No module named 'pointops_cuda'

And I find the same error in POSTECH-CVLab/point-transformer#27

But when I found the file in ./miniconda3 (like find ./miniconda3/ -name "pointops*"), I realized that I don't have this file at all.

silverster98 / humanise Goto Github PK

humanise's People

Contributors

Stargazers

Watchers

Forkers

humanise's Issues

Recommend Projects

Recommend Topics

Recommend Org