Giter Site home page Giter Site logo

Comments (11)

lucasjinreal avatar lucasjinreal commented on May 20, 2024

how did u visualize it? which tool

from motionet.

Shimingyi avatar Shimingyi commented on May 20, 2024

Hi @catherineytw ,

Thanks for the question! I think you got the wrong visualization becuase of the misunderstanding of skel_in variable. When I write code, I name this variable refer to original TensorFlow code. It represents the offsets between parent joints and child joints, rather than the position in global coordinate. I found the knee and foot locates in a same place in your visualization, but based on our code, it won't happen, so I will wondering if it's the reason.

def bones2skel(bones, bone_mean, bone_std):
    unnorm_bones = bones * bone_std.unsqueeze(0) + bone_mean.repeat(bones.shape[0], 1, 1)
    skel_in = torch.zeros(bones.shape[0], 17, 3).cuda()
    skel_in[:, 1, 0] = -unnorm_bones[:, 0, 0]
    skel_in[:, 4, 0] = unnorm_bones[:, 0, 0]
    skel_in[:, 2, 1] = -unnorm_bones[:, 0, 1]
    skel_in[:, 5, 1] = -unnorm_bones[:, 0, 1]
    skel_in[:, 3, 1] = -unnorm_bones[:, 0, 2]
    skel_in[:, 6, 1] = -unnorm_bones[:, 0, 2]
    skel_in[:, 7, 1] = unnorm_bones[:, 0, 3]
    skel_in[:, 8, 1] = unnorm_bones[:, 0, 4]
    skel_in[:, 9, 1] = unnorm_bones[:, 0, 5]
    skel_in[:, 10, 1] = unnorm_bones[:, 0, 6]
    skel_in[:, 11, 0] = unnorm_bones[:, 0, 7]
    skel_in[:, 12, 0] = unnorm_bones[:, 0, 8]
    skel_in[:, 13, 0] = unnorm_bones[:, 0, 9]
    skel_in[:, 14, 0] = -unnorm_bones[:, 0, 7]
    skel_in[:, 15, 0] = -unnorm_bones[:, 0, 8]
    skel_in[:, 16, 0] = -unnorm_bones[:, 0, 9]
    return skel_in

from motionet.

catherineytw avatar catherineytw commented on May 20, 2024

Hi @catherineytw ,

Thanks for the question! I think you got the wrong visualization becuase of the misunderstanding of skel_in variable. When I write code, I name this variable refer to original TensorFlow code. It represents the offsets between parent joints and child joints, rather than the position in global coordinate. I found the knee and foot locates in a same place in your visualization, but based on our code, it won't happen, so I will wondering if it's the reason.

def bones2skel(bones, bone_mean, bone_std):
    unnorm_bones = bones * bone_std.unsqueeze(0) + bone_mean.repeat(bones.shape[0], 1, 1)
    skel_in = torch.zeros(bones.shape[0], 17, 3).cuda()
    skel_in[:, 1, 0] = -unnorm_bones[:, 0, 0]
    skel_in[:, 4, 0] = unnorm_bones[:, 0, 0]
    skel_in[:, 2, 1] = -unnorm_bones[:, 0, 1]
    skel_in[:, 5, 1] = -unnorm_bones[:, 0, 1]
    skel_in[:, 3, 1] = -unnorm_bones[:, 0, 2]
    skel_in[:, 6, 1] = -unnorm_bones[:, 0, 2]
    skel_in[:, 7, 1] = unnorm_bones[:, 0, 3]
    skel_in[:, 8, 1] = unnorm_bones[:, 0, 4]
    skel_in[:, 9, 1] = unnorm_bones[:, 0, 5]
    skel_in[:, 10, 1] = unnorm_bones[:, 0, 6]
    skel_in[:, 11, 0] = unnorm_bones[:, 0, 7]
    skel_in[:, 12, 0] = unnorm_bones[:, 0, 8]
    skel_in[:, 13, 0] = unnorm_bones[:, 0, 9]
    skel_in[:, 14, 0] = -unnorm_bones[:, 0, 7]
    skel_in[:, 15, 0] = -unnorm_bones[:, 0, 8]
    skel_in[:, 16, 0] = -unnorm_bones[:, 0, 9]
    return skel_in

Hi @catherineytw ,

Thanks for the question! I think you got the wrong visualization becuase of the misunderstanding of skel_in variable. When I write code, I name this variable refer to original TensorFlow code. It represents the offsets between parent joints and child joints, rather than the position in global coordinate. I found the knee and foot locates in a same place in your visualization, but based on our code, it won't happen, so I will wondering if it's the reason.

def bones2skel(bones, bone_mean, bone_std):
    unnorm_bones = bones * bone_std.unsqueeze(0) + bone_mean.repeat(bones.shape[0], 1, 1)
    skel_in = torch.zeros(bones.shape[0], 17, 3).cuda()
    skel_in[:, 1, 0] = -unnorm_bones[:, 0, 0]
    skel_in[:, 4, 0] = unnorm_bones[:, 0, 0]
    skel_in[:, 2, 1] = -unnorm_bones[:, 0, 1]
    skel_in[:, 5, 1] = -unnorm_bones[:, 0, 1]
    skel_in[:, 3, 1] = -unnorm_bones[:, 0, 2]
    skel_in[:, 6, 1] = -unnorm_bones[:, 0, 2]
    skel_in[:, 7, 1] = unnorm_bones[:, 0, 3]
    skel_in[:, 8, 1] = unnorm_bones[:, 0, 4]
    skel_in[:, 9, 1] = unnorm_bones[:, 0, 5]
    skel_in[:, 10, 1] = unnorm_bones[:, 0, 6]
    skel_in[:, 11, 0] = unnorm_bones[:, 0, 7]
    skel_in[:, 12, 0] = unnorm_bones[:, 0, 8]
    skel_in[:, 13, 0] = unnorm_bones[:, 0, 9]
    skel_in[:, 14, 0] = -unnorm_bones[:, 0, 7]
    skel_in[:, 15, 0] = -unnorm_bones[:, 0, 8]
    skel_in[:, 16, 0] = -unnorm_bones[:, 0, 9]
    return skel_in

Thank you for the quick response.
Based on your explanation, the parameters of the FK layer are: skeletal topology(parents), joint offset matrix (skel_in) , and the joints rotations in quaternion, am I right?

In addition, I am super curious about the adversarial rotation loss.

  1. If the training dataset is aligned, is it possible to minimize the differences between the fake and real rotations (absolute value) instead of their temporal differences?
  2. Is it possible to use cmu mocap data to train the S and Q branch and use h36m dataset in a semi-supervision style to validate the training?

And the global trajectory reconstruction is another subject that I'm interested in.
I compared the global trajectory reconstructed using the pre-trained model h36m_gt_t.pth, as shown in the gif, top row showed the global trajectory reconstructed by videopose3d (I fashioned it a little bit) and the second row was the GT and the third row was reconstructed by h36m_gt_t. As shown in the video, the man was walking in circles, but the global trajectory reconstructed by h36m_gt_t was linear. Could you be generous to explain?
S11

Any responses will be highly appreciated!
Best regards.

from motionet.

catherineytw avatar catherineytw commented on May 20, 2024

how did u visualize it? which tool

I wrote a simple interface using qglviewer and rendered 3D skeletons using opengl

from motionet.

Shimingyi avatar Shimingyi commented on May 20, 2024

@catherineytw

Thank you for the quick response. Based on your explanation, the parameters of the FK layer are: skeletal topology(parents), joint offset matrix (skel_in), and the joints rotations in quaternion, am I right?

Yes, you are right.

In addition, I am super curious about the adversarial rotation loss.

  1. If the training dataset is aligned, is it possible to minimize the differences between the fake and real rotations (absolute value) instead of their temporal differences?
  2. Is it possible to use cmu mocap data to train the S and Q branch and use h36m dataset in a semi-supervision style to validate the training?
  1. When we write this paper, we found some potential problems when we apply the supervision on rotation directly. We didn't explore this problem further. Recently, I have done more experiments on this field to get more feeling about it, and the main reason is the ambiguities of rotation. Basically, the alignment on the skeleton is not enough for getting a stable fake-real paired training but still appliable.
  2. Possible, we also got some results based on the CMU-supervised. But just like I said, the result is not stable.

And the global trajectory reconstruction is another subject that I'm interested in.

I didn't convert our prediction into world space because of some reason. In your visualization, what kind of transformation do you apply to the global rotation and global trajectory? I have my visualization script, it should align the GT very well, so I wonder if some steps are missing.

from motionet.

Shimingyi avatar Shimingyi commented on May 20, 2024

Here is how we transform the global trajectory, you can insert it after this line and get tthe correct trajectory in your visualization.

translation_world = (R.T.dot(translation.T) + T).T

Notice that, this a trajectory in world space, which cannot be combined with the predicted rotation in camera space. So we can only compare the trajectory itself, until we can convert our bvh rotation to world space. I didn't do that because of some technical issue in that time, now I am able to fix it but allow me some time to implement and test it.

from motionet.

catherineytw avatar catherineytw commented on May 20, 2024

@catherineytw

I didn't convert our prediction into world space because of some reason. In your visualization, what kind of transformation do you apply to the global rotation and global trajectory? I have my visualization script, it should align the GT very well, so I wonder if some steps are missing.

Thank you for the detailed explanation. Allow me to be more specific, the trajectories in the gif were in the camera space. Here is the bvh file I got by running the evaluation code (didn't change anything) using h36m_gt_t.pth, he didn't walk in circles and the motion jittered when he turned around. The trajectory validation example in your project page is perfect, but I cannot replicate it. I am really confused and trying to figure out what goes wrong.
S11_Walking_1_bvh.txt

Any responses will be highly appreciated!
Best regards.

from motionet.

Shimingyi avatar Shimingyi commented on May 20, 2024

@catherineytw
The trajectory is in world space in your visualization, but ours is in camera space. That's why you find they are different, and ours is not in circles. If you want to solve it, you need to take the XYZ trajectory from our method and apply the above transformation. For the trajectory prediction, there are a few points that should be claimed:

  1. We assume it's impossible to recover an absolute value of trajectory because it's an ill-posed problem from 2d to 3d. So our pipeline is a little different from VideoPose, which predicts the absolute XYZ number. In our solution, we will predict a depth factor, then assemble it with the XY movement of the root joint from the original 2d detection to recover an XYZ global trajectory. Also, because all the prediction in our method is a relative number, we will give a scaling manually to make the trajectory aligned better. In the default setting, I set it to 8 (code). It causes unnatural performance but is adjustable.
  2. So, how do we evaluate the trajectory? We only need to compare our predicted depth_facter with gt_depth_facter and recover a translation_rec and gt_translation, then visualize the translations.

In this open-source code, you can easily get it by inserting the following code after L54:

# Recover the trajectory with pre_proj
translation = (translations * np.repeat(pre_proj[0].cpu().numpy(), 3, axis=-1).reshape((-1, 3))) * 12
with open('./paper_figure/root_path_global/%s.rec.txt' % video_name, 'w') as f:
    for index1 in range(translation.shape[0]):
        f.writelines('%s %s %s\n' % (translation[index1, 0], translation[index1, 1], translation[index1, 2]))
f.close()

# Recover the trajectory with gt_proj_facters
translation = (translations * np.repeat(proj_facters[0].cpu().numpy(), 3, axis=-1).reshape((-1, 3))) * 12
with open('./paper_figure/root_path_global/%s.gt.txt' % video_name, 'w') as f:
    for index1 in range(translation.shape[0]):
        f.writelines('%s %s %s\n' % (translation[index1, 0], translation[index1, 1], translation[index1, 2]))
f.close()

Then you can compare these two trajectories. I re-run the visualization code, and was able to get these results:
20220424210759

Regarding the motion jittered problem, I am sorry it's hard to avoid based on the current framework. Using sparse 2d position as input will lose important information. Our prediction is per-framed, unlike VideoPose, which operates 243 inputs to predict 1, so they are expected to have stronger temporal coherence than ours.

from motionet.

catherineytw avatar catherineytw commented on May 20, 2024

Then you can compare these two trajectories. I re-run the visualization code, and was able to get these results: 20220424210759

Thank you for the detailed explanation. I add the code and finally get the trajectories on the left, they match, perfectly. However, I still cannot get the circular global trajectory in the world space on the right. Here are my codes, and the world trajectories looks exactly like those on the left (as shown in the figure), could you help me to figure out what's wrong?
` #-----------------------Trajectory file--------------------------
if config.arch.translation:
R, T, f, c, k, p, res_w, res_h = test_data_loader.dataset.cameras[(int(video_name.split('')[0].replace('S', '')), int(video_name.split('')[-1]))]
pose_2d_film = (poses_2d_pixel[0, :, :2].cpu().numpy() - c[:, 0]) / f[:, 0]
translations = np.ones(shape=(pose_2d_film.shape[0], 3))
translations[:, :2] = pose_2d_film

                # Recover the trajectory with pre_proj
                translation = (translations * np.repeat(pre_proj[0].cpu().numpy(), 3, axis=-1).reshape(
                    (-1, 3))) * 12
                np.save('{}/{}_rec.npy'.format(output_trajectory_path,video_name),translation)
                translation_world = (R.T.dot(translation.T) + T).T
                np.save('{}/{}_rec_world.npy'.format(output_trajectory_path, video_name), translation_world)


                # Recover the trajectory with gt_proj_facters
                translation = (translations * np.repeat(proj_facters[0].cpu().numpy(), 3, axis=-1).reshape(
                    (-1, 3))) * 12
                np.save('{}/{}_gt.npy'.format(output_trajectory_path, video_name), translation)
                translation_world = (R.T.dot(translation.T) + T).T
                np.save('{}/{}_gt_world.npy'.format(output_trajectory_path, video_name), translation_world)`

image

image

By the way, I couldn't agree with you more, precisely recovering the 3D global trajectory is impossible due to the depth ambiguity and unknown camera intrinsic/extrinsic parameters. In my opinion, as long as the projected 3d trajectory matches the 2D trajectory in the video, it is valid.

Regarding the motion jittered problem, I am sorry it's hard to avoid based on the current framework. Using sparse 2d position as input will lose important information. Our prediction is per-framed, unlike VideoPose, which operates 243 inputs to predict 1, so they are expected to have stronger temporal coherence than ours.

In terms of the jittered motion problem, I have several immature ideas and want to discuss them with you. Is it possible to add rotation angular velocity and acceleration terms to the adversarial training loss to keep the motion smooth? Or add the joint velocity and acceleration loss terms to the reconstructed skeleton? Since the network takes motion chunks as input, why not make use of the temporal information to mitigate the motion jitters?

from motionet.

Shimingyi avatar Shimingyi commented on May 20, 2024

@catherineytw
I have no idea what's wrong with the visualization on the right. Would you mind scheduling a meeting so I can know more about the inference details? Here is my email: [email protected]

In terms of the jittered motion problem, I have several immature ideas and want to discuss them with you. Is it possible to add rotation angular velocity and acceleration terms to the adversarial training loss to keep the motion smooth? Or add the joint velocity and acceleration loss terms to the reconstructed skeleton? Since the network takes motion chunks as input, why not make use of the temporal information to mitigate the motion jitters?

Adversarial loss needed to be designed carefully. Even though we found it helpful in some cases, but still hard to refine the motion to a better level. We have some experiments on VIBE, which uses adversarial learning to improve human reconstruction, but it also makes a tiny improvement. Adding velocity and acceleration loss looks great. It's also what we are considering, and it works well in some motion synthesis papers like PFNN, etc. We do use temporal information as input, but the videopose architecture underperformed with our data operation, so I gave it up and changed to the current version with adaptive pooling. Bringing neural FK into the learning process is the key idea of our paper, and the current network architecture is just appliable and has big space to be improved. I am so happy to see more great ideas based on neural FK, we can talk about it more in the meeting.

from motionet.

catherineytw avatar catherineytw commented on May 20, 2024

Would you mind scheduling a meeting so I can know more about the inference details? Here is my email: [email protected]

Thank you for your kindness. Is Thursday evening a good time to you? Sorry to say that I never used google meeting, if you don't mind, here is my wechat ID: Catherineytw, maybe we can have a conversation on Tecent Meeting?

from motionet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.