Comments (11)
how did u visualize it? which tool
from motionet.
Hi @catherineytw ,
Thanks for the question! I think you got the wrong visualization becuase of the misunderstanding of skel_in
variable. When I write code, I name this variable refer to original TensorFlow code. It represents the offsets between parent joints and child joints, rather than the position in global coordinate. I found the knee and foot locates in a same place in your visualization, but based on our code, it won't happen, so I will wondering if it's the reason.
def bones2skel(bones, bone_mean, bone_std):
unnorm_bones = bones * bone_std.unsqueeze(0) + bone_mean.repeat(bones.shape[0], 1, 1)
skel_in = torch.zeros(bones.shape[0], 17, 3).cuda()
skel_in[:, 1, 0] = -unnorm_bones[:, 0, 0]
skel_in[:, 4, 0] = unnorm_bones[:, 0, 0]
skel_in[:, 2, 1] = -unnorm_bones[:, 0, 1]
skel_in[:, 5, 1] = -unnorm_bones[:, 0, 1]
skel_in[:, 3, 1] = -unnorm_bones[:, 0, 2]
skel_in[:, 6, 1] = -unnorm_bones[:, 0, 2]
skel_in[:, 7, 1] = unnorm_bones[:, 0, 3]
skel_in[:, 8, 1] = unnorm_bones[:, 0, 4]
skel_in[:, 9, 1] = unnorm_bones[:, 0, 5]
skel_in[:, 10, 1] = unnorm_bones[:, 0, 6]
skel_in[:, 11, 0] = unnorm_bones[:, 0, 7]
skel_in[:, 12, 0] = unnorm_bones[:, 0, 8]
skel_in[:, 13, 0] = unnorm_bones[:, 0, 9]
skel_in[:, 14, 0] = -unnorm_bones[:, 0, 7]
skel_in[:, 15, 0] = -unnorm_bones[:, 0, 8]
skel_in[:, 16, 0] = -unnorm_bones[:, 0, 9]
return skel_in
from motionet.
Hi @catherineytw ,
Thanks for the question! I think you got the wrong visualization becuase of the misunderstanding of
skel_in
variable. When I write code, I name this variable refer to original TensorFlow code. It represents the offsets between parent joints and child joints, rather than the position in global coordinate. I found the knee and foot locates in a same place in your visualization, but based on our code, it won't happen, so I will wondering if it's the reason.def bones2skel(bones, bone_mean, bone_std): unnorm_bones = bones * bone_std.unsqueeze(0) + bone_mean.repeat(bones.shape[0], 1, 1) skel_in = torch.zeros(bones.shape[0], 17, 3).cuda() skel_in[:, 1, 0] = -unnorm_bones[:, 0, 0] skel_in[:, 4, 0] = unnorm_bones[:, 0, 0] skel_in[:, 2, 1] = -unnorm_bones[:, 0, 1] skel_in[:, 5, 1] = -unnorm_bones[:, 0, 1] skel_in[:, 3, 1] = -unnorm_bones[:, 0, 2] skel_in[:, 6, 1] = -unnorm_bones[:, 0, 2] skel_in[:, 7, 1] = unnorm_bones[:, 0, 3] skel_in[:, 8, 1] = unnorm_bones[:, 0, 4] skel_in[:, 9, 1] = unnorm_bones[:, 0, 5] skel_in[:, 10, 1] = unnorm_bones[:, 0, 6] skel_in[:, 11, 0] = unnorm_bones[:, 0, 7] skel_in[:, 12, 0] = unnorm_bones[:, 0, 8] skel_in[:, 13, 0] = unnorm_bones[:, 0, 9] skel_in[:, 14, 0] = -unnorm_bones[:, 0, 7] skel_in[:, 15, 0] = -unnorm_bones[:, 0, 8] skel_in[:, 16, 0] = -unnorm_bones[:, 0, 9] return skel_in
Hi @catherineytw ,
Thanks for the question! I think you got the wrong visualization becuase of the misunderstanding of
skel_in
variable. When I write code, I name this variable refer to original TensorFlow code. It represents the offsets between parent joints and child joints, rather than the position in global coordinate. I found the knee and foot locates in a same place in your visualization, but based on our code, it won't happen, so I will wondering if it's the reason.def bones2skel(bones, bone_mean, bone_std): unnorm_bones = bones * bone_std.unsqueeze(0) + bone_mean.repeat(bones.shape[0], 1, 1) skel_in = torch.zeros(bones.shape[0], 17, 3).cuda() skel_in[:, 1, 0] = -unnorm_bones[:, 0, 0] skel_in[:, 4, 0] = unnorm_bones[:, 0, 0] skel_in[:, 2, 1] = -unnorm_bones[:, 0, 1] skel_in[:, 5, 1] = -unnorm_bones[:, 0, 1] skel_in[:, 3, 1] = -unnorm_bones[:, 0, 2] skel_in[:, 6, 1] = -unnorm_bones[:, 0, 2] skel_in[:, 7, 1] = unnorm_bones[:, 0, 3] skel_in[:, 8, 1] = unnorm_bones[:, 0, 4] skel_in[:, 9, 1] = unnorm_bones[:, 0, 5] skel_in[:, 10, 1] = unnorm_bones[:, 0, 6] skel_in[:, 11, 0] = unnorm_bones[:, 0, 7] skel_in[:, 12, 0] = unnorm_bones[:, 0, 8] skel_in[:, 13, 0] = unnorm_bones[:, 0, 9] skel_in[:, 14, 0] = -unnorm_bones[:, 0, 7] skel_in[:, 15, 0] = -unnorm_bones[:, 0, 8] skel_in[:, 16, 0] = -unnorm_bones[:, 0, 9] return skel_in
Thank you for the quick response.
Based on your explanation, the parameters of the FK layer are: skeletal topology(parents), joint offset matrix (skel_in) , and the joints rotations in quaternion, am I right?
In addition, I am super curious about the adversarial rotation loss.
- If the training dataset is aligned, is it possible to minimize the differences between the fake and real rotations (absolute value) instead of their temporal differences?
- Is it possible to use cmu mocap data to train the S and Q branch and use h36m dataset in a semi-supervision style to validate the training?
And the global trajectory reconstruction is another subject that I'm interested in.
I compared the global trajectory reconstructed using the pre-trained model h36m_gt_t.pth, as shown in the gif, top row showed the global trajectory reconstructed by videopose3d (I fashioned it a little bit) and the second row was the GT and the third row was reconstructed by h36m_gt_t. As shown in the video, the man was walking in circles, but the global trajectory reconstructed by h36m_gt_t was linear. Could you be generous to explain?
Any responses will be highly appreciated!
Best regards.
from motionet.
how did u visualize it? which tool
I wrote a simple interface using qglviewer and rendered 3D skeletons using opengl
from motionet.
Thank you for the quick response. Based on your explanation, the parameters of the FK layer are: skeletal topology(parents), joint offset matrix (skel_in), and the joints rotations in quaternion, am I right?
Yes, you are right.
In addition, I am super curious about the adversarial rotation loss.
- If the training dataset is aligned, is it possible to minimize the differences between the fake and real rotations (absolute value) instead of their temporal differences?
- Is it possible to use cmu mocap data to train the S and Q branch and use h36m dataset in a semi-supervision style to validate the training?
- When we write this paper, we found some potential problems when we apply the supervision on rotation directly. We didn't explore this problem further. Recently, I have done more experiments on this field to get more feeling about it, and the main reason is the ambiguities of rotation. Basically, the alignment on the skeleton is not enough for getting a stable fake-real paired training but still appliable.
- Possible, we also got some results based on the CMU-supervised. But just like I said, the result is not stable.
And the global trajectory reconstruction is another subject that I'm interested in.
I didn't convert our prediction into world space because of some reason. In your visualization, what kind of transformation do you apply to the global rotation and global trajectory? I have my visualization script, it should align the GT very well, so I wonder if some steps are missing.
from motionet.
Here is how we transform the global trajectory, you can insert it after this line and get tthe correct trajectory in your visualization.
translation_world = (R.T.dot(translation.T) + T).T
Notice that, this a trajectory in world space, which cannot be combined with the predicted rotation in camera space. So we can only compare the trajectory itself, until we can convert our bvh rotation to world space. I didn't do that because of some technical issue in that time, now I am able to fix it but allow me some time to implement and test it.
from motionet.
I didn't convert our prediction into world space because of some reason. In your visualization, what kind of transformation do you apply to the global rotation and global trajectory? I have my visualization script, it should align the GT very well, so I wonder if some steps are missing.
Thank you for the detailed explanation. Allow me to be more specific, the trajectories in the gif were in the camera space. Here is the bvh file I got by running the evaluation code (didn't change anything) using h36m_gt_t.pth, he didn't walk in circles and the motion jittered when he turned around. The trajectory validation example in your project page is perfect, but I cannot replicate it. I am really confused and trying to figure out what goes wrong.
S11_Walking_1_bvh.txt
Any responses will be highly appreciated!
Best regards.
from motionet.
@catherineytw
The trajectory is in world space in your visualization, but ours is in camera space. That's why you find they are different, and ours is not in circles. If you want to solve it, you need to take the XYZ trajectory from our method and apply the above transformation. For the trajectory prediction, there are a few points that should be claimed:
- We assume it's impossible to recover an absolute value of trajectory because it's an ill-posed problem from 2d to 3d. So our pipeline is a little different from VideoPose, which predicts the absolute XYZ number. In our solution, we will predict a depth factor, then assemble it with the XY movement of the root joint from the original 2d detection to recover an XYZ global trajectory. Also, because all the prediction in our method is a relative number, we will give a scaling manually to make the trajectory aligned better. In the default setting, I set it to 8 (code). It causes unnatural performance but is adjustable.
- So, how do we evaluate the trajectory? We only need to compare our predicted
depth_facter
withgt_depth_facter
and recover atranslation_rec
andgt_translation
, then visualize the translations.
In this open-source code, you can easily get it by inserting the following code after L54:
# Recover the trajectory with pre_proj
translation = (translations * np.repeat(pre_proj[0].cpu().numpy(), 3, axis=-1).reshape((-1, 3))) * 12
with open('./paper_figure/root_path_global/%s.rec.txt' % video_name, 'w') as f:
for index1 in range(translation.shape[0]):
f.writelines('%s %s %s\n' % (translation[index1, 0], translation[index1, 1], translation[index1, 2]))
f.close()
# Recover the trajectory with gt_proj_facters
translation = (translations * np.repeat(proj_facters[0].cpu().numpy(), 3, axis=-1).reshape((-1, 3))) * 12
with open('./paper_figure/root_path_global/%s.gt.txt' % video_name, 'w') as f:
for index1 in range(translation.shape[0]):
f.writelines('%s %s %s\n' % (translation[index1, 0], translation[index1, 1], translation[index1, 2]))
f.close()
Then you can compare these two trajectories. I re-run the visualization code, and was able to get these results:
Regarding the motion jittered problem, I am sorry it's hard to avoid based on the current framework. Using sparse 2d position as input will lose important information. Our prediction is per-framed, unlike VideoPose, which operates 243 inputs to predict 1, so they are expected to have stronger temporal coherence than ours.
from motionet.
Then you can compare these two trajectories. I re-run the visualization code, and was able to get these results:
Thank you for the detailed explanation. I add the code and finally get the trajectories on the left, they match, perfectly. However, I still cannot get the circular global trajectory in the world space on the right. Here are my codes, and the world trajectories looks exactly like those on the left (as shown in the figure), could you help me to figure out what's wrong?
` #-----------------------Trajectory file--------------------------
if config.arch.translation:
R, T, f, c, k, p, res_w, res_h = test_data_loader.dataset.cameras[(int(video_name.split('')[0].replace('S', '')), int(video_name.split('')[-1]))]
pose_2d_film = (poses_2d_pixel[0, :, :2].cpu().numpy() - c[:, 0]) / f[:, 0]
translations = np.ones(shape=(pose_2d_film.shape[0], 3))
translations[:, :2] = pose_2d_film
# Recover the trajectory with pre_proj
translation = (translations * np.repeat(pre_proj[0].cpu().numpy(), 3, axis=-1).reshape(
(-1, 3))) * 12
np.save('{}/{}_rec.npy'.format(output_trajectory_path,video_name),translation)
translation_world = (R.T.dot(translation.T) + T).T
np.save('{}/{}_rec_world.npy'.format(output_trajectory_path, video_name), translation_world)
# Recover the trajectory with gt_proj_facters
translation = (translations * np.repeat(proj_facters[0].cpu().numpy(), 3, axis=-1).reshape(
(-1, 3))) * 12
np.save('{}/{}_gt.npy'.format(output_trajectory_path, video_name), translation)
translation_world = (R.T.dot(translation.T) + T).T
np.save('{}/{}_gt_world.npy'.format(output_trajectory_path, video_name), translation_world)`
By the way, I couldn't agree with you more, precisely recovering the 3D global trajectory is impossible due to the depth ambiguity and unknown camera intrinsic/extrinsic parameters. In my opinion, as long as the projected 3d trajectory matches the 2D trajectory in the video, it is valid.
Regarding the motion jittered problem, I am sorry it's hard to avoid based on the current framework. Using sparse 2d position as input will lose important information. Our prediction is per-framed, unlike VideoPose, which operates 243 inputs to predict 1, so they are expected to have stronger temporal coherence than ours.
In terms of the jittered motion problem, I have several immature ideas and want to discuss them with you. Is it possible to add rotation angular velocity and acceleration terms to the adversarial training loss to keep the motion smooth? Or add the joint velocity and acceleration loss terms to the reconstructed skeleton? Since the network takes motion chunks as input, why not make use of the temporal information to mitigate the motion jitters?
from motionet.
@catherineytw
I have no idea what's wrong with the visualization on the right. Would you mind scheduling a meeting so I can know more about the inference details? Here is my email: [email protected]
In terms of the jittered motion problem, I have several immature ideas and want to discuss them with you. Is it possible to add rotation angular velocity and acceleration terms to the adversarial training loss to keep the motion smooth? Or add the joint velocity and acceleration loss terms to the reconstructed skeleton? Since the network takes motion chunks as input, why not make use of the temporal information to mitigate the motion jitters?
Adversarial loss needed to be designed carefully. Even though we found it helpful in some cases, but still hard to refine the motion to a better level. We have some experiments on VIBE, which uses adversarial learning to improve human reconstruction, but it also makes a tiny improvement. Adding velocity and acceleration loss looks great. It's also what we are considering, and it works well in some motion synthesis papers like PFNN, etc. We do use temporal information as input, but the videopose architecture underperformed with our data operation, so I gave it up and changed to the current version with adaptive pooling. Bringing neural FK into the learning process is the key idea of our paper, and the current network architecture is just appliable and has big space to be improved. I am so happy to see more great ideas based on neural FK, we can talk about it more in the meeting.
from motionet.
Would you mind scheduling a meeting so I can know more about the inference details? Here is my email: [email protected]
Thank you for your kindness. Is Thursday evening a good time to you? Sorry to say that I never used google meeting, if you don't mind, here is my wechat ID: Catherineytw, maybe we can have a conversation on Tecent Meeting?
from motionet.
Related Issues (20)
- kernel size HOT 2
- 如何将人体的关节信息映射到动画人物上的?
- 如何将人的关节信息映射到动画人物上? HOT 2
- With RGB image input for 3D motion prediction?
- Read the output of openpose on wild videos HOT 2
- 怎么改为输出zxy顺序的bvh文件? HOT 1
- How to obtain smooth animation? HOT 5
- Blender visualization issue. HOT 1
- 关于测试时如何获得更好的效果 HOT 1
- BVH Output is upside down HOT 1
- How to use custom pose 2D key-points? HOT 12
- question about training pose in world space.
- about Global Root Positions Reconstruction Error.
- 每个关节点预测出来的rotation是关于哪里的rotation
- Project dependencies may have API risk issues
- How to convert h36m 3d pose to quaternions or euler angles?
- Calculate Joint Angles from 3D skeleton
- Training on custom dataset
- Global root position
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from motionet.