I noticed that the results vary with different face rotation / head tilt, since the va

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

rotate face to neutral pose first about mefamo HOT 14 OPEN

jimwest commented on August 26, 2024

rotate face to neutral pose first

from mefamo.

Comments (14)

Neleac commented on August 26, 2024

Actually a simpler solution than rotating the landmarks would be to project the points onto a plane defined by some local axes.

from mefamo.

JimWest commented on August 26, 2024

I actually got those points in a better way, but couldn't had the time to implement it properly yet.
If you activate the --show_3d parameter and look at the image (projected 3d points onto 2d) that's pretty much as stable and normalized as you can get with the current mediapipe model.

from mefamo.

qhanson commented on August 26, 2024

Currently, the code uses metric landmarks or normalized landmarks (image pixel space) to calculate blendshape values. I tried both methods and the results look awful.

However, both ways ignore the face identity. Different people have varied faces. I even try the rigid transformation to map my metric landmarks to the canonical face provided by mediapipe. However, even the neural faces in the transformed space (canonical space) look different. Do you have any suggestions? I am also working on data-driven blendshape solver (deep learning by collecting enough metahuman faces and their blendshape values).

from mefamo.

xuguozhi commented on August 26, 2024

Currently, the code uses metric landmarks or normalized landmarks (image pixel space) to calculate blendshape values. I tried both methods and the results look awful.

However, both ways ignore the face identity. Different people have varied faces. I even try the rigid transformation to map my metric landmarks to the canonical face provided by mediapipe. However, even the neural faces in the transformed space (canonical space) look different. Do you have any suggestions? I am also working on data-driven blendshape solver (deep learning by collecting enough metahuman faces and their blendshape values).

deep learning base approach seems ok but requires mush pair-data for training

from mefamo.

qhanson commented on August 26, 2024

Yes. It needs massive paired-data such as hundrends of faces. Luckily, Metahuman is real enough to compensate for the real human face collection. I am working on writing a metahuman project to receive blendshape values and save the results as image.

from mefamo.

xuguozhi commented on August 26, 2024

Yes. It needs massive paired-data such as hundrends of faces. Luckily, Metahuman is real enough to compensate for the real human face collection. I am working on writing a metahuman project to receive blendshape values and save the results as image.

I am no more at NetEase, but the image-bs pair from metahuman could be easily acquired if you are familiar with UE.

from mefamo.

qhanson commented on August 26, 2024

Some Updates:
Datasets: Send some 52 blendshape to metahuman and get the corresponding metahuman face. Personally, I obtained 30k images for 40 expressions of 59 metahumans.

Method: training a neural network from synthesized metahuman faces to 52 bs.

Result: The neural network is converged well on the synthesized datasets. Testing on the synthesized datasets worked well. However, it does not generalize to real human faces.

from mefamo.

Neleac commented on August 26, 2024

@qhanson I suggest training the model to directly use MediaPipe landmarks to predict blendshape values. To generate the ground truth blendshape values for the dataset, you'll have to use something like LiveLinkFace mentioned in the README. This MediaPipe -> blendshape model is the missing piece to replacing LiveLinkFace

from mefamo.

iPsych commented on August 26, 2024

@Neleac @qhanson
It seems that I am facing the same problem.
I am looking for the better solution for 'already recorded video' to meta-human applicable blendshape output.
Currently wiggling with mediapipe attention-mesh.

from mefamo.

qhanson commented on August 26, 2024

@qhanson I suggest training the model to directly use MediaPipe landmarks to predict blendshape values. To generate the ground truth blendshape values for the dataset, you'll have to use something like LiveLinkFace mentioned in the README. This MediaPipe -> blendshape model is the missing piece to replacing LiveLinkFace

In my experiment, directly learning the mapping (468*3 -> 52) with a 4-layer MLP does not work well. With l1 loss, the output keeps the same. With l2 loss, the mouth can open and close while the eye keeps open all the time. This reminds me of the mesh classification problem. Passing the render mesh or point cloud of 468 landmarks may work. In this way, we can not exploit the pretrained-weights of mediapipe. I do not know the minimum number of paired image2bs. Tip: I have not tested this way.

from mefamo.

JimWest commented on August 26, 2024

I would try to use a smaller input, you don't need all the 468 Keypoints, I would try to start with the ones I'm using in my config file and slowly adding more (by looking at the ones that really matter when doing facial stuff). With that you will need way less training data (and training time).

from mefamo.

zk2ly commented on August 26, 2024

一些更新： **数据集：**向 metahuman 发送一些 52 blendshape 并获得相应的 metahuman 人脸。就个人而言，我为 59 个超人类的 40 个表情获得了 30k 张图像。

**方法：**从合成的超人脸到 52 bs 训练一个神经网络。

**结果：**神经网络在合成数据集上收敛良好。对合成数据集的测试效果很好。但是，它并不能推广到真实的人脸。

Can you share your data, I want to use it to train a mediapipe2blendshape network, if it works well I will share the network with you.

from mefamo.

qhanson commented on August 26, 2024

Can you share your data, I want to use it to train a mediapipe2blendshape network, if it works well I will share the network with you.

For simple experiments, you do not need these datasets to train on model. You can try https://github.com/yeemachine/kalidokit

from mefamo.

sylyt62 commented on August 26, 2024

Some Updates: Datasets: Send some 52 blendshape to metahuman and get the corresponding metahuman face. Personally, I obtained 30k images for 40 expressions of 59 metahumans.

Method: training a neural network from synthesized metahuman faces to 52 bs.

Result: The neural network is converged well on the synthesized datasets. Testing on the synthesized datasets worked well. However, it does not generalize to real human faces.

What loss function did you use to train this network?

There's another morphable head model named FLAME, which offers a tool to generate 3d mesh with its 100 expression parameters (something like blendshape) as inputs. With this tool we could build loss functions by mapping it back to the image space (3d -> 2d) and thus compare the 3d landmarks of the face.

But it seems that ARKit lack this kind of tool to do the mapping. If you use statistical L1 loss or so, it will only focus on the similarity of the numbers, but not the similarity of the actual expressions. Guess that's why your model not performing well in generalization.

from mefamo.

rotate face to neutral pose first about mefamo HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent