Giter Site home page Giter Site logo

shenhanqian / gaussianavatars Goto Github PK

View Code? Open in Web Editor NEW
481.0 481.0 44.0 0 B

[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"

Home Page: https://shenhanqian.github.io/gaussian-avatars

Python 100.00%

gaussianavatars's People

Contributors

eltociear avatar emepetres avatar gdrett avatar graphdeco avatar hrspythonix avatar jakubcerveny avatar jonathonluiten avatar shenhanqian avatar snosixtyboo avatar szymanowiczs avatar yzslab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaussianavatars's Issues

cx cy,question

I found a problem, when cx,cy are not in the center of the image, the projection is wrong.

Question regarding local to global 3D Gaussian rotation

Thanks for the great project and open-source code!

I have a question regarding your implementation about local to global 3D Gaussian rotation.

From the paper, I understood that you optimize the rotations of 3D Gaussians in their local coordinate (mesh face coordinate). To do so, you convert the local 3D Gaussian parameters to the global coordinate using a mesh face rotation matrix.

Specifically for the 3D Gaussian rotation, global rotation r' is computed as the multiplication of the "triangle (mesh face) rotation matrix R" and the "local 3D gaussian rotation matrix r". Thus, r'=Rr (from your paper and figure explanation).

However, I found the product order of rotations R and r (in quaternions) are reversed as below:

return quat_xyzw_to_wxyz(quat_product(quat_wxyz_to_xyzw(rot), quat_wxyz_to_xyzw(face_orien_quat))) # roma

I think it should be the opposite order, if rot is 3D Gaussian rotation (r) and face_orien_quat is the mesh face rotation (R).
Could you clarify the reason why the quaternion product is conducted in r'=rR order?

Thanks in advance :)

Loss question

Hi,sir
I have 20,000 frames of data, and each frame of data has 24 pictures. But when I run to 60000 and save the checkpoint for the first time, the loss stabilizes around 0.03. Increasing the training time cannot reduce the loss. What is the possible reason?

dimensions of self.binding

The initial dimension of binding is 10144, which changes to 62451 after the following step. Is this caused by the number of point clouds?
self.binding = torch.tensor(binding, dtype=torch.int32, device="cuda").squeeze(-1)

pytorch?

Why do you need to install cuda without writing pytorch and without specifying the version?

Are you sure the requirements.txt is complete? There is no torch inside. . .

Discrepancy in net_type Option for `lpips()` between `train.py` and `metrics.py`

Hi!
I noticed an inconsistency in the calculation of the Metric in train.py.
The lpips() function is used without specifying the net_type option, as seen here:

lpips_test += lpips(batch_img, batch_gt_img).sum().double()

Then, the default net_type for 'lpips()' is set to 'alex':

net_type: str = 'alex',

On the other hand, in metrics.py, the net_type for lpips() is set to 'vgg':

lpipss.append(lpips(renders[idx], gts[idx], net_type='vgg'))

This discrepancy could explain why there are differences in the values.
I believe that using 'alex' might be the more appropriate option, as it is closer to the results presented in the paper.

Could you please confirm this?

Thank you for looking into this.

How to get 3d landmarks?

I am new to this area so this question may be stupid.
I checked out the facial landmark detector "Star loss: Reducing semantic ambiguity in facial landmark detection." as you mentioned in appendix A. It gives a set of 2D landmarks for an image.
How could we get 3D landmarks to fit a FLAME mesh?
Thanks for your time!

about customize dataset

Dear author:

  1. How to make customize data for generate our own head avatar?
  2. Does it support mono video data for input? just like IMavatar data format, NeRSemble data use 16 calibrated camera, which is difficult to build such professional equipment, but mono video data can capture from a mobile phone.

Looking forward to your reply , thank you!

Managing gaussians.binding_counter in the function densify_and_split

First of all, thank you for providing awesome results of your research.

I have a question about the function densify_and_clone in scene/gaussian_model.
Screen Shot 2024-01-23 at 6 12 59 PM

Since we add the number of connections properly in line 466, I think the adding function at line 465 is redundant.
When I tried with my toy dataset, it occurs an undesirable discrepancy between the number of self._xyz and the sum of self.binding_counter.

When I wiped out line 465, it worked well.

If I missed something, please let me know.

Thank you.

About distributed parallel training

Hi, thanks for your great work!
I have 4 GPUs and plan to use torchrun to achieve distributed parallel training. Before that, I want to know if this approach will impair the model's capabilities(or, is distributed parallel training necessary?), as the dataset could be loaded at once on a single GPU(loader_camera_train = DataLoader(scene.getTrainCameras(), batch_size=None, shuffle=True, num_workers=8, pin_memory=True, persistent_workers=True)).
Do you have any ideas?
Thanks!

Camera Extrinsics

Hi Shenhan,

I am wondering why you did not use the original camera extrinsics in the Nersemble dataset. Is it because the original extrinsics are not accurate? Since I saw in the Nersemble githut repo, they have to apply a scale factor on camera position to let everything work.

And, if so, how did you get the new extrinsics? Using COLMAP?

Thanks,
Jeremy

Regarding Camera Information

Thank you very much for this open source work.
I wanted to ask about the cameras used for capturing the raw data in the NeRSemble dataset.
I think I've found a model (Sony Atlas 7.1 MP Model) which seems like the one you guys used but wanted to make sure.
Thanks.

How can I use colmap pose for training?

Hi, thank you very much for your excellent work. I'm trying to use colmap pose for GaussianAvatars training, but the training results are not ideal. How can I handle the colmap pose to adapt to GaussianAvatars training?

I noticed that you have answered a similar question before, using Gram-Schmidt to process camera poses, but my training results did not align with the flame model, which felt like a scale factor was missing.

I'm looking forward to your reply !

Concerns regarding the new License paragraph with Toyota

In a recent commit you updated the license and added a new paragraph about Toyota Motor Europe NV/SA.

I am currently working on my masters thesis and wanted to use this project to generate gaussian splatting human heads that i can use in my own project. According to the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license you are using, I should be allowed to use and even make custom changes to this project for non commercial use if I use the same license and attribute the original authors.

But the separate Toyota paragraph explicitly says:

Toyota Motor Europe NV/SA and its affiliated companies retain all intellectual property and proprietary rights in and to this software, related documentation and any modifications thereto. Any use, reproduction, disclosure or distribution of this software and related documentation without an express license agreement from Toyota Motor Europe NV/SA is strictly prohibited.

Could you clarify how these two contradicting parts work together? Am I allowed to use it according to the Creative Commons license or is the use and reproduction prohibited without a separate license from Toyota and all my possible modifications would be intellectual property of Toyota.

PointClouds

First of all, congratulations on producing this excellent piece of work.After following the guidance, the rendered images are realistic. But when viewing the point cloud using MeshLab, it appears to be in a diffuse state. I would like to know if there is a problem with any of the steps. Thank you very much.
image

COLMAP reading bug

I get the following error while trying to train the model on custom COLMAP data:
CameraInfo.__new__() got an unexpected keyword argument 'image'

Apparently, you forgot to comment out passing an image argument when instantiating CameraInfo in readColmapCameras()

UPD:
Other arguments, such as background, timestep and camera_id are also missing:
TypeError: CameraInfo.__new__() missing 3 required positional arguments: 'bg', 'timestep', and 'camera_id'

UPD2:
Looks like interfaces have been changed, since SceneInfo is also missing arguments while working with COLMAP:
TypeError: SceneInfo.__new__() missing 5 required positional arguments: 'val_cameras', 'train_meshes', 'test_meshes', 'tgt_train_meshes', and 'tgt_test_meshes'

About the color loss in Appendix A. FLAME Tracking

I am implementing the "FLAME Tracking" described in Appendix A. And I have a small question about the color loss.
Do you use texture space FLAMETex provided by authors of FLAME, or use mesh uv generator such as xatlas to generate a customized texture?
Thanks for your time in advance.

ValueError: too many values to unpack (expected 2)

Hello author, sorry to bother you.
During the training step, I encountered the following problem:

Loading Test Cameras [27/04 15:01:33]
100%|█████████████████████████████████████| 2496/2496 [00:00<00:00, 4829.53it/s]
Number of points at initialisation: 10144 [27/04 15:01:36]
Training progress: 0%| | 0/600000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home1/stu/wkx/GaussianAvatars-main/train.py", line 350, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "/home1/stu/wkx/GaussianAvatars-main/train.py", line 124, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/home1/stu/wkx/GaussianAvatars-main/gaussian_renderer/init.py", line 86, in render
rendered_image, radii = rasterizer(
ValueError: too many values to unpack (expected 2)
Training progress: 0%| | 0/600000 [00:40<?, ?it/s]

I followed the steps step by step, but where did the problem occur?
Could you please answer this question?

Don't find simple-knn.

Hello, I try to configure this code environment, but can not find the submodule simple-knn's page.

Triangles for the upper and lower teeth

What a great job! Congratulation for CVPR's acceptance! I find your paper described that

We also manually add triangles for the upper and lower teeth, which are rigid to the neck and jaw joints, respectively.

Would you plan to share the revised FLAME mesh in the future? It is quite important for me! Thanks a lot!

Build error for diff-gaussian-rasterization submodule

Problem Description:

When attempting to install the diff-gaussian-rasterization submodule as part of the GaussianAvatars project on Windows, I encounter a build error with the following message when using pip install -r requirements.txt:

Building wheel for diff-gaussian-rasterization (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [55 lines of output]
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7'
running bdist_wheel
running build
running build_py
...
Failed to build diff-gaussian-rasterization simple-knn

and at the same time
torch.cuda.is_available() return false

However, the build and installation process succeeds without any issues when using conda with the following command:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

System Environment:

OS: Windows 11
Python Version: 3.10
CUDA Version: 11.7

Steps to Reproduce:
Install PyTorch 2.0.1, torchvision 0.15.2 using pip.
Attempt to build and install the diff-gaussian-rasterization submodule.

Potential Reason:
It seems like the pip installation does not properly handle or recognize the CUDA environment, leading to the build process not finding the CUDA runtime. This issue might be related to the way pip installs PyTorch and its interaction with CUDA, differing from how conda manages package installations and environment configurations.

Suggested Improvement:

Could the installation documentation or setup scripts be updated to address or provide guidance for this issue? It would be helpful to have a note or workaround for users facing similar issues when using pip to install PyTorch and compile CUDA-dependent submodules.

How can I measure Novel-View Synthesis and Self-Reenactment?

After finishing training, I measured Novel-View Synthesis and Self-Reenactment on my own with metrics.py, but I'm not sure if I did it correctly.

To render for Novel-View Synthesis, attaching only --skip_train and --skip_val as below, is that correct?
Or is there any other operation that needs to be done?

SUBJECT=306

python render.py \
-m output/UNION10EMOEXP_${SUBJECT}_eval_600k \
--skip_train --skip_test

Additionally, to perform Self-Reenactment, can it be done as below?

SUBJECT=306
TGT_SUBJECT=306

python render.py \
-t data/UNION10_${TGT_SUBJECT}_EMO1234EXP234589_v16_DS2-0.5x_lmkSTAR_teethV3_SMOOTH_offsetS_whiteBg_maskBelowLine \
-m output/UNION10EMOEXP_${SUBJECT}_eval_600k

It might be that I don’t fully understand Novel-View Synthesis and Self-Reenactment, but I’m not clear on the difference between the two.
Thank you for your research.
I am currently conducting research using your study as a baseline.
I would like to measure the metrics correctly if you could inform me of the proper method.

[F glutil.cpp:332] eglGetDisplay() failed

(avatar) hci@hci:/workspace/psen/GaussianAvatars-main/GaussianAvatars-main$ SUBJECT=306
(avatar) hci@hci:/workspace/psen/GaussianAvatars-main/GaussianAvatars-main$
(avatar) hci@hci:/workspace/psen/GaussianAvatars-main/GaussianAvatars-main$ python train.py \

-s /workspace/psen/GaussianAvatars-main/NeRSemble/306
-m /workspace/psen/GaussianAvatars-main/output/306
--port 60001 --eval --white_background --bind_to_mesh
Optimizing /workspace/psen/GaussianAvatars-main/output/306
Output folder: /workspace/psen/GaussianAvatars-main/output/306 [29/04 10:11:04]
[F glutil.cpp:332] eglGetDisplay() failed
Aborted (core dumped)
有这种问题的解决方案吗?

dynamic_offset

Thanks for your brilliant work! I have a few questions.
What is dynamic_offset? And why it is set to None by default?

time

How long is the training time and the inference time?

GaussianAvatars-Local Viewer problem

Hello, I encountered an error when using the GaussianAvatars-Local Viewer in your project for model preview. When I try to resize the window, the program crashes with the error: "Segmentation fault (core dumped)." How can I resolve this issue? I am running Ubuntu 20.04.

How to obtain the FLAME params for custom input?

Hello, first of all, thanks a lot for the wonderful work and making it open source.

I wanted to run this on a custom dataset. I am using the colmap data for camera params but that would not include the mesh information.
Could you please point out to how could I get the FLAME params?

For example, for the dataset 306, 00000.npz has these shape values:
translation.npy (1,3) rotation.npy (1,3) neck_pose.npy (1,3) jaw_pose.npy (1,3) eyes_pose.npy (1,6) shape.npy (300) expr.npy (1,100) static_offset.npy (1,5143,3)
If there is some method, which could give all these values for the custom input frames, it would be really helpful.

Thanks in advance!

About self-reenactment and cross-identity reenactment

Hi, dear author. I have some questions about self-reenactment and cross-identity reenactment.

  1. For multi-view, the FLAME meshes have vertices at varied positions but share the same topology. Does that mean all the FLAME meshes have same number of vertices, and same number of triangles?
  2. In checkpoint, there is only one set of 3DGS(for subject 306, after training, gaussians.binding.shape = torch.Size([86127])). Their
    location µ, rotation r, and anisotropic scaling s are all in the local space, right?
  3. When I test self-reenactment or cross-identity reenactment, actually I'm modifying the parameters of the triangles, thereby altering the parameters of gs in the global space. This is consistent with the training process, so no additional optimization is needed, right?

I'm not very familiar with this field, so I really appreciate it if you could answer my questions!

The values of position_lr_init and scaling_lr

Excellent work!

I have a question: how to calculate the values of self.position_lr_init and self.scaling_lr in OptimizationParams?

I find the annotation of "scaled up according to mean triangle scale", does it means the triangle scale in your implementation is 0.00016/0.005=0.032, i.e. 32 mm. It seems a bit big. Is my calculation wrong? How about to scale up self.scaling_lr? Thanks!

Length of Training data

Hi,
I have a question about the training data. Looking forward to your reply.

How many frames were used in your experiments during training a head avatar?
I have conducted some experiments based on the original GS and found that using more frames for training leads to worse results. Do you have any suggestions as to why this is happening?

How to render the canonical space images?

Hi, Shenhan,

Many thanks for releasing your project, it's pretty awesome!

May I inquire how to render the canonical space Avatar's images? Like using the scripts to render novel expressions and poses:
python render.py -m output/UNION10EMOEXP_306_eval_600k --skip_train --skip_val --render_mesh --select_camera_id 8

Looking forward to your reply! Many thanks!

Regarding the necessity of the pure-white background

Hi,

Thanks for your good work. Really appreciate that you guys released the codes.

One thing I noticed is that you define the initial color of Gaussians to be some random color quite close to black (rather than randomly initialized), while the background color of your processed data is white.

I'm not sure whether the high contrast between the initial Gaussian's color and the background color is beneficial for reducing the floaters near the mesh surface. Have you tried on the noisy background (non pure white) data before?

Thanks.

xyz_loss increases during training

Hello, I'm running training code and find xyz_loss increases during training. Is this as expected?
image
Could you please explain losses['xyz'] = F.relu((gaussians._xyz*gaussians.face_scaling[gaussians.binding])[visibility_filter] - opt.threshold_xyz).norm(dim=1).mean() * opt.lambda_xyz while threshold_xyz = 1.? I don't really understand it.
Thanks!

About pre-processed data

I have applied for and obtained the pre-processed data authentication, but when trying to verify the code on OneDrive sent via email, the email delay is too long and the verification code has expired. I have tried multiple times to solve the issue without success. Is there any other way to obtain the pre-processed data?

How to obtain FLAME mesh with hair?

Thank you for your great work. I have one question. How to obtain FLAME mesh with hair? I checked the FLAME mesh model, and found out that it does not model hair. But in the pipeline and the data, I noticed that hair are part of the mesh. Could you tell me why? Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.