Giter Site home page Giter Site logo

ku-cvlab / gaussiantalker Goto Github PK

View Code? Open in Web Editor NEW
235.0 235.0 30.0 155.36 MB

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim

License: Other

Python 96.86% Shell 3.14%

gaussiantalker's People

Contributors

joungbinlee avatar kyustorm7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaussiantalker's Issues

Question: Paste inferenced video back onto the original source video

Hey there,

first of all great project! I would love to try it out. Does it seemlesly work to put the generated video back on top of the raw video?

Like this:

image

To remain the real eye and forehead movement i want to mask out just the nose mouth and jaw:

image

Would be awesome if someone could answer me this before train an own video :)
Thanks in advance!

Output in my own training video is unstable

Hi, I tried to train it for two videosm however got unstable head. What is the required criteria for the input video for training?
I have got results similar to this:

326222364-304de436-6fbe-4aab-8c68-c8eaa34b432f.mov

What can be done from my end to improve training for results that do not have this instability?

Training not Converge

Hi, thanks for your amazing work
I have tried to reproduce the results, and i followed the data-processing procedure in readme.md.
However, when i train on Obama dataset and my own dataset, the training loss seems not converge.
here are my arguments and command

python train.py \
-s mnt/workspace/talking_head/GaussianTalker/data/ids/Obama \
--model_path /mnt/workspace/talking_head/GaussianTalker/output/Obama \
--configs arguments/64_dim_1_transformer.py

and the loss log seems super weird
截屏2024-05-15 11 28 20

the rendered image has no head, which is different from the original paper.
截屏2024-05-15 11 30 39

截屏2024-05-15 11 29 59

similar results were observed on my own data, any ideas ?
The only difference between my env with yours is
i use torch Version: 1.12.1 and cuda 11.3

undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Did someone face this issue while running

"pip install -e submodules/custom-bg-depth-diff-gaussian-rasterization"

Obtaining file:///home/nikhil/GaussianTalker/submodules/custom-bg-depth-diff-gaussian-rasterization
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
Traceback (most recent call last):
File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/init.py", line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/ctypes/init.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 36, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/nikhil//GaussianTalker/submodules/custom-bg-depth-diff-gaussian-rasterization/setup.py", line 13, in <module>
      from torch.utils.cpp_extension import CUDAExtension, BuildExtension
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/__init__.py", line 217, in <module>
      _load_global_deps()
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/__init__.py", line 178, in _load_global_deps
      _preload_cuda_deps()
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
      ctypes.CDLL(cublas_path)
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/ctypes/__init__.py", line 364, in __init__
      self._handle = _dlopen(self._name, mode)
  OSError: /home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Subtitute tri-plane with MLP

Congratulatation for MM Acceptance!Since it's hard to learn a canonical face with explicit point clouds, i am wondering what if we change the tri-plane representation with a common implicit MLP ?

anyone compare with the geneface++

GaussianTalker is a method developed by 3DGS while geneface++ developed by NeRF. If someone compare the result and can you give some discussion.

Question in data preparation

This is a great work!I'm new to this field, so I lack a lot of experience. If you happen to have some time, could you help me?
In the Data Preparation section, I tried to run python data_utils/process.py my_dataset/Obama/Obama.mp4, but it gave an error indicating that the track_params.pt file is missing.

After comparing with the directory structure you provided, I found that I'm missing

  1. au.csv
  2. track_params.pt
  3. transformers_train.json
  4. transformers_val.json

Could you please guide me in as much detail as possible on how to obtain the files that I am missing? Thanks a lot!

Here's a printout of my directory structure:
(/data2/conda_envs/GaussianTalker) root@my_workspace/gaussiantalker/xq_dataset/Obama# ls -lh
total 121M
-- aud_ds.npy
-- aud_novel.wav
-- aud.npy ( seems not included in your directory structure given)
-- aud_train.wav
-- aud.wav
--- bc.jpg
-- gt_imgs
-- Obama.mp4
-- ori_imgs
-- torso_imgs

Wish you all the best

some flickering artifacts in the neck region

This one is a very good repo. But I am facing some issues. Since gaussian rasterizer setting is taking torso+bgimage as input, The torso is from training set, while performing inference with new audio, while rendering there are artifacts in the neck region because torso +bg image is from training set, what can be the solution for this?

Why is the maximum duration of the rendered video only 29 seconds and always 702 frames

Looking for config file in model/cfg_args
Config file found: model/cfg_args
Rendering model
Namespace(add_point=False, add_points=False, apply_rotation=False, batch=1, batch_size=32, bounds=1.6, canonical_tri_plane_factor_list=['opacity', 'shs'], checkpoint_iterations=[], coarse_iterations=7999, compute_cov3D_python=False, configs='arguments/64_dim_1_transformer.py', convert_SHs_python=False, custom_aud='/User/GaussianTalker/custom_audio/1/audio.npy', custom_sampler=None, custom_wav='/User/GaussianTalker/custom_audio/1/audio.wav', d_model=64, data_device='cuda', dataloader=True, debug=True, debug_from=-1, defor_depth=2, deformation_lr_delay_mult=0.01, deformation_lr_final=1e-05, deformation_lr_init=0.0001, densification_interval=100, densify_from_iter=1000, densify_grad_threshold_after=0.0002, densify_grad_threshold_coarse=0.001, densify_grad_threshold_fine_init=0.0002, densify_until_iter=7000, depth_fine_tuning=True, detect_anomaly=False, drop_prob=0.2, empty_voxel=False, eval=True, expname='', extension='.png', feature_lr=0.0025, ffn_hidden=128, grid_lr_final=0.00016, grid_lr_init=0.0016, grid_pe=0, images='images', ip='127.0.0.1', iteration=10000, iterations=10000, kplanes_config={'grid_dimensions': 2, 'input_coordinate_dim': 3, 'output_coordinate_dim': 32, 'resolution': [64, 64, 64]}, l1_time_planes=0.0001, lambda_dssim=0, lambda_lpips=0, lip_fine_tuning=True, llffhold=8, model_path='model', multires=[1, 2], n_head=2, n_layer=1, net_width=128, no_do=False, no_dr=False, no_ds=False, no_dshs=False, no_dx=False, no_grid=False, only_infer=True, opacity_lr=0.05, opacity_pe=2, opacity_reset_interval=3000, opacity_threshold_coarse=0.005, opacity_threshold_fine_after=0.005, opacity_threshold_fine_init=0.005, percent_dense=0.01, plane_tv_weight=0.0002, port=6009, pos_emb=True, posebase_pe=10, position_lr_delay_mult=0.01, position_lr_final=1.6e-06, position_lr_init=0.00016, position_lr_max_steps=20000, pruning_from_iter=500, pruning_interval=100, quiet=False, render_process=False, resolution=-1, rotation_lr=0.001, save_iterations=[1000, 3000, 4000, 5000, 6000, 7000, 9000, 10000, 12000, 14000, 20000, 30000, 45000, 60000, 30000], scale_rotation_pe=2, scaling_lr=0.005, sh_degree=3, skip_test=True, skip_train=True, skip_video=False, source_path='data_set/wangxueyin', split_gs_in_fine_stage=False, start_checkpoint=None, static_mlp=False, test_iterations=[0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, 15500, 16000, 16500, 17000, 17500, 18000, 18500, 19000, 19500, 20000, 20500, 21000, 21500, 22000, 22500, 23000, 23500, 24000, 24500, 25000, 25500, 26000, 26500, 27000, 27500, 28000, 28500, 29000, 29500, 30000, 30500, 31000, 31500, 32000, 32500, 33000, 33500, 34000, 34500, 35000, 35500, 36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500, 43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500], time_smoothness_weight=0.001, timebase_pe=4, timenet_output=32, timenet_width=64, train_l=['xyz', 'deformation', 'grid', 'f_dc', 'f_rest', 'opacity', 'scaling', 'rotation'], train_tri_plane=True, use_wandb=False, visualize_attention=False, weight_constraint_after=0.2, weight_constraint_init=1, weight_decay_iteration=5000, white_background=True, zerostamp_init=False) [05/06 00:37:09]
feature_dim: 64 [05/06 00:37:09]
Loading trained model at iteration 10000 [05/06 00:37:09]
[INFO] load aud_features: torch.Size([7713, 29, 16]) [05/06 00:37:09]
Reading Training Transforms [05/06 00:37:10]
Reading Test Transforms [05/06 00:37:17]
Generating Video Transforms [05/06 00:37:20]
Reading Custom Transforms [05/06 00:37:20]
Loading Training Cameras [05/06 00:37:26]
Loading Test Cameras [05/06 00:37:26]
Loading Video Cameras [05/06 00:37:26]
Loading Custom Cameras [05/06 00:37:26]
Deformation Net Set aabb [0.75849324 0.90093476 0.45244923] [-0.78521377 -0.8166129 -0.566415 ] [05/06 00:37:26]
Voxel Plane: set aabb= Parameter containing:
tensor([[ 0.7585, 0.9009, 0.4524],
[-0.7852, -0.8166, -0.5664]]) [05/06 00:37:26]
loading model from existsmodel/point_cloud/iteration_10000 [05/06 00:37:28]
============== <scene.Scene object at 0x7800e0d15d50> [05/06 00:37:28]
------------------------------------------------- [05/06 00:37:28]
test set rendering : 702 frames [05/06 00:37:28]
------------------------------------------------- [05/06 00:37:28]
point nums: 43459 [05/06 00:37:28]
Rendering progress: 100%|██████████| 702/702 [00:34<00:00, 20.28it/s]
total frame: 702 [05/06 00:38:03]
FPS: 23.27712135731388 [05/06 00:38:03]

iterations done: begin wirte [05/06 00:39:17]

How to do inference on a new audio?

Hello, thank you very much for the great work and making it open source.

I tried the code and it was producing good results for the aud_novel.wav kept separately.

But what changes would be required if I want to run the inference on a new audio.
Just changing the code in render.py to take up new audio file does not seem to work.
It seems there are changes to be done elsewhere too.

Could you point out how can I infer the code for a new audio?

Thanks in advance.

Very bad lip sync for audios especially in Korean

Hello, thanks for providing such a great project.

FPS of your work is amazing, but seemingly the rendering outputs have bad lip sync quality. Particularly the lip sync for Korean audios is almost zero.

may_kor.mov

At least I don't think it's because of the length of the trained video - over 4 mins length which will be enough for other person dependent models.

I'd like to ask you, why is that? Or do you have any hypothesis or guess about the problem?

Best,
Junyeong Ahn

The location of trained model

Hello, thank you for providing such a great project. I finished training session as shown below:

image

However, I cannot find the checkpoint nowhere in my folders.

image

No outputfolder has been created. I will be so glad if you help me out with this!

IndexError: list index out of range

run inference

File "render.py", line 231, in
render_sets(model.extract(args), hyperparam.extract(args), args.iteration, pipeline.extract(args), args)
File "render.py", line 170, in render_sets
render_set(dataset.model_path, "train", scene.loaded_iter, scene.getTrainCameras(), gaussians, pipeline, audio_dir, batch_size)
File "render.py", line 133, in render_set
cmd = f'ffmpeg -loglevel quiet -y -i {gts_path}/gt.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {gts_path}/{model_path.split("/")[-2]}{name}{iteration}iter_gt.mov'
IndexError: list index out of range

how could I solve this?

run the code with higher resolution

I am having various problem porting your code to 1440x1440.

I am getting this issue here.
I have tried to play with params as suggested here... But the result was very bad.

Error when Inference with custom audio

Hello!

Having error while Inferencing with custom audio.

total frame: 449 [20/06 12:24:24]
FPS: 97.33466045638134 [20/06 12:24:24]
Traceback (most recent call last):
File "render.py", line 233, in
render_sets(model.extract(args), hyperparam.extract(args), args.iteration, pipeline.extract(args), args)
File "render.py", line 168, in render_sets
render_set(dataset.model_path, "custom", scene.loaded_iter, scene.getCustomCameras(), gaussians, pipeline, audio_dir, batch_size)
File "render.py", line 137, in render_set
cmd = f'ffmpeg -loglevel quiet -y -i {render_path}/renders.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {render_path}/{model_path.split("/")[-2]}{name}{iteration}iter_renders.mov'
IndexError: list index out of range

What may cause this error?

Wrong lines under if statement

GaussianTalker/render.py

Lines 134 to 138 in aead7cf

if name != 'custom':
cmd = f'ffmpeg -loglevel quiet -y -i {render_path}/renders.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {render_path}/{model_path.split("/")[-2]}_{name}_{iteration}iter_renders.mov'
os.system(cmd)
cmd = f'ffmpeg -loglevel quiet -y -i {gts_path}/gt.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {gts_path}/{model_path.split("/")[-2]}_{name}_{iteration}iter_gt.mov'
os.system(cmd)

Line 137,138 should be under if statement.

how to get <custom_aud>.npy?

I have a new_for_inference.wav file for inference,but I am not sure how to get the <custom_aud>.npy file. HELP!!!

facial_mesh error

facial_mesh = torch.load(mesh_path)["vertices"]
KeyError: 'vertices'

the mesh_path is track_params.pt
but there is no "vertices" information in the "track_params.pt",is there anything error?

facial_mesh KeyError: 'vertices'

facial_mesh = torch.load(mesh_path)["vertices"]
KeyError: 'vertices'

I looked at question 3, and unfortunately my side of the code still has the "KeyError" problem. Looking back at data/process.py, there are no vertices. is there anything extra I can do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.