ku-cvlab / gaussiantalker Goto Github PK

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim

License: Other

Python 96.86% Shell 3.14%

gaussiantalker's People

Contributors

Stargazers

Watchers

gaussiantalker's Issues

Question: Paste inferenced video back onto the original source video

Hey there,

first of all great project! I would love to try it out. Does it seemlesly work to put the generated video back on top of the raw video?

Like this:

To remain the real eye and forehead movement i want to mask out just the nose mouth and jaw:

Would be awesome if someone could answer me this before train an own video :)
Thanks in advance!

Output in my own training video is unstable

Hi, I tried to train it for two videosm however got unstable head. What is the required criteria for the input video for training?
I have got results similar to this:

326222364-304de436-6fbe-4aab-8c68-c8eaa34b432f.mov

What can be done from my end to improve training for results that do not have this instability?

Training not Converge

Hi, thanks for your amazing work
I have tried to reproduce the results, and i followed the data-processing procedure in readme.md.
However, when i train on Obama dataset and my own dataset, the training loss seems not converge.
here are my arguments and command

python train.py \
-s mnt/workspace/talking_head/GaussianTalker/data/ids/Obama \
--model_path /mnt/workspace/talking_head/GaussianTalker/output/Obama \
--configs arguments/64_dim_1_transformer.py

and the loss log seems super weird

the rendered image has no head, which is different from the original paper.

similar results were observed on my own data, any ideas ?
The only difference between my env with yours is
i use torch Version: 1.12.1 and cuda 11.3

undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Did someone face this issue while running

"pip install -e submodules/custom-bg-depth-diff-gaussian-rasterization"

Obtaining file:///home/nikhil/GaussianTalker/submodules/custom-bg-depth-diff-gaussian-rasterization
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
Traceback (most recent call last):
File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/init.py", line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/ctypes/init.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 36, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/nikhil//GaussianTalker/submodules/custom-bg-depth-diff-gaussian-rasterization/setup.py", line 13, in <module>
      from torch.utils.cpp_extension import CUDAExtension, BuildExtension
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/__init__.py", line 217, in <module>
      _load_global_deps()
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/__init__.py", line 178, in _load_global_deps
      _preload_cuda_deps()
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
      ctypes.CDLL(cublas_path)
    File "/home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/ctypes/__init__.py", line 364, in __init__
      self._handle = _dlopen(self._name, mode)
  OSError: /home/nikhil/anaconda3/envs/GaussianTalker/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Subtitute tri-plane with MLP

Congratulatation for MM Acceptance！Since it's hard to learn a canonical face with explicit point clouds, i am wondering what if we change the tri-plane representation with a common implicit MLP ?

training error: RuntimeError: received 0 items of ancdata

the error as the image below shows:

anyone compare with the geneface++

GaussianTalker is a method developed by 3DGS while geneface++ developed by NeRF. If someone compare the result and can you give some discussion.

Question in data preparation

This is a great work！I'm new to this field, so I lack a lot of experience. If you happen to have some time, could you help me?
In the Data Preparation section, I tried to run python data_utils/process.py my_dataset/Obama/Obama.mp4, but it gave an error indicating that the track_params.pt file is missing.

After comparing with the directory structure you provided, I found that I'm missing

au.csv
track_params.pt
transformers_train.json
transformers_val.json

Could you please guide me in as much detail as possible on how to obtain the files that I am missing? Thanks a lot!

Here's a printout of my directory structure:
(/data2/conda_envs/GaussianTalker) root@my_workspace/gaussiantalker/xq_dataset/Obama# ls -lh
total 121M
-- aud_ds.npy
-- aud_novel.wav
-- aud.npy ( seems not included in your directory structure given)
-- aud_train.wav
-- aud.wav
--- bc.jpg
-- gt_imgs
-- Obama.mp4
-- ori_imgs
-- torso_imgs

Wish you all the best

some flickering artifacts in the neck region

This one is a very good repo. But I am facing some issues. Since gaussian rasterizer setting is taking torso+bgimage as input, The torso is from training set, while performing inference with new audio, while rendering there are artifacts in the neck region because torso +bg image is from training set, what can be the solution for this?

inference with cpu

how to inference with cpu?

error: inference custom audio

what should I do

Why is the maximum duration of the rendered video only 29 seconds and always 702 frames

Looking for config file in model/cfg_args
Config file found: model/cfg_args
Rendering model
Namespace(add_point=False, add_points=False, apply_rotation=False, batch=1, batch_size=32, bounds=1.6, canonical_tri_plane_factor_list=['opacity', 'shs'], checkpoint_iterations=[], coarse_iterations=7999, compute_cov3D_python=False, configs='arguments/64_dim_1_transformer.py', convert_SHs_python=False, custom_aud='/User/GaussianTalker/custom_audio/1/audio.npy', custom_sampler=None, custom_wav='/User/GaussianTalker/custom_audio/1/audio.wav', d_model=64, data_device='cuda', dataloader=True, debug=True, debug_from=-1, defor_depth=2, deformation_lr_delay_mult=0.01, deformation_lr_final=1e-05, deformation_lr_init=0.0001, densification_interval=100, densify_from_iter=1000, densify_grad_threshold_after=0.0002, densify_grad_threshold_coarse=0.001, densify_grad_threshold_fine_init=0.0002, densify_until_iter=7000, depth_fine_tuning=True, detect_anomaly=False, drop_prob=0.2, empty_voxel=False, eval=True, expname='', extension='.png', feature_lr=0.0025, ffn_hidden=128, grid_lr_final=0.00016, grid_lr_init=0.0016, grid_pe=0, images='images', ip='127.0.0.1', iteration=10000, iterations=10000, kplanes_config={'grid_dimensions': 2, 'input_coordinate_dim': 3, 'output_coordinate_dim': 32, 'resolution': [64, 64, 64]}, l1_time_planes=0.0001, lambda_dssim=0, lambda_lpips=0, lip_fine_tuning=True, llffhold=8, model_path='model', multires=[1, 2], n_head=2, n_layer=1, net_width=128, no_do=False, no_dr=False, no_ds=False, no_dshs=False, no_dx=False, no_grid=False, only_infer=True, opacity_lr=0.05, opacity_pe=2, opacity_reset_interval=3000, opacity_threshold_coarse=0.005, opacity_threshold_fine_after=0.005, opacity_threshold_fine_init=0.005, percent_dense=0.01, plane_tv_weight=0.0002, port=6009, pos_emb=True, posebase_pe=10, position_lr_delay_mult=0.01, position_lr_final=1.6e-06, position_lr_init=0.00016, position_lr_max_steps=20000, pruning_from_iter=500, pruning_interval=100, quiet=False, render_process=False, resolution=-1, rotation_lr=0.001, save_iterations=[1000, 3000, 4000, 5000, 6000, 7000, 9000, 10000, 12000, 14000, 20000, 30000, 45000, 60000, 30000], scale_rotation_pe=2, scaling_lr=0.005, sh_degree=3, skip_test=True, skip_train=True, skip_video=False, source_path='data_set/wangxueyin', split_gs_in_fine_stage=False, start_checkpoint=None, static_mlp=False, test_iterations=[0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, 15500, 16000, 16500, 17000, 17500, 18000, 18500, 19000, 19500, 20000, 20500, 21000, 21500, 22000, 22500, 23000, 23500, 24000, 24500, 25000, 25500, 26000, 26500, 27000, 27500, 28000, 28500, 29000, 29500, 30000, 30500, 31000, 31500, 32000, 32500, 33000, 33500, 34000, 34500, 35000, 35500, 36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500, 43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500], time_smoothness_weight=0.001, timebase_pe=4, timenet_output=32, timenet_width=64, train_l=['xyz', 'deformation', 'grid', 'f_dc', 'f_rest', 'opacity', 'scaling', 'rotation'], train_tri_plane=True, use_wandb=False, visualize_attention=False, weight_constraint_after=0.2, weight_constraint_init=1, weight_decay_iteration=5000, white_background=True, zerostamp_init=False) [05/06 00:37:09]
feature_dim: 64 [05/06 00:37:09]
Loading trained model at iteration 10000 [05/06 00:37:09]
[INFO] load aud_features: torch.Size([7713, 29, 16]) [05/06 00:37:09]
Reading Training Transforms [05/06 00:37:10]
Reading Test Transforms [05/06 00:37:17]
Generating Video Transforms [05/06 00:37:20]
Reading Custom Transforms [05/06 00:37:20]
Loading Training Cameras [05/06 00:37:26]
Loading Test Cameras [05/06 00:37:26]
Loading Video Cameras [05/06 00:37:26]
Loading Custom Cameras [05/06 00:37:26]
Deformation Net Set aabb [0.75849324 0.90093476 0.45244923] [-0.78521377 -0.8166129 -0.566415 ] [05/06 00:37:26]
Voxel Plane: set aabb= Parameter containing:
tensor([[ 0.7585, 0.9009, 0.4524],
[-0.7852, -0.8166, -0.5664]]) [05/06 00:37:26]
loading model from existsmodel/point_cloud/iteration_10000 [05/06 00:37:28]
============== <scene.Scene object at 0x7800e0d15d50> [05/06 00:37:28]
------------------------------------------------- [05/06 00:37:28]
test set rendering : 702 frames [05/06 00:37:28]
------------------------------------------------- [05/06 00:37:28]
point nums: 43459 [05/06 00:37:28]
Rendering progress: 100%|██████████| 702/702 [00:34<00:00, 20.28it/s]
total frame: 702 [05/06 00:38:03]
FPS: 23.27712135731388 [05/06 00:38:03]

iterations done: begin wirte [05/06 00:39:17]

How to do inference on a new audio?

Hello, thank you very much for the great work and making it open source.

I tried the code and it was producing good results for the aud_novel.wav kept separately.

But what changes would be required if I want to run the inference on a new audio.
Just changing the code in render.py to take up new audio file does not seem to work.
It seems there are changes to be done elsewhere too.

Could you point out how can I infer the code for a new audio?

Thanks in advance.

Bug: inference custom audio

推理目标音频的时候，结果会少一部分帧，是不是有bug呢

output result of my own model is quite unstable

the result is showed below: (is there any key parameters to set and make it stable)

model_test_10000iter_renders.mov

How to begin from a previous checkpoint?

If I ran until fine iterations 7000 and then run out of memory, how can i begin from my last checkpoint?

Very bad lip sync for audios especially in Korean

Hello, thanks for providing such a great project.

FPS of your work is amazing, but seemingly the rendering outputs have bad lip sync quality. Particularly the lip sync for Korean audios is almost zero.

may_kor.mov

At least I don't think it's because of the length of the trained video - over 4 mins length which will be enough for other person dependent models.

I'd like to ask you, why is that? Or do you have any hypothesis or guess about the problem?

Best,
Junyeong Ahn

The location of trained model

Hello, thank you for providing such a great project. I finished training session as shown below:

However, I cannot find the checkpoint nowhere in my folders.

No outputfolder has been created. I will be so glad if you help me out with this!

Do you will add wav2vec and hubert?

IndexError: list index out of range

run inference

File "render.py", line 231, in
render_sets(model.extract(args), hyperparam.extract(args), args.iteration, pipeline.extract(args), args)
File "render.py", line 170, in render_sets
render_set(dataset.model_path, "train", scene.loaded_iter, scene.getTrainCameras(), gaussians, pipeline, audio_dir, batch_size)
File "render.py", line 133, in render_set
cmd = f'ffmpeg -loglevel quiet -y -i {gts_path}/gt.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {gts_path}/{model_path.split("/")[-2]}{name}{iteration}iter_gt.mov'
IndexError: list index out of range

how could I solve this?

run the code with higher resolution

I am having various problem porting your code to 1440x1440.

I am getting this issue here.
I have tried to play with params as suggested here... But the result was very bad.

question in data processing

the extracted gt img, parsing img, and torso img seem not look well.

Error when Inference with custom audio

Hello!

Having error while Inferencing with custom audio.

total frame: 449 [20/06 12:24:24]
FPS: 97.33466045638134 [20/06 12:24:24]
Traceback (most recent call last):
File "render.py", line 233, in
render_sets(model.extract(args), hyperparam.extract(args), args.iteration, pipeline.extract(args), args)
File "render.py", line 168, in render_sets
render_set(dataset.model_path, "custom", scene.loaded_iter, scene.getCustomCameras(), gaussians, pipeline, audio_dir, batch_size)
File "render.py", line 137, in render_set
cmd = f'ffmpeg -loglevel quiet -y -i {render_path}/renders.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {render_path}/{model_path.split("/")[-2]}{name}{iteration}iter_renders.mov'
IndexError: list index out of range

What may cause this error?

How to specify the audio for reasoning?

The --iterations and --coarse_iterations params to train.py don't seem to work

I tried setting it like this but it still does 10K iterations. If I go into the code and override op.iterations, I can get it to do more.
python train.py -s data/obama --model_path models/obama/ --configs arguments/64_dim_1_transformer.py --iterations 100000 --coarse_iterations 20000

Wrong lines under if statement

GaussianTalker/render.py

Lines 134 to 138 in aead7cf

    
           if name != 'custom': 
        
               cmd = f'ffmpeg -loglevel quiet -y -i {render_path}/renders.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {render_path}/{model_path.split("/")[-2]}_{name}_{iteration}iter_renders.mov' 
        
               os.system(cmd) 
        
           cmd = f'ffmpeg -loglevel quiet -y -i {gts_path}/gt.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {gts_path}/{model_path.split("/")[-2]}_{name}_{iteration}iter_gt.mov' 
        
           os.system(cmd)

Line 137,138 should be under if statement.

The lip shape and audio of the trained video are out of sync. Do the training parameters need to be adjusted?

how to Inference with target audio

thanks for your excellent work!
i have a question that how to inference with target audio，is there any command I can refer to?

how to get <custom_aud>.npy?

I have a new_for_inference.wav file for inference,but I am not sure how to get the <custom_aud>.npy file. HELP!!!

Given groups=1, weight of size [32, 29, 3], expected input[8, 1024, 2] to have 29 channels, but got 1024 channels instead

Use hubert prompts，Given groups=1, weight of size [32, 29, 3], expected input[8, 1024, 2] to have 29 channels, but got 1024 channels instead

	if name != 'custom':
	cmd = f'ffmpeg -loglevel quiet -y -i {render_path}/renders.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {render_path}/{model_path.split("/")[-2]}_{name}_{iteration}iter_renders.mov'
	os.system(cmd)
	cmd = f'ffmpeg -loglevel quiet -y -i {gts_path}/gt.mp4 -i {inf_audio_dir} -c:v copy -c:a aac {gts_path}/{model_path.split("/")[-2]}_{name}_{iteration}iter_gt.mov'
	os.system(cmd)

ku-cvlab / gaussiantalker Goto Github PK

gaussiantalker's People

Contributors

Stargazers

Watchers

Forkers

gaussiantalker's Issues

Recommend Projects

Recommend Topics

Recommend Org