mrtornado24 / fenerf Goto Github PK

[CVPR 2022] FENeRF: Face Editing in Neural Radiance Fields

License: MIT License

Python 93.40% C++ 1.65% Cuda 4.96%

fenerf's Introduction

FENeRF: Face Editing in Radiance Fields
_{Official PyTorch implementation}

FENeRF: Face Editing in Radiance Fields
Jingxiang Sun, Xuan Wang, Yong Zhang, Xiaoyu Li, Qi Zhang, Yebin Liu and Jue Wang
https://mrtornado24.github.io/FENeRF/

Abstract: Previous portrait image generation methods roughly fall into two categories: 2D GANs and 3D-aware GANs. 2D GANs can generate high fidelity portraits but with low view consistency. 3D-aware GAN methods can maintain view consistency but their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a 3D-aware generator that can produce view-consistent and locally-editable portrait images. Our method uses two decoupled latent codes to generate corresponding facial semantics and texture in a spatial aligned 3D volume with shared geometry. Benefiting from such underlying 3D representation, FENeRF can jointly render the boundary-aligned image and semantic mask and use the semantic mask to edit the 3D volume via GAN inversion. We further show such 3D representation can be learned from widely available monocular image and semantic mask pairs. Moreover, we reveal that joint learning semantics and texture helps to generate finer geometry. Our experiments demonstrate that FENeRF outperforms state-of-the-art methods in various face editing tasks.

Data Preparison

We trained our models on CelebAHQ-Mask and FFHQ:

FFHQ
CelebA

For FFHQ, we estimate the segmentation map for each portrait image by prepare_segmaps.py. We adopt the pretrained face parsing model in SofGAN and convert semantic categories to the format of CelebA. Please download pretrained model and put it into ./checkpoints.

Training a Model

python train_double_latent_semantic.py --curriculum CelebA_double_semantic_texture_embedding_256_dim_96 --output_dir training-runs/debug --num_gpus 4

To continue training from another run specify the --load_dir=path/to/directory flag.

Model Results

Rendering Images and Segmentation Maps

python render_multiview_images_double_semantic.py path/to/generator.pth --curriculum CelebA_double_semantic_texture_embedding_256_dim_96 --seeds 0 1 2 3

Rendering Videos with Disentangled Interpolation

python render_video_interpolation_semantic.py path/to/generator.pth --curriculum CelebA_double_semantic_texture_embedding_256_dim_96 --latent_type geo --seeds 0 1 2 3 --trajectory front --save_with_video

You can pass the flag --lock_view_dependence to remove view dependent effects. This can help mitigate distracting visual artifacts such as shifting eyebrows. However, locking view dependence may lower the visual quality of images (edges may be blurrier etc.)

Extracting 3D Shapes

python extract_double_semantic_shapes.py path/to/generator.pth --seed 0

Real Portrait Editing

Inversion. Given a reference portrait image and its segmentation map, run inverse_render_double_semantic.py to obtain its latent codes stored as freq_phase_offset_$exp_name.pth:

python inverse_render_double_semantic.py exp_name path/to/generator.pth --image_path data/examples/image.jpg --seg_path data/examples/mask.png --background_mask --image_size 128 --latent_normalize --lambda_seg 1. --lambda_img 0.2 --lambda_percept 1. --lock_view_dependence True --recon

Editing Shape. Edit the segmentation map using our UI platform: python ./Painter/run_UI.py. You can load a segmentation map and edit it. Press the 'Save Img' button after editing. Then load the latent code obtained in Step 1 and run inversion again:


python inverse_render_double_semantic.py exp_name path/to/generator.pth --image_path data/examples/image.jpg --seg_path data/examples/mask_edit.png --background_mask --image_size 128 --latent_normalize --lambda_seg 1. --lambda_img 0 --lambda_percept 0 --lock_view_dependence True --recon --load_checkpoint True --checkpoint_path freq_phase_offset_$exp_name.pth

After that you can find a free-view rendering video of the edited portrait in the saving directory.

Editing global appearance. Given the inverted latent codes of the Portrait A and B, you can transfer B's appearance to A by swapping w_app_phase_shifts, w_app_frequency_offsets, w_app_phase_shift_offsets of A with B.

Pretrained Models

We provide pretrained models for FENeRF w/ latent grid and w/o latent grid.

FENeRF w/o latent grid: https://drive.google.com/file/d/1RZvQ7a0sC6k0_N85by71Tepchk_wBBQF/view?usp=sharing

FENeRF w/ latent grid: https://drive.google.com/file/d/1ObhhxPTeuTBOOJwxOL3_xzwBFJIh3Jqg/view?usp=sharing

Citation

@InProceedings{Sun_2022_CVPR,
    author    = {Sun, Jingxiang and Wang, Xuan and Zhang, Yong and Li, Xiaoyu and Zhang, Qi and Liu, Yebin and Wang, Jue},
    title     = {FENeRF: Face Editing in Neural Radiance Fields},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {7672-7682}
}

Acknowledgments

pi-GAN from https://github.com/marcoamonteiro/pi-GAN
SofGAN from https://github.com/apchenstu/sofgan

fenerf's People

Contributors

Stargazers

Watchers

Forkers

luh1124 tzuj6 41xu peterzs d-bohn janfschr jjandnn xgenietony kkodoo cv-synthesis jackzhousz liulin6638 opptimus stickice mornydew jskim-research qsc-1999 yikang-he jindaznb

fenerf's Issues

When will the code be released?

Thanks for your impressive work! Will you release the code soon?

poor performance of the Inversion

请问在用新的portrait进行Inversion的时候，渲染出来的结果很差：
python inverse_render_double_semantic.py exp_name 、/checkpoint/315000_generator.pth --image_path /dataset/cnn/0/com_imgs/0.jpg --seg_path /dataset/cnn/0/parsing_celebahq/masks1024x1024/0.jpg --background_mask --image_size 128 --latent_normalize --lambda_seg 1. --lambda_img 0.2 --lambda_percept 1. --lock_view_dependence True --recon
其中的image和mask如下图所示：

但是渲染出来的结果

是我哪里搞错了吗？

run_UI.py could not work

when I try to run run_UI.py，something wrong happens.

QObject::moveToThread: Current thread (0x5555f1280ce0) is not the object's thread (0x5555f3a25330).
Cannot move to target thread (0x5555f1280ce0)
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/junshen/anaconda3/envs/fenerf/lib/python3.7/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.

Models trained on FFHQ

Are the provided models trained on CelebA?
If so, will you release the models trained on FFHQ?

Thank you for your time. Very amazing work !!

How to reproduce "Style mixing of latent space"

On your project website, you show a demo of "Style Mixing of latent space", where you provide source 1 and source 2 image (i.e. the Figure 7 of your paper).

Can you please guide me on how I can use your code to reproduce that, by providing my own images?

About Channels_seg=18

Could you please tell me whether the CHANNELS_SEG=18 in the training code is determined according to the number of categories according to the mask diagram or according to other bases？Thanks for your answer！

What is the dimension of the learnable feature grid?

Thanks for your excellent work! Curious about the details of the learnable feature grid (e_coord), what is the dimension of it and do you keep it the same dimension when rendering images of different resolutions?

Looking forward to your reply.

Curriculum for FFHQ

Hi, great work? Could you specify the curriculum for FFHQ or is it the same as CelebA-HQ? Thank you

The pretrained model can't be downloaded.

Some problems during training

Why did this happen to me shortly after training

could you upload the discriminator.pth ?

Hi!
I really thank you for sharing code and pth files.
However, pretrained discriminator files are not exist.
Do you have plane to upload these files?

Code release?

Hi,

Do you plan to release the code recently?

Best

Code

You have done a very meaningful work, I want to learn. Can you provide the source code?

Inversion的时候如果是输入同一个人多个视角的图片？

请问做inverse latent code的时候输入的都是一张正面人脸图像嘛？
如果说我有这个人的几张不同角度的图片，做inverse的时候得到的latent code、重建出来的结果会不会更好？？
如果是的话请问在inverse_render_double_semantic里边具体要怎么操作呢？？

something about FID

Thank you for your great work, I have a few questions about FID:
I used eval_metrics.py to calculate FID on the pre-trained model, but something wrong, then I use pytorch-fid to calculate FID between CelebA-Mask dataset and images generated by pre-trained model, but the value, which is 80, looks wrong. Could you please provide suggestions about the calculating of FID?

nothing happens

when i run the script to invert an image nothing seems to happen

!python inverse_render_double_semantic.py exp_name /content/FENeRF/wo_latent_grid/200000_generator.pth --image_path data/examples/image.jpg --seg_path data/examples/mask_edit.png --background_mask --image_size 128 --latent_normalize --lambda_seg 1. --lambda_img 0 --lambda_percept 0 --lock_view_dependence True --recon --load_checkpoint True --checkpoint_path freq_phase_offset_$exp_name.pth

Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.0], spatial [off]
Loading model from: /usr/local/lib/python3.7/dist-packages/lpips/weights/v0.0/vgg.pth
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  "Argument interpolation should be of type InterpolationMode instead of int. "

colab inference

amazing work, please create a colab for inference

TypeError: can't multiply sequence by non-int of type 'float'

D:\anaconda\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "D:/work/vgan/FENeRF-main/inverse_render_double_semantic.py", line 582, in
checkpoint_path = run_inverse_render(opt, opt.image_path, opt.seg_path)
File "D:/work/vgan/FENeRF-main/inverse_render_double_semantic.py", line 404, in run_inverse_render
loss += opt.lambda_norm * norm_loss
TypeError: can't multiply sequence by non-int of type 'float'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/x_fahkh/.conda/envs/gmpi/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/proj/cvl/users/x_fahkh/mn/debug-dmpi/gmpi/prepsem/Bisnet.py", line 23, in forward
    x = F.relu(self.bn(x))
  File "/home/x_fahkh/.conda/envs/gmpi/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/x_fahkh/.conda/envs/gmpi/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward
    self.eps,
  File "/home/x_fahkh/.conda/envs/gmpi/lib/python3.7/site-packages/torch/nn/functional.py", line 2279, in batch_norm
    _verify_batch_size(input.size())
  File "/home/x_fahkh/.conda/envs/gmpi/lib/python3.7/site-packages/torch/nn/functional.py", line 2247, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 128, 1, 1])

When I tried to provide the segmentation Map for FFHQ I am getting this error.

Can you Please have a look at this