Giter Site home page Giter Site logo

padinv's Introduction

PadInv - High-fidelity GAN Inversion with Padding Space

High-fidelity GAN Inversion with Padding Space
Qingyan Bai*, Yinghao Xu*, Jiapeng Zhu, Weihao Xia, Yujiu Yang, Yujun Shen
European Conference on Computer Vision (ECCV) 2022

image Figure: Our encoder produces instance-aware coefficients to replace the fixed padding used in the generator. Such a design improves GAN inversion with better spatial details.

[Paper] [Project Page] [ArXiv Paper with Supp] [ECVA Link]

In this work, we propose to involve the padding space of the generator to complement the native latent space, facilitating high-fidelity GAN inversion. Concretely, we replace the constant padding (e.g., usually zeros) used in convolution layers with some instance-aware coefficients. In this way, the inductive bias assumed in the pre-trained model can be appropriately adapted to fit each individual image. We demonstrate that such a space extension allows a more flexible image manipulation, such as the separate control of face contour and facial details, and enables a novel editing manner where users can customize their own manipulations highly efficiently.

Qualitative Results

From top to bottom: (a) high-fidelity GAN inversion with spatial details, (b) face blending with contour from one image and details from another, and (c) customized manipulations with one image pair.

image

Additional inversion results.

image

Additional face blending results.

image

Additional customized editing results.

image

Preparation

To train or test PadInv, preparing the data and pre-trained GAN checkpoints is needed at first.

For data, please download FFHQ and CelebA-HQ-testset for face domain, and LSUN Church and Bedroom for indoor and outdoor scene, respectively.

For pre-trained GAN checkpoints, you can download them here: StyleGAN2-FFHQ, StyleGAN2-Church, StyleGAN2-Bedroom.

Training

Training Scripts

Please use the following scripts to train PadInv corresponding to various domains.

# Face
bash scripts/encoder_scipts/encoder_stylegan2_ffhq_train.sh 8 your_training_set_path your_test_set_path your_gan_ckp_path --job_name=your_job_name 
# Church
bash scripts/encoder_scipts/encoder_stylegan2_church_train.sh 8 your_training_set_path your_test_set_path your_gan_ckp_path --job_name=your_job_name 
# Bedroom
bash scripts/encoder_scipts/encoder_stylegan2_bedroom_train.sh 8 your_training_set_path your_test_set_path your_gan_ckp_path --job_name=your_job_name 

In scripts above, '8' indicates the gpu amount for training. 'your_training_set_path' and 'your_test_set_path' indicate the dataset paths (e.g. data/ffhq.zip, data/CelebA-HQ-test, or data/bedroom_train_lmdb). For training and testing on LSUN, we support reading the LMDB directory thanks to Hammer. 'your_gan_ckp_path' indicates the path of the pre-trained GAN checkpoint to be inverted. 'your_job_name' indicates the name of this training job and the name of the job working directory.

Results

  1. Testing metric results and visualization results of inversion can be found in work_dir/your_job_name/results/.
  2. The training log is saved at work_dir/your_job_name/log.txt.
  3. Checkpoints can be found in work_dir/your_job_name/checkpoints/. Note that we save the checkpoints corresponding to the best metrics and the latest ones.

BibTeX

If you find our work or code helpful for your research, please consider to cite:

@inproceedings{bai2022high,
  title={High-fidelity GAN inversion with padding space},
  author={Bai, Qingyan and Xu, Yinghao and Zhu, Jiapeng and Xia, Weihao and Yang, Yujiu and Shen, Yujun},
  booktitle={European Conference on Computer Vision},
  pages={36--53},
  year={2022},
  organization={Springer}
}

Acknowledgement

Thanks to Hammer, StyleGAN2, and Pixel2Style2Pixel for sharing the code.

padinv's People

Contributors

ezioby avatar justimyhxu avatar shenyujun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

padinv's Issues

padding map visualization

Hi thank you for your interesting work !
Have you visualized the padding map? I wonder how the padding map is generated, since it is learned unsupervisedly and used as kind of "shell" w.r.t. generator feature map.

Best,

unbalanced gpu memory usage

Thank you for your work. I train this code using 8 gpus. But I get very unbalanced gpu memory usage. Can you give me some advice.

图片

May you please provide pre-trained encoders for your method?

Hi,

Is it possible to share pre-trained encoders for your method that can directly be used to invert images? I see in the code that you provided pre-trained generators that are to be used to train the encoder for the inversion, but it would be great if you can provide trained encoders as well that can directly be used for inversion.

Thanks!

Training dataset specific encoders

Hi I noticed that there are pre-trained encoder models for specific datasets like churches, faces etc.

How do I go about training my own encoder that's specific to my own dataset? Say a dataset of website landing pages, or keyboards? I already have a pre-trained StyleGAN2 generator model for my custom dataset.

Do I need to follow a specific methodology?

Any advice will be highly appreciated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.