compvis / net2net Goto Github PK

View Code? Open in Web Editor NEW

219.0 14.0 20.0 77 MB

Network-to-Network Translation with Conditional Invertible Neural Networks

Home Page: https://compvis.github.io/net2net/

Python 100.00%

inn pytorch lightning autoencoders gans normalizing-flows generative-model streamlit pytorch-lightning

net2net's Introduction

Net2Net

Code accompanying the NeurIPS 2020 oral paper

Network-to-Network Translation with Conditional Invertible Neural Networks
Robin Rombach*, Patrick Esser*, Björn Ommer
* equal contribution

tl;dr Our approach distills the residual information of one model with respect to another's and thereby enables translation between fixed off-the-shelf expert models such as BERT and BigGAN without having to modify or finetune them.

arXiv | BibTeX | Project Page

News Dec 19th, 2020: added SBERT-to-BigGAN, SBERT-to-BigBiGAN and SBERT-to-AE (COCO)

Requirements

A suitable conda environment named net2net can be created and activated with:

conda env create -f environment.yaml
conda activate net2net

Datasets

CelebA: Create a symlink 'data/CelebA' pointing to a folder which contains the following files:
```
.
├── identity_CelebA.txt
├── img_align_celeba
├── list_attr_celeba.txt
└── list_eval_partition.txt
```
These files can be obtained here.
CelebA-HQ: Create a symlink data/celebahq pointing to a folder containing the .npy files of CelebA-HQ (instructions to obtain them can be found in the PGGAN repository).
FFHQ: Create a symlink data/ffhq pointing to the images1024x1024 folder obtained from the FFHQ repository.
Anime Faces: First download the face images from the Anime Crop dataset and then apply the preprocessing of FFHQ to those images. We only keep images where the underlying dlib face recognition model recognizes a face. Finally, create a symlink data/anime which contains the processed anime face images.
Oil Portraits: Download here. Unpack the content and place the files in data/portraits. It consists of 18k oil portraits, which were obtained by running dlib on a subset of the WikiArt dataset dataset, kindly provided by A Style-Aware Content Loss for Real-time HD Style Transfer.
COCO: Create a symlink data/coco containing the images from the 2017 split in train2017 and val2017, and their annotations in annotations. Files can be obtained from the COCO webpage.

ML4Creativity Demo

We include a streamlit demo, which utilizes our approach to demonstrate biases of datasets and their creative applications. More information can be found in our paper A Note on Data Biases in Generative Models from the Machine Learning for Creativity and Design at NeurIPS 2020. Download the models from

and place them into logs. Run the demo with

streamlit run ml4cad.py

Training

Our code uses Pytorch-Lightning and thus natively supports things like 16-bit precision, multi-GPU training and gradient accumulation. Training details for any model need to be specified in a dedicated .yaml file. In general, such a config file is structured as follows:

model:
  base_learning_rate: 4.5e-6
  target: <path/to/lightning/module>
  params:
    ...
data:
  target: translation.DataModuleFromConfig
  params:
    batch_size: ...
    num_workers: ...
    train:
      target: <path/to/train/dataset>
      params:
        ...
    validation:
      target: <path/to/validation/dataset>
      params:
        ...

Any Pytorch-Lightning model specified under model.target is then trained on the specified data by running the command:

python translation.py --base <path/to/yaml> -t --gpus 0,

All available Pytorch-Lightning trainer arguments can be added via the command line, e.g. run

python translation.py --base <path/to/yaml> -t --gpus 0,1,2,3 --precision 16 --accumulate_grad_batches 2

to train a model on 4 GPUs using 16-bit precision and a 2-step gradient accumulation. More details are provided in the examples below.

Training a cINN

Training a cINN for network-to-network translation usually utilizes the Lighnting Module net2net.models.flows.flow.Net2NetFlow and makes a few further assumptions on the configuration file and model interface:

model:
  base_learning_rate: 4.5e-6
  target: net2net.models.flows.flow.Net2NetFlow
  params:
    flow_config:
      target: <path/to/cinn>
      params:
        ...

    cond_stage_config:
      target: <path/to/network1>
      params:
        ...

    first_stage_config:
      target: <path/to/network2>
      params:
        ...

Here, the entries under flow_config specifies the architecture and parameters of the conditional INN; cond_stage_config specifies the first network whose representation is to be translated into another network specified by first_stage_config. Our model net2net.models.flows.flow.Net2NetFlow expects that the first
network has a .encode() method which produces the representation of interest, while the second network should have an encode() and a decode() method, such that both of them applied sequentially produce the networks output. This allows for a modular combination of arbitrary models of interest. For more details, see the examples below.

Training a cINN - Superresolution

Training details for a cINN to concatenate two autoencoders from different image scales for stochastic superresolution are specified in configs/translation/faces32-to-256.yaml.

To train a model for translating from 32 x 32 images to 256 x 256 images on GPU 0, run

python translation.py --base configs/translation/faces32-to-faces256.yaml -t --gpus 0,

and specify any additional training commands as described above. Note that this setup requires two pretrained autoencoder models, one on 32 x 32 images and the other on 256 x 256. If you want to train them yourself on a combination of FFHQ and CelebA-HQ, run

python translation.py --base configs/autoencoder/faces32.yaml -t --gpus <n>,

for the 32 x 32 images; and

python translation.py --base configs/autoencoder/faces256.yaml -t --gpus <n>,

for the model on 256 x 256 images. After training, adopt the corresponding model paths in configs/translation/faces32-to-faces256.yaml. Additionally, we provide weights of pretrained autoencoders for both settings: Weights 32x32; Weights256x256. To run the training as described above, put them into logs/2020-10-16T17-11-42_FacesFQ32x32/checkpoints/last.ckptand logs/2020-09-16T16-23-39_FacesXL256z128/checkpoints/last.ckpt, respectively.

Training a cINN - Unpaired Translation

All training scenarios for unpaired translation are specified in the configs in configs/creativity. We provide code and pretrained autoencoder models for three different translation tasks:

Anime ⟷ Photography; see configs/creativity/anime_photography_256.yaml. Download autoencoder checkpoint (Download Anime+Photography) and place into logs/2020-09-30T21-40-22_AnimeAndFHQ/checkpoints/epoch=000007.ckpt.
Oil-Portrait ⟷ Photography; see configs/creativity/portraits_photography_256.yaml Download autoencoder checkpoint (Download Portrait+Photography) and place into logs/2020-09-29T23-47-10_PortraitsAndFFHQ/checkpoints/epoch=000004.ckpt.
FFHQ ⟷ CelebA-HQ ⟷ CelebA; see configs/creativity/celeba_celebahq_ffhq_256.yaml Download autoencoder checkpoint (Download FFHQ+CelebAHQ+CelebA) and place into logs/2020-09-16T16-23-39_FacesXL256z128/checkpoints/last.ckpt. Note that this is the same autoencoder checkpoint as for the stochastic superresolution experiment.

To train a cINN on one of these unpaired transfer tasks using the first GPU, simply run

python translation.py --base configs/creativity/<task-of-interest>.yaml -t --gpus 0,

where <task-of-interest>.yaml is one of portraits_photography_256.yaml, celeba_celebahq_ffhq_256.yaml or anime_photography_256.yaml. Providing additional arguments to the pytorch-lightning trainer object is also possible as described above.

In our framework, unpaired translation between domains is formulated as a translation between expert 1, a model which can infer the domain a given image belongs to, and expert 2, a model which can synthesize images of each domain. In the examples provided, we assume that the domain label comes with the dataset and provide the net2net.modules.labels.model.Labelator module, which simply returns a one hot encoding of this label. However, one could also use a classification model which infers the domain label from the image itself. For expert 2, our examples use an autoencoder trained jointly on all domains, which is easily achieved by concatenating datasets together. The provided net2net.data.base.ConcatDatasetWithIndex concatenates datasets and returns the corresponding dataset label for each example, which can then be used by the Labelator class for the translation. The training configurations for the autoencoders used in the creativity experiments are included in configs/autoencoder/anime_photography_256.yaml, configs/autoencoder/celeba_celebahq_ffhq_256.yaml and configs/autoencoder/portraits_photography_256.yaml.

Unpaired Translation on Custom Datasets

Create pytorch datasets for each of your domains, create a concatenated dataset with ConcatDatasetWithIndex (follow the example in net2net.data.faces.CCFQTrain), train an autoencoder on the concatenated dataset (adjust the data section in configs/autoencoder/celeba_celebahq_ffhq_256.yaml) and finally train a net2net translation model between a Labelator and your autoencoder (adjust the sections data and first_stage_config in configs/creativity/celeba_celebahq_ffhq_256.yaml). You can then also add your new model to the available modes in the ml4cad.py demo to visualize the results.

Training a cINN - Text-to-Image

We provide code to obtain a text-to-image model by translating between a text model (SBERT) and an image decoder. To show the flexibility of our approach, we include code for three different decoders: BigGAN, as described in the paper, BigBiGAN, which is only available as a tensorflow model and thus nicely shows how our approach can work with black-box experts, and an autoencoder.

SBERT-to-BigGAN

Train with

python translation.py --base configs/translation/sbert-to-biggan256.yaml -t --gpus 0,

When running it for the first time, the required models will be downloaded automatically.

SBERT-to-BigBiGAN

Since BigBiGAN is only available on tensorflow-hub, this example has an additional dependency on tensorflow. A suitable environment is provided in env_bigbigan.yaml, and you will need COCO for training. You can then start training with

python translation.py --base configs/translation/sbert-to-bigbigan.yaml -t --gpus 0,

Note that the BigBiGAN class is just a naive wrapper, which converts pytorch tensors to numpy arrays, feeds them to the tensorflow graph and again converts the result to pytorch tensors. It does not require gradients of the expert model and serves as a good example on how to use black-box experts.

SBERT-to-AE

Similarly to the other examples, you can also train your own autoencoder on COCO with

python translation.py --base configs/autoencoder/coco256.yaml -t --gpus 0,

or download a pre-trained one, and translate to it by running

python translation.py --base configs/translation/sbert-to-ae-coco256.yaml -t --gpus 0,

Shout-outs

Thanks to everyone who makes their code and models available.

BigGAN code and weights from: LoreGoetschalckx/GANalyze
Code and weights for the captioning model: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

BibTeX

@misc{rombach2020networktonetwork,
      title={Network-to-Network Translation with Conditional Invertible Neural Networks},
      author={Robin Rombach and Patrick Esser and Björn Ommer},
      year={2020},
      eprint={2005.13580},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{esser2020note,
      title={A Note on Data Biases in Generative Models}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.02516},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

net2net's People

Contributors

Stargazers

Watchers

net2net's Issues

Execution error

Thank you for your surprising work.

During the SBERT-to-BigGAN, SBERT-to-BigBiGAN and SBERT-to-AE (COCO) execution, I received the following error:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "translation.py", line 531, in
melk()
NameError: name 'melk' is not defined

I'd appreciate it if you could check.

Seeking Advice on Designing an Invertible Neural Network for Fission

First and foremost, I would like to express my sincere gratitude and respect for your work on this repository. The progress and innovations shared here have been immensely insightful and valuable to the community.

I am currently exploring the concept of fission in invertible neural networks, where a single latent representation 'x' can be decomposed into two distinct components 'y' and 'z'. My objective is to parameterize 'z' with a tractable distribution while ensuring that the combination of 'y' and 'z' can be accurately recombined to reconstruct 'x' using the reverse of the model.

Given your expertise in this field, I would greatly appreciate any guidance or suggestions you could provide on the following aspects:

Design Strategies: What are the best practices or strategies in designing such an invertible network that can effectively decompose and recombine representations?
Parameterization of 'z': How can 'z' be parameterized with a tractable distribution, and what are the implications of different distribution choices?
Ensuring Reversibility: What are the key considerations to ensure that the network remains reversible and accurate in the reconstruction phase?

Any insights, references, or examples you could share would be extremely helpful.

Thank you for your time and for the impactful contributions you've made to the field.

Best regards

GPU memory

Hi,

Thank you for your amazing work!
I am trying to replicate your results and training using
python translation.py --base configs/translation/sbert-to-biggan256.yaml -t --gpus 0,
I was wondering what gpu was used to train your model and what batch size did you use? I am only able to fit batch_size=2 on a TITAN XP, the default batch_size in the config was 16 but I am not able to launch it using 4 TITANs XP without running into memory issues. Is the BigGan or Sentence Transformer fine-tuned during the training (from your paper it seems like it was not), do you have any insight on what am I missing?

Thank you in advance

How to prepare the data (e.g. CelebA HQ)

Hi,

Thanks for the interesting work. I am trying to reproduce the results bu running faces32-to-faces256.yaml. However, I am confused about how to prepare the corresponding dataset.

I have downloaded celebahq dataset from https://drive.google.com/drive/folders/11Vz0fqHS2rXDb5pprgTjpD7S2BAJhi1P, and put them into the data/celebahq folder. And there are 4 subfolders corresponding to different resolutions: 128 x 128, 256 x 256, 512 x 512 and 1024 x 1024, the images are in .jpg format.

My questions are:

Which resolution should we use?
Currently the images are in .jpg format. I am trying to convert them to .npy. However, it looks like I need to convert them following the file names provided in celebahqtrain.txt. Are there any guidelines for that?
Thanks for your time!

readme have some error

In readme about how to train unpaired translations task ;
you said :
python translation.py --base configs/translation/<task-of-interest>.yaml -t --gpus 0,

but in translation folder it has only faces32-to-faces256.yaml not any other, so I think it maybe ：

python translation.py --base configs/creativty/<task-of-interest>.yaml -t --gpus 0,

how to apply on new datasets

Hi，if I have a new dataset with source domain x and target domain y ， how I train the model like creativity/portrait-to-photo

as your paper said， it should be train two autoencoder （Resnet101-as encoder， bigGAN as decoder）。

use the source domain x data train an autoencoder and got encoder-x， decoder-x
use the target domain y data train an autoencoder and got encoder-y， decoder-y
use the pretrained model （encoder-x and decoder-y） train an cINN ， it will learn translation z_encx to z_ency ？

is right ？？

and would you provide an tutorial for how to apply on new datasets？ thank you

Runtime Error

Hi, thanks for your interesting work. When i run the anime to photography task: python translation.py --base configs/creativity/anime_photography_256.yaml -t --gpus0, i receive the following error:

Traceback (most recent call last):
File "translation.py", line 522, in
trainer . fit(model, data）
File " /home /projects /miniconda3/ envs/net2net/lib/python3.7/site- packages/pytorch lightning/ trainer /states.py", line 48, in wrapped_ fn
result = fn(self , *args, **kwargs
File " /home/projects/miniconda3/envs /net2net/lib/python3.7/site-packages/pytorch_ lightning/ trainer/trainer .py", line 1058, in fit
results = self . accelerator_ backend. spawn_ ddp_ children( model )
File "/home/projects/miniconda3/envs /net2net/lib/python3 .7/site - packages /pytorch_ lightning/ accelerators/ddp_ backend.py", line 123, in spawn_ ddp_ childrenresults = self .ddp_ train(local_ rank, mp_ queue=None, model=model, is_ master=True )
File " /home / projects /miniconda3/envs /net2net/ lib/ python3.7/site- packages / pytorch_ lightning/ accelerators/ddp_ backend.py", line 224, in ddp_ train
results = self . trainer .run_ pretrain_ routine( model )
File " /home/projects/miniconda3/ envs /ne t2net/ lib/py thon3.7/site - packages/py torch_ lightning/trainer/trainer .py", line 1224, in run_ pretrain_ routineself._ run_ sanity check(ref_ model,model)
File " /home/projects/miniconda3/envs /net2net/lib/python3.7/site - packages/pytorch_ lightning/trainer/trainer .py", line 1257, in run_ sanity check
eval_ results = self._ evaluate(model, self .val_ dataloaders, max_ batches, False )
File " /home / projects /miniconda3 /envs /net2net/lib/python3.7/site- packages/ pytorch_ lightning/trainer /evaluation_ loop.py", line 369, in_ evaluate
self . on_ validation_ batch_ end( batch, batch_ idx, dataloader_ idx
File " /home /projects/miniconda3/envs /net2net/lib/py thon3.7/site packages/pytorch lightning/trainer/callback_ hook.py", line 156, in on_ validation_ batch_ endcallback. on validation batch end(self, self . getdell0batch, batch_ idx, dataloader idx)
File " /home /projects/net2net/translation.py", line 297, in on_ validation_ batch_ end
self.log_ img(pl_ module, batch, batch_ idx, split="val"
File " /home /projects /net2net/ translation.py", line 265, in log_ img
images = pl_ module. log_ images(batch, split=split)
File " /home /projects/miniconda3/envs /net2net/lib/python3. 7/site- packages/ torch/ autograd/grad_ mode.py", line 28, in decorate_ context
return func(*args, **kwargs)
File " /home /projects/net2net/net2net/models/flows/flow.py", line 157, in log_ images
log[" conditioning"] = log_ txt_ as_ img((w,h), xc)
File " /home /projects/ net2net /net2net/modules/util.py", line 18, in log_ txt_ as_ img
lines = "In" . join(xc[bi][start:start+nc] for start in range(0, len(xc[bi]), nc))
File " /home /projects /miniconda3/envs /net2nelib/python3.7/site-ackages/torch/_ tensor.py", line 589, in_ len__
raise TypeError("len() of a 0-d tensor "
TypeError: len() of a 0-d tensor

I don't know what causes this error. I would greatly appreciate it if you could help me find out the problem. Thanks for your time.

what is dataset about Oil-Portrait ⟷ Photography I need

thank you for shareing your code and pretrained model.

If I want to retrain the unpaired - traslation task Oil-Portrait ⟷ Photography what is dataset I need ?

such as , how many Oil portrait images? and how many real human image photograph?

I want to train a pretrained model by myself, thank you

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.