siyuhuang / quantart Goto Github PK

View Code? Open in Web Editor NEW

93.0 3.0 6.0 2.22 MB

Official PyTorch implementation of QuantArt (CVPR2023)

License: MIT License

Python 99.84% Shell 0.16%

generative-model image-style-transfer vector-quantization

quantart's Introduction

QuantArt

Official PyTorch implementation of the paper:

QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
Siyu Huang^* (Harvard), Jie An^* (Rochester), Donglai Wei (BC), Jiebo Luo (Rochester), Hanspeter Pfister (Harvard)
CVPR 2023

We devise a new style transfer framework called QuantArt for high visual-fidelity stylization. The core idea is to push latent representation of generated artwork toward centroids of real artwork distribution with vector quantization. QuantArt achieves decent performance for various image style transfer tasks.

Dependencies

python=3.8.5
pytorch=1.7.0
pytorch-lightning=1.0.8
cuda=10.2

We recommend to use conda to create a new environment with all dependencies installed.

conda env create -f environment.yaml
conda activate quantart

Quick Example of Landscape Style Transfer

Download pre-trained landscape2art model and put it under logs/. Run

bash test.sh

The stylized landscape images (from imgs/) will be saved in logs/.

Datasets and Pre-trained Models

Stage-1: The datasets and pre-trained models for codebook pretraining are as follows:

Dataset	Pre-trained Model
MS_COCO	vqgan_imagenet_f16_1024.ckpt
WikiArt	vqgan_wikiart_f16_1024.ckpt
LandscapesHQ	vqgan_landscape_f16_1024.ckpt
FFHQ	vqgan_faceshq_f16_1024.ckpt
Metfaces	vqgan_metfaces_f16_1024.ckpt

Stage-2: The datasets and pre-trained models for style transfer experiments are as follows:

Task	Pre-trained Model	Content	Style
photo->artwork	coco2art	MS_COCO	WikiArt
landscape->artwork	landscape2art	LandscapesHQ	WikiArt
landscape->artwork (non-VQ)	landscape2art_continuous	LandscapesHQ	WikiArt
face->artwork	face2art	FFHQ	Metfaces
artwork->artwork	art2art	WikiArt	WikiArt
photo->photo	coco2coco	MS_COCO	MS_COCO
landscape->landscape	landscape2landscape	LandscapesHQ	LandscapesHQ

Testing

Follow Datasets and Pre-trained Models to download more datasets and pretrained models. For instance for landscape-to-artwork style transfer model, the folder structure should be

QuantArt
├── configs
├── datasets
│   ├── lhq_1024_jpg
│   │   ├── lhq_1024_jpg
│   │   │   ├── 0000000.jpg
│   │   │   ├── 0000001.jpg
│   │   │   ├── 0000002.jpg
│   │   │   ├── ...
│   ├── painter-by-numbers
│   │   ├── train
│   │   │   ├── 100001.jpg
│   │   │   ├── 100002.jpg
│   │   │   ├── 100003.jpg
│   │   │   ├── ...
│   │   ├── test
│   │   │   ├── 0.jpg
│   │   │   ├── 100000.jpg
│   │   │   ├── 100004.jpg
│   │   │   ├── ...
├── logs
│   ├── landscape2art
│   │   ├── checkpoints
│   │   ├── configs
├── taming
├── environment.yaml
├── main.py
├── train.sh
└── test.sh

Run the following command to test the pre-trained model on the testing dataset:

python -u main.py --base logs/landscape2art/configs/test.yaml -n landscape2art -t False --gpus 0,

--base: path for the config file.
-n: result folder under logs/.
-t: is training.
--gpus: GPUs used.

Training

Stage-1: Prepare WikiArt dataset as above. Download file lists painter-by-numbers-train.txt and painter-by-numbers-test.txt, put them under datasets/. Run the following command to train a Stage-1 model (i.e., an autoencoder and a codebook). Four GPUs are recommended but not necessary.

python -u main.py --base configs/vqgan_wikiart.yaml -t True --gpus 0,1,2,3

Two separate Stage-1 models are required for content and style datasets, respectively.

Stage-2: Run bash train.sh or the following command to train a photo-to-artwork model

python -u main.py --base configs/coco2art.yaml -t True --gpus 0,

--base: path for the config file.
-n: result folder under logs/.
-t: is training.
--gpus: GPUs used.
--resume_from_checkpoint: resume training from a checkpoint.

More training configs of Stage-2 models can be found in configs/.

Custom Dataset

Unpaired data: To test unpaired data, follow comments in configs/custom_unpaired.yaml to specify model checkpoints and data paths. Then run

python -u main.py --base configs/custom_unpaired.yaml -n custom_unpaired -t False --gpus 0,

Paired data: To test paired data, the corresponding content and style images (in two folders) should have the same file names. Follow comments in configs/custom_paired.yaml to specify model checkpoints and data paths, then run

python -u main.py --base configs/custom_paired.yaml -n custom_paired -t False --gpus 0,

Citation

@inproceedings{huang2023quantart,
    title={QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity},
    author={Siyu Huang and Jie An and Donglai Wei and Jiebo Luo and Hanspeter Pfister},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    month={June},
    year={2023}
}

Acknowledgement

This repository is heavily built upon the amazing VQGAN.

Contact

Siyu Huang ([email protected]).

quantart's People

Contributors

Stargazers

Watchers

Forkers

firatozdemir cansincomert jingzhut supercluster-hercules 1234567yang sunpro108

quantart's Issues

Missing file

Hello, author! This is an excellent work. I attempted to reproduce the code but received an error indicating that the "datasets/painter-by-numbers/train_info.csv" file is missing.

codebook visualization

Hello, based on the information you provided, I successfully replicated the model, and the results are excellent! In your article, you mentioned the function of the codebook 'to learn to cluster the artwork distribution in the representation space, where the centroids of all clusters form an artwork codebook Zart.' I'm very interested in knowing how to visually demonstrate the clustering effect on the dataset

Size of input and output images

Does the size of the input and output images have to be fixed, or can the original content image size be maintained?

some questions about training stage

Hi!

Thanks for great paper and your implementation.

I want to try reproduce your code and train a custom model.

First, on the training stage 1, 'vqgan_wikiart.yaml' uses the pretrained model 'vqgan_imagenet_f16_1024.ckpt' and wikiart dataset for training wikiart style auto-encoder. Does it mean fine-tuning the imagenet auto-encoder with the wikiart(or any) dataset?

Moreover, on the training stage 2, I cannot find style-visual fidelity trade-off parameter(as described figure 3. of paper) alpha and beta on your code.
I cannot find the upper side of (c) Inference stage of figure 3, therefore I think it needs an implementation, or you're only considering visual fidelity.
If I'm wrong, I'd appreciate if you could tell me where that parameters are or your opinion.

Minimal Inference Test

Hi there,

I'm super exited to try this out, but from what I can tell you need to download one or two large datasets just to test inference. If this is the case, it would be great to have a simple inference test that only requires downloading the minimum required number of files.

Thanks!
Jonah

Missing instructions and code (?)

I experience several issues, but I'll list a few here.

creating conda environment: pip packages within environment.yaml includes - -e . but setup.py is missing in root dir.
Under Testing, text says Follow Datasets and Pre-trained Models to download more datasets and pretrained models. For instance for photo-to-artwork style transfer model, the folder structure should be, however, I believe it should be landscape-to-artwork by the looks of the folder structure. Then it also becomes clearer that one needs to download lhq_1024_jpg and wikiart for testing.
inside logs/landscape2art/configs/test_paired.yaml for the sample test run, taming.data.paired_image.PairedImageVal is pointed, however, this class does not exist. I attempted changing taming.data.paired_image.PairedImageVal to taming.data.paired_image.PairedImageTest, which then runs, but I get corrupted results, see an example below:

Can you have a look to see if you can run examples on a fresh clone on your side?
Thanks for the great work and making code publicly available!

would you provide art-to-photo pretrained model?

Hello, could you also provide pretrained model that can transfer art works to real photos?

Some issues came up while generating the image.

When I was running the Quick Example of Landscape Style Transfer section, I found that I couldn't generate the correct image. Since it's my first time dealing with this, I couldn't resolve the issue. If you have the time, I would greatly appreciate it if you could patiently explain it to me. Thank you very much.

Style fidelity

The problem about training setting

Hello, author! This is an excellent work.
Regarding the training section in your documentation, you mentioned, 'Four GPUs are recommended but not necessary.' I would like to inquire whether a single Nvidia RTX 3090 is sufficient for the complete training process?

推理耗时

RTX4090 推理4张图片需要十几分钟，是否正常？

(quantart)% bash test.sh
Running on GPUs 0,
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
VQLPIPSWithDiscriminator running with hinge loss.
Restored from logs/landscape2art/checkpoints/last.ckpt
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
accumulate_grad_batches = 1
Setting learning rate to 4.50e-06 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 1 (batchsize) * 4.50e-06 (base_lr)
/home/kas/.conda/envs/quantart/lib/python3.8/site-packages/torch/cuda/init.py:104: UserWarning:
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

Testing 0 / 4
Testing 1 / 4
Testing 2 / 4
Testing 3 / 4

(quantart)% tree /home/kas/code/QuantArt/datasets
/home/kas/code/QuantArt/datasets
├── all_data_info.csv.zip
├── lhq_1024_jpg
│ └── lhq_1024_jpg
│ ├── 1.jpg
│ ├── 2.jpg
│ ├── 3.jpg
│ └── 4.jpg
├── painter-by-numbers
│ ├── test
│ │ ├── 1.jpg
│ │ ├── 2.jpg
│ │ ├── 3.jpg
│ │ └── 4.jpg
│ └── train_info.csv
└── train_info.csv

(quantart) % tree logs
logs
└── landscape2art
├── checkpoints
│ └── last.ckpt
└── configs
├── test_paired.yaml
└── test.yaml

It seems that it was not found α parameter in the inference code？

hi, I'm trying to use bash test.sh to reproduce the results of a paper. But I can't find any declaration about α ,in this function.
https://github.com/siyuhuang/QuantArt/blob/84d3c83032c03053577159f0af6a137c7a6dae3d/taming/models/vqgan_ref.py#L137

 def transfer(self, x, ref, quantize=True):
        with torch.no_grad():  
            quant_x, _, _ = self.encode(x, quantize=True) 
            quant_x = quant_x.detach()
            quant_ref, _, info_ref = self.encode_real(ref, quantize=True)
            indices_ref = info_ref[2]
            quant_ref = quant_ref.detach()
        
        h_x = self.model_x2y(quant_x, quant_ref)
        if not quantize:
            return quant_x, h_x, quant_ref, torch.zeros(1).to(self.device), [0,0,0], indices_ref
        quant_y, diff_x2y, info_y = self.quantize_dec(h_x)
        indices_y = info_y[2]
        return quant_x, quant_y, quant_ref, diff_x2y, indices_y, indices_ref

Any advice is super appreciated.