Giter Site home page Giter Site logo

yeungchenwa / fontdiffuser Goto Github PK

View Code? Open in Web Editor NEW
251.0 5.0 22.0 16.79 MB

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Home Page: https://yeungchenwa.github.io/fontdiffuser-homepage/

Python 98.93% Shell 1.07%
deep-learning diffusers diffusion font-generation image-generation

fontdiffuser's Introduction

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser_LOGO

arXiv preprint Gradio demo Homepage Code

🔥 Model Zoo 🛠️ Installation 🏋️ Training📺 Sampling📱 Run WebUI

🌟 Highlights

Vis_1 Vis_2

  • We propose FontDiffuser, which can generate unseen characters and styles and can be extended to cross-lingual generation, such as Chinese to Korean.
  • FontDiffuser excels in generating complex characters and handling large style variations. And it achieves state-of-the-art performance.
  • The generated results by FontDiffuser can be perfectly used for InstructPix2Pix for decoration, as shown in thr above figure.
  • We release the 💻Hugging Face Demo online! Welcome to Try it Out!

📅 News

  • 2024.01.27: The training of phase 2 is released.
  • 2023.12.20: Our repository is public! 👏🤗
  • 2023.12.19: 🔥🎉 The 💻Hugging Face Demo is public! Welcome to try it out!
  • 2023.12.16: The gradio app demo is released.
  • 2023.12.10: Release source code with phase 1 training and sampling.
  • 2023.12.09: 🎉🎉 Our paper is accepted by AAAI2024.
  • Previously: Our Recommendations-of-Diffusion-for-Text-Image repo is public, which contains a paper collection of recent diffusion models for text-image generation tasks. Welcome to check it out!

🔥 Model Zoo

Model chekcpoint status
FontDiffuer GoogleDrive / BaiduYun:gexg Released
SCR GoogleDrive / BaiduYun:gexg Released

🚧 TODO List

  • Add phase 1 training and sampling script.
  • Add WebUI demo.
  • Push demo to Hugging Face.
  • Add phase 2 training script and checkpoint.
  • Add the pre-training of SCR module.
  • Combined with InstructPix2Pix.

🛠️ Installation

Prerequisites (Recommended)

  • Linux
  • Python 3.9
  • Pytorch 1.13.1
  • CUDA 11.7

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/FontDiffuser.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Step 3: Install the required packages.

pip install -r requirements.txt

🏋️ Training

Data Construction

The training data files tree should be (The data examples are shown in directory data_examples/train/):

├──data_examples
│   └── train
│       ├── ContentImage
│       │   ├── char0.png
│       │   ├── char1.png
│       │   ├── char2.png
│       │   └── ...
│       └── TargetImage.png
│           ├── style0
│           │     ├──style0+char0.png
│           │     ├──style0+char1.png
│           │     └── ...
│           ├── style1
│           │     ├──style1+char0.png
│           │     ├──style1+char1.png
│           │     └── ...
│           ├── style2
│           │     ├──style2+char0.png
│           │     ├──style2+char1.png
│           │     └── ...
│           └── ...

Training Configuration

Before running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:

accelerate config

Training - Pretraining of SCR

Coming Soon ...

Training - Phase 1

sh train_phase_1.sh
  • data_root: The data root, as ./data_examples
  • output_dir: The training output logs and checkpoints saving directory.
  • resolution: The resolution of the UNet in our diffusion model.
  • style_image_size: The resolution of the style image, can be different with resolution.
  • content_image_size: The resolution of the content image, should be the same as the resolution.
  • channel_attn: Whether to use the channel attention in the MCA block.
  • train_batch_size: The batch size in the training.
  • max_train_steps: The maximum of the training steps.
  • learning_rate: The learning rate when training.
  • ckpt_interval: The checkpoint saving interval when training.
  • drop_prob: The classifier-free guidance training probability.

Training - Phase 2

After the phase 2 training, you should put the trained checkpoint files (unet.pth, content_encoder.pth, and style_encoder.pth) to the directory phase_1_ckpt. During phase 2, these parameters will be resumed.

sh train_phase_2.sh
  • phase_2: Tag to phase 2 training.
  • phase_1_ckpt_dir: The model checkpoints saving directory after phase 1 training.
  • scr_ckpt_path: The ckpt path of pre-trained SCR module. You can download it from above 🔥Model Zoo.
  • sc_coefficient: The coefficient of style contrastive loss for supervision.
  • num_neg: The number of negative samples, default to be 16.

📺 Sampling

Step 1 => Prepare the checkpoint

Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.
Option (2) Put your re-training checkpoint folder ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.

Step 2 => Run the script

(1) Sampling image from content image and reference image.

sh script/sample_content_image.sh
  • ckpt_dir: The model checkpoints saving directory.
  • content_image_path: The content/source image path.
  • style_image_path: The style/reference image path.
  • save_image: set True if saving as images.
  • save_image_dir: The image saving directory, the saving files including an out_single.png and an out_with_cs.png.
  • device: The sampling device, recommended GPU acceleration.
  • guidance_scale: The classifier-free sampling guidance scale.
  • num_inference_steps: The inference step by DPM-Solver++.

(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.

sh script/sample_content_character.sh
  • character_input: If set True, use character string as content/source input.
  • content_character: The content/source content character string.
  • The other parameters are the same as the above option (1).

📱 Run WebUI

(1) Sampling by FontDiffuser

gradio gradio_app.py

Example:

(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix

Coming Soon ...

🌄 Gallery

Characters of hard level of complexity

vis_hard

Characters of medium level of complexity

vis_medium

Characters of easy level of complexity

vis_easy

Cross-Lingual Generation (Chinese to Korean)

vis_korean

💙 Acknowledgement

Copyright

Citation

@inproceedings{yang2024fontdiffuser,
  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2024}
}

⭐ Star Rising

Star Rising

fontdiffuser's People

Contributors

yeungchenwa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fontdiffuser's Issues

src/modules/style_encoder.py中, 名为style_encoder_textedit_addskip_arch的函数报错, 导致训练阶段1无法开始

用Git Bash 2.46.0.windows.1运行D:\FontDiffuser-main\train_phase_1.sh的时候报出如下错误:

l257737602 MINGW64 /d/FontDiffuser-main
$ sh train_phase_1.sh
D:\Users\limin\AppData\Local\Programs\Python\Python312\Lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
D:\Users\limin\AppData\Local\Programs\Python\Python312\Lib\site-packages\accelerate\accelerator.py:406: UserWarning: log_with=tensorboard was passed but no supported trackers are currently installed.
warnings.warn(f"log_with={log_with} was passed but no supported trackers are currently installed.")
pygame 2.6.0 (SDL 2.28.4, Python 3.12.1)
Hello from the pygame community. https://www.pygame.org/contribute.html
Load the down block DownBlock2D
Load the down block MCADownBlock2D
The style_attention cross attention dim in Down Block 1 layer is 1024
The style_attention cross attention dim in Down Block 2 layer is 1024
Load the down block MCADownBlock2D
The style_attention cross attention dim in Down Block 1 layer is 1024
The style_attention cross attention dim in Down Block 2 layer is 1024
Load the down block DownBlock2D
Load the up block UpBlock2D
Load the up block StyleRSIUpBlock2D
Load the up block StyleRSIUpBlock2D
Load the up block UpBlock2D
Traceback (most recent call last):
File "D:\FontDiffuser-main\train.py", line 272, in
main()
File "D:\FontDiffuser-main\train.py", line 74, in main
style_encoder = build_style_encoder(args=args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\FontDiffuser-main\src\build.py", line 41, in build_style_encoder
style_image_encoder = StyleEncoder(
^^^^^^^^^^^^^
File "D:\Users\limin\AppData\Local\Programs\Python\Python312\Lib\site-packages\diffusers\configuration_utils.py", line 653, in inner_init
init(self, *args, **init_kwargs)
File "D:\FontDiffuser-main\src\modules\style_encoder.py", line 362, in init
self.arch = style_encoder_textedit_addskip_arch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 64
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "D:\Users\limin\AppData\Local\Programs\Python\Python312\Scripts\accelerate.exe_main
.py", line 7, in
File "D:\Users\limin\AppData\Local\Programs\Python\Python312\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "D:\Users\limin\AppData\Local\Programs\Python\Python312\Lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File "D:\Users\limin\AppData\Local\Programs\Python\Python312\Lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\Users\limin\AppData\Local\Programs\Python\Python312\python.exe', 'train.py', '--seed=123', '--experience_name=FontDiffuser_training_phase_1', '--data_root=data_examples', '--output_dir=outputs/FontDiffuser', '--report_to=tensorboard', '--resolution=64', '--style_image_size=64', '--content_image_size=64', '--content_encoder_downsample_size=3', '--channel_attn=True', '--content_start_channel=64', '--style_start_channel=64', '--train_batch_size=16', '--perceptual_coefficient=0.01', '--offset_coefficient=0.5', '--max_train_steps=440000', '--ckpt_interval=40000', '--gradient_accumulation_steps=1', '--log_interval=50', '--learning_rate=1e-4', '--lr_scheduler=linear', '--lr_warmup_steps=10000', '--drop_prob=0.1', '--mixed_precision=no']' returned non-zero exit status 1.

l257737602 MINGW64 /d/FontDiffuser-main

当时的FontDiffuser-main文件夹:
链接:https://pan.baidu.com/s/1z2A02skrnFJL-59ZyUKWwA?pwd=py98
提取码:py98
附训练材料来源:
寒蝉半圆体 https://github.com/Warren2060/ChillRound
天珩全字库 http://cheonhyeong.com/Simplified/download.html

About InstructPix2Pix

Hello @yeungchenwa ,

Thank you for your excellent work on the FontDiffuser project! I am very curious about how you combined FontDiffuser and InstructPix2Pix. I have tried different text prompts and parameters but haven’t been able to achieve the desired results. Could you please provide an example or some guidance on how to generate these pictures using InstructPix2Pix?

with_instructpix2pix

training about the english letters

hi @yeungchenwa, thanks for your excellent work at first!
I want to ask some questions about the training of the English letter.

If I add English dataset, I need to train both the phase 1 and phase 2 process, right?
how many fonts do you think is efficient?
do I need to train from scratch or refine with your model would be enough?

Hello, I am very interested in your research after reading your paper. But I have a confusion about the second stage of training.

Hello, I am very interested in your research after reading your paper. But I have a confusion about the second stage of training.

May I ask how to load the model from the first stage during the second stage of training? Is it the total model. pth from the first stage? But it seems that it cannot be loaded. If the downloaded scr_210000.pth can be loaded, what is the relationship between this and the first stage of my own training. Please help answer, thank you very much!!

请问附录在哪可以找到

您好,感谢您优秀的论文。文章中提到三个复杂程度的分类细节见附录,论文结尾并未找到附录,请问可以分享下附录的地址吗?非常感谢!

InfoNCE

您好,我想询问一下关于第二阶段的对比损失InfoNCE具体代码是怎么样的,感谢

About how to generate an SCR_210000.pth file

Hello, how was the SCR_210000 used during the second training process saved? How can I separate the corresponding SCR from the total_model. pth file generated during my training? Looking forward to your answer, thank you very much!!!!

训练时长

您好!非常感谢您优秀的工作!
我想知道您当初用完整数据集在3090训练的时候,用了多长时间?

Request for Korean font pre-trained checkpoints

I am very interested in your repository. The checkpoint file you have provided seems to be trained primarily on Chinese characters, as certain shapes such as the Korean character 'ㅇ' do not render properly.

I was wondering if you happened to have a checkpoint file that has been trained on Korean fonts? Having a model pre-trained on Korean data would be incredibly valuable and helpful for my current project. If such a file is available, I would be most grateful if you could share it with me. If not, I completely understand, and I appreciate you taking the time to consider my request.

Thank you in advance for your help. I look forward to hearing from you.

Question about the resolution.

May I ask whether, during the comparative experiment, all the baselines were trained using the same training set as our method? The article mentions an image size of 96. Regarding DG-Font, whose resolution is limited to 80, how did you address this issue? Did you directly resize it from 80 to 96 for comparison, or did you employ other methods? Thank you.

BTW, without considering the situation of GPU memory, can we set the resolution arbitrarily for our method?

数据集相关问题

您好,这个项目真的太牛了,我想要自己训练一下,但是不知道这个项目数据集的那424种字体的选取有什么特别的考虑吗。还是说从公开字库里随机下载的这些字体并没有什么考虑。如果是有特别进行挑选的话,是个什么挑选策略呢

关于训练的几个问题

作者大佬们好,非常感谢你们的工作!
关于训练我有两个问题。
1、假如说有草书、楷书两个主要风格的笔记训练,我将他们混合到一起进行训练效果好,还是说进行分类后再单独生产效果好呢?
2、训练时loss值有推荐的区间不

模型在毛笔字上的效果

感谢你们的工作!我想用书法数据集来训练一下模型,所以问问模型在毛笔字上的效果如何呢?

SCR pre-training script release timeline?

First, thanks for open soucing your amazing work. Two weeks ago, I commented below an existing issue requesting for the script, but you might have missed my comment. It's been a few months since you announced that the SCR pretraining script will be released, so I just want to check in for any potential update.

does it support few-shot?

Does the model's inference support few-shots, or is there a way for the model to take features from multiple reference characters and then inference? Nice work by the way.

关于数据集的问题

你好,你们的工作很棒,但是我作为一个初学者对于这个领域的了解没有很深,尤其是这个领域现存有哪些开源数据集可以使用。请问你们的数据集是怎么、从何构建的呢,可以给一个可供参考的开源数据集的链接吗?非常感谢

训练出错

在训练的时候默认开启了数据并行导致保存DistributedDataParallel,模型model没有unt这个参数导致出错,如何从并行模型提取需要模型参数

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.