English ｜中文

Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

🎉 News
📑 Open-source Plan
📖 Introduction
📊 Evaluation 🥇🥇🔥🔥
🎥 Visualization
🛠️ Usage
📜 License & Citation & Acknowledgments

🎉 News

2024.07.10 🤖 Kolors supports ModelScope.
2024.07.09 💥 Kolors supports ComfyUI. Thanks to @kijai with his great work.
2024.07.06 🔥🔥🔥 We release Kolors, a large text-to-image model trained on billions of text-image pairs. This model is bilingual in both Chinese and English, and supports a context length of 256 tokens. For more technical details, please refer to technical report.
2024.07.03 📊 Kolors won the second place on FlagEval Multimodal Text-to-Image Leaderboard, excelling particularly in the Chinese and English subjective quality assessment where Kolors took the first place.
2024.07.02 🎉 Congratulations! Our paper on controllable video generation, DragAnything: Motion Control for Anything using Entity Representation, have been accepted by ECCV 2024.
2024.02.08 🎉 Congratulations! Our paper on generative model evaluation, Learning Multi-dimensional Human Preference for Text-to-Image Generation, have been accepted by CVPR 2024.

📑 Open-source Plan

📖 Introduction

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this technical report.

📊 Evaluation

We have collected a comprehensive text-to-image evaluation dataset named KolorsPrompts to compare Kolors with other state-of-the-art open models and closed-source models. KolorsPrompts includes over 1,000 prompts across 14 catagories and 12 evaluation dimensions. The evaluation process incorporates both human and machine assessments. In relevant benchmark evaluations, Kolors demonstrated highly competitive performance, achieving industry-leading standards.

Human Assessment

For the human evaluation, we invited 50 imagery experts to conduct comparative evaluations of the results generated by different models. The experts rated the generated images based on three criteria: visual appeal, text faithfulness, and overall satisfaction. In the evaluation, Kolors achieved the highest overall satisfaction score and significantly led in visual appeal compared to other models.

Model	Average Overall Satisfaction	Average Visual Appeal	Average Text Faithfulness
Adobe-Firefly	3.03	3.46	3.84
Stable Diffusion 3	3.26	3.50	4.20
DALL-E 3	3.32	3.54	4.22
Midjourney-v5	3.32	3.68	4.02
Playground-v2.5	3.37	3.73	4.04
Midjourney-v6	3.58	3.92	4.18
Kolors	3.59	3.99	4.17

All model results are tested with the April 2024 product versions

Machine Assessment

We used MPS (Multi-dimensional Human Preference Score) on KolorsPrompts as the evaluation metric for machine assessment. Kolors achieved the highest MPS score, which is consistent with the results of the human evaluations.

Models	Overall MPS
Adobe-Firefly	8.5
Stable Diffusion 3	8.9
DALL-E 3	9.0
Midjourney-v5	9.4
Playground-v2.5	9.8
Midjourney-v6	10.2
Kolors	10.3

For more experimental results and details, please refer to our technical report.

🎥 Visualization

High-quality Portrait

Chinese Elements Generation

Complex Semantic Understanding

Text Rendering

The visualized case prompts mentioned above can be accessed here.

🛠️ Usage

Requirements

Python 3.8 or later
PyTorch 1.13.1 or later
Transformers 4.26.1 or later
Recommended: CUDA 11.7 or later

Repository Cloning and Dependency Installation

apt-get install git-lfs
git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
conda create --name kolors python=3.8
conda activate kolors
pip install -r requirements.txt
python3 setup.py install

Weights download（link）：

huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors

git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors

Inference：

python3 scripts/sample.py "一张瓢虫的照片，微距，变焦，高质量，电影，拿着一个牌子，写着“可图”"
# The image will be saved to "scripts/outputs/sample_text.jpg"

Web demo：

python3 scripts/sampleui.py

📜 License & Citation & Acknowledgments

License

Kolors are fully open-sourced for academic research. For commercial use, please fill out this questionnaire and sent it to [email protected] to obtain the agreement for free use.

We open-source Kolors to promote the development of large text-to-image models in collaboration with the open-source community. The code of this project is open-sourced under the Apache-2.0 license. We sincerely urge all developers and users to strictly adhere to the open-source license, avoiding the use of the open-source model, code, and its derivatives for any purposes that may harm the country and society or for any services not evaluated and registered for safety. Note that despite our best efforts to ensure the compliance, accuracy, and safety of the data during training, due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, we cannot guarantee the accuracy and safety of the output content, and the model is susceptible to misleading. This project does not assume any legal responsibility for any data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized due to the use of the open-source model and code.

Citation

If you find our work helpful, please cite it!

@article{kolors,
  title={Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis},
  author={Kolors Team},
  journal={arXiv preprint},
  year={2024}
}

Acknowledgments

Thanks to Diffusers for providing the codebase.
Thanks to ChatGLM3 for providing the powerful Chinese language model.

Contact Us

If you want to leave a message for our R&D team and product team, feel free to join our WeChat group. You can also contact us via email ([email protected]).

gogelabs / kolors Goto Github PK

kolors's Introduction