Giter Site home page Giter Site logo

notthed / sd-latent-interposer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from city96/sd-latent-interposer

0.0 0.0 0.0 75 KB

A small neural network to provide interoperability between the latents generated by the different Stable Diffusion models.

License: Apache License 2.0

Python 100.00%

sd-latent-interposer's Introduction

SD-Latent-Interposer

A small neural network to provide interoperability between the latents generated by the different Stable Diffusion models.

I wanted to see if it was possible to pass latents generated by the new SDXL model directly into SDv1.5 models without decoding and re-encoding them using a VAE first.

Installation

To install it, simply clone this repo to your custom_nodes folder using the following command:

git clone https://github.com/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer

Alternatively, you can download the comfy_latent_interposer.py file to your ComfyUI/custom_nodes folder as well. You may need to install hfhub using the command pip install huggingface-hub inside your venv.

If you need the model weights for something else, they are hosted on HF under the same Apache2 license as the rest of the repo. The current files are in the "v4.0" subfolder.

Usage

Simply place it where you would normally place a VAE decode followed by a VAE encode. Set the denoise as appropirate to hide any artifacts while keeping the composition. See image below.

LATENT_INTERPOSER_V3 1_TEST

Without the interposer, the two latent spaces are incompatible:

LATENT_INTERPOSER_V3 1

Local models

The node pulls the required files from huggingface hub by default. You can create a models folder and place the models there if you have a flaky connection or prefer to use it completely offline. The custom node will prefer local files over HF when available. The path should be: ComfyUI/custom_nodes/SD-Latent-Interposer/models

Alternatively, just clone the entire HF repo to it:

git clone https://huggingface.co/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer/models

Supported Models

Model names:

code name
v1 Stable Diffusion v1.x
xl SDXL
v3 Stable Diffusion 3
ca Stable Cascade (Stage A/B)

Available models:

From to v1 to xl to v3 to ca
v1 - v4.0 v4.0 No
xl v4.0 - v4.0 No
v3 v4.0 v4.0 - No
ca v4.0 v4.0 v4.0 -

Training

The training code initializes most training parameters from the provided config file. The dataset should be a single .bin file saved with torch.save for each latent version. The format should be [batch, channels, height, width] with the "batch" being as large as the dataset, ie 88000.

Interposer v4.0

The training code currently initializes two copies of the model, one in the target direction and one in the opposite. The losses are defined based on this.

  • p_loss is the main criterion for the primary model.
  • b_loss is the main criterion for the secondary one.
  • r_loss is the output of the primary model back through the secondary model and checked against the source latent (basically a round trip through the two models).
  • h_loss is the same as r_loss but for the secondary model.

All models were trained for 50000 steps with either batch size 128 (xl/v1) or 48 (cascade). The training was done locally on an RTX 3080 and a Tesla V100S.

LATENT_INTERPOSER_V4_LOSS

Older versions

Interposer v3.1

Interposer v3.1

This is basically a complete rewrite. Replaced the mediocre bunch of conv2d layers with something that looks more like a proper neural network. No VGG loss because I still don't have a better GPU.

Training was done on combined Flickr2K + DIV2K, with each image being processed into 6 1024x1024 segments. Padded with some of my random images for a total of 22,000 source images in the dataset.

I think I got rid of most of the XL artifacts, but the color/hue/saturation shift issues are still there. I actually saved the optimizer state this time so I might be able to do 100K steps with visual loss on my P40s. Hopefully they won't burn up.

v3.0 was 500k steps at a constant LR of 1e-4, v3.1 was 1M steps using a CosineAnnealingLR to drop the learning rate towards the end. Both used AdamW.

INTERPOSER_V3 1

Interposer v1.1

Interposer v1.1

This is the second release using the "spaceship" architecture. It was trained on the Flickr2K dataset and was continued from the v1.0 checkpoint. Overall, it seems to perform a lot better, especially for real life photos. I also investigated the odd v1->xl artifacts but in the end it seems inherent to the VAE decoder stage.

loss

Interposer v1.0

Interposer v1.0

Not sure why the training loss is so different, it might be due to the """highly curated""" dataset of 1000 random images from my Downloads folder that I used to train it.

I probably should've just grabbed LAION.

I also trained a v1-to-v2 mode, before realizing v1 and v2 shared the same latent space. Oh well.

loss

xl-to-v1_interposer

sd-latent-interposer's People

Contributors

city96 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.