Giter Site home page Giter Site logo

Comments (5)

robertsehlke avatar robertsehlke commented on July 21, 2024

Not too surprisingly "man" and "woman" classes seem to be entangled. Here is what I get for "a photo of a woman" with the three checkpoints:

(original)
download

(1)
download

(2)
download

from dreambooth-stable-diffusion.

XavierXiao avatar XavierXiao commented on July 21, 2024

Well I think indeed you should use the original model for producing reg images, as fine tuning will do some mysterious thing on the model (remember the model is also fine-tuned on the reg images). I have some thoughts: given that SD cannot generate realistic photo of human, why not use some random images of human you obtained online for regularization? Maybe you can try that as well. I feel like the original SD model will always produce some black-white faint human images with prompt like "photo of a man/woman", so maybe using external, diverse set of human photos serves as better regularization.

from dreambooth-stable-diffusion.

robertsehlke avatar robertsehlke commented on July 21, 2024

The first round of regularization images (from the untuned model) are pretty good/usable, so that should be fine.

I was just wondering why generating images with the regularization class noun after fine-tuning leads to such strong drift/collapse for the noun, also since you mentioned in the readme that it looks like they generate regularization images on the fly in the paper.

I'll try using more regularization images. Upon reading more closely the paper does mention that
∼200 × N “a [class noun]” samples are generated, with N being the size of the subject dataset. So we're looking at 800-1000 recommended.

from dreambooth-stable-diffusion.

TingTingin avatar TingTingin commented on July 21, 2024

Well I think indeed you should use the original model for producing reg images, as fine tuning will do some mysterious thing on the model (remember the model is also fine-tuned on the reg images). I have some thoughts: given that SD cannot generate realistic photo of human, why not use some random images of human you obtained online for regularization? Maybe you can try that as well. I feel like the original SD model will always produce some black-white faint human images with prompt like "photo of a man/woman", so maybe using external, diverse set of human photos serves as better regularization.

have you tested that does it produce better results?

from dreambooth-stable-diffusion.

robertsehlke avatar robertsehlke commented on July 21, 2024

I've now tried it with more regularization images (~300, mix of curated images from the original model + internet photos, at the default 800 steps) - it seems to help a little bit with preserving diversity, but the class prior is still degraded.

Pretty impressive how well the ad hoc regularization works to generate/edit one intended new concept, but this issue limits it a bit. Not sure if they completely solved it in the Dreambooth paper either (though they're clearly aware) or just staved it off with far more regularization images.

from dreambooth-stable-diffusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.