Giter Site home page Giter Site logo

rbbrdckybk / ai-art-generator Goto Github PK

View Code? Open in Web Editor NEW
633.0 13.0 128.0 13.39 MB

For automating the creation of large batches of AI-generated artwork locally.

License: Other

Python 99.34% CSS 0.66%
machine-learning vqgan-clip deep-learning image-generation clip-guided-diffusion generative-art stable-diffusion

ai-art-generator's Introduction

2022-09-28 Update:

Just a note that I've launched Dream Factory, a significant upgrade to this. It's got an (optional) GUI, true simultaneous multi-GPU support, an integrated gallery with full EXIF metadata support, and many other new features.

I dropped VQGAN and Disco Diffusion support to focus on Stable Diffusion, so if you want VQGAN and/or Disco Diffusion you should stick with this for now. Otherwise I encourage everyone to migrate to Dream Factory! I'll continue to patch bug fixes on this repo but I likely won't be adding new features going foward.

AI Art Generator

For automating the creation of large batches of AI-generated artwork locally. Put your GPU(s) to work cranking out AI-generated artwork 24/7 with the ability to automate large prompt queues combining user-selected subjects, styles/artists, and more! More info on which models are available after the sample pics.
Some example images that I've created via this process (these are cherry-picked and sharpened):
sample image 1 sample image 2 sample image 3 sample image 4 sample image 5 sample image 6
Note that I did not create or train the models used in this project, nor was I involved in the original coding. I've simply modified the original colab versions so they'll run locally and added some support for automation. Models currently supported, with links to their original implementations:

Requirements

You'll need an Nvidia GPU, preferably with a decent amount of VRAM. 12GB of VRAM is sufficient for 512x512 output images depending on model and settings, and 8GB should be enough for 384x384 (8GB should be considered a reasonable minimum!). To generate 1024x1024 images, you'll need ~24GB of VRAM or more. Generating small images and then upscaling via ESRGAN or some other package provides very good results as well.

It should be possible to run on an AMD GPU, but you'll need to be on Linux to install the ROCm version of Pytorch. I don't have an AMD GPU to throw into a Linux machine so I haven't tested this myself.

Setup

These instructions were tested on a Windows 10 desktop with an Nvidia 3080 Ti GPU (12GB VRAM), and also on an Ubuntu Server 20.04.3 system with an old Nvidia Tesla M40 GPU (24GB VRAM).

[1] Install Anaconda, open the root terminal, and create a new environment (and activate it):

conda create --name ai-art python=3.9
conda activate ai-art

[2] Install Pytorch:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Note that you can customize your Pytorch installation by using the online tool located here.

[3] Install other required Python packages:

conda install -c anaconda git urllib3
pip install transformers keyboard pillow ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops torch_optimizer

[4] Clone this repository and switch to its directory:

git clone https://github.com/rbbrdckybk/ai-art-generator
cd ai-art-generator

Note that Linux users may need single quotes around the URL in the clone command.

[5] Clone additional required repositories:

git clone https://github.com/openai/CLIP
git clone https://github.com/CompVis/taming-transformers

[6] Download the default VQGAN pre-trained model checkpoint files:

mkdir checkpoints
curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - "https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1"
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - "https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1"

Note that Linux users should replace the double quotes in the curl commands with single quotes.

[7] (Optional) Download additional pre-trained models:
Additional models are not necessary, but provide you with more options. Here is a good list of available pre-trained models.
For example, if you also wanted the FFHQ model (trained on faces):

curl -L -o checkpoints/ffhq.yaml -C - "https://app.koofr.net/content/links/0fc005bf-3dca-4079-9d40-cdf38d42cd7a/files/get/2021-04-23T18-19-01-project.yaml?path=%2F2021-04-23T18-19-01_ffhq_transformer%2Fconfigs%2F2021-04-23T18-19-01-project.yaml&force"
curl -L -o checkpoints/ffhq.ckpt -C - "https://app.koofr.net/content/links/0fc005bf-3dca-4079-9d40-cdf38d42cd7a/files/get/last.ckpt?path=%2F2021-04-23T18-19-01_ffhq_transformer%2Fcheckpoints%2Flast.ckpt"

[8] (Optional) Test VQGAN+CLIP:

python vqgan.py -s 128 128 -i 200 -p "a red apple" -o output/output.png

You should see output.png created in the output directory, which should loosely resemble an apple.

[9] Install packages for CLIP-guided diffusion (if you're only interested in VQGAN+CLIP, you can skip everything from here to the end):

pip install ipywidgets omegaconf torch-fidelity einops wandb opencv-python matplotlib lpips datetime timm
conda install pandas

[10] Clone repositories for CLIP-guided diffusion:

git clone https://github.com/crowsonkb/guided-diffusion
git clone https://github.com/assafshocher/ResizeRight
git clone https://github.com/CompVis/latent-diffusion

[11] Download models needed for CLIP-guided diffusion:

mkdir content\models
curl -L -o content/models/256x256_diffusion_uncond.pt -C - "https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt"
curl -L -o content/models/512x512_diffusion_uncond_finetune_008100.pt -C - "http://batbot.tv/ai/models/guided-diffusion/512x512_diffusion_uncond_finetune_008100.pt"
curl -L -o content/models/secondary_model_imagenet_2.pth -C - "https://ipfs.pollinations.ai/ipfs/bafybeibaawhhk7fhyhvmm7x24zwwkeuocuizbqbcg5nqx64jq42j75rdiy/secondary_model_imagenet_2.pth"
mkdir content\models\superres
curl -L -o content/models/superres/project.yaml -C - "https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1"
curl -L -o content/models/superres/last.ckpt -C - "https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1"

Note that Linux users should again replace the double quotes in the curl commands with single quotes, and replace the mkdir backslashes with forward slashes.

[12] (Optional) Test CLIP-guided diffusion:

python diffusion.py -s 128 128 -i 200 -p "a red apple" -o output.png

You should see output.png created in the output directory, which should loosely resemble an apple.

[13] Clone Stable Diffusion repository (if you're not interested in SD, you can skip everything from here to the end):

git clone https://github.com/rbbrdckybk/stable-diffusion

[14] Install additional dependancies required by Stable Diffusion:

pip install diffusers

[15] Download the Stable Diffusion pre-trained checkpoint file:

mkdir stable-diffusion\models\ldm\stable-diffusion-v1
curl -L -o stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt -C - "https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt"

If the curl command doesn't download the checkpoint, it's gated behind a login. You'll need to register here (only requires email and name) and then you can download the checkpoint file here.
After downloading, you'll need to place the .ckpt file in the directory created above and name it model.ckpt.

[16] (Optional) Test Stable Diffusion:
The easiest way to test SD is to create a simple prompt file with !PROCESS = stablediff and a single subject. See example-prompts.txt and the next section for more information. Assuming you create a simple prompt file called test.txt first, you can test by running:

python make_art.py test.txt

Images should be saved to the output directory if successful (organized into subdirectories named for the date and prompt file).

[17] Setup ESRGAN/GFPGAN (if you're not planning to upscale images, you can skip this and everything else):

git clone https://github.com/xinntao/Real-ESRGAN
pip install basicsr facexlib gfpgan
cd Real-ESRGAN
curl -L -o experiments/pretrained_models/RealESRGAN_x4plus.pth -C - "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth"
python setup.py develop
cd ..

You're done!

If you're getting errors outside of insufficient GPU VRAM while running and haven't updated your installation in awhile, try updating some of the more important packages, for example:

pip install transformers -U

Usage

Essentially, you just need to create a text file containing the subjects and styles you want to use to generate images. If you have 5 subjects and 20 styles in your prompt file, then a total of 100 output images will be created (20 style images for each subject).

Take a look at example-prompts.txt to see how prompt files should look. You can ignore everything except the [subjects] and [styles] areas for now. Lines beginning with a '#' are comments and will be ignored, and lines beginning with a '!' are settings directives and are explained in the next section. For now, just modify the example subjects and styles with whatever you'd like to use.

After you've populated example-prompts.txt to your liking, you can simply run:

python make_art.py example-prompts.txt

Depending on your hardware and settings, each image will take anywhere from a few seconds to a few hours (on older hardware) to create. If you can run Stable Diffusion, I strongly recommend it for the best results - both in speed and image quality.

Output images are created in the output/[current date]-[prompt file name]/ directory by default. The output directory will contain a JPG file for each image named for the subject & style used to create it. So for example, if you have "a monkey on a motorcycle" as one of your subjects, and "by Picasso" as a style, the output image will be created as output/[current date]-[prompt file name]/a-monkey-on-a-motorcycle-by-picasso.jpg (filenames will vary a bit depending on process used).

You can press CTRL+SHIFT+P any time to pause execution (the pause will take effect when the current image is finished rendering). Press CTRL+SHIFT+P again to unpause. Useful if you're running this on your primary computer and need to use your GPU for something else for awhile. You can also press CTRL+SHIFT+R to reload the prompt file if you've changed it (the current work queue will be discarded, and a new one will be built from the contents of your prompt file). Note that keyboard input only works on Windows.

The settings used to create each image are saved as metadata in each output JPG file by default. You can read the metadata info back by using any EXIF utility, or by simply right-clicking the image file in Windows Explorer and selecting "properties", then clicking the "details" pane. The "comments" field holds the command used to create the image.

Advanced Usage

Directives can be included in your prompt file to modify settings for all prompts that follow it. These settings directives are specified by putting them on their own line inside of the [subject] area of the prompt file, in the following format:

![setting to change] = [new value]

For [setting to change], valid directives are:

  • PROCESS
  • CUDA_DEVICE
  • WIDTH
  • HEIGHT
  • ITERATIONS (vqgan/diffusion only)
  • CUTS (vqgan/diffusion only)
  • INPUT_IMAGE
  • SEED
  • LEARNING_RATE (vqgan only)
  • TRANSFORMER (vqgan only)
  • OPTIMISER (vqgan only)
  • CLIP_MODEL (vqgan only)
  • D_VITB16, D_VITB32, D_RN101, D_RN50, D_RN50x4, D_RN50x16 (diffusion only)
  • STEPS (stablediff only)
  • CHANNELS (stablediff only)
  • SAMPLES (stablediff only)
  • STRENGTH (stablediff only)
  • SD_LOW_MEMORY (stablediff only)
  • USE_UPSCALE (stablediff only)
  • UPSCALE_AMOUNT (stablediff only)
  • UPSCALE_FACE_ENH (stablediff only)
  • UPSCALE_KEEP_ORG (stablediff only)
  • REPEAT

Some examples:

!PROCESS = vqgan

This will set the current AI image-generation process. Valid options are vqgan for VQGAN+CLIP, diffusion for CLIP-guided diffusion (Disco Diffusion), or stablediff for Stable Diffusion.

!CUDA_DEVICE = 0

This will force GPU 0 be to used (the default). Useful if you have multiple GPUs - you can run multiple instances, each with it's own prompt file specifying a unique GPU ID.

!WIDTH = 384
!HEIGHT = 384

This will set the output image size to 384x384. A larger output size requires more GPU VRAM. Note that for Stable Diffusion these values should be multiples of 64.

!TRANSFORMER = ffhq

This will tell VQGAN to use the FFHQ transformer (somewhat better at faces), instead of the default (vqgan_imagenet_f16_16384). You can follow step 7 in the setup instructions above to get the ffhq transformer, along with a link to several others.

Whatever you specify here MUST exist in the checkpoints directory as a .ckpt and .yaml file.

!INPUT_IMAGE = samples/face-input.jpg

This will use samples/face-input.jpg (or whatever image you specify) as the starting image, instead of the default random noise. Input images must be the same aspect ratio as your output images for good results. Note that when using with Stable Diffusion the output image size will be the same as your input image (your height/width settings will be ignored).

!SEED = 42

This will use 42 as the input seed value, instead of a random number (the default). Useful for reproducibility - when all other parameters are identical, using the same seed value should produce an identical image across multiple runs. Set to nothing or -1 to reset to using a random value.

!INPUT_IMAGE = 

Setting any of these values to nothing will return it to its default. So in this example, no starting image will be used.

!STEPS = 50

Sets the number of steps (simliar to iterations) when using Stable Diffusion to 50 (the default). Higher values take more time and may improve image quality. Values over 100 rarely produce noticeable differences compared to lower values.

!SCALE = 7.5

Sets the guidance scale when using Stable Diffusion to 7.5 (the default). Higher values (to a point, beyond ~25 results may be strange) will cause the the output to more closely adhere to your prompt.

!SAMPLES = 1

Sets the number of times to sample when using Stable Diffusion to 1 (the default). Values over 1 will cause multiple output images to be created for each prompt at a slight time savings per image. There is no cost in GPU VRAM required for incrementing this.

!STRENGTH = 0.75

Sets the influence of the starting image to 0.75 (the default). Only relevant when using Stable Diffusion with an input image. Valid values are between 0-1, with 1 corresponding to complete destruction of the input image, and 0 corresponding to leaving the starting image completely intact. Values between 0.25 and 0.75 tend to give interesting results.

!SD_LOW_MEMORY = no

Use a forked repo with much lower GPU memory requirements when using Stable Diffusion (yes/no)? Setting this to yes will switch over to using a memory-optimized version of SD that will allow you to create higher resolution images with far less GPU memory (512x512 images should only require around 4GB of VRAM). The trade-off is that inference is much slower compared to the default official repo. For comparison: on a RTX 3060, a 512x512 image at default settings takes around 12 seconds to create; with !SD_LOW_MEMORY = yes, the same image takes over a minute. Recommend keeping this off unless you have under 8GB GPU VRAM, or want to experiment with creating larger images before upscaling.

!USE_UPSCALE = no

Automatically upscale images created with Stable Diffusion (yes/no)? Uses ESRGAN/GFPGAN (see additional settings below).

!UPSCALE_AMOUNT = 2

How much to scale when !USE_UPSCALE = yes. Default is 2.0x; higher values require more VRAM and time.

!UPSCALE_FACE_ENH = no

Whether or not to use GFPGAN (vs default ESRGAN) when upscaling. GFPGAN provides the best results with faces, but may provide slightly worse results if used on non-face subjects.

!UPSCALE_KEEP_ORG = no

Keep the original unmodified image when upscaling (yes/no)? If set to no (the default), the original image will be deleted. If set to yes, the original image will be saved in an /original subdirectory of the image output folder.

!REPEAT = no

When all jobs in the prompt file are finished, restart back at the top of the file (yes/no)? Default is no, which will simply terminate execution when all jobs are complete.

TODO: finish settings examples & add usage tips/examples, document random_art.py

ai-art-generator's People

Contributors

lyfeonedge avatar rbbrdckybk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai-art-generator's Issues

AMD Graphics Thread

Just letting you know. I have tried and failed to run this configuration on WSL today. I have concluded that it is not possible on Windows 10.

Problem: AMD GPU on Windows 10. Dual booting Ubuntu is an option, but comes with it's own headaches, namely, the time it takes to install the OS, and the time it takes to make them play nice together so I can choose the right one at startup. I think Windows 10 is actually worse than previous OS's to do this with too. It's been a while since I've done this, but I remember getting everything all set up nice and then a few weeks later Windows 10 would just delete the other boot record. In short, AI Art should be fun, and dual booting is not fun.

Proposed and Attempted Solution: Install WSL2 with the Ubuntu 20.04 image. Install the dreadful AMD 2020 WSL preview driver, which sadly overwrites the newer driver. Then follow the steps for setup, but with the AMD version of PyTorch.

I made it over multiple hurdles, but when I finally got to the prompt step, it simply gave me the same error I get in Windows: Warning: No GPU found! Using the CPU instead. The iterations will be slow..

I kept looking for things to try, but I was forced to admit defeat after reading this and seeing the feature that would even allow me to use graphics in WSL2 is on Windows 11 (this article is geared toward GUI apps, but you can see it's showing you how to set up graphics in WSL2 here, so I believe these are one an the same issue; if I am wrong about that, maybe there's a way to do this but I'm kind of at a dead end with the information I've found, so if anybody knows anything feel free to correct me):
https://learn.microsoft.com/en-us/windows/wsl/tutorials/gui-apps

Here are the takeaways.

You can put in the documentation that it doesn't support WSL2 for Windows 10 and save everyone the time. I can't rule out AMD graphics on Windows 11 for WSL2, and more power to somebody with a newer system who can try that combination.

Here are the notes for how I got as far as I did in case somebody wants to try it on Windows 11 using WSL2:

  1. Enable WSL and Virtual Machine Platform via Control Panel
  2. Install Ubuntu 22.04 from Windows Store.
  3. Enable Virtualization in your bios.
  4. Read here for more info on enabling graphics in WSL2: https://learn.microsoft.com/en-us/windows/wsl/tutorials/gui-apps
    • You should install the driver specified here.
    • Update WSL to WSL2 (probably should do this at the beginning so your Ubuntu image is on WSL2, although it's an easy fix if you don't).
  5. Use this command pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1 to install PyTorch for AMD.
  6. Everything else is just following the same steps in this project's readme.

I'll open another issue if I get it working on Ubuntu and let everybody know any changes to the steps I took to do so.

Training the AI?

how does training work? I know it uses checkpoints as trained data but is there a way I can train it manually for my art and specific images?

Running on Anaconda Powershell Error code

I am attempting to run this on Anaconda Powershell, so maybe this is my first mistake, but everything was installing and running smoothly up until putting in the curl prompts under "6] Download the default VQGAN pre-trained model checkpoint files:" step.

I am getting the following error, immediately after the input "curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - "https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1" :

Invoke-WebRequest : Parameter cannot be processed because the parameter name 'C' is ambiguous. Possible matches include: -Credential -CertificateThumbprint -Certificate -ContentType. At line:1 char:54 + curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - "https://he ... + ~~ + CategoryInfo : InvalidArgument: (:) [Invoke-WebRequest], ParameterBindingException + FullyQualifiedErrorId : AmbiguousParameter,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

Thanks in advance!

Smaller Chunks

Congrats, amazing stuff! The results I'm getting are mind blowing!

Unfortunately, like you, I only have an 8GB VRAM GPU.
Is there a way to increase the output size to 720 or 1080 without having to upscale? Perhaps by using a smaller chunk size and waiting longer?

Thanks again for releasing this (and for spending the time to create detailed instructions)

Error while running the test command

Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Using device: cuda:0
Optimising using: Adam
Using text prompts: ['a red apple']
Using seed: 1260714667749887246
i: 0, loss: 0.906549, losses: 0.906549
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/home/gaurav/Desktop/ai_art_generators/ai-art-generator/vqgan.py", line 867, in
train(i)
File "/home/gaurav/Desktop/ai_art_generators/ai-art-generator/vqgan.py", line 751, in train
checkin(i, lossAll)
File "/home/gaurav/anaconda3/envs/ai-art/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/gaurav/Desktop/ai_art_generators/ai-art-generator/vqgan.py", line 721, in checkin
TF.to_pil_image(out[0].cpu()).save(args.output, pnginfo=info)
File "/home/gaurav/anaconda3/envs/ai-art/lib/python3.9/site-packages/PIL/Image.py", line 2209, in save
fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'output/output.png'

Style is a required field, not optional

First, thanks for the work on this project, it's exactly what I was looking for, and the instructions are really good.

I found an issue where if you don't have a style listed, make_art.sh queues up 0 items and ends. The examples file notes that styles are optional, but since you must iterate on them to generate a prompt, it's actually not.

for style in self.styles:

I think it was your intention but I'd like to be able to just list several complete prompts under "subject" and not require other fields.

Thanks!

RuntimeError: CUDA out of memory. diffusion

RuntimeError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 4.00 GiB total capacity; 3.45 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

AttributeError: 'Image' object has no attribute 'getexif'

Hello,
I get this error when trying to run make.art.py.

python make_art.py test5.txt
Queued 1 work items from test5.txt.

Worker starting job #1:
Command: python vqgan.py -s 384 384 -i 500 -cuts 32 -p "some random prompt artistic, masterpiece | cartoonish" -lr 0.1 -m ViT-B/32 -cd "cuda:0" -sd 3831269064 -o output/2023-05-21-test5/some-random-prompt-cartoonish.png
/root/miniconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:259: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead.
"pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Using device: cuda:0
Optimising using: Adam
Using text prompts: ['some random prompt', 'cartoonish']
Using seed: 3831269064
i: 0, loss: 1.84399, losses: 0.912632, 0.931359
i: 50, loss: 1.66447, losses: 0.776792, 0.887676
i: 100, loss: 1.52617, losses: 0.664702, 0.861466
i: 150, loss: 1.54887, losses: 0.686982, 0.861885
i: 200, loss: 1.58915, losses: 0.718493, 0.870659
i: 250, loss: 1.48152, losses: 0.622581, 0.858936
i: 300, loss: 1.47938, losses: 0.622537, 0.856846
i: 350, loss: 1.47611, losses: 0.62103, 0.85508
i: 400, loss: 1.58315, losses: 0.718842, 0.86431
i: 450, loss: 1.4626, losses: 0.60904, 0.853565
i: 500, loss: 1.4565, losses: 0.603882, 0.852619
500it [02:30, 3.32it/s]
Exception in thread Thread-1:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "make_art.py", line 226, in run
exif = im.getexif()
AttributeError: 'Image' object has no attribute 'getexif'

What can i do to fix this?

When doing red apple test.

Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Using device: cuda:0
Optimising using: Adam
Using text prompts: ['a red apple']
Using seed: 16544599355900
i: 0, loss: 0.913828, losses: 0.913828
0it [00:02, ?it/s]
Traceback (most recent call last):
File "C:\Users\mikeb\ai-art-generator\vqgan.py", line 867, in
train(i)
File "C:\Users\mikeb\ai-art-generator\vqgan.py", line 751, in train
checkin(i, lossAll)
File "C:\Users\mikeb\anaconda3\envs\ai-art\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\mikeb\ai-art-generator\vqgan.py", line 721, in checkin
TF.to_pil_image(out[0].cpu()).save(args.output, pnginfo=info)
File "C:\Users\mikeb\anaconda3\envs\ai-art\lib\site-packages\PIL\Image.py", line 2317, in save
fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'output/output.png'

Up-Scaling and Diffusion Not Working?

Is there any way to upscale a generated image? Would that be done by diffusion? but if so the problem then is my Diffusion is missing cv2 Module. Everything was installed correctly also. But just a general question of how Upscaling is done

Diffusion _pickle.UnpicklingError: invalid load key, '<'.

I am getting the following error while running diffusion.py

 Traceback (most recent call last):
  File "/home/ali/Desktop/ai-art-generator/diffusion.py", line 1650, in <module>
    secondary_model.load_state_dict(torch.load(f'{model_path}/secondary_model_imagenet_2.pth', map_location='cpu'))
  File "/home/ali/miniconda3/envs/ai-art/lib/python3.9/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/ali/miniconda3/envs/ai-art/lib/python3.9/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

The command I am running is

python diffusion.py -s 400 400 -i 250 -p "a shining lighthouse on the shore of a tropical island in a raging storm" -o output/diff.png

Thank you

Tagging Outputs with Source Images?

EDIT: I had several questions originally, but most of them arose from being completely new to this, but I'm reading more and more, and kind of seeing what's there. I have one question that I still think is relevant.

Is it possible to have the outputs tagged with information on the source images the AI mixed together and a fair percentage of the image that came from each of them?

no module named clip

i followed all the steps up to the test, and when i try 'python vqgan.py -s 128 128 -i 200 -p "a red apple" -o output/output.png' it says that there is no module named clip.

Can't run python diffusion.py

Traceback (most recent call last):
  File "F:\Projects\ai-art-generator\diffusion.py", line 1438, in <module>
    sr_model = get_model('superresolution')
  File "F:\Projects\ai-art-generator\diffusion.py", line 1219, in get_model
    model, step = load_model_from_config(config, path_ckpt)
  File "F:\Projects\ai-art-generator\diffusion.py", line 1209, in load_model_from_config
    model = instantiate_from_config(config.model)
  File "F:\Projects\ai-art-generator\./latent-diffusion\ldm\util.py", line 85, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "F:\Projects\ai-art-generator\./latent-diffusion\ldm\util.py", line 93, in get_obj_from_str
    return getattr(importlib.import_module(module, package=None), cls)
  File "E:\anaconda\envs\ai-art\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "F:\Projects\ai-art-generator\./latent-diffusion\ldm\models\diffusion\ddpm.py", line 19, in <module>
    from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

pytorch-lightning 2.0.4 is installed

No GPU Found?

i7 - 9700k
GTX 1660 Super
Did all commands and first trained models downloaded
"No GPU Found, using CPU"
Desktop PC, 16gb ram

Torch

how do you fix "AssertionError: Torch not compiled with CUDA enabled"
and " ModuleNotFoundError: No module named 'torch._six' "

Can run step #12 test CLIP guided diffusion

Traceback (most recent call last):
  File "F:\Projects\ai-art-generator\diffusion.py", line 2211, in <module>
    model.load_state_dict(torch.load(f'{model_path}/{diffusion_model}.pt', map_location='cpu'))
  File "E:\anaconda\envs\ai-art\lib\site-packages\torch\serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "E:\anaconda\envs\ai-art\lib\site-packages\torch\serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

CUDA out of memory issue

Hi,

I ran the first test successfully with the lighthouse prompt. It works great! But for the second time, I ran into this issue:

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.69 GiB total capacity; 20.71 GiB already allocated; 187.50 MiB free; 20.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I dont understand why, because when I do nvidia-smi, it shows that I have enough space. I have ran torch.cuda.empty_cache as well, but it doesnt seem to work.

Any help would be appreciated. Thanks, and great work!

Error when running + Where do I get the images?

RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 4.00 GiB total capacity; 3.35 GiB already allocated; 0 bytes free; 3.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Also the images arent in the output folder.

reproducing the art style based on people's picture

Thank you for sharing your great work!
I wonder what settings you used to generate the second rows of your example (two people's art style picture: sample03.jpg and sample04.jpg). Would you share the exact settings for that please? I tried the ffhq transformer but the results are quite different from yours and somewhat disappointing..!

Output image corners/borders getting deformed.

Hi there, after messing around with the code on this repo I noticed that some of my images were getting deformed corners/borders, mostly the top-right and bottom-left corners but in general all the corners seem to have this issue. This happened a few times on multiple images but there were other images were this didnt happen, maybe it's some of the settings Im using that are messing things up. I would appreciate any help as I havent been able to figure out the issue by myself and I definetely want to keep using this tool to see what I can generate with it. Here are some of the images I generated that have this issue, not sure if github will preserve the metadata on the images but if it doesnt let me know and I can just share the parameters I used when running the code to get those images so other people can easy reproduce the same or similar result.
horror-village-with-dead-trees-a-snowy-tall-mountain-in-th-unreal-engine-high-detailk-ultra-hd-ray-tracing-realistic-photorealistic
painting-of-small-cabin-in-the-middle-of-snowy-mountains-in-the-winter-at-night-in-the-style-of-disney-trending-on-artstation-unreal-engine

Here are some images that do not have this issue:
landscape-wallpaper-wallpaper-5
landscape-wallpaper-wallpaper-4
landscape-wallpaper-wallpaper-3
landscape-wallpaper-wallpaper-2
landscape-wallpaper-wallpaper

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.