ashleykleynhans / kohya-docker Goto Github PK

View Code? Open in Web Editor NEW

56.0 3.0 14.0 195 KB

Docker image for Kohya_ss Web UI

License: GNU General Public License v3.0

Dockerfile 35.19% Shell 47.76% HCL 17.05%

docker dreambooth kohya-webui lora runpod

kohya-docker's People

Contributors

Stargazers

Watchers

Forkers

bmaltais mitchellfox story-boards-ai 5l1v3r1 fearvel komojini cjvandyk miabrahams jwbadev fcogomez cheffromspace neroyuki journa-ly niko2020

kohya-docker's Issues

[bug] `accelerate` not found with runpod "ashleykza/kohya:1.11.1"

While running a runpod container with this setup, I'm getting an accelerate not found. Any tips to debug?

ERROR:

Full logs below

04:39:07-748602 INFO     Start training LoRA Standard ...                       
04:39:07-750284 INFO     Checking for duplicate image filenames in training data
                         directory...                                           
04:39:07-752158 INFO     Valid image folder names found in:                     
                         /workspace/organize/watches/img                        
04:39:07-753621 INFO     Headless mode, skipping verification if model already  
                         exist... if model already exist it will be             
                         overwritten...                                         
04:39:07-755396 INFO     Folder 20_wxwatch watch: 24 images found               
04:39:07-756743 INFO     Folder 20_wxwatch watch: 480 steps                     
04:39:07-758027 INFO     Total steps: 480                                       
04:39:07-759290 INFO     Train batch size: 2                                    
04:39:07-759959 INFO     Gradient accumulation steps: 1                         
04:39:07-760572 INFO     Epoch: 10                                              
04:39:07-761151 INFO     Regulatization factor: 1                               
04:39:07-761755 INFO     max_train_steps (480 / 2 / 1 * 10 * 1) = 2400          
04:39:07-762504 INFO     stop_text_encoder_training = 0                         
04:39:07-763121 INFO     lr_warmup_steps = 240                                  
04:39:07-763757 INFO     Can't use LR warmup with LR Scheduler constant...      
                         ignoring...                                            
04:39:07-764482 INFO     Saving training config to                              
                         /workspace/organize/watches/model/wxwatches_20240119-04
                         3907.json...                                           
04:39:07-765433 INFO     accelerate launch --num_cpu_threads_per_process=2      
                         "./sdxl_train_network.py" --enable_bucket              
                         --min_bucket_reso=256 --max_bucket_reso=2048           
                         --pretrained_model_name_or_path="stabilityai/stable-dif
                         fusion-xl-base-1.0"                                    
                         --train_data_dir="/workspace/organize/watches/img"     
                         --resolution="1024,1024"                               
                         --output_dir="/workspace/organize/watches/model"       
                         --logging_dir="/workspace/organize/watches/log"        
                         --network_alpha="1" --save_model_as=safetensors        
                         --network_module=networks.lora --text_encoder_lr=5e-05 
                         --unet_lr=0.0001 --network_dim=8                       
                         --output_name="wxwatches"                              
                         --lr_scheduler_num_cycles="10" --no_half_vae           
                         --learning_rate="5e-05" --lr_scheduler="constant"      
                         --train_batch_size="2" --max_train_steps="2400"        
                         --save_every_n_epochs="1" --mixed_precision="fp16"     
                         --save_precision="fp16" --cache_latents                
                         --cache_latents_to_disk --optimizer_type="Adafactor"   
                         --optimizer_args scale_parameter=False                 
                         relative_step=False warmup_init=False                  
                         --max_grad_norm="1" --max_data_loader_n_workers="0"    
                         --bucket_reso_steps=64 --xformers --bucket_no_upscale  
                         --noise_offset=0.0                                     
/bin/sh: 1: accelerate: not found

require advice how to use it

Had managed to run up the docker. but am lost what to do next. Can give advise how to run the ui to do the lora fine tuning?

Old school issue popping up

This i get when i'm trying to do the log thing

return buttonbox(msg=msg,

File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 95, in buttonbox
bb = ButtonBox(
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 147, in init
self.ui = GUItk(msg, title, choices, images, default_choice, cancel_choice, self.callback_ui)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 263, in init
self.boxRoot = tk.Tk()
File "/usr/lib/python3.10/tkinter/init.py", line 2299, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

I"m not joking you, this worked less than a month ago, i got an SDXL lora running for 0.9 less than three weeks ago

Kohya webUI is not loading

I installed the docker container like so:
sudo docker run -d --name kohya --gpus all -v '/kohya/kohya-docker/workspace' -p 3000:3001 -p 8000:8000 -p 8888:8888 -p 2999:2999 ashleykza/kohya:latest

also without the name:
sudo docker run -d --name kohya --gpus all -v '/kohya/kohya-docker/workspace' -p 3000:3001 -p 8000:8000 -p 8888:8888 -p 2999:2999 ashleykza/kohya:latest

I'm able to see Runpod, jupyterLab, and the runpod uploader, but the kohya_ss webui will not load. I am on localhost:3000. I tried looking on another device on my network and i am unable to access the page, logs just say "container is READY". It was working yesterday. I have recreated the container a few times.

Starting Jupyter Lab... Jupyter Lab started Starting RunPod Uploader... RunPod Uploader started Running pre-start script... Template version: 24.0.6 Syncing kohya_ss to workspace, please wait... Syncing Application Manager to workspace, please wait... Fixing venv... Fixing venv. Old Path: /kohya_ss/venv New Path: /workspace/kohya_ss/venv Configuring accelerate... Starting Kohya_ss Web UI Kohya_ss started Log file: /workspace/logs/kohya_ss.log All services have been started RUNPOD_PUBLIC_IP is not set. Skipping FileZilla configuration. Updating rclone... 2024/04/27 22:49:13 NOTICE: rclone is up to date Exporting environment variables... Container is READY!

Runpod Kohya Finetune process gets killed on launch

This might be related to the recent update to 23.1 as it wasnt happening before. Could it be related to the Torch 2.1.2 change?

I do not see any other messages other than "Killed"

how do I know if the training has started?

I made all the folders and uploaded the data like I do on my local PC
but still get this error in th Jupyter- Image folder does not exist

Application manager doesn't stop the application

The stable-diffusion-docker image uses a different port for Kohya_ss than this image, so stopping Kohya_ss using application manager on this image uses the wrong port.

Need to find a solution to be able to use different ports for stable-diffusion-docker and kohya-docker.

Building a new docker image

Hi
i used the built docker image ashleykza/kohya:latest to run and all is fine. However, when trying to train, i hit into issue as i believe there is some incompatibility of CUDA 11.8 with my machine. Am using CUDA 11.6 on my machine. Can i check is the Dockerfile in this repo the same one which is used to built the "ashleykza/kohya:latest" as i have issue with the #0 324.1 E: Couldn't find any package by glob 'python3.10-venv'" which python3.10-venv is one of the packages to installed. Thanks

UPDATE RCLONE

Hey would you mind running:

'rclone selfupdate'

Rclone requires to be on latest version to work with dropbox

Would save us the trouble.

Thank you!

Add cron

Would love Cron added in order to schedule regular rclone transfers during training. Checkpoint file size is huge. Helps to start transferring in parts regularly during training

ashleykleynhans / kohya-docker Goto Github PK

kohya-docker's People

Contributors

Stargazers

Watchers

Forkers

kohya-docker's Issues

[bug] `accelerate` not found with runpod "ashleykza/kohya:1.11.1"

require advice how to use it

Old school issue popping up

Kohya webUI is not loading

Runpod Kohya Finetune process gets killed on launch

how do I know if the training has started?

Application manager doesn't stop the application

Building a new docker image

UPDATE RCLONE

Add cron

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent