Giter Site home page Giter Site logo

kohya-docker's People

Contributors

ashleykleynhans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kohya-docker's Issues

[bug] `accelerate` not found with runpod "ashleykza/kohya:1.11.1"

While running a runpod container with this setup, I'm getting an accelerate not found. Any tips to debug?

CleanShot 2024-01-18 at 18 52 11@2x

ERROR:
CleanShot 2024-01-18 at 18 53 02@2x

Full logs below

04:39:07-748602 INFO     Start training LoRA Standard ...                       
04:39:07-750284 INFO     Checking for duplicate image filenames in training data
                         directory...                                           
04:39:07-752158 INFO     Valid image folder names found in:                     
                         /workspace/organize/watches/img                        
04:39:07-753621 INFO     Headless mode, skipping verification if model already  
                         exist... if model already exist it will be             
                         overwritten...                                         
04:39:07-755396 INFO     Folder 20_wxwatch watch: 24 images found               
04:39:07-756743 INFO     Folder 20_wxwatch watch: 480 steps                     
04:39:07-758027 INFO     Total steps: 480                                       
04:39:07-759290 INFO     Train batch size: 2                                    
04:39:07-759959 INFO     Gradient accumulation steps: 1                         
04:39:07-760572 INFO     Epoch: 10                                              
04:39:07-761151 INFO     Regulatization factor: 1                               
04:39:07-761755 INFO     max_train_steps (480 / 2 / 1 * 10 * 1) = 2400          
04:39:07-762504 INFO     stop_text_encoder_training = 0                         
04:39:07-763121 INFO     lr_warmup_steps = 240                                  
04:39:07-763757 INFO     Can't use LR warmup with LR Scheduler constant...      
                         ignoring...                                            
04:39:07-764482 INFO     Saving training config to                              
                         /workspace/organize/watches/model/wxwatches_20240119-04
                         3907.json...                                           
04:39:07-765433 INFO     accelerate launch --num_cpu_threads_per_process=2      
                         "./sdxl_train_network.py" --enable_bucket              
                         --min_bucket_reso=256 --max_bucket_reso=2048           
                         --pretrained_model_name_or_path="stabilityai/stable-dif
                         fusion-xl-base-1.0"                                    
                         --train_data_dir="/workspace/organize/watches/img"     
                         --resolution="1024,1024"                               
                         --output_dir="/workspace/organize/watches/model"       
                         --logging_dir="/workspace/organize/watches/log"        
                         --network_alpha="1" --save_model_as=safetensors        
                         --network_module=networks.lora --text_encoder_lr=5e-05 
                         --unet_lr=0.0001 --network_dim=8                       
                         --output_name="wxwatches"                              
                         --lr_scheduler_num_cycles="10" --no_half_vae           
                         --learning_rate="5e-05" --lr_scheduler="constant"      
                         --train_batch_size="2" --max_train_steps="2400"        
                         --save_every_n_epochs="1" --mixed_precision="fp16"     
                         --save_precision="fp16" --cache_latents                
                         --cache_latents_to_disk --optimizer_type="Adafactor"   
                         --optimizer_args scale_parameter=False                 
                         relative_step=False warmup_init=False                  
                         --max_grad_norm="1" --max_data_loader_n_workers="0"    
                         --bucket_reso_steps=64 --xformers --bucket_no_upscale  
                         --noise_offset=0.0                                     
/bin/sh: 1: accelerate: not found

require advice how to use it

Hi

Had managed to run up the docker. but am lost what to do next. Can give advise how to run the ui to do the lora fine tuning?

Old school issue popping up

This i get when i'm trying to do the log thing

return buttonbox(msg=msg,

File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 95, in buttonbox
bb = ButtonBox(
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 147, in init
self.ui = GUItk(msg, title, choices, images, default_choice, cancel_choice, self.callback_ui)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 263, in init
self.boxRoot = tk.Tk()
File "/usr/lib/python3.10/tkinter/init.py", line 2299, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

I"m not joking you, this worked less than a month ago, i got an SDXL lora running for 0.9 less than three weeks ago

Kohya webUI is not loading

I installed the docker container like so:
sudo docker run -d --name kohya --gpus all -v '/kohya/kohya-docker/workspace' -p 3000:3001 -p 8000:8000 -p 8888:8888 -p 2999:2999 ashleykza/kohya:latest

also without the name:
sudo docker run -d --name kohya --gpus all -v '/kohya/kohya-docker/workspace' -p 3000:3001 -p 8000:8000 -p 8888:8888 -p 2999:2999 ashleykza/kohya:latest

I'm able to see Runpod, jupyterLab, and the runpod uploader, but the kohya_ss webui will not load. I am on localhost:3000. I tried looking on another device on my network and i am unable to access the page, logs just say "container is READY". It was working yesterday. I have recreated the container a few times.

Starting Jupyter Lab... Jupyter Lab started Starting RunPod Uploader... RunPod Uploader started Running pre-start script... Template version: 24.0.6 Syncing kohya_ss to workspace, please wait... Syncing Application Manager to workspace, please wait... Fixing venv... Fixing venv. Old Path: /kohya_ss/venv New Path: /workspace/kohya_ss/venv Configuring accelerate... Starting Kohya_ss Web UI Kohya_ss started Log file: /workspace/logs/kohya_ss.log All services have been started RUNPOD_PUBLIC_IP is not set. Skipping FileZilla configuration. Updating rclone... 2024/04/27 22:49:13 NOTICE: rclone is up to date Exporting environment variables... Container is READY!

Runpod Kohya Finetune process gets killed on launch

Runpod Kohya Finetune process gets killed on launch

This might be related to the recent update to 23.1 as it wasnt happening before. Could it be related to the Torch 2.1.2 change?

image

image

I do not see any other messages other than "Killed"

Application manager doesn't stop the application

The stable-diffusion-docker image uses a different port for Kohya_ss than this image, so stopping Kohya_ss using application manager on this image uses the wrong port.

Need to find a solution to be able to use different ports for stable-diffusion-docker and kohya-docker.

Building a new docker image

Hi
i used the built docker image ashleykza/kohya:latest to run and all is fine. However, when trying to train, i hit into issue as i believe there is some incompatibility of CUDA 11.8 with my machine. Am using CUDA 11.6 on my machine. Can i check is the Dockerfile in this repo the same one which is used to built the "ashleykza/kohya:latest" as i have issue with the #0 324.1 E: Couldn't find any package by glob 'python3.10-venv'" which python3.10-venv is one of the packages to installed. Thanks

UPDATE RCLONE

Hey would you mind running:

'rclone selfupdate'

Rclone requires to be on latest version to work with dropbox

Would save us the trouble.

Thank you!

Add cron

Would love Cron added in order to schedule regular rclone transfers during training. Checkpoint file size is huge. Helps to start transferring in parts regularly during training

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.