Comments (5)
Here's the output i'm getting while training.
Running on GPUs 0,
Perfusion: Running in eps-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 867.83 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Restored from ./ckpt/v2-1_512-ema-pruned.ckpt with 38 missing and 2 unexpected keys
Missing Keys:
['logvar', 'C_inv', 'target_input', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.target_output', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.target_output', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.target_output', 'embedding_manager.string_to_param_dict.*', 'embedding_manager.initial_embeddings.*', 'embedding_manager.get_embedding_for_tkn.weight']
Unexpected Keys:
['model_ema.decay', 'model_ema.num_updates']
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
accumulate_grad_batches = 4
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------------------------------
0 | model | DiffusionWrapper | 867 M
1 | first_stage_model | AutoencoderKL | 83.7 M
2 | cond_stage_model | FrozenOpenCLIPEmbedder | 354 M
3 | embedding_manager | EmbeddingManager | 50.6 M
-------------------------------------------------------------
961 K Trainable params
1.3 B Non-trainable params
1.3 B Total params
2,611.042 Total estimated model params size (MB)
Save project config
Save lightning config
Epoch 0: 0%| | 0/5000 [00:00<?, ?it/s]Data shape for DDIM sampling is (4, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:22<00:00, 2.21it/s]
Epoch 0: 0%| | 20/5000 [00:59<4:08:41, 3.00s/it, loss=0.0518, v_num=, train/loss_simple_step=0.0522, train/loss_vlb_step=0.000219, train/los
from key-locked-rank-one-editing-for-text-to-image-personalization.
me too
from key-locked-rank-one-editing-for-text-to-image-personalization.
@SlZeroth @PaulToast I'm experiencing the same thing, even when running on a relatively beefy GPU. were you able to speed up training somehow?
from key-locked-rank-one-editing-for-text-to-image-personalization.
@SlZeroth @PaulToast I'm experiencing the same thing, even when running on a relatively beefy GPU. were you able to speed up training somehow?
Well, mainly i was given a more powerful graphics card haha. I don't think the training was going that slow after all, the numbers are a little misleading - its not supposed to go through multiple epochs, it just goes on until it reaches the max_steps set in the config. You will probably never have to train further than step=600 or something
from key-locked-rank-one-editing-for-text-to-image-personalization.
thanks @PaulToast, yea makes sense. I eventually figured it out after I found there was a mistake in how I was passing the args to the training script 🤦🏻♂️
from key-locked-rank-one-editing-for-text-to-image-personalization.
Related Issues (15)
- Questions about the reason for applying EMA to the target input i*
- AssertionError: String 'birdbath' maps to more than a single token.
- How does this excellent work apply to the webui?
- training image folder option not present in the configs/perfusion_inference_sd_v2.yaml HOT 1
- shape mismatch: value tensor of shape [1024] cannot be broadcast to indexing result of shape [0, 768]
- missing 38 keys and having 2 unexpected keys
- Where should I put the downloaded pre-trained model?
- Device error HOT 4
- Any plan for supporting Support SDXL-1.0? HOT 1
- No font file font "data/DejaVuSans.ttf" HOT 2
- Load Model Problem TypeError: expected str, bytes or os.PathLike object, not NoneType
- Problem when running training with num_vectors_per_token>1 HOT 1
- Using CLIP similarity to automatically select a balanced weight is necessary.
- multiconcept for woman and teddy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from key-locked-rank-one-editing-for-text-to-image-personalization.