xpixelgroup / diffbir Goto Github PK

View Code? Open in Web Editor NEW

3.0K 3.0K 256.0 42.45 MB

Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

License: Apache License 2.0

Python 100.00%

diffbir's People

Contributors

Stargazers

Watchers

Forkers

universewill hitlhy715 kadirnar ip-restoration camenduru paperwave lamardealmaker kp-forks crystalwizard patrickcusack polyphron realsrisri gagsusu plazas-inc freespace8 pemorse dokosho02 bit-r sebruant foundations saulocatharino techthiyanes huaqinghe vmasalov spikerus93 tonywhite11 g2262853652 dayadaya222 tztztz2 luckmc voswentum jeffmartson frederiklorenzen ai-machine-vision-lab silva-m bycloud-ai arahman493 aveshrahmani05 rkp64 vuongvmu ailabteam pterameta hitech777 kinotoaster liupen878 maximustes leftomelas a-biao96 syunar volotko2023 hhy5277 refactor2 dequeplus5 forexwiki treksis touristshaun martincastellano anhlbt dylangh sjupas zhaoyk1986 rohan7958 mdmmn378 tuanthanhdat stjordanis eltociear arcb01 ai-machine-vision-lab gradient-ai parsa-ra liguiming77 sarveshcchauhan x4ntep mariaerikina zsxkib tcharlezin jizhe-cheer ykyou xsuteamdesign gudzaman alexanderinum reedemus faisalomar arblib noif lhkjacky lyl1015 jaywoo elena447 josephrp coinhubx margar8800 asanakoy bencoster joymagine apricot404 24wenjie-li ky0-999 zeronerorgb 5l1v3r1

diffbir's Issues

CUDA out of memory RTX 3060 12GB

The xformers are installed and run correctly. Please, provide configuration with which it can run with 10GB GPU.

Questions about latent-based guidance during sampling

Hello, from the current released version of guidance.py, I do not find the relevant code about latent-based loss mentioned in section 3.3 in the paper. Could you please provide the related code for this part?

PretrainedModels能否上传到modelscope

鉴于国内封禁了huggingface，能否将PretrainedModels上传到modelscope
https://www.modelscope.cn

RuntimeError: User specified an unsupported autocast device_type 'cuda:0'

Hello Team,

When trying to run the following command : python inference.py --config configs/model/cldm.yaml --ckpt weights/general_full_v1.ckpt --steps 50 --sr_scale 1 --image_size 512 --input results/maxout/ --color_fix_type wavelet --resize_back --output results/detailed/ --disable_preprocess_model --device cuda on Linux Docker container (with a AMD 7900XT), I have the following error :

RuntimeError: User specified an unsupported autocast device_type 'cuda:0'

Before commit 30355a1 I was able to perform some image restoration, once pulled latest commit, I got the given error.

Best regards,

Nikos

24GB GPU out of memory

How much GPU RAM it needs to run? I have a 24GB 3090 and it still complains out of memory.

Installation problems on Windows

1. When I try to run ━ conda install xformers==0.0.16 -c xformers ,

I receive the following message:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - xformers==0.0.16

Current channels:

  - https://conda.anaconda.org/xformers/win-64
  - https://conda.anaconda.org/xformers/noarch
  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://conda.anaconda.org/pytorch/win-64
  - https://conda.anaconda.org/pytorch/noarch
  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

2. When I'm trying to run ━ pip install -r requirements.txt ,

the following error appears:

ERROR: Could not find a version that satisfies the requirement triton (from versions: none)
ERROR: No matching distribution found for triton

Сan you please explain to me what this could be related to? How is it possible to avoid the occurrence of these two errors?

Thanks.

Question about video memory consumption, and computational resource requirements

There is a similar work StableSR, which also uses the Stable Diffusion base model, and its GPU memory requirements are huge. And sometimes even tiling does not save the situation in any way.

StableSR is not able to properly process images with resolution higher than 190px, I personally tried it with different data and parameters, and it's just a huge waste of time.

How much more accessible would your work be to most graphics gas pedals, like those with 8 or 16gb of video memory?

Thank you very much, for your attention.

Is the calculation of loss influenced by I_{HQ} during the training of LAControlNet?

I_{HQ} only works during the SwinIR training? I couldn't locate any involvement of I_{HQ} in the LAControlNet training code. Whether "|I_{HQ} - I_{reg}|" is computed as part of the LAControlNet training process? LAControlNet only need to I_{reg} as input? If I intend to directly train LAControlNet using I_{HQ} and I_{reg}, will {I_HQ} be incorporated in the process?

PermissionError: [Errno 13] Permission denied

(diffbir) E:\AI>python e:\ai\diffbir\inference.py --config E:\AI\DiffBIR\configs\model\cldm.yaml --ckpt E:\AI\DiffBIR\ckpt --reload_swinir --swinir_ckpt E:\AI\DiffBIR\ckpt --steps 50 --input E:\AI\DiffBIR\lq_dir --sr_scale 1 --image_size 512 --color_fix_type wavelet --resize_back --output E:\AI\DiffBIR\hq_dirE:\AI\DiffBIR\inference.py --config E:\AI\DiffBIR\configs\model\cldm.yaml --ckpt E:\AI\DiffBIR\ckpt --reload_swinir --swinir_ckpt E:\AI\DiffBIR\ckpt --steps 50 --input E:\AI\DiffBIR\lq_dir --sr_scale 1 --image_size 512 --color_fix_type wavelet --resize_back --output E:\AI\DiffBIR\hq_dir
E:\Anaconda3\envs\diffbir\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
warnings.warn("No audio backend is available.")
No module 'xformers'. Proceeding without it.
Global seed set to 231
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
E:\Anaconda3\envs\diffbir\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
E:\Anaconda3\envs\diffbir\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
E:\Anaconda3\envs\diffbir\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: E:\Anaconda3\envs\diffbir\lib\site-packages\lpips\weights\v0.1\alex.pth
Traceback (most recent call last):
File "e:\ai\diffbir\inference.py", line 212, in
main()
File "e:\ai\diffbir\inference.py", line 140, in main
load_state_dict(model, torch.load(args.ckpt, map_location="cpu"), strict=True)
File "E:\Anaconda3\envs\diffbir\lib\site-packages\torch\serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "E:\Anaconda3\envs\diffbir\lib\site-packages\torch\serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "E:\Anaconda3\envs\diffbir\lib\site-packages\torch\serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
PermissionError: [Errno 13] Permission denied: 'E:\AI\DiffBIR\ckpt'

About this error? (run on M2)

(diffbir) pwoj@pwoj-mbpro DiffBIR % python gradio_diffbir.py --ckpt general_full_v1.ckpt --config configs/model/cldm.yaml --reload_swinir --swinir_ckpt general_swinir_v1.ckpt Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions. Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions. OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. zsh: abort python gradio_diffbir.py --ckpt general_full_v1.ckpt --config --reload_swini

Фото

Failure during training

I started training on A100 GPU with about 2000 training images. It completed about 900 Epochs, then the process ended abruptly without any errors. I can see several checkpoint step files.
I also tried to restart the traning by setting the resume path to the folder containing step files. But gives error {folder} is a directory.
Any help would be highly appreciated.

Thanks

Error in inference

Thanks for the excellent work. But some errors here:

I put lq images with size 512 in /home/notebook/data/group/DiffusionFace/DiffBIR/image/TestCrop.
I have no idea about the error. Thanks in advance

failed finding central directory

for general image inference

/home/bc/Projects/OpenSource/DiffBIR/venvDiffBIR/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/bc/Projects/OpenSource/DiffBIR/venvDiffBIR/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/bc/Projects/OpenSource/DiffBIR/venvDiffBIR/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /home/bc/Projects/OpenSource/DiffBIR/venvDiffBIR/lib/python3.10/site-packages/lpips/weights/v0.1/alex.pth
Traceback (most recent call last):
File "/home/bc/Projects/OpenSource/DiffBIR/inference.py", line 216, in
main()
File "/home/bc/Projects/OpenSource/DiffBIR/inference.py", line 141, in main
load_state_dict(model, torch.load(args.ckpt, map_location="cpu"), strict=True)
File "/home/bc/Projects/OpenSource/DiffBIR/venvDiffBIR/lib/python3.10/site-packages/torch/serialization.py", line 777, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/bc/Projects/OpenSource/DiffBIR/venvDiffBIR/lib/python3.10/site-packages/torch/serialization.py", line 282, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

`requirements.txt`

Thanks for this work! Now, I'm getting a strange error where it's telling me ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed' but I already have pytorch-lightning==1.8.5.post0. Any idea?

Here's the full stack trace

Traceback (most recent call last):
  File "/content/diffbir/inference.py", line 15, in <module>
Sep 02 at 22:50:30.984
    from model.cldm import ControlLDM
  File "/content/diffbir/model/cldm.py", line 18, in <module>
Sep 02 at 22:50:30.984
    from ldm.models.diffusion.ddpm import LatentDiffusion
  File "/content/diffbir/ldm/models/diffusion/ddpm.py", line 20, in <module>
Sep 02 at 22:50:30.985
    from pytorch_lightning.utilities.distributed import rank_zero_only
Sep 02 at 22:50:30.985
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

Model downloading problems: HuggingFace connection error

When I tried to inference，this error happened.I don‘t know how to solve this problem.

huggingface hub.utils.errors .LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. please check your connection and try again or make sure your Internet connection is on.

Issue when try to train model

I have done the split creating the list train.list and val.list. I also edited the config files according to the instructions in the readme.md but I got an error, it seemed like there was a problem reading the image data so I checked again but couldn't find any problems. Can you help me ?

Could it be extended to include Anti-adversarial noise capabilities?

Hi, all! Thanks for your work on such a project

I wonder if the image restoration capabilities can be extended to adversarial noise purification as well, similar to the year old Nvidia's DiffPure https://github.com/NVlabs/DiffPure

I'm certain such an application. which is able to both restore the images and clear them off adversarial noise, will have a profound impact on the industry

Does DiffBIR support input at arbitrary resolutions?

Excellent work, but I noticed while reading the paper and code that it seems to not support arbitrary resolution input (instead, it forcibly scales the image). This feature is supported in both PatchDM and StableSR.

If it indeed doesn't support large image restoration, are there plans to include this feature?

Multi GPU Support?

Loving the results, but maxing out GPU 24GB VRAM. Can this be run with mutli GPUs, where it will continue on the second GPU so it doesn't run out of VRAM? Any changes I have to make to this:

python inference.py \
--input inputs/general \
--config configs/model/cldm.yaml \
--ckpt weights/general_full_v1.ckpt \
--reload_swinir --swinir_ckpt weights/general_swinir_v1.ckpt \
--steps 50 \
--sr_scale 4 \
--image_size 512 \
--color_fix_type wavelet --resize_back \
--output results/general

6gb vram LAPTOP, CUDA out of memory

PS E:\AI\DiffBIR\DiffBIR> venv\Scripts\activate.ps1
(venv) PS E:\AI\DiffBIR\DiffBIR> python gradio_diffbir.py --ckpt ./general_full_v1.ckpt --config configs/model/cldm.yaml
--reload_swinir --swinir_ckpt ./general_swinir_v1.ckpt --device cuda
ControlLDM: Running in eps-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3484.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\lpips\weights\v0.1\alex.pth
reload swinir model from ./general_swinir_v1.ckpt
Traceback (most recent call last):
File "E:\AI\DiffBIR\DiffBIR\gradio_diffbir.py", line 40, in
model.to(args.device)
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 109, in to
return super().to(*args, **kwargs)
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
return self._apply(convert)
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "E:\AI\DiffBIR\DiffBIR\venv\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.30 GiB already allocated; 0 bytes free; 5.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

how much vram does it support?

finetune stable diffusion

Hi, thanks for this excellent work. I am working my own dataset (medical images) now. I see the paper and find the stable diffusion is frozen and also the autoencoder. could you please share the finetuning stable diffusion code if possible?
thanks a lot for your time

Training duration on A100

I started training with about 2000 images with a batch size of 10.

How long does traning take for 2000 images set? Currently, it is at about 800 Epoch and each Epoch taking about 2.5 mins. Doesn't show total Epochs to process.
I can see files like step=49999.ckpt etc. are created after every 10000 steps. If the training is stopped and started again, will it resume from where it was stopped?
Can the training be done on CPU only?

Thanks,
Kiran

Running on local URL: http://0.0.0.0:7860 - but link is not working

I think I managed to get it running but the local URL it gave me is wrong

the one that works for me is http://127.0.0.1:7860/ (from another repo that I use frequently)

Plans for other stable diffusion models

Hello, I wanted to ask if there are any plans regards using other stable diffusion models.

I can imagine that stable diffusion models which are fine-tuned on higher resolution images or higher quality images to work better compared to the base model.

about test error

cldm.py下的sample_log函数

@torch.no_grad()
def sample_log(self, cond, steps):
sampler = SpacedSampler(self)
b, c, h, w = cond["c_concat"][0].shape
shape = (b, self.channels, h // 8, w // 8)
samples = sampler.sample(steps, shape, cond, unconditional_guidance_scale=1.0, unconditional_conditioning=None)
return samples

sampler.sample传递的参数是不是有问题，还是sampler的定义有问题，麻烦解答一下

It is possible to use the models from JS?

Is there a JS interface? Or maybe the models are hosted on HuggingFace?

PS: I wasn't sure if this is the best way to reach out, happy to use a more suitable channel. Thanks!

Is torchtext mandatory？

Where to host?

Team,

When I try to run this code, I get the following error when pressing run after uploading a photo:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.43 GiB total capacity; 6.75 GiB already allocated; 10.44 MiB free; 6.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any recommendations on the kind of EC2 instance I would need to run this code? Can you recommend any platforms in addition to AWS?

Will other samplers be supported in the future?

I noticed that you've removed the DDIM option in the current version of the code, even though it didn't seem to work in the initial version. Sampling efficiency is one of the obstacles to practicality, especially for high-resolution images. Will DDIM and DPM-Solver samplers be supported in the future?

not support apple m1

cutlassF is not supported because:
device=cpu (supported: {'cuda'})
flshattF is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
max(query.shape[-1] != value.shape[-1]) > 128
Operator wasn't built - see python -m xformers.info for more info
tritonflashattF is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
max(query.shape[-1] != value.shape[-1]) > 128
Operator wasn't built - see python -m xformers.info for more info
triton is not available
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 512

OpenXLab issues

I see that OpenXLab has issues every day, is it possible that you hot DIffBIR on hugging face or replicate?
They're more reliable

about inference_face.py

The line 178 in inference_face.py is

restored_img = restored_img[:lq_resized.height, :lq_resized.width, :]

It seems, however, the place where restored_img is declared is in the line 158

restored_img = face_helper.paste_faces_to_input_image(
    upsample_img=bg_img
)

but this is inside of

if not args.has_aligned:

If this condition is not satisfied, restored_img will become an undefined variable and cause an error.
What is this variable supposed to be? Is it restored_face?

Problem with Pytorch Lightning

Traceback (most recent call last):
File "C:\Users\wdson\Downloads\Compressed\DiffBIR-main\inference.py", line 15, in
from model.cldm import ControlLDM
File "C:\Users\wdson\Downloads\Compressed\DiffBIR-main\model\cldm.py", line 18, in
from ldm.models.diffusion.ddpm import LatentDiffusion
File "C:\Users\wdson\Downloads\Compressed\DiffBIR-main\ldm\models\diffusion\ddpm.py", line 20, in
from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

I am getting this error, even though I have Pytorch Lightning installed. Also, I could not find "requirements.txt" in the repository, so I installed the modules as they were being required.

open clip model erro

when I run the following command
“python inference.py
--input inputs/demo/general
--config configs/model/cldm.yaml
--ckpt weights/general_full_v1.ckpt
--reload_swinir --swinir_ckpt weights/general_swinir_v1.ckpt
--steps 50
--sr_scale 4
--image_size 512
--color_fix_type wavelet --resize_back
--output results/demo/general
--device cuda”

An error occurred

cache files is

Can you help me, how should I solve this problem

Where is the code for Latent Image Guidance?

I'm sorry to bother you again, as I have already started research based on your work.
I'm very interested in the Latent Image Guidance in Section 3.3, but I only found the classifier-free guidance strength in the inference code. I further found the relevant code in the sampler, but it seems like it's not enabled in the current version?

Hope to support a1111 sd webui

Hope to support a1111 sd webui, thank you developers.

How to change the "Control Strength" in the Code instead of Page?

🦒 colab

Thanks for the project ❤️ I made a colab. 🥳 I hope you like it. https://github.com/camenduru/DiffBIR-colab

Installation

Hi how can i download and install ? where we install on windows? i use Windows can you guide me plz thanks

Comfyui please

Yes

request.txt

hello,when i install the diffbir, i meet with a problem

when i run "pip install -r requirements.txt", cmd shows "ERROR: Could not find a version that satisfies the requirement triton"

my already used python is 3.11, i don't know if this is the cause of this problem

thank you for your time

and what's more, when i input the "conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch", cmd shows as follow picture:

sorry for my junior question

ERROR: Could not find a version that satisfies the requirement triton (from versions: none)

I need help

about urls for RealESRGAN checkpoint

line 326 in realesrganer.py is

model_path=f"https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x{scale}plus.pth",

when scale == 2, it tries to download RealESRGAN_x2plus.pth from v0.1.0, but v0.1.0 doesn't have x2plus checkpoint.
It should be download from v0.2.1