carperai / drlx Goto Github PK

View Code? Open in Web Editor NEW

171.0 171.0 7.0 4.57 MB

Diffusion Reinforcement Learning Library

License: MIT License

Python 100.00%

drlx's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes tingtingin bigdatasciencegroup abdulk084 djbielejeski nbardy ariapoy

drlx's Issues

CUDA Error on second epoch

Seeing an unknown CUDA error on the second epoch. Will try to debug more tomorrow.

Traceback (most recent call last):
  File "/home/paperspace/git/DRLX/train_aesthetics.py", line 12, in <module>
    trainer.train(pipe, Aesthetics())
  File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in train
    if self.config.train.total_samples is not None:
  File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in <listcomp>
    if self.config.train.total_samples is not None:
  File "/home/paperspace/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/paperspace/git/DRLX/src/drlx/denoisers/ldm_unet.py", line 125, in postprocess
    images = images.detach().cpu().permute(0,2,3,1).numpy()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
File "/home/paperspace/git/DRLX/train_aesthetics.py", line 12, in
trainer.train(pipe, Aesthetics())
File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in train
if self.config.train.total_samples is not None:
File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in
if self.config.train.total_samples is not None:
File "/home/paperspace/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/paperspace/git/DRLX/src/drlx/denoisers/ldm_unet.py", line 125, in postprocess
images = images.detach().cpu().permute(0,2,3,1).numpy()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Wen SDXL support?

@Stability-AI launched SDXL 0.9 on June 22. It will be great to see SDXL model support at DRLX.

Add support for BitFit

Paper: https://aclanthology.org/2022.acl-short.1/

Summary (my words):

As a model trainer, it would be nice if we could use this directed policy optimization trainer to train just the bias of the U-net, keeping the weights frozen.

Initial testing shows that this approach allows us to carefully direct the model toward better details / aesthetics while maintaining most of the model's core structure.

Where full weight and bias tuning results in almost complete destruction of SD 2.1-v using just 8 images for finetuning, this method allows pushing past 400 epochs on the same dataset.

Example:

The starting point ^

After just 810 steps ^

This is without any DPO, simply finetuning based on MSE loss and velocity objective.

Comparison, the mode collapse of SD 2.1-v when tuning weights and bias which occurs in fewer steps:

This is using the same hyperparameters, eg. learning rate/scheduler/dataset/seeds.

Local models dont work?

I was trying to use a local safetensors sd model and cant seem to get it to work does the current setup always trys to download from hugging face even if an explicit file path is given and use_safetensors is set to true.

The models will work locally if downloaded from the hub intially but not if I give a file path to a local safetensors model

`save_samples` is shown in the example configs but it isn't supported by the main branch

DRLX/configs/ddpo_sd_pickapic.yml

Line 41 in 2c20e43

save_samples: False

save_samples is not supported by the main branch. When in the config, it causes the following error:

Traceback (most recent call last):
  File "/home/ogezi/miniconda3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/ogezi/miniconda3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/home/ogezi/projects/Freda/relations-encoding/spatial_rl.py", line 135, in <module>
    config = DRLXConfig.load_yaml("configs/ddpo_sd.yml")
  File "/home/ogezi/miniconda3/lib/python3.9/site-packages/drlx/configs.py", line 325, in load_yaml
    return cls.from_dict(config)
  File "/home/ogezi/miniconda3/lib/python3.9/site-packages/drlx/configs.py", line 354, in from_dict
    train=TrainConfig.from_dict(config["train"]),
  File "/home/ogezi/miniconda3/lib/python3.9/site-packages/drlx/configs.py", line 12, in from_dict
    return cls(**cfg)
TypeError: __init__() got an unexpected keyword argument 'save_samples'

Add bitsandbytes optimizer support

For 8-bit optimizer training...

Add Direct Preference Optimization support

There should be a way to do Direct Preference Optimization with diffusion models. Ryan Murdock already has it working:
https://twitter.com/advadnoun/status/1677479082752364546

Requires further investigation.

Missing datasets package

Required by pipeline/pickapic_prompts.py

Add abilty to load from safetensors

It seems that presently you can't load from a safetensors file unless I'm mistaken the library is using diffusers with the sdpipeline under the hood which should support this

How to see sampled images?

How to see the sampled images that are being used for the reinforcement while training?

Reward model inference

Need to add reward model inference for when the RM is a sizable model. Currently attempts to have RM on each GPU. This is problematic because there are many cases where RM is too big to fit alongside the denoiser model. Solution in LLM case is often to use Triton inference server or to put RM on one gpu while main model uses rest of GPUs. Should be explored further.