rq-wu / lamp Goto Github PK

Official implement code of LAMP: Learn a Motion Pattern by Few-Shot Tuning a Text-to-Image Diffusion Model (Few-shot-based text-to-video diffusion)

Home Page: https://rq-wu.github.io/projects/LAMP/index.html

License: Other

Python 100.00%

aigc diffusion diffusion-model diffusion-models few-shot-learning stable-diffusion text-to-video video-editing

lamp's Introduction

武睿祺(Ruiqi Wu)

I am a master student of TMCC, College of Computer Science, Nankai University, China, under the supervision of Prof. Ming-Ming Cheng & Dr. Chun-Le Guo. My research interests are computer vision and machine learning, focusing on AIGC and low-level vision.

Visitor count

Star History

lamp's People

Contributors

Stargazers

Watchers

Forkers

evodmik eltociear wipwai shashwatnigam99 theonlynick0430 guspan-tanadi hongj77 xuweiyichen qinhao0519

lamp's Issues

plans for Google Drive?

Hi there - amazing work! Just wondering when you are planning to upload the models to google drive - excited to play with them

Is one-shot learning possible?

Great Work!

I have a question about the setting. Is LAMP suitable for only few-shot learning or it is also suitable for one-shot learning?
In other words, does LAMP require 8~16 videos always or is one video okay too?

Thank you in advance.

Regarding the paper

Hi, thank you for the interesting work. I have a question about the proposed method.

a 2D convolution with an output channel of 1 along with a Sigmoid function is added

self.conv_gate = nn.Conv2d(out_channels, 1, 3, stride=1, padding=1)
x_gate = rearrange(x_2d, "b c f h w -> (b f) c h w")
c = x_gate.shape[1]
x_gate = self.sigmoid(self.conv_gate(x_gate)).repeat(1, c, 1, 1)

I would like to know what is the insight behind using a c -> 1 channel convolution and then repeating back c times. As a side question, what is the purpose of using a sigmoid function after this branch before multiplying to the conv_1d output?
Thanks.

About multi-action

Thank you for your excellent work! I try to train three actions at the same time(horse run, birds fly, waterfall), but the result is not as good as the single action. Can you give me some suggestions?

Evaluation code

Hi! I was wondering if you could share the evaluation code used in LAMP or point me to references that you used for the results reported in the paper? Thank you!

does each model only contain an individual motion or can multiple types of motion be trained inside a single model?

does each model only contain an individual motion or can you train multiple types of motion inside a single model like motiondirector ?

inference_script

Hi, thanks a lot for your interesting work! I know that in your paper you explain that you use the T2I model to generate the first frame during inference, but there doesn't seem to be any code in the "inference_script" that generates the first frame. I'm wondering if I'm mistaken.