lotayou / everybody_dance_now_pytorch Goto Github PK

A PyTorch Implementation of "Everybody Dance Now" from Berkeley AI lab.

License: GNU Affero General Public License v3.0

Python 98.68% Shell 1.32%

pytorch-implementation human-video-generation

everybody_dance_now_pytorch's Introduction

Everybody Dance Now (pytorch)

A PyTorch implementation of "Everybody Dance Now" from Berkeley AI lab. Including all functionality except pose normalization.

Other implementations:

yanx27 EverybodyDanceNow reproduced in pytorch

nyoki-pytorch pytorch-EverybodyDanceNow

Also check out densebody_pytorch for 3D human mesh estimation from monocular images.

Environment

Ubuntu 18.04 (But 16.04 should be fine too)
Python 3.6
CUDA 9.0.176
PyTorch 0.4.1post2

For other necessary packages, Use pip install -r requirements for a quick install.

The project requires tensorflow>1.9.0 since the pose estimator is implemented in Keras. If you are using an independent Keras package, change the corresponding import command in ./pose_estimator/compute_coordinates_for_video.py. However, you won't be able to use tensorboard this way.
The project uses imageio for video processing, which requires the ffmpeg core to be downloaded and installed after the pip command. Just follow the error message and you'll get there.

Dataset Preparation

To reproduce our results, download the Afrobeat workout Sequence from YouTube. (clipconverter is a great downloading tool.)
Open the mp4 file with imageio , remove the first 25 seconds (625 frames for our video), and resize the rest frames into 288*512(for 16:9 HD video). Save all the frames in a single folder named train_B. It is highly recommended that the frames are named with their index numbers, like 00001.png, 00002.png.
Download a pre-trained pose-estimator from Yandex Disk and put it under subfolder pose-estimator, then run the following script to estimate the pose for each frame and render the poses into RGB images.

python ./pose_estimator/compute_coordinates.py

The script is supposed to generate a folder named train_A containing corresponding pose stickfigure images (also named as 00001.png, 00002.png, etc.), and a numpy file poses.npy that contains estimated poses of size N*18*2, where N is the number of frames.

The numpy file is not necessary for training global generator, but we need it for training face-enhancer since we need to estimate and crop the head region from synthesized frames.

Note: You can also use Openpose or any other pose-estimation networks for this step. Just make sure you organize your pose data as suggested above.
Wrap train_A, train_B and poses.npy into the same folder and put it under ./datasets/.

Use your own dataset

The model is not fully trained/tested on other dancing videos. You are encouraged to play with your own dataset as well, but the performance is not guaranteed.

Empirically, to increase the change of success in training/testing, it is important that your video:

has a fixed, clean background
is more than 5 minutes
contains a single person performing "basic" actions(which will be elaborated later)
(Optional, only if you use optical flow loss) contains mimimal change in lighting conditions

On the contrary, your training would possibly fail if your video contains

Intensive movements such as turning around, kneeling down
Heavy limb occlusions
Large scale variations (e.g. dancer running towards/away from the camera)

If you encounter any failcase, do not hesitate to leave an issue to let us know!

Testing

Download pretrained checkpoints:

Pose2vid generator: put in ./checkpoints/everybody_dance_now_temporal/

Update on 20181204

Face enhancer for Afrobeat sequence: put in ./face-enhancer/checkpoints/dance_test_new_down2_res6/
Pretrained VGG 16 weights: put in ./face-enhancer/utils/

Prepare the testing sequence: Save the skeleton figures in a folder named test_A, slice the corresponding pose coordinates from previously cached poses.npy, and wrap them in a single folder (for example cardio_dance_test) and put it under ./datasets/.

In addition, the program supports using first ground-truth frame as a reference, so create a new folder test_B and put inside the ground truth frame corresponding to the first item in test_A (with identical file name of course).
Run the following command for global synthesis

sh ./scripts/test_full_512.sh

This will generates a coarse video stored in ./results/$NAME$/$WHICH_EPOCH$/test_clip.avi and cache all synthesized frames for face_enhancer evaluation.
Run the face-enhancer to get the final result.

python ./face-enhancer/enhance.py

Training

Step I: Training global pose2vid network.

Prepare the dataset following the instructions above.
For pose2vid baseline, run the script

sh ./scripts/train_full_512.sh

If you wish to incorporate optical flow loss, run the script

sh ./scripts/train_flow_512.sh

Warning: this module will increase memory cost and slows down the training speed by 40% to 50%. Also it's very sensitive to background flow, so use it at your discretion. However, if you can accurately estimate the dancer's body mask, using masked flow could help with temporal smoothing. Please send a PR if you find masked Flowloss effective.

Step II: Training local face-enhancing network.

Rename your train_B folder into test_real (Or you can save a copy and rename it)
Test the global pose2vid network (either trained from Step I or initialized with downloaded pretrained model) with your train_A dataset, save the results into a folder named test_sync with matching names.
Open the face-enhancement training script at ./face_enhancement/main.py, modify the dataset_dir, pose_dir, checkpoint dir, log_dir variables, and run the script.
The default network structure is 2 downsample layers, 6 Resblocks, and 2 upsample layers. You can modify it for best enhancing effect, just change the corresponding parameters at line 22. Also the crop size is adjustable at line 23(default is 96).

Citation

Should you find this implementation useful, please add the following citation in your paper/open-sourced project:

@article{chan2018everybody,
  title={Everybody dance now},
  author={Chan, Caroline and Ginosar, Shiry and Zhou, Tinghui and Efros, Alexei A},
  journal={arXiv preprint arXiv:1808.07371},
  year={2018}
}

Acknowledgement

This repo borrows heavily from pix2pixHD.

everybody_dance_now_pytorch's People

Contributors

Stargazers

Watchers

Forkers

piiven ogis-do jkllbn2563 lishali hawkcoder letsdodatascience kyokke mahfuj9346449 codingsmith daydreamer2023 vlivashkin skks11 gwliu213 leonstd2015 peterzs gothic22 sabritol manikant92 thanhtin1997 moriarty16 thespacemaker satyamroy001 michellefli rw97 saiuz dbsxdbsx diliplilaramani aitoma6 meetwhom aesthetic sookim-ai slipknottn skcskc7 stevenstoner mfd92 human2b nithinanc rhaon-virtual-reality-sports-3d heonkk apearlriverwater sunshinewhy avatarworld houlin hpg-ai liuyicai cfengfeng mlomnitz monsoon-k lwzbuaa aj-26 nuhabit-ai cv-ip miumiu0917 zoops funky-ah3-lab muhammad4hmed diegocantonr amirunpri2018 ziyan-wyq longervision yuanxinl stevenjokess chenyp79 yimingzenmedi hevincent sreedharkorubilli noamgaash abecadel cnnandbn shimura0 towhid17

everybody_dance_now_pytorch's Issues

can you give your result video demo,and how is this compared to vid2vid?

thank you for the work,can you give your result video demo,I want to compared it to vid2vid,because I use vid2vid to pose transefer.

compute_coordinates too slow

In Dataset Preparation step 3
when I run python ./pose_estimator/compute_coordinates_for_video.py based on the Afrobeat workout Sequence(got 34495 frames), it take 27s to process each frame. 10 days to finish all frames.

Is there some problem in this? Wouldn't it be too slow?

BTW, I pip install the tensorflow-gpu instead of tensorflow in your requirment.txt.

vgg16_pretrained_features.pth

Could you provide vgg16_pretrained_features.pth when we train the face enhancement net? Thank you very much!

compute_coordinates.py input error

When i prepare the dataset exactly what instructions say, and run the command;

python ./pose_estimator/compute_coordinates.py

It gives me the following error.

Traceback (most recent call last): File "pose_estimator/compute_coordinates_for_video.py", line 258, in <module> cord = cordinates_from_image_file(img, model=model) File "pose_estimator/compute_coordinates_for_video.py", line 188, in cordinates_from_image_file output1, output2 = model.predict(imageToTest_padded) File "/Users/sinangencoglu/anaconda3/envs/sinandl/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1096, in predict x, check_steps=True, steps_name='steps', steps=steps) File "/Users/sinangencoglu/anaconda3/envs/sinandl/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 2382, in _standardize_user_data exception_prefix='input') File "/Users/sinangencoglu/anaconda3/envs/sinandl/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_utils.py", line 362, in standardize_input_data ' but got array with shape ' + str(data_shape)) ValueError: Error when checking input: expected input_5 to have shape (None, None, 3) but got array with shape (184, 103, 4)

Thank you.

Is it possible to generate frame using OpenPose coordinates?

Hi,

I trained my model and its working well with the test videos/frames.
Now, my purpose is generate frames using OpenPose coordinates. I will give the coordinates and will get the output frame. Is it possible? And if its, how can i do that?

Output video performance

Hello,

I tried to run your code on my test dataset, given a target pose and a source image.
However, the generated video does not look like the source image at all.
The problem is that it looks like the dancing video that you show as a demo in the Afrobeat workout Sequence from YouTube.
Is this normal?

Thank you

face enhancement questions

Hi, glad to see your great work, really useful. Here I have two questions about face enhancement part:

when doing face crop, you set crop_size = 48 for 512-frame videos. But that may not be accurate as the person (or its head) can be either large or small due to the distance to the camera. Is there a better way to do the crop? thanks
Is there any official paper about face-gan? I didn't get any related reference from the 'everybody dance now' paper. And is there any other implementation about face-gan on github?

many thanks

pretrained pose_estimator error

I got the error when runing python ./pose_estimator/compute_coordinates_for_video.py:
2020-09-02 08:55:35.129558: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2020-09-02 08:55:35.129593: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "./pose_estimator/compute_coordinates_for_video.py", line 251, in <module> model = load_model('./pose_estimator/pose_estimator.h5') File "/opt/conda/envs/EDN/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 182, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/opt/conda/envs/EDN/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 166, in load_model_from_hdf5 f = h5py.File(filepath, mode='r') File "/opt/conda/envs/EDN/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in __init__ swmr=swmr) File "/opt/conda/envs/EDN/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 88, in h5py.h5f.open OSError: Unable to open file (file read failed: time = Wed Sep 2 08:55:36 2020 , filename = './pose_estimator/pose_estimator.h5', file descriptor = 4, errno = 5, error message = 'Input/output error', buf = 0x7fffd750fdb0, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)

It seems the pose_estimator.h5 file had been broken. Is there anybody had faced the same problem and how to fix it?

Thanks

'if nor' typo in compute_coordinates_for_video.py

There is a typo in compute_coordinates_for_video line 249. It should read:
if not os.path.isdir(pose_dir):

Opencl

Can this code run without cuda?
Can opencl be used instead of cuda?

Not have NVIDIA card

Can this code run without using CUDA.

what is the skeleton figures in testing step2?

what does it look like? any example?

size compatibility issues

There seem to be compatibility issues of image sizes between README.md and pose_estimator/compute_coordinates_for_video.py . README.md suggests the frames should be resized into 288*512 (for 16:9 HD video) but this leads to an issue when using the function 'cordinates_from_image_file' from compute_coordinates_for_video.py.

Specifically, the imageToTest_padded, line 189, will not be the right size for the model suggested in the READ.me file which takes images of size (368, 368). This doesn't work with the given scale_search (line 25) and therefore multiplier (line 179).

A workaround is to change the transformation (line 189) in the function cordinates_from_image_file to ensure the input into the model has the right size.

compute_coordinates_for_video.py error

Hi! Thank you for your great job!
I have two questions.

1.When I try to run "compute_coordinates_for_video.py" after creating the dataset, I get the following error:

File "./pose_estimator/compute_coordinates_for_video.py", line 248
if nor os.path.isdir(pose_dir):

Is "If not" correct??

2.I have also another error below.
File "./pose_estimator/compute_coordinates_for_video.py", line 260
cord = cordinates_from_image_file(img, model=model)
^
SyntaxError: invalid syntax

The dataset directory is set as follows.
img_dir = './datasets/train_B'

Could you let me know how I resolve error?

Thanks

Syntax Error When Running compute_coordinates_for_video.py

I am running into this syntax error when running the compute_coordinates_for_video.py

My datasets/train_B just contains a bunch of images extracted from video

Please kindly let me know why and how I can address this issue.

train hign-resolution video

Hello, I seem to have found a problem when trying to train hign-resolution video. After I resized the normal video at 512512 resolution to 10241024, I found that some rectangular areas on the lower and left and right edges of the image can not be learned normally. Even after training many batches, it is still the case. I drew it with a red pen in the picture. I changed the loadsize, finesize and display_winsize to 1024. This is still the case when other network structures have not been changed. Where should I modify other resolutions such as 1024 * 1024?

Broken Pipe error

I execute the script sh ./scripts/test_full_512.sh
Then appear the error as follows:
/media/ouc/4T_B/gc/everybody_dance_now_pytorch/models/pose2vidHD_model.py:146: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
input_label = Variable(input_label, volatile=infer)
100%|█████████████████████████████████████████| 500/500 [01:58<00:00, 4.19it/s]
Unknown encoder 'libx264'
Traceback (most recent call last):
File "/home/ouc/miniconda3/lib/python3.6/site-packages/imageio/plugins/ffmpeg.py", line 727, in _append_data
self._proc.stdin.write(im.tostring())
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test_video.py", line 60, in
writer.append_data(im)
File "/home/ouc/miniconda3/lib/python3.6/site-packages/imageio/core/format.py", line 500, in append_data
return self._append_data(im, total_meta)
File "/home/ouc/miniconda3/lib/python3.6/site-packages/imageio/plugins/ffmpeg.py", line 734, in _append_data
raise IOError(msg)
OSError: [Errno 32] Broken pipe

FFMPEG COMMAND:
/home/ouc/miniconda3/bin/ffmpeg -y -f rawvideo -vcodec rawvideo -s 480x480 -pix_fmt rgb24 -r 25.00 -i - -an -vcodec libx264 -pix_fmt yuv420p -crf 25 -v warning /media/ouc/4T_B/gc/everybody_dance_now_pytorch/results/everybody_dance_now_temporal/latest/test_clip.avi

FFMPEG STDERR OUTPUT:

How can I fix the problem. Thanks

Missing cord in some video

Thank you for your work! I have downloaded a Bruno Mars video and resize it to 288*512, but I can't correctly get the coordinates with ./pose_estimator and I found there is one point missing in the results.(Figure 1) I have tried the Afrobeat exercise video and I am pretty sure I get the right setup. (Figure 2)

Source video:
https://www.youtube.com/watch?v=PMivT7MJ41M

It would be great if you can check this out!

no module named draw_pose_from_coords in compute_coordinates_for_video.py

Could you tell me where is the function draw_pose_from_coords ?