Giter Site home page Giter Site logo

lotayou / everybody_dance_now_pytorch Goto Github PK

View Code? Open in Web Editor NEW
281.0 11.0 72.0 23.76 MB

A PyTorch Implementation of "Everybody Dance Now" from Berkeley AI lab.

License: GNU Affero General Public License v3.0

Python 98.68% Shell 1.32%
pytorch-implementation human-video-generation

everybody_dance_now_pytorch's Introduction

Everybody Dance Now (pytorch)

A PyTorch implementation of "Everybody Dance Now" from Berkeley AI lab. Including all functionality except pose normalization.

Other implementations:

yanx27 EverybodyDanceNow reproduced in pytorch

nyoki-pytorch pytorch-EverybodyDanceNow

Also check out densebody_pytorch for 3D human mesh estimation from monocular images.

Environment

  • Ubuntu 18.04 (But 16.04 should be fine too)
  • Python 3.6
  • CUDA 9.0.176
  • PyTorch 0.4.1post2

For other necessary packages, Use pip install -r requirements for a quick install.

  1. The project requires tensorflow>1.9.0 since the pose estimator is implemented in Keras. If you are using an independent Keras package, change the corresponding import command in ./pose_estimator/compute_coordinates_for_video.py. However, you won't be able to use tensorboard this way.
  2. The project uses imageio for video processing, which requires the ffmpeg core to be downloaded and installed after the pip command. Just follow the error message and you'll get there.

Dataset Preparation

  1. To reproduce our results, download the Afrobeat workout Sequence from YouTube. (clipconverter is a great downloading tool.)

  2. Open the mp4 file with imageio , remove the first 25 seconds (625 frames for our video), and resize the rest frames into 288*512(for 16:9 HD video). Save all the frames in a single folder named train_B. It is highly recommended that the frames are named with their index numbers, like 00001.png, 00002.png.

  3. Download a pre-trained pose-estimator from Yandex Disk and put it under subfolder pose-estimator, then run the following script to estimate the pose for each frame and render the poses into RGB images.

    python ./pose_estimator/compute_coordinates.py

    The script is supposed to generate a folder named train_A containing corresponding pose stickfigure images (also named as 00001.png, 00002.png, etc.), and a numpy file poses.npy that contains estimated poses of size N*18*2, where N is the number of frames.

    The numpy file is not necessary for training global generator, but we need it for training face-enhancer since we need to estimate and crop the head region from synthesized frames.

    Note: You can also use Openpose or any other pose-estimation networks for this step. Just make sure you organize your pose data as suggested above.

  4. Wrap train_A, train_B and poses.npy into the same folder and put it under ./datasets/.

Use your own dataset

The model is not fully trained/tested on other dancing videos. You are encouraged to play with your own dataset as well, but the performance is not guaranteed.

Empirically, to increase the change of success in training/testing, it is important that your video:

  • has a fixed, clean background
  • is more than 5 minutes
  • contains a single person performing "basic" actions(which will be elaborated later)
  • (Optional, only if you use optical flow loss) contains mimimal change in lighting conditions

On the contrary, your training would possibly fail if your video contains

  • Intensive movements such as turning around, kneeling down
  • Heavy limb occlusions
  • Large scale variations (e.g. dancer running towards/away from the camera)

If you encounter any failcase, do not hesitate to leave an issue to let us know!

Testing

  1. Download pretrained checkpoints:

Update on 20181204

  1. Prepare the testing sequence: Save the skeleton figures in a folder named test_A, slice the corresponding pose coordinates from previously cached poses.npy, and wrap them in a single folder (for example cardio_dance_test) and put it under ./datasets/.

    In addition, the program supports using first ground-truth frame as a reference, so create a new folder test_B and put inside the ground truth frame corresponding to the first item in test_A (with identical file name of course).

  2. Run the following command for global synthesis

    sh ./scripts/test_full_512.sh

    This will generates a coarse video stored in ./results/$NAME$/$WHICH_EPOCH$/test_clip.avi and cache all synthesized frames for face_enhancer evaluation.

  3. Run the face-enhancer to get the final result.

    python ./face-enhancer/enhance.py

Training

Step I: Training global pose2vid network.

  1. Prepare the dataset following the instructions above.

  2. For pose2vid baseline, run the script

    sh ./scripts/train_full_512.sh

    If you wish to incorporate optical flow loss, run the script

    sh ./scripts/train_flow_512.sh

    Warning: this module will increase memory cost and slows down the training speed by 40% to 50%. Also it's very sensitive to background flow, so use it at your discretion. However, if you can accurately estimate the dancer's body mask, using masked flow could help with temporal smoothing. Please send a PR if you find masked Flowloss effective.

Step II: Training local face-enhancing network.

  1. Rename your train_B folder into test_real (Or you can save a copy and rename it)

  2. Test the global pose2vid network (either trained from Step I or initialized with downloaded pretrained model) with your train_A dataset, save the results into a folder named test_sync with matching names.

  3. Open the face-enhancement training script at ./face_enhancement/main.py, modify the dataset_dir, pose_dir, checkpoint dir, log_dir variables, and run the script.

  4. The default network structure is 2 downsample layers, 6 Resblocks, and 2 upsample layers. You can modify it for best enhancing effect, just change the corresponding parameters at line 22. Also the crop size is adjustable at line 23(default is 96).

Citation

Should you find this implementation useful, please add the following citation in your paper/open-sourced project:

@article{chan2018everybody,
  title={Everybody dance now},
  author={Chan, Caroline and Ginosar, Shiry and Zhou, Tinghui and Efros, Alexei A},
  journal={arXiv preprint arXiv:1808.07371},
  year={2018}
}

Acknowledgement

This repo borrows heavily from pix2pixHD.

everybody_dance_now_pytorch's People

Contributors

lotayou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

everybody_dance_now_pytorch's Issues

compute_coordinates too slow

In Dataset Preparation step 3
when I run python ./pose_estimator/compute_coordinates_for_video.py based on the Afrobeat workout Sequence(got 34495 frames), it take 27s to process each frame. 10 days to finish all frames.

Is there some problem in this? Wouldn't it be too slow?

BTW, I pip install the tensorflow-gpu instead of tensorflow in your requirment.txt.

compute_coordinates.py input error

When i prepare the dataset exactly what instructions say, and run the command;

python ./pose_estimator/compute_coordinates.py

It gives me the following error.

Traceback (most recent call last): File "pose_estimator/compute_coordinates_for_video.py", line 258, in <module> cord = cordinates_from_image_file(img, model=model) File "pose_estimator/compute_coordinates_for_video.py", line 188, in cordinates_from_image_file output1, output2 = model.predict(imageToTest_padded) File "/Users/sinangencoglu/anaconda3/envs/sinandl/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1096, in predict x, check_steps=True, steps_name='steps', steps=steps) File "/Users/sinangencoglu/anaconda3/envs/sinandl/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 2382, in _standardize_user_data exception_prefix='input') File "/Users/sinangencoglu/anaconda3/envs/sinandl/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_utils.py", line 362, in standardize_input_data ' but got array with shape ' + str(data_shape)) ValueError: Error when checking input: expected input_5 to have shape (None, None, 3) but got array with shape (184, 103, 4)

Thank you.

Is it possible to generate frame using OpenPose coordinates?

Hi,

I trained my model and its working well with the test videos/frames.
Now, my purpose is generate frames using OpenPose coordinates. I will give the coordinates and will get the output frame. Is it possible? And if its, how can i do that?

Output video performance

Hello,

I tried to run your code on my test dataset, given a target pose and a source image.
However, the generated video does not look like the source image at all.
The problem is that it looks like the dancing video that you show as a demo in the Afrobeat workout Sequence from YouTube.
Is this normal?

Thank you

face enhancement questions

Hi, glad to see your great work, really useful. Here I have two questions about face enhancement part:

  1. when doing face crop, you set crop_size = 48 for 512-frame videos. But that may not be accurate as the person (or its head) can be either large or small due to the distance to the camera. Is there a better way to do the crop? thanks

  2. Is there any official paper about face-gan? I didn't get any related reference from the 'everybody dance now' paper. And is there any other implementation about face-gan on github?

many thanks

pretrained pose_estimator error

I got the error when runing python ./pose_estimator/compute_coordinates_for_video.py:
2020-09-02 08:55:35.129558: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2020-09-02 08:55:35.129593: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "./pose_estimator/compute_coordinates_for_video.py", line 251, in <module> model = load_model('./pose_estimator/pose_estimator.h5') File "/opt/conda/envs/EDN/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 182, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/opt/conda/envs/EDN/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 166, in load_model_from_hdf5 f = h5py.File(filepath, mode='r') File "/opt/conda/envs/EDN/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in __init__ swmr=swmr) File "/opt/conda/envs/EDN/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 88, in h5py.h5f.open OSError: Unable to open file (file read failed: time = Wed Sep 2 08:55:36 2020 , filename = './pose_estimator/pose_estimator.h5', file descriptor = 4, errno = 5, error message = 'Input/output error', buf = 0x7fffd750fdb0, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)

It seems the pose_estimator.h5 file had been broken. Is there anybody had faced the same problem and how to fix it?

Thanks

Opencl

Can this code run without cuda?
Can opencl be used instead of cuda?

size compatibility issues

There seem to be compatibility issues of image sizes between README.md and pose_estimator/compute_coordinates_for_video.py . README.md suggests the frames should be resized into 288*512 (for 16:9 HD video) but this leads to an issue when using the function 'cordinates_from_image_file' from compute_coordinates_for_video.py.

Specifically, the imageToTest_padded, line 189, will not be the right size for the model suggested in the READ.me file which takes images of size (368, 368). This doesn't work with the given scale_search (line 25) and therefore multiplier (line 179).

A workaround is to change the transformation (line 189) in the function cordinates_from_image_file to ensure the input into the model has the right size.

compute_coordinates_for_video.py error

Hi! Thank you for your great job!
I have two questions.

1.When I try to run "compute_coordinates_for_video.py" after creating the dataset, I get the following error:

File "./pose_estimator/compute_coordinates_for_video.py", line 248
if nor os.path.isdir(pose_dir):

Is "If not" correct??

2.I have also another error below.
File "./pose_estimator/compute_coordinates_for_video.py", line 260
cord = cordinates_from_image_file(img, model=model)
^
SyntaxError: invalid syntax

The dataset directory is set as follows.
img_dir = './datasets/train_B'

Could you let me know how I resolve error?

Thanks

train hign-resolution video

Hello, I seem to have found a problem when trying to train hign-resolution video. After I resized the normal video at 512512 resolution to 10241024, I found that some rectangular areas on the lower and left and right edges of the image can not be learned normally. Even after training many batches, it is still the case. I drew it with a red pen in the picture. I changed the loadsize, finesize and display_winsize to 1024. This is still the case when other network structures have not been changed. Where should I modify other resolutions such as 1024 * 1024?
168549353-3ceb0d4f-c42f-4c44-84d8-5697f5672b33

Broken Pipe error

I execute the script sh ./scripts/test_full_512.sh
Then appear the error as follows:
/media/ouc/4T_B/gc/everybody_dance_now_pytorch/models/pose2vidHD_model.py:146: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
input_label = Variable(input_label, volatile=infer)
100%|█████████████████████████████████████████| 500/500 [01:58<00:00, 4.19it/s]
Unknown encoder 'libx264'
Traceback (most recent call last):
File "/home/ouc/miniconda3/lib/python3.6/site-packages/imageio/plugins/ffmpeg.py", line 727, in _append_data
self._proc.stdin.write(im.tostring())
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test_video.py", line 60, in
writer.append_data(im)
File "/home/ouc/miniconda3/lib/python3.6/site-packages/imageio/core/format.py", line 500, in append_data
return self._append_data(im, total_meta)
File "/home/ouc/miniconda3/lib/python3.6/site-packages/imageio/plugins/ffmpeg.py", line 734, in _append_data
raise IOError(msg)
OSError: [Errno 32] Broken pipe

FFMPEG COMMAND:
/home/ouc/miniconda3/bin/ffmpeg -y -f rawvideo -vcodec rawvideo -s 480x480 -pix_fmt rgb24 -r 25.00 -i - -an -vcodec libx264 -pix_fmt yuv420p -crf 25 -v warning /media/ouc/4T_B/gc/everybody_dance_now_pytorch/results/everybody_dance_now_temporal/latest/test_clip.avi

FFMPEG STDERR OUTPUT:

How can I fix the problem. Thanks

Missing cord in some video

Thank you for your work! I have downloaded a Bruno Mars video and resize it to 288*512, but I can't correctly get the coordinates with ./pose_estimator and I found there is one point missing in the results.(Figure 1) I have tried the Afrobeat exercise video and I am pretty sure I get the right setup. (Figure 2)
0201
00000
Source video:
https://www.youtube.com/watch?v=PMivT7MJ41M

It would be great if you can check this out!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.