aliaksandrsiarohin / first-order-model Goto Github PK

View Code? Open in Web Editor NEW

14.2K 352.0 3.2K 72.17 MB

This repository contains the source code for the paper First Order Motion Model for Image Animation

Home Page: https://aliaksandrsiarohin.github.io/first-order-model-website/

License: MIT License

Python 8.95% Jupyter Notebook 91.02% Dockerfile 0.03%

deep-learning image-animation generative-model motion-retargeting

first-order-model's Introduction

!!! Check out our new paper and framework improved for articulated objects

First Order Motion Model for Image Animation

This repository contains the source code for the paper First Order Motion Model for Image Animation by Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci and Nicu Sebe.

Hugging Face Spaces

Example animations

The videos on the left show the driving videos. The first row on the right for each dataset shows the source videos. The bottom row contains the animated sequences with motion transferred from the driving video and object taken from the source image. We trained a separate network for each task.

VoxCeleb Dataset

Fashion Dataset

MGIF Dataset

Installation

We support python3. To install the dependencies run:

pip install -r requirements.txt

YAML configs

There are several configuration (config/dataset_name.yaml) files one for each dataset. See config/taichi-256.yaml to get description of each parameter.

Pre-trained checkpoint

Checkpoints can be found under following link: google-drive or yandex-disk.

Animation Demo

To run a demo, download checkpoint and run the following command:

python demo.py  --config config/dataset_name.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale

The result will be stored in result.mp4.

The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg. In order to use the script, face-alligment library is needed:

git clone https://github.com/1adrianb/face-alignment
cd face-alignment
pip install -r requirements.txt
python setup.py install

Animation demo with Docker

If you are having trouble getting the demo to work because of library compatibility issues, and you're running Linux, you might try running it inside a Docker container, which would give you better control over the execution environment.

Requirements: Docker 19.03+ and nvidia-docker installed and able to successfully run the nvidia-docker usage tests.

We'll first build the container.

docker build -t first-order-model .

And now that we have the container available locally, we can use it to run the demo.

docker run -it --rm --gpus all \
       -v $HOME/first-order-model:/app first-order-model \
       python3 demo.py --config config/vox-256.yaml \
           --driving_video driving.mp4 \
           --source_image source.png  \ 
           --checkpoint vox-cpk.pth.tar \ 
           --result_video result.mp4 \
           --relative --adapt_scale

Colab Demo

@graphemecluster prepared a GUI demo for the Google Colab. It also works in Kaggle. For the source code, see demo.ipynb.

For the old demo, see old_demo.ipynb.

Face-swap

It is possible to modify the method to perform face-swap using supervised segmentation masks. For both unsupervised and supervised video editing, such as face-swap, please refer to Motion Co-Segmentation.

Training

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py --config config/dataset_name.yaml --device_ids 0,1,2,3

The code will create a folder in the log directory (each run will create a time-stamped new directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. You can also check training data reconstructions in the train-vis subfolder. By default the batch size is tunned to run on 2 or 4 Titan-X gpu (appart from speed it does not make much difference). You can change the batch size in the train_params in corresponding .yaml file.

Evaluation on video reconstruction

To evaluate the reconstruction performance run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode reconstruction --checkpoint path/to/checkpoint

You will need to specify the path to the checkpoint, the reconstruction subfolder will be created in the checkpoint folder. The generated video will be stored to this folder, also generated videos will be stored in png subfolder in loss-less '.png' format for evaluation. Instructions for computing metrics from the paper can be found: https://github.com/AliaksandrSiarohin/pose-evaluation.

Image animation

In order to animate videos run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode animate --checkpoint path/to/checkpoint

You will need to specify the path to the checkpoint, the animation subfolder will be created in the same folder as the checkpoint. You can find the generated video there and its loss-less version in the png subfolder. By default video from test set will be randomly paired, but you can specify the "source,driving" pairs in the corresponding .csv files. The path to this file should be specified in corresponding .yaml file in pairs_list setting.

There are 2 different ways of performing animation: by using absolute keypoint locations or by using relative keypoint locations.

Animation using absolute coordinates: the animation is performed using the absolute postions of the driving video and appearance of the source image. In this way there are no specific requirements for the driving video and source appearance that is used. However this usually leads to poor performance since unrelevant details such as shape is transfered. Check animate parameters in taichi-256.yaml to enable this mode.

Animation using relative coordinates: from the driving video we first estimate the relative movement of each keypoint, then we add this movement to the absolute position of keypoints in the source image. This keypoint along with source image is used for animation. This usually leads to better performance, however this requires that the object in the first frame of the video and in the source image have the same pose

Datasets

Bair. This dataset can be directly downloaded.
Mgif. This dataset can be directly downloaded.
Fashion. Follow the instruction on dataset downloading from.
Taichi. Follow the instructions in data/taichi-loading or instructions from https://github.com/AliaksandrSiarohin/video-preprocessing.
Nemo. Please follow the instructions on how to download the dataset. Then the dataset should be preprocessed using scripts from https://github.com/AliaksandrSiarohin/video-preprocessing.
VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Training on your own dataset

Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.
Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.
Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

Additional notes

Citation:

@InProceedings{Siarohin_2019_NeurIPS,
  author={Siarohin, Aliaksandr and Lathuilière, Stéphane and Tulyakov, Sergey and Ricci, Elisa and Sebe, Nicu},
  title={First Order Motion Model for Image Animation},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  month = {December},
  year = {2019}
}

first-order-model's People

Contributors

Stargazers

Watchers

Forkers

afcarl jjandnn mcwebdev kreativai ak9250 8secz-johndpope peternara rahul75 wintersun661 shafiahmed rogalag amirstudy foolishantcat alighofrani95 elaa0505 tarsbase ml-and-ai-repo xanatos626 opentld roidangur barhilleli vcvycy zelladoor kastnerkyle yqgans johndpope awstrainer007 peterzs zvk qweasdzxc110 iadolgov tymoshenko dimitriirfan awokeknowing svpn asears virevolai ahuirecome chousse daweeed rajat--paliwal kingstarcraft jacklongking thestarboy jemisa hell-to-heaven ml-lab fortunova lotayou r33p scantrev samg9 mhdsyarif jjhuangtw iiakyjiuh dayu1979 pauliver imideev valka ferranespigares bonbert81 tchigher cgndev driandri avatarworld vladimirgl vyvydkf628 osfa uchanet rasarab ihor-nahuliak armoreal penafarhan lsheiba shurco kunato longjohncoder koentjehh stobasa zelex theothings robert-ko wizardofoddz anshulagx hwreverse fpgaq marcinwal achinthyabhat mr-kumar-abhishek valgaze bigoldillyd stacksapien osmiogrzesznik jeanpierrethach samsgates perschistent pramitbiswas aviral-sharma-10 seoguypt mohammedgomaa

first-order-model's Issues

512x512 generation

Hi, thank you for this model and support!
Is it possible to train the model for 512x512 size?

demo.py with pytorch 1.0.0 with CPU support error

After removing all .cuda() calls after I ran that command:
sudo python3 demo.py --config config/vox-adv-256.yaml --driving_video ja.mp4 --source_image michal.png --checkpoint vox-adv-cpk.pth.tar --relative --adapt_scale

I got that error:
File "demo.py", line 123, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
File "demo.py", line 35, in load_checkpoints
checkpoint = torch.load(checkpoint_path)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 367, in load
return _load(f, map_location, pickle_module)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 538, in _load
result = unpickler.load()
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 504, in persistent_load
data_type(size), location)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 113, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 94, in _cuda_deserialize
device = validate_cuda_device(location)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 78, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.

After I change in demo.py:
checkpoint = torch.load(checkpoint_path) to this checkpoint = torch.load(checkpoint_path, map_location='cpu')

I got that error:
Illegal instruction

Installation fails on dependencies

Hi,

I cloned the repository and tried to run it, but failed on the dependencies. I'm on OSx 10.15, installed a fresh Python 3.8.1 with pyenv. These two dependencies I had to install manually as running the requirements with pip was missing them:

numpy
cython

The script failed with this message:

ERROR: Could not find a version that satisfies the requirement torch==1.0.0 (from -r requirements.txt (line 25)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 1.4.0)
ERROR: No matching distribution found for torch==1.0.0 (from -r requirements.txt (line 25))

Hope you can help me :)

Motion Reference Frame

Is there a way to have the motion vectors be zeroed out on a certain frame other than the first frame? by default the source video remains unwarped on the first frame, then gets warped by the driving video on subsequent frames.

Say in this example that there will inevitably be heavy distortion, but If I could apply the animation referencing the driving video at around halfway (where the face is facing camera) then the warping when looking up and down may not be so harsh. So the source video would remain most unwarped around the halfway point, but distorted the most at the beginning and end. I've made a horrible example here using a different vector warping program:

Any data on expected training time? Also, Are there plans for realtime visualized output?

Hello. First, I would like to thank you for such great, fully fleshed out project.

In regards to training, is there any data on how long it should generally take per GPU? If we're talking about a GPU with 8GB of memory, how long would it take to finish training on a small to medium sized data set?

This one kind of ties into the training time, but are there any plans for realtime visualization of training data? If the training does take long periods of time, it would be beneficial to see the results updated at a faster rate.

Any input is appreciated. Thanks!

Installation fails on dependencies (2)

Hi,

I cloned the repository and tried to run it, but failed on the dependencies. I'm on Windows 10, installed a Python 3.6.6. I get the following error:
...
Collecting toolz==0.9.0
Downloading toolz-0.9.0.tar.gz (45 kB)
|████████████████████████████████| 45 kB 1.1 MB/s
ERROR: Could not find a version that satisfies the requirement torch==1.0.0 (from -r requirements.txt (line 25)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.0.0 (from -r requirements.txt (line 25))

error accur when i run demo.py on mayun.mp4 and yyp.jpeg

This is my data.
mayun.mp4.zip

I run python crop-video.py --inp mayun.mp4.

Then I run python demo.py --config config/vox-256.yaml --driving_video ./mayun.mp4 --source_image ./yyp.jpeg --checkpoint ./vox-cpk.pth.tar --relative --adapt_scale.

It appers that
Traceback (most recent call last): File "demo.py", line 120, in <module> driving_video = imageio.mimread(opt.driving_video, memtest=False) File "/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 286, in mimread for im in reader: File "/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/format.py", line 397, in iter_data im, meta = self._get_data(i) File "/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/ffmpeg.py", line 396, in _get_data result, is_new = self._read_frame() File "/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/ffmpeg.py", line 585, in _read_frame s, is_new = self._read_frame_data() File "/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/ffmpeg.py", line 571, in _read_frame_data raise CannotReadFrameError(fmt % (self._pos, err1, err2)) imageio.core.format.CannotReadFrameError: Could not read frame 1750: Frame is 0 bytes, but expected 6220800. === stderr === ffmpeg version 4.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0) configuration: --prefix=/home/user/anaconda3/envs/first-order-model --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1566210161358/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/user/projects/first-order-model/mayun.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp41isom creation_time : 2020-03-18T14:18:57.000000Z Duration: 00:01:09.48, start: 0.000000, bitrate: 2095 kb/s Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 1962 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc (default) Metadata: creation_time : 2020-03-18T14:30:35.000000Z handler_name : VideoHandler encoder : AVC Coding Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 131 kb/s (default) Metadata: creation_time : 2020-03-18T14:30:35.000000Z handler_name : SoundHandler Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> rawvideo (native)) Press [q] to stop, [?] for help Output #0, image2pipe, to 'pipe:': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp41isom encoder : Lavf58.29.100 Stream #0:0(und): Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 1492992 kb/s, 30 fps, 30 tbn, 30 tbc (default) ... showing only last few lines ... major_brand : mp42 minor_version : 0 compatible_brands: mp41isom encoder : Lavf58.29.100 Stream #0:0(und): Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 1492992 kb/s, 30 fps, 30 tbn, 30 tbc (default) Metadata: creation_time : 2020-03-18T14:30:35.000000Z handler_name : VideoHandler encoder : Lavc58.54.100 rawvideo frame= 58 fps=0.0 q=-0.0 size= 352350kB time=00:00:01.93 bitrate=1492992.3kbits/s speed=3.86x frame= 128 fps=128 q=-0.0 size= 777600kB time=00:00:04.26 bitrate=1492991.9kbits/s speed=4.26x frame= 199 fps=132 q=-0.0 size= 1208925kB time=00:00:06.63 bitrate=1492992.1kbits/s speed=4.41x frame= 271 fps=135 q=-0.0 size= 1646325kB time=00:00:09.03 bitrate=1492992.1kbits/s speed= 4.5x frame= 342 fps=136 q=-0.0 size= 2077650kB time=00:00:11.40 bitrate=1492992.0kbits/s speed=4.55x frame= 414 fps=138 q=-0.0 size= 2515050kB time=00:00:13.80 bitrate=1492992.0kbits/s speed=4.59x frame= 486 fps=139 q=-0.0 size= 2952450kB time=00:00:16.20 bitrate=1492992.0kbits/s speed=4.62x frame= 555 fps=138 q=-0.0 size= 3371625kB time=00:00:18.50 bitrate=1492992.0kbits/s speed=4.61x frame= 625 fps=138 q=-0.0 size= 3796875kB time=00:00:20.83 bitrate=1492992.0kbits/s speed=4.62x frame= 694 fps=138 q=-0.0 size= 4216050kB time=00:00:23.13 bitrate=1492992.0kbits/s speed=4.61x frame= 764 fps=138 q=-0.0 size= 4641300kB time=00:00:25.46 bitrate=1492992.0kbits/s speed=4.61x frame= 834 fps=138 q=-0.0 size= 5066550kB time=00:00:27.80 bitrate=1492992.0kbits/s speed=4.62x frame= 904 fps=139 q=-0.0 size= 5491800kB time=00:00:30.13 bitrate=1492992.0kbits/s speed=4.62x frame= 975 fps=139 q=-0.0 size= 5923125kB time=00:00:32.50 bitrate=1492992.0kbits/s speed=4.62x frame= 1045 fps=139 q=-0.0 size= 6348375kB time=00:00:34.83 bitrate=1492992.0kbits/s speed=4.62x frame= 1116 fps=139 q=-0.0 size= 6779700kB time=00:00:37.20 bitrate=1492992.0kbits/s speed=4.62x frame= 1180 fps=138 q=-0.0 size= 7168500kB time=00:00:39.33 bitrate=1492992.0kbits/s speed= 4.6x frame= 1250 fps=138 q=-0.0 size= 7593750kB time=00:00:41.66 bitrate=1492992.0kbits/s speed= 4.6x frame= 1316 fps=138 q=-0.0 size= 7994700kB time=00:00:43.86 bitrate=1492992.0kbits/s speed=4.59x frame= 1388 fps=138 q=-0.0 size= 8432100kB time=00:00:46.26 bitrate=1492992.0kbits/s speed= 4.6x frame= 1460 fps=138 q=-0.0 size= 8869500kB time=00:00:48.66 bitrate=1492992.0kbits/s speed=4.61x frame= 1533 fps=139 q=-0.0 size= 9312975kB time=00:00:51.10 bitrate=1492992.0kbits/s speed=4.62x frame= 1597 fps=138 q=-0.0 size= 9701775kB time=00:00:53.23 bitrate=1492992.0kbits/s speed= 4.6x frame= 1667 fps=138 q=-0.0 size=10127025kB time=00:00:55.56 bitrate=1492992.0kbits/s speed= 4.6x frame= 1725 fps=137 q=-0.0 size=10479375kB time=00:00:57.50 bitrate=1492992.0kbits/s speed=4.57x

result not so good

pose-to-pose transformation

can I apply it like the pose-to-pose transformation? and how?

Can't find Checkpoints

Hello, I would like to know what this error may be once I downloaded the updated checkpoints from Google Drive but when running the command:

python demo.py --config config / vox-adv-256.yaml --driving_video data / 10.mp4 --source_image data / 02.png --checkpoint chekpoints / vox-adv-cpk.pth --relative --adapt_scale

The error I got was:

demo.py:25: YAMLLoadWarning: calling yaml.load () without Loader = ... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load (f)
Traceback (most recent call last):
  File "demo.py", line 123, in <module>
    generator, kp_detector = load_checkpoints (config_path = opt.config, checkpoint_path = opt.checkpoint)
  File "demo.py", line 35, in load_checkpoints
    checkpoint = torch.load (checkpoint_path)
  File "C: \ Users \ jefer \ anaconda3 \ lib \ site-packages \ torch \ serialization.py", line 525, in load
    with _open_file_like (f, 'rb') as opened_file:
  File "C: \ Users \ jefer \ anaconda3 \ lib \ site-packages \ torch \ serialization.py", line 212, in _open_file_like
    return _open_file (name_or_buffer, mode)
  File "C: \ Users \ jefer \ anaconda3 \ lib \ site-packages \ torch \ serialization.py", line 193, in __init__
    super (_open_file, self) .__ init __ (open (name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'chekpoints / vox-adv-cpk.pth'

I'm using W10 and Python 3.7.6
I believe it may be something related to the integrity of the checkpoint download

Little typo in demo.ipynb file

First of all, amazing work! Between "In[6]" and "In[7]":
"We ca use absolute coordinates(...)" instead of "We can use absolute coordinates(...)"

video is slow and very short

I'm generating facial animation in colab
the obama video driving look very slow compared to the original 8s
I also used a 1 minute video driving and only generated a <30 second video.
how to increase the speed and the video time

Ask for trained weight in Voxceleb

Thanks for your great work.
I see the Voxceleb is used for training in your paper, but the weight is not shared for now.
So may I ask for uploading the trained weight file in Voxceleb, please.

Your reply will be appreciated, thank you.

maybe some generated videos are not very smooth?

do you think about smoothing the optical flow among frames, because some generated video may be not very natured?

Error on demo.py.

Traceback (most recent call last):
File "demo.py", line 94, in
predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale)
File "demo.py", line 61, in make_animation
use_relative_jacobian=relative, adapt_movement_scale=adapt_movement_scale)
File "/home/chimpinchief/first-order-model/animate.py", line 33, in normalize_kp
jacobian_diff = torch.matmul(kp_driving['jacobian'], torch.inverse(kp_driving_initial['jacobian']))
RuntimeError: tensor should be 2 dimensional

RAM requirements for demo.py

Hello, what are the RAM requirements for running the demo script? I have 16GB of ram, but the script failed with a memory error, and I can see in Task Manager that RAM usage was near 100%. I'm curious how much more RAM I would need to run the script?

I downloaded the first-order-motion-model folder into the git repository and ran this command:
python demo.py --config config\vox-adv-256.yaml --driving_video first-order-motion-model\07.mkv --source_image first-order-motion-model\01.png --checkpoint first-order-motion-model\vox-adv-cpk.pth.tar --relative --adapt_scale

The error I got was:

Traceback (most recent call last):
  File "demo.py", line 119, in <module>
    driving_video = imageio.mimread(opt.driving_video, memtest=False)
  File "C:\Users\mr_j_\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 286, in mimread
    for im in reader:
  File "C:\Users\mr_j_\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 397, in iter_data
    im, meta = self._get_data(i)
  File "C:\Users\mr_j_\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\ffmpeg.py", line 396, in _get_data
    result, is_new = self._read_frame()
  File "C:\Users\mr_j_\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\ffmpeg.py", line 586, in _read_frame
    result = np.fromstring(s, dtype='uint8')
MemoryError: Unable to allocate 5.93 MiB for an array with shape (6220800,) and data type uint8

I'm using Windows 10 with python 3.7

bad result for a new source image?

Dear Aliaksandr Siarohin,
When I use the checkpoint fashion.pth.tar you offerd，a test fashion video and my own image to run demo.py, I got the bad result. Does this indicate poor generalization performance for a new image?

i think maybe the keypoint detector need to be improved?

in some case, the key points is not stable,this will cause unreal transform in the neighbour of the point? do you have any good idea to improve this?

Where can I contact you?

Hi, sorry leaving this here but where can I contact the developer. I have some questions that might not be very appropriate to be in a github issue. Please leave an email or some sort of social media contact here so I can contact you. Very nice project btw. Love it. Thanks.

Is any other place to download the checkpoints

It's too slow when downloading cpk form Yandex.Disk

Syntax error running demo.py

Hello !

Running the command

python demo.py --config config/fashion-256.yaml --driving_video .//Skills-1.mp4 --source_image ./single.jpeg --checkpoint ./checkpoints --relative --adapt_scale

I got the following error

File "demo.py", line 27
generator = OcclusionAwareGenerator(**config['model_params']['generator_params'],
^
SyntaxError: invalid syntax
Any hint ? Thanks !

How to make the image have the same pose as the first driving frame?

the "source image"
at the beginning it has a pose or angle that is not the same as my video driving.
therefore the "source image" has another angle throughout the video, I want
"source image" has the same angle as the driving head.

for example if "driving" is facing the camera "source image" should be facing the camera

there is way to force this without looking for a "source image" with the same angle of the head of the first frame of driving

about gaussian heatmap?

in monkey net, you will calculate Covariance of heatmap, but this improved work, i find that you just using a constant variance, could you explain me why?

kp_detector dimension mismatch in concatenation.

Hi,
While training the model, I run into a dimension match error in the UNet where the layers are being concatenated.
the error:

out = torch.cat([out, skip], dim=1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 7 and 6 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

What could be causing this mismatch?

yamlloadwarning calling yaml.load() without loader= nvidea

File "demo.py", line 123, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
File "demo.py", line 29, in load_checkpoints
generator.cuda()
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 201, in _apply
module._apply(fn)
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 201, in _apply
module._apply(fn)
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 201, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 223, in _apply
param_applied = fn(param)
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 304, in
return self.apply(lambda t: t.cuda(device))
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\cuda_init.py", line 196, in _lazy_init
check_driver()
File "C:\Users\Mohcine\Anaconda3\lib\site-packages\torch\cuda_init.py", line 101, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from

I don't have a GPU and I need to run this on my CPU

Thanks everyone !

how to gen pairs_list file

thank you,I want to train model on vox dataset,but some mp4 file seems broken link when I dowload using loadvideo.py , and I dont know how to generate pairs_list file.

keypoints padding

isnt it a problem that you set the padding for the keypoints to 0 here? This alters the shape of the output, and therefore the position of the keypoints by 1/100 or so, which shouldnt effect the results, but I was wondering

warning- conversion float32 to uint8

Hi, program complie but I got this warnings:
WARNING:root:Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
A lot of them. Waht am I doing wrong?

cropping image

Hi there,
Firstly, your work is awesome. I have to say thank you present that on github.
But, i have a issue when i crop image using crop_image.py. It is showed an error. So, can you tell me what's wrong with that?
fps = video.get_meta_data()['fps']
KeyError: 'fps'

Thanks

Model diverging when training in VoxCeleb

Hi, thanks for your great work!

I am trying to retrain the method in VoxCeleb, I don't use GAN_loss.
Any try it and face the same issue with me?
Is there any strategy in training or some problem with my dataset？
Any advice is appreciated.
Below is my result during the training.

Your reply will be appreciated.

Windows Specific issues

Im opening this issue to gather here all the issues that are specific to Windows. If you use Windows and you get some errors when running the code please post them here or create another issue but link it here with a comment pointing to your issue so we know about it, @AliaksandrSiarohin owner of the repository does not have a machine with Windows installed and it will be hard for him to test things that are specific to Windows machines but with enough information it could be possible for him to fix those issues sometimes, I will try to fix what I can too as I do have a Windows computer.

error - demo.py

After I ran that command:
sudo python demo.py --config config/vox-adv-256.yaml --driving_video ja.mp4 --source_image michal.png --checkpoint vox-adv-cpk.pth.tar --relative --adapt_scale

I got that error:
File "demo.py", line 27
generator = OcclusionAwareGenerator(**config['model_params']['generator_params'],
^
SyntaxError: invalid syntax

Problem downloading taichi dataset

When I run this command:
python load_videos.py --metadata taichi-metadata.csv --format .mp4 --out_folder taichi --workers 1
At some point I hit this error, it happens on both Windows and Linux:

2it [02:13, 58.17s/it]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\first-order-model\data\taichi-loading\load_videos.py", line 76, in run
    crop = img_as_ubyte(resize(crop, args.image_shape, anti_aliasing=True))
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\site-packages\skimage\transform\_warps.py", line 166, in resize
    preserve_range=preserve_range)
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\site-packages\skimage\transform\_warps.py", line 807, in warp
    raise ValueError("Cannot warp empty image with dimensions", image.shape)
ValueError: ('Cannot warp empty image with dimensions', (120, 0, 3))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\first-order-model\data\taichi-loading\load_videos.py", line 78, in run
    except imageio.core.format.CannotReadFrameError:
AttributeError: module 'imageio.core.format' has no attribute 'CannotReadFrameError'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "load_videos.py", line 113, in <module>
    for chunks_data in tqdm(pool.imap_unordered(run, zip(video_ids, args_list))):
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\site-packages\tqdm\std.py", line 1091, in __iter__
    for obj in iterable:
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\multiprocessing\pool.py", line 748, in next
    raise value
AttributeError: module 'imageio.core.format' has no attribute 'CannotReadFrameError'
2it [02:23, 71.69s/it]

The timing when this error occurs and which videos it downloads are random (ie. sometimes it happens after 1 minute, sometimes after 5 minutes, first time it downloads a set of videos then second time it downloads videos that completely different from before, even after emptying the folders first).

good job,but i cant find "swap face.ipynb"

thank you,that's great work...

Does not support high-resolution images

Is there a way to support high resolution

How to test Taichi animation?

Hi!
I follow the Google-Colab ipynb to test the model, found that there only exist face animation. So I dont know how to test taichi model, i noticed that git's repo have a crop_video.py and AUTHOR say we should crop image and video before test model, but I dont kown how to crop video for taichi model? and is there any precautions of driving video? thanks for any reply

headswap

The current method is limited to giving a head driving video and head image. Any suggestions that if I just provide a head image and a driving video like a lecture video, and I want to replace the people head in the video with a head image？

Audio output demo.ipynb

If the source video has an audio track, is it possible to persist the audio through the resize call?

Is there any way to fake add or mechanical voice along with video!!

I tried it, and it looks super dope. If there is an option to mimic voice to a specified voice it could be the best. I tried some sources like Lyrebird which is not working as they specified. Are there any other ways to create a fake voice??

init() got an unexpected keyword argument 'estimate_jacobian'

Hi，when I downloaded your checkpoint/fashion.pth.tar and 'python demo.py', I encountered this problem.
Traceback (most recent call last):
File "demo.py", line 92, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
File "demo.py", line 28, in load_checkpoints
**config['model_params']['common_params'])
TypeError: init() got an unexpected keyword argument 'estimate_jacobian'
Would you please tell me how to solve it?

fashion checkpoint Error on demo.py on colab

i installed in google colab using typical, with torch version 1.0.0
!git clone https://github.com/AliaksandrSiarohin/first-order-model.git`
cd first-order-model/
!pip install -r requirements.txt

I download the fashion.pth.tar checkpoint, and put it in a folder called first-order-model/checkpoints/fashion.pth.tar

I then put the source image and driving videos in folder data.

When I run the demo:
!python demo.py --config config/fashion-256.yaml --driving_video data/vid.mp4 --source_image data/img3.jpg --checkpoint checkpoints/fashion.pth.tar --relative --adapt_scale
I get the following errors:

/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
WARNING:root:Warning: the frame size for reading (352, 640) is different from the source frame size (640, 352).
WARNING:root:Warning: the frame size for reading (352, 640) is different from the source frame size (640, 352).
/usr/local/lib/python3.6/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.6/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
Traceback (most recent call last):
File "demo.py", line 92, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
File "demo.py", line 35, in load_checkpoints
checkpoint = torch.load(checkpoint_path)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 367, in load
return _load(f, map_location, pickle_module)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 545, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 480726 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount.load() > 0 ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:341, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /pytorch/c10/util/intrusive_ptr.h:341)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fe861d96fe1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fe861d96dfa in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #2: THStorage_free + 0xca (0x7fe862a0caea in /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2.so)
frame #3: + 0x4b6317 (0x7fe89c26a317 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #4: python3() [0x54f6e6]
frame #5: python3() [0x5734e0]
frame #6: python3() [0x4b1a28]
frame #7: python3() [0x589078]
frame #8: python3() [0x5ade68]
frame #9: python3() [0x5ade7e]
frame #10: python3() [0x5ade7e]
frame #11: python3() [0x5ade7e]
frame #12: python3() [0x56be56]

frame #18: __libc_start_main + 0xe7 (0x7fe8bd840b97 in /lib/x86_64-linux-gnu/libc.so.6)

The error seems to be occuring at torch.load(). Since, If I only run
checkpoint = torch.load('checkpoints/fashion.pth.tar')
Then, the error appears:

RuntimeError Traceback (most recent call last)
in ()
----> 1 checkpoint = torch.load('checkpoints/fashion.pth.tar')

1 frames
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module)
543 for key in deserialized_storage_keys:
544 assert key in deserialized_objects
--> 545 deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
546 offset = None
547

RuntimeError: unexpected EOF, expected 480726 more bytes. The file might be corrupted.

The issue appear to be with fashion.pth.tar file. Using taichi-adv-cpk.pth.tar does not produce the error.

Face swap?

Hello,

i have seen that on https://aliaksandrsiarohin.github.io/first-order-model-website/ you have published an example of face-swap, but i cant find the description of the method used in the paper or here.

Thanks for reading.

License

Hi.

I see a license was added in #12.

The added license is not advised for software (Creative Commons iteslf advises not to use it for software). Additionally, it's a proprietary license.

I advise you to opt for Apache 2.0 or a similar open source software license, as advised by Creative Commons. This would make the project maximally useful for others.

Error in train.py

Hello,

I just downloaded your repo, set up conda env with requirements.txt and torch1.0.0. Downloaded datasets by the links in README.md and got the following error when calling train.py:

python run.py --config config/mgif-256.yaml --device_ids 0
/gpfs-volume/miniconda3/envs/fomm/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Traceback (most recent call last):
  File "run.py", line 68, in <module>
    dataset = FramesDataset(is_train=(opt.mode == 'train'), **config['dataset_params'])
TypeError: __init__() got an unexpected keyword argument 'image_shape'

Problems downloading the checkpoints

Hi, I downloaded all the checkpoints, both with simple download, both with the Yandex.Disk app, but all archives seems to be corrupted.
I tried to open them with WinRAR, 7zip and PeaZip, same result.
Can you please help me?
Thank you.

Any chance we could get a Prebuilt Windows Release with dependencies included?

It would be nice if we could get a release with everything included to run the code or at least run the demo, something similar to this , there is a folder called _internal with a python virtualenv and all the dependencies to run that DeepFake project, even scripts to do specific tasks fast like training. Currently its hard to run the demo for First Order Motion Model for Image Animation as it has some dependencies that need a specific version of some libraries, in my case I tried to run the demo and it didnt work, after a few hours downloading all the necessary stuff and datasets I couldnt run it no matter what I tried, would be nice to have something we can test at least the demo right away and then we can decide what to continue testing.

How to run demo.py with pytorch 1.0.0 with CPU support

I get the following error:

C:\Users\USER\source\repos\AliaksandrSiarohin\first-order-model>python demo.py --config config/vox-256.yaml --driving_video first-order-motion-model/04.mp4 --source_image first-order-motion-model/05.png --checkpoint first-order-motion-model/vox-cpk.pth.tar --relative --adapt_scale
demo.py:25: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Traceback (most recent call last):
File "demo.py", line 123, in
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
File "demo.py", line 29, in load_checkpoints
generator.cuda()
File "C:\Users\USER\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 260, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Users\USER\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply
module._apply(fn)
File "C:\Users\USER\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply
module._apply(fn)
File "C:\Users\USER\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "C:\Users\USER\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 193, in _apply
param.data = fn(param.data)
File "C:\Users\USER\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 260, in
return self.apply(lambda t: t.cuda(device))
File "C:\Users\USER\anaconda3\lib\site-packages\torch\cuda_init.py", line 161, in _lazy_init
check_driver()
File "C:\Users\USER\anaconda3\lib\site-packages\torch\cuda_init.py", line 75, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

How should I get a running demo of the pose transformation.

Hi,

I was wondering how I'd get the other parts running. You only show the face manipulation stuff. How can I get started with for example the taichi or fashion?

Thanks.

Hardware GPU requirements

Hi, I follow the steps to running it locally.

I'm getting the following error

/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/.local/lib/python3.6/site-packages/imageio/core/format.py:403: UserWarning: Could not read last frame of /home/Projects/IA/first-order-model/crop.mp4.
  warn('Could not read last frame of %s.' % uri)
/home/.local/lib/python3.6/site-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
/home/.local/lib/python3.6/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
  warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
Traceback (most recent call last):
  File "demo.py", line 123, in <module>
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
  File "demo.py", line 35, in load_checkpoints
    checkpoint = torch.load(checkpoint_path)
  File "/home.local/lib/python3.6/site-packages/torch/serialization.py", line 367, in load
    return _load(f, map_location, pickle_module)
  File "/home/.local/lib/python3.6/site-packages/torch/serialization.py", line 538, in _load
    result = unpickler.load()
  File "/home/.local/lib/python3.6/site-packages/torch/serialization.py", line 504, in persistent_load
    data_type(size), location)
  File "/home/.local/lib/python3.6/site-packages/torch/serialization.py", line 113, in default_restore_location
    result = fn(storage, location)
  File "/home/.local/lib/python3.6/site-packages/torch/serialization.py", line 95, in _cuda_deserialize
    return obj.cuda(device)
  File "/home/.local/lib/python3.6/site-packages/torch/_utils.py", line 76, in _cuda
    return new_type(self.size()).copy_(self, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 1.96 GiB total capacity; 794.95 MiB already allocated; 17.06 MiB free; 1.05 MiB cached)
(myenv-first-order-model)

when execute the following

python demo.py --config config/vox-adv-256.yaml --driving_video crop.mp4 --source_image sabri.JPG --checkpoint "/media/Projects/IA/first-order-model-checkpoints/vox-adv-cpk.pth.tar" --relative --adapt_scale

What is the license for this repository?

There's no license information for this repository, when no license is specified then it's implied that all rights are reserved and it is not Open Source or Free. No one can modify, redistribute or contribute to this code without explicit permission from the copyright holder. Lately in the deep learning community, innovative projects have been more and more restrictive with their licenses, I believe this have slowed down innovation and implementation of said solutions considerably as no one is willing to risk investing so much time into something that in the end cannot be freely used or is outrageously expensive, I'd really appreciate if you'd consider a license without any crippling restrictions as I'm sure there're lots of people who are itching to make something awesome with it (remember when GPT-2 best model was kept private, when they finally decided to released it, AI Dungeon 2 was born? That was mindblowing as no one ever thought of using it that way)