Giter Site home page Giter Site logo

Comments (57)

adeptflax avatar adeptflax commented on May 25, 2024 22

I'm going to train a 512x512 face model and release it to the public under the public domain.

from first-order-model.

adeptflax avatar adeptflax commented on May 25, 2024 20

Here it is: https://github.com/adeptflax/motion-models with any additional info you might want to know. I uploaded the model to mediafire. Hopefully that doesn't cause any issues.

from first-order-model.

william-nz avatar william-nz commented on May 25, 2024 3

@adeptflax First off, thanks for doing this :)

im having an issue
_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
from here
File "demo.py", line 42, in load_checkpoints checkpoint = torch.load(checkpoint_path)

I think it has something to do with the file format of the checkpoint? any ideas?

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024 2

Yes you should use scale_factor = 0.0625. In other words kp_detector and dense_motion should always operate on the same 64x64 resolution.
This sigma is parameter of anti-aliasing for downsampling, in principle any could be used, I select the one which is used by default in scikit-image. So sigma=1.5 is default for 256x256. But I don't think it affect results that much. So you can leave it equal to 1.5 or you can avoid loading this dense_motion_network.down.weight parameter, by removing it from state_dict.

from first-order-model.

LopsidedJoaw avatar LopsidedJoaw commented on May 25, 2024 2

@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.

Generally the result looks like this:

ezgif-1-3f05db10770d

It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...

Many thanks for your excellent work.

from first-order-model.

dreammonkey avatar dreammonkey commented on May 25, 2024 2

Hi all,

I was wondering if anyone has succeeded in successfully retraining the network to support 512x512 (or higher) images ?
Before attempting this my self, I thought it might be a good idea to check if anyone has succeeded in retraining and if yes if that person would be kind enough to provide the checkpoints/configuration with the community ? 🙏

Kind regards

from first-order-model.

bigboss97 avatar bigboss97 commented on May 25, 2024 2

@adeptflax Thank you so much for your hard work. I managed to run your 512 version. Just for comparison, here are my old 256 footage and the new 512 version:

result256.mp4
result512.mp4

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024 1
  1. The only reliable methods is to retrain on high resolution videos.
  2. You can also try to use of the shell video super-resolution method.
  3. Since all the networks are fully convolutional you can actually try to use pretrained checkpoints , trained on 256 images. In order to do this change the size in
    source_image = resize(source_image, (256, 256))[..., :3]
    to size that you want. Also it may be benificial to change scale_factor parameter in config
    scale_factor: 0.25
    and in
    scale_factor: 0.25
    . For example if you want 512 resolution images change it to 0.125, so that input resolution for these networks is always 64.

If you have any lack with these please share your findings.

from first-order-model.

eps696 avatar eps696 commented on May 25, 2024 1

@pidginred i've used it for rather artistic purposes (applying to face-alike imagery), so cannot confirm 100%. it definitely behaved very similar with 1024 and 256 resolutions, though.
speaking animation quality, quite a lot was said here about the necessity of having similarity in poses (or face expressions) between the source image and the starting video frame. i think you may want to check that first.

from first-order-model.

adeptflax avatar adeptflax commented on May 25, 2024 1

I got it trained I will be uploading it shortly

from first-order-model.

5agado avatar 5agado commented on May 25, 2024

@AliaksandrSiarohin thanks for the feedback.

Notice however that point 3 doesn't work out-of-the-box. If I change the scale factors as you mention I get an error for incompatible shapes.

Also, as I'm planning to try out some super-resolution methods for this, I'm curious about what you mean with "shell video super-resolution method"?

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024

Can you post the error message you got?
I mean some video super resolution method, like one there https://paperswithcode.com/task/video-super-resolution

from first-order-model.

5agado avatar 5agado commented on May 25, 2024

@AliaksandrSiarohin

Error(s) in loading state_dict for OcclusionAwareGenerator:
	size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024

Ah yes you are right.
Can you try in

sigma = (1 / scale - 1) / 2

to hard set sigma=1.5?

from first-order-model.

5agado avatar 5agado commented on May 25, 2024

Cool, that worked! Could it be generalized for other resolutions?
I'll do some tests and comparisons using super-resolution

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024

What do you mean? Generalized?

from first-order-model.

5agado avatar 5agado commented on May 25, 2024

Is the scale factor proportional to image size? Like if I wanted to try with 1024x1024 I should use scale_factor = 0.0625?

Also is the fixed sigma (1.5) valid only for size 512? What about for size 1024?

I was interested in generalizing my setup such that these values can be derived automatically by the given image size.

from first-order-model.

5agado avatar 5agado commented on May 25, 2024

Thanks so much for the support, really valuable info here!

from first-order-model.

CarolinGao avatar CarolinGao commented on May 25, 2024

Hi ,have you retrained on high resolution videos? If i do not retrain on new datasets, instead just do as the point3 mentioned, can I get a good result?

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024

See https://github.com/tg-bomze/Face-Image-Motion-Model for point2

from first-order-model.

pidginred avatar pidginred commented on May 25, 2024

@AliaksandrSiarohin

sigma=1.5 does not work for 1024x1024 source images (with scale factor of 0.0625). I get the following error:

  File "C:\Users\admin\git\first-order-model\modules\util.py", line 180, in forward
    out = torch.cat([out, skip], dim=1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 1 and 2 in dimension 2 at c:\a\w\1\s\tmp_conda_3.6_061433\conda\conda-bld\pytorch_1544163532679\work\aten\src\thc\generic/THCTensorMath.cu:83

But I can confirm that hard coding sigma=1.5 works only for 512x512 images (with scale factor of 0.125).

Can you please let us know the correct setting for 1024x1024 images? Thank you for your wonderful work.

from first-order-model.

AliaksandrSiarohin avatar AliaksandrSiarohin commented on May 25, 2024

@pidginred can you provide full stack trace, and your configs.

from first-order-model.

pidginred avatar pidginred commented on May 25, 2024

@AliaksandrSiarohin Certainly! Here are the changes I made (for 1024x1024 / 0.0625) & the full error stack:

Diffs

diff --git a/config/vox-256.yaml b/config/vox-256.yaml
index abfe9a2..10fce42 100644
--- a/config/vox-256.yaml
+++ b/config/vox-256.yaml
@@ -23,7 +23,7 @@ model_params:
      temperature: 0.1
      block_expansion: 32
      max_features: 1024
-     scale_factor: 0.25
+     scale_factor: 0.0625
      num_blocks: 5
   generator_params:
     block_expansion: 64
@@ -35,7 +35,7 @@ model_params:
       block_expansion: 64
       max_features: 1024
       num_blocks: 5
-      scale_factor: 0.25
+      scale_factor: 0.0625
   discriminator_params:
     scales: [1]
     block_expansion: 32
diff --git a/demo.py b/demo.py
index 848b3df..28bea70 100644
--- a/demo.py
+++ b/demo.py
@@ -134,7 +134,7 @@ if __name__ == "__main__":
     reader.close()
     driving_video = imageio.mimread(opt.driving_video, memtest=False)
 
-    source_image = resize(source_image, (256, 256))[..., :3]
+    source_image = resize(source_image, (1024, 1024))[..., :3]
     driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
     generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
 
diff --git a/modules/util.py b/modules/util.py
index 8ec1d25..cb8b149 100644
--- a/modules/util.py
+++ b/modules/util.py
@@ -202,7 +202,7 @@ class AntiAliasInterpolation2d(nn.Module):
     """
     def __init__(self, channels, scale):
         super(AntiAliasInterpolation2d, self).__init__()
-        sigma = (1 / scale - 1) / 2
+        sigma = 1.5 # Hard coded as per issues/20#issuecomment-600784060
         kernel_size = 2 * round(sigma * 4) + 1
         self.ka = kernel_size // 2
         self.kb = self.ka - 1 if kernel_size % 2 == 0 else self.ka

Full Errors

(base) C:\Users\admin\git\first-order-model-1024>python demo.py  --config config/vox-256.yaml --driving_video driving.mp4 --source_image source.jpg --checkpoint "C:\Users\admin\Downloads\vox-cpk.pth.tar" --relative --adapt_scale
demo.py:27: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Traceback (most recent call last):
  File "demo.py", line 150, in <module>
    predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
  File "demo.py", line 65, in make_animation
    kp_driving_initial = kp_detector(driving[:, :, 0])
  File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\git\first-order-model-1024\modules\keypoint_detector.py", line 53, in forward
    feature_map = self.predictor(x)
  File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\git\first-order-model-1024\modules\util.py", line 196, in forward
    return self.decoder(self.encoder(x))
  File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\git\first-order-model-1024\modules\util.py", line 180, in forward
    out = torch.cat([out, skip], dim=1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 1 and 2 in dimension 2 at c:\a\w\1\s\tmp_conda_3.6_061433\conda\conda-bld\pytorch_1544163532679\work\aten\src\thc\generic/THCTensorMath.cu:83

from first-order-model.

eps696 avatar eps696 commented on May 25, 2024

@pidginred fixed sigma worked on my side for any resolution, including 1024x1024. it's not the reason of your problems.

from first-order-model.

pidginred avatar pidginred commented on May 25, 2024

@eps696 What was your scale factor for 1024x1024? And did you get a proper output?

from first-order-model.

eps696 avatar eps696 commented on May 25, 2024

@pidginred same as yours, 0.0625.
but i also resize driving_video, not only source_image (which i see you don't).

from first-order-model.

pidginred avatar pidginred commented on May 25, 2024

@eps696 Confirmed that worked. However, I lost almost complete eye & mouth tracking (compared to 256x256), and it results in lots of weird artifacts and very poor quality output.

Are you getting good quality results (in terms of animation) using 1024x1024 compared to 256x256?

from first-order-model.

zpeiguo avatar zpeiguo commented on May 25, 2024

@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.

Generally the result looks like this:

ezgif-1-3f05db10770d

It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...

Many thanks for your excellent work.

I had the same problem

from first-order-model.

zpeiguo avatar zpeiguo commented on May 25, 2024

@eps696
Can you share the revised file?After I followed the above steps, the facial movements were normal, but the mouth could not open.

from first-order-model.

eps696 avatar eps696 commented on May 25, 2024

@zpeiguo that project is unreleased yet, sorry.
and this topic is about high res images. check other issues for 'normality' of movements.

from first-order-model.

shillerz avatar shillerz commented on May 25, 2024

@eps696
Can you share the revised file? After I followed the above steps, the facial movements were normal, but the mouth could not open.

Same here. Mouth won't open. I believe that the best is to retrain everything with a 512 rez

from first-order-model.

boraturant avatar boraturant commented on May 25, 2024

@eps696 Confirmed that worked. However, I lost almost complete eye & mouth tracking (compared to 256x256), and it results in lots of weird artifacts and very poor quality output.

Are you getting good quality results (in terms of animation) using 1024x1024 compared to 256x256?

I have also tested with the third method with 512, the animation quality is lower than the 256. I have to judgement as to why, I expect the quality to be the same with the same 64 keypoints.

from first-order-model.

BloodBlackNothingness avatar BloodBlackNothingness commented on May 25, 2024

I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.

from first-order-model.

bigboss97 avatar bigboss97 commented on May 25, 2024

I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.

I also followed method 3 and the animation is not acceptable :-( Mouth does not open at all and the face is distorted all the time.
Maybe have to use AI to upscale 256 to 512 video :-)

from first-order-model.

BloodBlackNothingness avatar BloodBlackNothingness commented on May 25, 2024

I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.

I also followed method 3 and the animation is not acceptable :-( Mouth does not open at all and the face is distorted all the time.
Maybe have to use AI to upscale 256 to 512 video :-)

Yes in theory. It depends on the video output quality I suppose. I have tried with Topaz Labs software and it also enhances distortions.

from first-order-model.

lschaupp avatar lschaupp commented on May 25, 2024

@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.

Generally the result looks like this:

ezgif-1-3f05db10770d

It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...

Many thanks for your excellent work.

Which super resolution network did you end up using? :)

from first-order-model.

SophistLu avatar SophistLu commented on May 25, 2024

I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.

In demo.py, I try also resizing "driving_video", it works:
driving_video = [resize(frame, (512, 512))[..., :3] for frame in driving_video]

from first-order-model.

bigboss97 avatar bigboss97 commented on May 25, 2024

In demo.py, I try also resizing "driving_video", it works:
driving_video = [resize(frame, (512, 512))[..., :3] for frame in driving_video]

Yes, it ran. But my result (animation) was terrible.

from first-order-model.

konstiantyn avatar konstiantyn commented on May 25, 2024

Untitled
How can I change blending mask size?

from first-order-model.

TracelessLe avatar TracelessLe commented on May 25, 2024

@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.

Generally the result looks like this:

ezgif-1-3f05db10770d

It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...

Many thanks for your excellent work.

hi @LopsidedJoaw, which super-resolution method did you use to get the 320320 size result from 256256 input as your gif shows?

from first-order-model.

LopsidedJoaw avatar LopsidedJoaw commented on May 25, 2024

from first-order-model.

TracelessLe avatar TracelessLe commented on May 25, 2024

I used the same method described in the first 10 or so entries on this post.

On 9 Apr 2021, at 08:04, TracelessLe @.***> wrote: @AliaksandrSiarohin https://github.com/AliaksandrSiarohin @5agado https://github.com/5agado I have run some tests using the method detailed in point 2. Generally the result looks like this: https://user-images.githubusercontent.com/37964292/78800976-fda86580-79b3-11ea-866e-6dfe046b6a20.gif It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong... Many thanks for your excellent work. hi @LopsidedJoaw https://github.com/LopsidedJoaw, which super-resolution method did you use to get the 320320 size result from 256256 input as your gif shows? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBUUBDP6UGKSPLIXO3KDULTH2RH7ANCNFSM4LKIORHA.

Got that, thank you. :)

from first-order-model.

bigboss97 avatar bigboss97 commented on May 25, 2024

Can't wait.
Please also share the process. I think many people are interested.
Thanks.

from first-order-model.

adeptflax avatar adeptflax commented on May 25, 2024

I'm going to take 5 days to train on a rtx3090. I'm also going to train a 512x512 motion-cosegmentation model and release to the public as well under the public domain.

from first-order-model.

LopsidedJoaw avatar LopsidedJoaw commented on May 25, 2024

from first-order-model.

adeptflax avatar adeptflax commented on May 25, 2024

I need these models for a project I'm working, so I might as well release them to the public.

from first-order-model.

 avatar commented on May 25, 2024

When trying to run the 512 model with this command: python demo.py --config config/vox-512.yaml --driving_video videos/2.mp4 --source_image images/4.jpg --checkpoint checkpoints/first-order-model-checkpoint-94.pth.tar --relative --adapt_scale --cpu I get the following error:

/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/imageio/core/format.py:403: UserWarning: Could not read last frame of /home/USER/General/Creating animated characters/First order motion model/first-order-model/videos/2.mp4.
  warn('Could not read last frame of %s.' % uri)
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
  warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
demo.py:27: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Traceback (most recent call last):
  File "demo.py", line 144, in <module>
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
  File "demo.py", line 44, in load_checkpoints
    generator.load_state_dict(checkpoint['generator'])
  File "/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for OcclusionAwareGenerator:
	size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).

It runs fine with the 256 model.
Has anyone run into the same problem or does anyone know how it could be fixed?

Update: I've fixed the problem, I had to change sigma to 1.5 as described here:
https://github.com/adeptflax/motion-models
#20 (comment) (it's also described there how to change 256 to 512 in the demo.py file)

Steps to fix:

  1. in demo.py change everything from 256 to 512 around this line: source_image = resize(source_image, (256, 256))[..., :3]
  2. change sigma to 1.5 in utils.py: sigma = (1 / scale - 1) / 2 to sigma = 1.5
  3. Use videos of 512x512 resolution

from first-order-model.

shyamjithMC avatar shyamjithMC commented on May 25, 2024

Update: I've fixed the problem, I had to change sigma to 1.5 as described here:
https://github.com/adeptflax/motion-models
#20 (comment) (it's also described there how to change 256 to 512 in the demo.py file)

When trying to run the 512 model with this command: python demo.py --config config/vox-512.yaml --driving_video videos/2.mp4 --source_image images/4.jpg --checkpoint checkpoints/first-order-model-checkpoint-94.pth.tar --relative --adapt_scale --cpu I get the following error:

/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/imageio/core/format.py:403: UserWarning: Could not read last frame of /home/USER/General/Creating animated characters/First order motion model/first-order-model/videos/2.mp4.
  warn('Could not read last frame of %s.' % uri)
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
  warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
demo.py:27: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Traceback (most recent call last):
  File "demo.py", line 144, in <module>
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
  File "demo.py", line 44, in load_checkpoints
    generator.load_state_dict(checkpoint['generator'])
  File "/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for OcclusionAwareGenerator:
	size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).

It runs fine with the 256 model.
Has anyone run into the same problem or does anyone know how it could be fixed?

I have the same issue

from first-order-model.

Animan8000 avatar Animan8000 commented on May 25, 2024

@adeptflax First off, thanks for doing this :)

im having an issue
_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
from here
File "demo.py", line 42, in load_checkpoints checkpoint = torch.load(checkpoint_path)

I think it has something to do with the file format of the checkpoint? any ideas?

Same error here "_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified."

from first-order-model.

william-nz avatar william-nz commented on May 25, 2024

@bigboss97 did you do anything to the 512 checkpoint from @adeptflax to get it to work?

from first-order-model.

bigboss97 avatar bigboss97 commented on May 25, 2024

@william-nz No, the only thing I've done... I downloaded the file and ran it with the above (512) modifications. I saw a 512x512 video has been generated. That's all. I'm not very convinced with my result. Therefore I posted it here and hope that people can judge it themselves. Probably, I still have done something wrong 😆
When I get the time I'll do more experiments with that.

from first-order-model.

zhaoruiqiff avatar zhaoruiqiff commented on May 25, 2024

@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.

Generally the result looks like this:

ezgif-1-3f05db10770d

It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...

Many thanks for your excellent work.

Hi @LopsidedJoaw, The video you showed looks very high resolution, which super resolution method did you use to get the result? Thanks!

from first-order-model.

mdv3101 avatar mdv3101 commented on May 25, 2024

Hi @AliaksandrSiarohin ,
I am training a model on 512x512 dataset from scratch. After 15 epochs, the loss is decreasing, but the keypoints seem to have just been confined to a small region.

image

Any Idea why this is happening? I haven't made any changes in the code. I have only modified config file.

dataset_params:
  root_dir: 512_dataset/
  frame_shape: [512, 512, 3]
  id_sampling: False
  pairs_list: data/vox256.csv
  augmentation_params:
    flip_param:
      horizontal_flip: True
      time_flip: True
    jitter_param:
      brightness: 0.1
      contrast: 0.1
      saturation: 0.1
      hue: 0.1


model_params:
  common_params:
    num_kp: 10
    num_channels: 3
    estimate_jacobian: True
  kp_detector_params:
     temperature: 0.1
     block_expansion: 32
     max_features: 1024
     scale_factor: 0.25
     num_blocks: 5
  generator_params:
    block_expansion: 64
    max_features: 512
    num_down_blocks: 2
    num_bottleneck_blocks: 6
    estimate_occlusion_map: True
    dense_motion_params:
      block_expansion: 64
      max_features: 1024
      num_blocks: 5
      scale_factor: 0.25
  discriminator_params:
    scales: [1]
    block_expansion: 32
    max_features: 512
    num_blocks: 4
    sn: True

train_params:
  num_epochs: 100
  num_repeats: 75
  epoch_milestones: [60,90]
  lr_generator: 2.0e-4
  lr_discriminator: 2.0e-4
  lr_kp_detector: 2.0e-4
  batch_size: 4
  scales: [1, 0.5, 0.25, 0.125]
  checkpoint_freq: 5
  transform_params:
    sigma_affine: 0.05
    sigma_tps: 0.005
    points_tps: 5
  loss_weights:
    generator_gan: 0
    discriminator_gan: 1
    feature_matching: [10, 10, 10, 10]
    perceptual: [10, 10, 10, 10, 10]
    equivariance_value: 10
    equivariance_jacobian: 10

reconstruction_params:
  num_videos: 1000
  format: '.mp4'

animate_params:
  num_pairs: 50
  format: '.mp4'
  normalization_params:
    adapt_movement_scale: True
    use_relative_movement: True
    use_relative_jacobian: True

visualizer_params:
  kp_size: 5
  draw_border: True
  colormap: 'gist_rainbow'

Here is the loss till 15 epochs:

00000000) perceptual - 121.42917; equivariance_value - 0.71458; equivariance_jacobian - 0.75562
00000001) perceptual - 109.27000; equivariance_value - 0.35340; equivariance_jacobian - 0.65690
00000002) perceptual - 100.28600; equivariance_value - 0.16266; equivariance_jacobian - 0.56337
00000003) perceptual - 96.12051; equivariance_value - 0.14541; equivariance_jacobian - 0.51318
00000004) perceptual - 93.17576; equivariance_value - 0.14200; equivariance_jacobian - 0.48087
00000005) perceptual - 90.71331; equivariance_value - 0.15415; equivariance_jacobian - 0.47770
00000006) perceptual - 88.90341; equivariance_value - 0.22227; equivariance_jacobian - 0.49095
00000007) perceptual - 86.39249; equivariance_value - 0.21560; equivariance_jacobian - 0.47799
00000008) perceptual - 84.61519; equivariance_value - 0.20801; equivariance_jacobian - 0.46283
00000009) perceptual - 84.08470; equivariance_value - 0.21185; equivariance_jacobian - 0.46702
00000010) perceptual - 82.73890; equivariance_value - 0.20613; equivariance_jacobian - 0.45508
00000011) perceptual - 81.45905; equivariance_value - 0.19839; equivariance_jacobian - 0.44276
00000012) perceptual - 81.00780; equivariance_value - 0.20207; equivariance_jacobian - 0.44244
00000013) perceptual - 80.08536; equivariance_value - 0.19849; equivariance_jacobian - 0.43349
00000014) perceptual - 79.34811; equivariance_value - 0.19838; equivariance_jacobian - 0.42291
00000015) perceptual - 78.98586; equivariance_value - 0.19916; equivariance_jacobian - 0.41774
00000016) perceptual - 78.48245; equivariance_value - 0.19998; equivariance_jacobian - 0.41450

from first-order-model.

sadluck avatar sadluck commented on May 25, 2024

@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.

from first-order-model.

Animan8000 avatar Animan8000 commented on May 25, 2024

@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.

Nope

from first-order-model.

Inferencer avatar Inferencer commented on May 25, 2024

@Animan8000 @william-nz Did you guys ever manage to get over the "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified." error? I'm getting the same thing.

Another user seems to have fixed it for themselves, I have not tried myself yet so have not run into the error.
adeptflax/motion-models#2 (comment)

from first-order-model.

ulucsahin avatar ulucsahin commented on May 25, 2024

With 512x512 model shared above, code runs smoothly with suggested changes. First of all, thank you for sharing it. However, results that I get are not different than upscaled 256x256 results. Animation is not bad, but output video is blurry as if I upscaled 256x256 output to 512x512. Is this expected?

from first-order-model.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.