12dmodel / deep_motion_mag Goto Github PK

View Code? Open in Web Editor NEW

452.0 452.0 131.0 23 KB

Tensorflow implementation of Learning-based Video Motion Magnification

License: MIT License

Python 97.69% Shell 2.31%

deep_motion_mag's People

Contributors

Stargazers

Watchers

Forkers

griffin93 larsoncs pzw520125 snailqueen rwenqi adis300 mlque challenzhou hxl1990 clowread zjshf peterzs peterzhousz wanghan6937 chuanzhidong dtmddx dhiman20009 fs-marco liyuejul yakuzeng limiaoshuo fegonda guoruiwang anujonthemove ljw723 sunzj m-liu1987 ifeife123 aiplus2019 majeedyaseen zhaojw1998 evilyingyun licongsheng westcityinstitute 1944715263 aliushn ankitrai-22 3thiago thuanvh heltonmaia pierfra-rocci chiuyuen-hsui zhongkey99 varie888 srzhao storskaeg variance dhruv8890 xahiru georgedeac asantos2000 vipero07 tproma zw2253782 xrosliang lincchenl kmooye amospeled mherkazandjian y742035557 damiegbeyemi braveryang devendra-dhakad neilgautam wjyhumor piotrantosz liyantett ceressh coderx7 asaadalsharif yuehchuan casonclagg kelvin-2020 89g sunshouqiang yildizozan bezubaba sailfish009 dansonc hushangh ehis-io mohammad-ali-dastgheib lorenzob danny0559 intp1 ori226 neurofan harryxd2018 zhong666666 kevinmicha pax7 jonnyytorres tianchi03 sanpigh joshgreifer xy-lin buptdbj mrg7 emgenbro taoboq

deep_motion_mag's Issues

Working Google Colaboratory Notebook.

I was working with this code on google colab. but was facing various issues, so as to remove it for all of the other users I am uploading my notebook with all the proper instructions so that you have not to go through them.
Hope it will help you.

You can find it on:
https://github.com/its-mayank/Video-Magnification/blob/master/Video_Magnification.ipynb

It is the original work of this paper, and all the rights are reserved with them. I have just Implemented and shared it so that all do not have to face the same problem.

Can you share the code for the data preparation?

Hi.
I want to understand the data preparation better. Can you share the code for the data preparation? how exactly you did the color perturbation?

Does frameC is color perturbed frameB in the dataset?

I downloaded the image dataset and found that there are four image directories.
I guess that 'amplifed', 'frameA', and 'frameB' are Y, X_a, and X_b in the paper.
I wonder 'frameC' is X'_b or not.

And do you also have the loss term L₁(V'_b,V'_Y) in the code?
I couldn't find the perturbed Y in the dataset.

Best,

Dataset

Hello,
I am wondering if there is any source code about how you synthesize the amplified ground truth for your training dataset?
If so, can you share it?
Looking forward to your reply.
Thanks.

About dataset synthesis

Hello:
In the json file of the training dataset,the values of the parameters "motion_amount" and "center" are float point. How do you paste foreground objects to background image while pixel position is int point? The paper mentioned that you reconstructed the image in continious domain with bicubic interpolation method. Do you enlarge the image first using bicubic interpolation method ?And then paste the foreground objects and dowmsample the image synthesized. Can you tell me the process of the reconstruction ?

Y'and Vy'

sorry
Where Y' and Vy'?

about training

hello：
I want to ask about how the txt file used in the training process is generated? And what is its function? And the parameters in the txt file are more than 100000 data, so they should not be one-to-one correspondence？Looking forward to your reply！！！

TF ConcatOp error: dimensions not same

When running on custom data:

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,360,248,32] vs. shape[1] = [1,360,247,32]

I have no idea where the numbers 360, 248 are coming from. Image width/height parameters are correct for my data. Any ideas?

How to pre-process data when I train on my data

When I run convert_3frames_data_to_tfrecords.py，what‘s the specific steps should I follow？
Thank you for your answering!

What is the meanings of `shape_type`, `color_mult` and `third_frame_motion` in meta files?

{  
   "background_file":"/home/tiam/project_data/tf_deep_mag/raw_ingredients/coco_train2017/000000225345.jpg",
   "third_frame_motion":1.0,
   "sample_properties":[  
      {  
         "color_add":[  
            1.925357951045477,
            9.502529585992523,
            7.880988956721406
         ],
         "fg_idx":1311,
         "orientation":3.301299684358326,
         "motion_dir":5.8994013477181255,
         "shape_type":2,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":2.191457788030521,
         "scaling":0.195800786759281,
         "motion_amount":0.3861106423132381,
         "center":[  
            190.46849859664286,
            208.1545052627958
         ]
      },
      {  
         "color_add":[  
            -10.01393090981086,
            -7.44447289150088,
            -3.836791152177721
         ],
         "fg_idx":4712,
         "orientation":1.2146459169689545,
         "motion_dir":2.8215923059585046,
         "shape_type":2,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":1.575751231357717,
         "scaling":0.7695978265063551,
         "motion_amount":0.9252538154237787,
         "center":[  
            138.44686122207858,
            274.563682226355
         ]
      },
      {  
         "color_add":[  
            13.653071468187918,
            14.028489528534774,
            -8.215312919870719
         ],
         "fg_idx":5718,
         "orientation":2.054352406981038,
         "motion_dir":0.10612433944136192,
         "shape_type":3,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":1.759935147917405,
         "scaling":0.799292509942318,
         "motion_amount":1.008468522062427,
         "center":[  
            299.6627111293551,
            244.2818327143585
         ]
      },
      {  
         "color_add":[  
            11.15283742278356,
            -2.560571143035471,
            13.147733175999939
         ],
         "fg_idx":3507,
         "orientation":0.4777896338358551,
         "motion_dir":3.184019233633535,
         "shape_type":1,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":2.7324625189657796,
         "scaling":0.10427110663734562,
         "motion_amount":0.4029012709278865,
         "center":[  
            374.5391294523404,
            85.92793533987069
         ]
      },
      {  
         "color_add":[  
            -7.741979262278923,
            10.189081151868574,
            14.194209720121702
         ],
         "fg_idx":1416,
         "orientation":4.676984784008412,
         "motion_dir":4.486443504656097,
         "shape_type":2,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":0.11384226239960371,
         "scaling":0.5548262804937897,
         "motion_amount":0.0843749640576295,
         "center":[  
            215.72180825160055,
            9.213579577060514
         ]
      },
      {  
         "color_add":[  
            -2.3170936238533315,
            -8.917824166917462,
            3.6573973430047566
         ],
         "fg_idx":4341,
         "orientation":5.684192681269192,
         "motion_dir":1.8791328006121593,
         "shape_type":3,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":2.6267749876376825,
         "scaling":0.6529448261831385,
         "motion_amount":0.43769959396502617,
         "center":[  
            41.803534791626944,
            61.853745682737994
         ]
      },
      {  
         "color_add":[  
            -7.576833772172929,
            8.619156100953902,
            6.769010895167728
         ],
         "fg_idx":1717,
         "orientation":6.17109738993578,
         "motion_dir":0.6974625676398899,
         "shape_type":1,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":0.37686866068980446,
         "scaling":0.6900220937610626,
         "motion_amount":0.20012596632715263,
         "center":[  
            125.69403191253784,
            93.55391836687608
         ]
      },
      {  
         "color_add":[  
            -8.870124580096775,
            -14.199621328432295,
            -2.6015452878369505
         ],
         "fg_idx":663,
         "orientation":3.8206371992641466,
         "motion_dir":3.524134968760174,
         "shape_type":1,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":1.3452644797115627,
         "scaling":0.45703039713457094,
         "motion_amount":0.5315456823000649,
         "center":[  
            337.18518966576335,
            293.3950208811551
         ]
      },
      {  
         "color_add":[  
            -12.987699689796031,
            5.82086276136825,
            -5.900104564786595
         ],
         "fg_idx":3452,
         "orientation":0.5694792141183294,
         "motion_dir":5.072288980793892,
         "shape_type":2,
         "color_mult":[  
            1.0,
            1.0,
            1.0
         ],
         "aspect_ratio":0.8311780579834941,
         "scaling":0.34532467780479253,
         "motion_amount":0.9680977920488166,
         "center":[  
            167.71642386436253,
            142.27622959742757
         ]
      }
   ],
   "color_amplification_factor":1.0,
   "amplification_factor":28.351030206327763,
   "bg_properties":{  
      "motion_dir":5.543601538543549,
      "color_mult":[  
         1.0,
         1.0,
         1.0
      ],
      "color_add":[  
         -10.990955175457486,
         -14.4096269084179,
         -13.60394413364702
      ],
      "motion_amount":0.030528508874763105,
      "bg_blur_amount":4.732548688741219
   }
}

I downloaded the whole dataset to figure out how you prepare the data for training. And I understand most of the parameters in the meta files. However, I am not clear about shape_type, color_mult and third_frame_motion in the meta files. Thank you for your help!

Adjusting spatial frequency λc

I was reading the research papers and am trying to amplify color. I can set all the parameters that are listed but I don't know where to set the spatial frequency cutoff, represented as λc in the paper

Paper - http://people.csail.mit.edu/mrub/papers/vidmag.pdf

Thank you

Is it possible to remove the perturbation for amplified frame?

Hi Tiam @12dmodel ,
I'm wondering if there is a way to remove/revert the perturbation for frameB and amplified?
Best,
Sheng

Wrong configured License in the repository

According to the repository's MIT license I could use and distribute it commercially. But that's not what the repository description says.

It was important to adjust the license of the repository to avoid future licensing problems with people who did not read or wanted to take advantage of the wrong licensing configuration on github.

get the heart rate

Can you please briefly explain how to get heart rate using the baby clip example through the output of the decoder?

There are restriction on the image size?

I have faced an error (below) while running run_on_test_videos.sh on my sequence of images
Wondering what may be, the only clue I got was tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,322,364,32] vs. shape[1] = [1,321,364,32]. Seems somewhere the shapes do not match, but I do not know from what, or what should I change in the code.
However, observing the images I put in the data/vids/'videoname', their sizes are 727 x 642. Then I realize that the error showed numbers close to the half of these image dimensions.
However, the problem do not seem to be with 727 (as ceil(727/2)=364 match in both shape[0] and shape[1] in the error description), so, the problem might be with the dimension 642, as 332 do not match with 321 (respectively 642/2 +1 and 642/2)

So, my question is: there is any known restriction on the image sizes? They must be odd? Which image sizes could make the approximation errors that break the program?
Knowing it Would help me to crop the images in a size the can work.

Thank you in advance.

The full error:

Traceback (most recent call last):
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,322,364,32] vs. shape[1] = [1,321,364,32]
	 [[{{node ynet_3frames/decoder/concat}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 102, in <module>
    main(arguments)
  File "main.py", line 83, in main
    args.velocity_mag)
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 279, in run
    out_amp = self.inference(prev_frame, frame, amplification_factor)
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 235, in inference
    [amplification_factor]})
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,322,364,32] vs. shape[1] = [1,321,364,32]
	 [[node ynet_3frames/decoder/concat (defined at /home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'ynet_3frames/decoder/concat':
  File "main.py", line 102, in <module>
    main(arguments)
  File "main.py", line 83, in main
    args.velocity_mag)
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 265, in run
    self.setup_for_inference(checkpoint_dir, image_width, image_height)
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 209, in setup_for_inference
    self._build_feed_model()
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 196, in _build_feed_model
    False)
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 145, in image_transformer
    return self._decoder(self.texture_b, self.out_shape_enc)
  File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 119, in _decoder
    enc = tf.concat([texture_enc, shape_enc], axis=3)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 1420, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1257, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

why res_manipulator returns enc_b + diff rather than enc_a + diff?

In modules.py, I found that res_manipulator returns enc_b + diff rather than enc_a + diff (line 38).
According to the eq. 3 in the paper, isn't the later correct?
I might understand wrong...

About dataset synthesis

Hello:
In the json file of the training dataset,the values of the parameters "motion_amount" and "center" are decimal. How do you paste foreground objects to background image while pixel position is integer? The paper mentioned that you reconstructed the image in continious domain with bicubic interpolation method. Do you enlarge the image first using bicubic interpolation method ?And then paste the foreground objects and dowmsample the image synthesized. Can you tell me the process of the reconstruction ?

Got demension mismatch during testing

I have got a really wired issue during testing, the dimension got a mismatch during
enc = tf.concat([texture_enc, shape_enc], axis=3)
and the mistake information is
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,84,68,32] vs. shape[1] = [1,83,68,32]
[[Node: ynet_3frames/decoder/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](ynet_3frames/decoder/Relu, ynet_3frames/manipulator/add_1, ynet_3frames/decoder/concat/axis)]]

About regularization loss

Hi,
In the paper, regularization is used to drive the separation of the texture and the shape representations.
I'm doing a similar work, but I don't have frameC to do regularization. If there is no regularization, will the separation result be poor? And are there other alternatives to regularization?
Thank you~

run_temporal_on_test_videos.sh takes a lot of time to process.

sh run_temporal_on_test_videos.sh o3f_hmhm2_bg_qnoise_mix4_nl_n_t_ds3 my_own_video 20 1.12 1.6 30 2 differenceOfIIR

I am running the above command on 800 frames. It takes around 30 mins to process. Something wrong with my command ?

Unzipped dataset description.

I'm wondering if there is any description of the unzipped dataset.
I downloaded the tfrecords files from the list, merged them into one file and unzipped it. However, there were some warnings during the unzipping process, and there is only one file train.tfrecords. Is this as expected?

Could you please send me the raw videos?

Could you please send me the raw videos?
For the throat video, the FPS is 1900. Do we have to use such high-speed camera to gain such effect?
Thank you!

Best regards!