12dmodel / deep_motion_mag Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation of Learning-based Video Motion Magnification
License: MIT License
Tensorflow implementation of Learning-based Video Motion Magnification
License: MIT License
I'm wondering if there is any description of the unzipped dataset.
I downloaded the tfrecords files from the list, merged them into one file and unzipped it. However, there were some warnings during the unzipping process, and there is only one file train.tfrecords
. Is this as expected?
Hi.
I want to understand the data preparation better. Can you share the code for the data preparation? how exactly you did the color perturbation?
I have got a really wired issue during testing, the dimension got a mismatch during
enc = tf.concat([texture_enc, shape_enc], axis=3)
and the mistake information is
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,84,68,32] vs. shape[1] = [1,83,68,32]
[[Node: ynet_3frames/decoder/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](ynet_3frames/decoder/Relu, ynet_3frames/manipulator/add_1, ynet_3frames/decoder/concat/axis)]]
I have faced an error (below) while running run_on_test_videos.sh on my sequence of images
Wondering what may be, the only clue I got was tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,322,364,32] vs. shape[1] = [1,321,364,32]
. Seems somewhere the shapes do not match, but I do not know from what, or what should I change in the code.
However, observing the images I put in the data/vids/'videoname', their sizes are 727 x 642. Then I realize that the error showed numbers close to the half of these image dimensions.
However, the problem do not seem to be with 727 (as ceil(727/2)=364 match in both shape[0] and shape[1] in the error description), so, the problem might be with the dimension 642, as 332 do not match with 321 (respectively 642/2 +1 and 642/2)
So, my question is: there is any known restriction on the image sizes? They must be odd? Which image sizes could make the approximation errors that break the program?
Knowing it Would help me to crop the images in a size the can work.
Thank you in advance.
The full error:
Traceback (most recent call last):
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,322,364,32] vs. shape[1] = [1,321,364,32]
[[{{node ynet_3frames/decoder/concat}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 102, in <module>
main(arguments)
File "main.py", line 83, in main
args.velocity_mag)
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 279, in run
out_amp = self.inference(prev_frame, frame, amplification_factor)
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 235, in inference
[amplification_factor]})
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,322,364,32] vs. shape[1] = [1,321,364,32]
[[node ynet_3frames/decoder/concat (defined at /home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Original stack trace for 'ynet_3frames/decoder/concat':
File "main.py", line 102, in <module>
main(arguments)
File "main.py", line 83, in main
args.velocity_mag)
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 265, in run
self.setup_for_inference(checkpoint_dir, image_width, image_height)
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 209, in setup_for_inference
self._build_feed_model()
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 196, in _build_feed_model
False)
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 145, in image_transformer
return self._decoder(self.texture_b, self.out_shape_enc)
File "/home/jonathan/Dropbox/UnB/motionMag/deep_motion_mag-master/magnet.py", line 119, in _decoder
enc = tf.concat([texture_enc, shape_enc], axis=3)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 1420, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1257, in concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/jonathan/Dropbox/UnB/motionMag/momagenv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
{
"background_file":"/home/tiam/project_data/tf_deep_mag/raw_ingredients/coco_train2017/000000225345.jpg",
"third_frame_motion":1.0,
"sample_properties":[
{
"color_add":[
1.925357951045477,
9.502529585992523,
7.880988956721406
],
"fg_idx":1311,
"orientation":3.301299684358326,
"motion_dir":5.8994013477181255,
"shape_type":2,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":2.191457788030521,
"scaling":0.195800786759281,
"motion_amount":0.3861106423132381,
"center":[
190.46849859664286,
208.1545052627958
]
},
{
"color_add":[
-10.01393090981086,
-7.44447289150088,
-3.836791152177721
],
"fg_idx":4712,
"orientation":1.2146459169689545,
"motion_dir":2.8215923059585046,
"shape_type":2,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":1.575751231357717,
"scaling":0.7695978265063551,
"motion_amount":0.9252538154237787,
"center":[
138.44686122207858,
274.563682226355
]
},
{
"color_add":[
13.653071468187918,
14.028489528534774,
-8.215312919870719
],
"fg_idx":5718,
"orientation":2.054352406981038,
"motion_dir":0.10612433944136192,
"shape_type":3,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":1.759935147917405,
"scaling":0.799292509942318,
"motion_amount":1.008468522062427,
"center":[
299.6627111293551,
244.2818327143585
]
},
{
"color_add":[
11.15283742278356,
-2.560571143035471,
13.147733175999939
],
"fg_idx":3507,
"orientation":0.4777896338358551,
"motion_dir":3.184019233633535,
"shape_type":1,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":2.7324625189657796,
"scaling":0.10427110663734562,
"motion_amount":0.4029012709278865,
"center":[
374.5391294523404,
85.92793533987069
]
},
{
"color_add":[
-7.741979262278923,
10.189081151868574,
14.194209720121702
],
"fg_idx":1416,
"orientation":4.676984784008412,
"motion_dir":4.486443504656097,
"shape_type":2,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":0.11384226239960371,
"scaling":0.5548262804937897,
"motion_amount":0.0843749640576295,
"center":[
215.72180825160055,
9.213579577060514
]
},
{
"color_add":[
-2.3170936238533315,
-8.917824166917462,
3.6573973430047566
],
"fg_idx":4341,
"orientation":5.684192681269192,
"motion_dir":1.8791328006121593,
"shape_type":3,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":2.6267749876376825,
"scaling":0.6529448261831385,
"motion_amount":0.43769959396502617,
"center":[
41.803534791626944,
61.853745682737994
]
},
{
"color_add":[
-7.576833772172929,
8.619156100953902,
6.769010895167728
],
"fg_idx":1717,
"orientation":6.17109738993578,
"motion_dir":0.6974625676398899,
"shape_type":1,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":0.37686866068980446,
"scaling":0.6900220937610626,
"motion_amount":0.20012596632715263,
"center":[
125.69403191253784,
93.55391836687608
]
},
{
"color_add":[
-8.870124580096775,
-14.199621328432295,
-2.6015452878369505
],
"fg_idx":663,
"orientation":3.8206371992641466,
"motion_dir":3.524134968760174,
"shape_type":1,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":1.3452644797115627,
"scaling":0.45703039713457094,
"motion_amount":0.5315456823000649,
"center":[
337.18518966576335,
293.3950208811551
]
},
{
"color_add":[
-12.987699689796031,
5.82086276136825,
-5.900104564786595
],
"fg_idx":3452,
"orientation":0.5694792141183294,
"motion_dir":5.072288980793892,
"shape_type":2,
"color_mult":[
1.0,
1.0,
1.0
],
"aspect_ratio":0.8311780579834941,
"scaling":0.34532467780479253,
"motion_amount":0.9680977920488166,
"center":[
167.71642386436253,
142.27622959742757
]
}
],
"color_amplification_factor":1.0,
"amplification_factor":28.351030206327763,
"bg_properties":{
"motion_dir":5.543601538543549,
"color_mult":[
1.0,
1.0,
1.0
],
"color_add":[
-10.990955175457486,
-14.4096269084179,
-13.60394413364702
],
"motion_amount":0.030528508874763105,
"bg_blur_amount":4.732548688741219
}
}
I downloaded the whole dataset to figure out how you prepare the data for training. And I understand most of the parameters in the meta files. However, I am not clear about shape_type
, color_mult
and third_frame_motion
in the meta files. Thank you for your help!
Hello:
In the json file of the training dataset,the values of the parameters "motion_amount" and "center" are float point. How do you paste foreground objects to background image while pixel position is int point? The paper mentioned that you reconstructed the image in continious domain with bicubic interpolation method. Do you enlarge the image first using bicubic interpolation method ?And then paste the foreground objects and dowmsample the image synthesized. Can you tell me the process of the reconstruction ?
When I run convert_3frames_data_to_tfrecords.py,what‘s the specific steps should I follow?
Thank you for your answering!
Can you please briefly explain how to get heart rate using the baby clip example through the output of the decoder?
When running on custom data:
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,360,248,32] vs. shape[1] = [1,360,247,32]
I have no idea where the numbers 360, 248 are coming from. Image width/height parameters are correct for my data. Any ideas?
I was working with this code on google colab. but was facing various issues, so as to remove it for all of the other users I am uploading my notebook with all the proper instructions so that you have not to go through them.
Hope it will help you.
You can find it on:
https://github.com/its-mayank/Video-Magnification/blob/master/Video_Magnification.ipynb
It is the original work of this paper, and all the rights are reserved with them. I have just Implemented and shared it so that all do not have to face the same problem.
In modules.py, I found that res_manipulator returns enc_b + diff rather than enc_a + diff (line 38).
According to the eq. 3 in the paper, isn't the later correct?
I might understand wrong...
Hello,
I am wondering if there is any source code about how you synthesize the amplified ground truth for your training dataset?
If so, can you share it?
Looking forward to your reply.
Thanks.
I downloaded the image dataset and found that there are four image directories.
I guess that 'amplifed', 'frameA', and 'frameB' are Y, Xa, and Xb in the paper.
I wonder 'frameC' is X'b or not.
And do you also have the loss term L1(V'b,V'Y) in the code?
I couldn't find the perturbed Y in the dataset.
Best,
Hi,
In the paper, regularization is used to drive the separation of the texture and the shape representations.
I'm doing a similar work, but I don't have frameC to do regularization. If there is no regularization, will the separation result be poor? And are there other alternatives to regularization?
Thank you~
sorry
Where Y' and Vy'?
hello:
I want to ask about how the txt file used in the training process is generated? And what is its function? And the parameters in the txt file are more than 100000 data, so they should not be one-to-one correspondence?Looking forward to your reply!!!
Hi Tiam @12dmodel ,
I'm wondering if there is a way to remove/revert the perturbation for frameB and amplified?
Best,
Sheng
According to the repository's MIT license I could use and distribute it commercially. But that's not what the repository description says.
It was important to adjust the license of the repository to avoid future licensing problems with people who did not read or wanted to take advantage of the wrong licensing configuration on github.
I was reading the research papers and am trying to amplify color. I can set all the parameters that are listed but I don't know where to set the spatial frequency cutoff, represented as λc in the paper
Paper - http://people.csail.mit.edu/mrub/papers/vidmag.pdf
Thank you
Hello:
In the json file of the training dataset,the values of the parameters "motion_amount" and "center" are decimal. How do you paste foreground objects to background image while pixel position is integer? The paper mentioned that you reconstructed the image in continious domain with bicubic interpolation method. Do you enlarge the image first using bicubic interpolation method ?And then paste the foreground objects and dowmsample the image synthesized. Can you tell me the process of the reconstruction ?
Could you please send me the raw videos?
For the throat video, the FPS is 1900. Do we have to use such high-speed camera to gain such effect?
Thank you!
Best regards!
sh run_temporal_on_test_videos.sh o3f_hmhm2_bg_qnoise_mix4_nl_n_t_ds3 my_own_video 20 1.12 1.6 30 2 differenceOfIIR
I am running the above command on 800 frames. It takes around 30 mins to process. Something wrong with my command ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.