Giter Site home page Giter Site logo

avinashpaliwal / super-slomo Goto Github PK

View Code? Open in Web Editor NEW
3.0K 96.0 478.0 26.88 MB

PyTorch implementation of Super SloMo by Jiang et al.

License: MIT License

Python 72.48% Jupyter Notebook 27.21% Dockerfile 0.32%
video-frame-interpolation super-slomo pytorch-implmention pytorch slow-motion deep-learning deep-neural-networks convolutional-neural-networks frame-interpolation slomo

super-slomo's People

Contributors

avinashpaliwal avatar cbskarmory avatar cyremur avatar sniklaus avatar ss18 avatar stg7 avatar tamasino52 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

super-slomo's Issues

It doesn't converse faster when I set the parameter(batch_size) to a number greater than one.

Describe the bug
It doesn't converse faster when I set the parameter(batch_size) to a number greater than one.

To Reproduce
Steps to reproduce the behavior:
e.g.

  1. Run cells in video_to_slomo.py
  2. Set batch_size to a number greater than one.
  3. Error/Abnormal behavior: It uses more memory of cuda but doesn't converse faster.

Expected behavior
It can really converse faster.

Interpolated results/error output
Add frames or terminal output as applicable.

Desktop (please complete the following information):

  • OS: Windows
  • Device Used GPU
  • Setup Info [e.g. PyTorch 0.4.1, Python 3.6, ]

Additional context
The utilization rate of the gpu <10%.(with Quadro P6000)

AttributeError: module 'ctypes' has no attribute 'windll' error when running video_to_slomo.py on Ubuntu 18.04

Hi, I'm trying to run the video_to_slomo.py script on ubuntu and I get the following error:

Traceback (most recent call last):
  File "video_to_slomo.py", line 199, in <module>
    main()
  File "video_to_slomo.py", line 95, in main
    ctypes.windll.kernel32.SetFileAttributesW(extractionDir, FILE_ATTRIBUTE_HIDDEN)
AttributeError: module 'ctypes' has no attribute 'windll'

I try to run the script using the following command:

python video_to_slomo.py --ffmpeg /usr/bin/ffmpeg --video ~/Super-SloMo/sample.mp4 --sf 4 --checkpoint checkpoint.ckpt --fps 240 --output output.mp4
Any ideas? I'm running on Ubuntu 18.04

Compared to input the output size is changed

Hey guys ,
I was trained a model used images with (1440,720) ,after i use the ckpt with video_to_slomo.py i got two type of images in tmpSuperSloMo/output folder one is (1440,720) another two are (1440,704) ps the (--sf 3). I dont understand, where are the lost pixs?
Another issue goes like this : my own dataset got the mean[0.696, 0.456, 0.957],and i changed the mean number in train.py . After I used the model created by train.py ,i forgot to changed the mean number in video_to_slomo.py which default[0.429, 0.431, 0.397],then i get some images in hidden tmpSuperSloMo/output folder . I compare these images with the result images after i change the mean number in video_to_slomo.py. They are totally same! ! Why mean parameter in video_to_slomo.py does not working? Did i misunderstand something? Please let me know.

Issue with the output when evaluating with the pretrained model

Hi, forgive me if this sounds a bit dense, I don't really know anything about machine learning but I have a passing interest in making slomo videos. I'm currently evaluating on a CPU (AMD Threradripper 1920x) as I don't have a CUDA device but I'm having trouble with the output from the video_to_slomo.py script. As a reference, I used ffmpeg to convert your original gif to mp4 and tried to create a video but the output looks a bit off. This is what the converted video looks like: Link here. Another video I tried also looks like this. Any ideas why this is?

Thanks

Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor

I am getting this error when I am running with gpu, it is running fine with cpu though.
May be in doing temporary fix for #7 , this error was introduced and I am not able to figure it out.

      handler_name    : VideoHandler
      encoder         : Lavc56.60.100 mjpeg
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
frame=   72 fps=0.0 q=2.0 Lsize=N/A time=00:00:03.00 bitrate=N/A    
video:2098kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
  0%|                                                                          | 0/71 [00:00<?, ?it/s]/users/TeamVideoSummarization/.local/lib/python2.7/site-packages/torch/nn/functional.py:1820: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.upsample instead.
  warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.upsample instead.")
cuda:0

Traceback (most recent call last):
  File "video_to_slomo.py", line 218, in <module>
    main()
  File "video_to_slomo.py", line 185, in main
    g_I0_F_t_0 = flowBackWarp(I0, F_t_0)
  File "/users/TeamVideoSummarization/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/TeamVideoSummarization/shivansh/Super-SloMo/model.py", line 276, in forward
    x = self.gridX.unsqueeze(0).expand_as(u).float() + u
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'

Pardon if this was silly doubt.

create_dataset.py issues on Google Colab

Describe the bug

I am trying to train the model via Google Colab and I am encountering issues about the creation of the dataset. I am receiving multiple errors when using ffmpeg to extract the frames of the Adobe240fps dataset.

Sample Output:

Output #0, image2, to './data/extracted/GOPR9641/%04d.jpg':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: mp41mp42isom
    encoder         : Lavf57.83.100
    Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 640x360 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 59.94 fps, 59.94 tbn, 59.94 tbc (default)
    Metadata:
      creation_time   : 2016-08-27T23:13:56.000000Z
      handler_name    : Core Media Video
      encoder         : Lavc57.107.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55f598cd6000] DTS 4248335 < 4272693 out of order
[image2 @ 0x55f598cd7800] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1062 >= 1062
[image2 @ 0x55f598cd7800] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1063 >= 1063
[image2 @ 0x55f598cd7800] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1064 >= 1064
[mjpeg @ 0x55f599c04000] Invalid pts (1066) <= last (1066)
[image2 @ 0x55f598cd7800] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1066 >= 1066
Video encoding failed
Conversion failed!
Error converting file:GOPR9641.mp4. Exiting.

To Reproduce

I am running the code below:
! python3 './data/create_dataset.py' --ffmpeg 'ffmpeg' --dataset 'adobe240fps' --videos_folder './unzipfile/original_high_fps_videos' --dataset_folder './data/'

I did edit the create_dataset.py. I still retained the --ffmpeg argument but it is already hardcoded into the os.system run so the line in the extract_frames function has been changed to:

retn = os.system('ffmpeg -i {} -vf scale={}:{} -vsync 0 -qscale:v 2 {}/%04d.jpg'.format( os.path.join(inDir, video), args.img_width, args.img_height, os.path.join(outDir, os.path.splitext(video)[0])))

Aside from that I changed the os.mkdir to os.makedirs although I don't think that should affect anything.

Additional context

I did try to run the code in a smaller and selected dataset.
IMG_0181.m4v, IMG_0180.m4v and IMG_0174.m4v and it seemed to extract the frames correctly and I was able to train the model. I am not sure if there is a compatibility issue with Colab's version of dependencies?

Also, I just completed running the script in Colab and I added a checker to see which videos were extracted and which failed.

Failed: ['GOPR9636.mp4', 'GOPR9640.mp4', 'GOPR9641.mp4', 'GOPR9643.mp4', 'GOPR9644.mp4', 'GOPR9645.mp4', 'GOPR9646.mp4', 'GOPR9648.mp4', 'GOPR9649.mp4', 'GOPR9652.mp4', 'GOPR9659.mp4', 'IMG_0003.mov', 'IMG_0004b.mov', 'IMG_0006.mov', 'IMG_0007.mov', 'IMG_0008.mov', 'IMG_0010.mov', 'IMG_0011.mov', 'IMG_0018.mov', 'IMG_0021.mov', 'IMG_0025.mov', 'IMG_0026.mov', 'IMG_0033.mov', 'IMG_0034.mov', 'IMG_0034a.mov', 'IMG_0036.mov', 'IMG_0037a.mov', 'IMG_0042.mov', 'IMG_0046.mov', 'IMG_0047.mov', 'IMG_0054a.mov', 'IMG_0054b.mov', 'IMG_0055.mov', 'IMG_0150.m4v', 'IMG_0151.m4v', 'IMG_0152.m4v', 'IMG_0153.m4v', 'IMG_0154.m4v', 'IMG_0155.m4v', 'IMG_0156.m4v', 'IMG_0157.m4v', 'IMG_0160.m4v', 'IMG_0161.m4v', 'IMG_0162.m4v', 'IMG_0163.m4v', 'IMG_0164.m4v', 'IMG_0167.m4v', 'IMG_0169.m4v', 'IMG_0170.m4v', 'IMG_0171.m4v', 'IMG_0172.m4v', 'IMG_0173.m4v', 'IMG_0174.m4v', 'IMG_0175.m4v', 'IMG_0176.m4v', 'IMG_0177.m4v', 'IMG_0178.m4v', 'IMG_0180.m4v'] 
Extracted:['720p_240fps_2.mov', '720p_240fps_3.mov', '720p_240fps_5.mov', '720p_240fps_6.mov', 'GOPR9633.mp4', 'GOPR9634.mp4', 'GOPR9637b.mp4', 'GOPR9638.mp4', 'GOPR9639.mp4', 'GOPR9642.mp4', 'GOPR9647.mp4', 'GOPR9650.mp4', 'GOPR9651.mp4', 'GOPR9653.mp4', 'GOPR9654a.mp4', 'GOPR9654b.mp4', 'GOPR9655a.mp4', 'GOPR9655b.mp4', 'GOPR9656.mp4', 'GOPR9657.mp4', 'GOPR9658.MP4', 'GOPR9660.MP4', 'IMG_0001.mov', 'IMG_0002.mov', 'IMG_0005.mov', 'IMG_0009.mov', 'IMG_0012.mov', 'IMG_0013.mov', 'IMG_0014.mov', 'IMG_0016.mov', 'IMG_0017.mov', 'IMG_0019.mov', 'IMG_0020.mov', 'IMG_0022.mov', 'IMG_0024.mov', 'IMG_0028.mov', 'IMG_0029.mov', 'IMG_0030.mov', 'IMG_0031.mov', 'IMG_0032.mov', 'IMG_0035.mov', 'IMG_0037.mov', 'IMG_0038.mov', 'IMG_0039.mov', 'IMG_0040.mov', 'IMG_0041.mov', 'IMG_0043.mov', 'IMG_0044.mov', 'IMG_0045.mov', 'IMG_0052.mov', 'IMG_0056.mov', 'IMG_0058.mov', 'IMG_0200.MOV', 'IMG_0212.MOV']

The output video stutters every few seconds?

The output video stutters every few seconds. When you slow the framerate in Pr or Final Cut Pro X, it's utterly obvious, do you have this issue? I have tried different resolutiosns and bitrates,your mode and my model, but the problem still persists.

Bad PSNR=27.7

My PSNR=27.7

If I download 300K(as the paper says) frames from high-quality 240fps videos from youtube, will the PSNR rise? And if I use 1080p or 4K videos for trianing, will the result be even better? It takes several days to train on Titan Xp.

cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Describe the bug
After starting the script, the Error "cuDNN error: CUDNN_STATUS_EXECUTION_FAILED" appears and the script stops.

To Reproduce

  1. Run the script

Expected behavior
It should generate a video :)

Interpolated results/error output
Full Output at the end.
Traceback (most recent call last): File "video_to_slomo.py", line 216, in <module> main() File "video_to_slomo.py", line 165, in main flowOut = flowComp(torch.cat((I0, I1), dim=1)) File "D:\Programme\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "D:\Programme\SuperSloMo\model.py", line 197, in forward x = F.leaky_relu(self.conv1(x), negative_slope = 0.1) File "D:\Programme\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "D:\Programme\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Desktop (please complete the following information):

  • OS: Windows 10 Pro 64-bit
  • Device Used: GPU (nvidia rtx 2070)
  • Setup Info: Python 3.7; pytorch 1.1.0; py3.7_cuda90_cudnn7_1; cudatoolkit 9.0

Additional context
(base) PS D:\Programme\SuperSloMo> python video_to_slomo.py --ffmpeg D:\Programme\ffmpeg\bin\ --video D:\in\C0093.mp4 --sf 4 --checkpoint D:\Programme\SuperSloMo\SuperSloMo.ckpt --fps 400 --output D:\out\400fps.mp4 --batch_size 1 D:\Programme\ffmpeg\bin\ffmpeg -i D:\in\C0093.mp4 -vsync 0 -qscale:v 2 tmpSuperSloMo\input/%06d.jpg ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 9.1.1 (GCC) 20190716 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth libavutil 56. 22.100 / 56. 22.100 libavcodec 58. 35.100 / 58. 35.100 libavformat 58. 20.100 / 58. 20.100 libavdevice 58. 5.100 / 58. 5.100 libavfilter 7. 40.101 / 7. 40.101 libswscale 5. 3.100 / 5. 3.100 libswresample 3. 3.100 / 3. 3.100 libpostproc 55. 3.100 / 55. 3.100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0000029f5c1c8d40] st: 0 edit list: 1 Missing key frame while searching for timestamp: 1000 [mov,mp4,m4a,3gp,3g2,mj2 @ 0000029f5c1c8d40] st: 0 edit list 1 Cannot find an index entry before timestamp: 1000. Guessed Channel Layout for Input Stream #0.1 : stereo Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'D:\in\C0093.mp4': Metadata: major_brand : XAVC minor_version : 16785407 compatible_brands: XAVCmp42iso2 creation_time : 2018-07-28T16:48:54.000000Z Duration: 00:00:02.88, start: 0.000000, bitrate: 116527 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709/bt709/iec61966-2-4), 1920x1080 [SAR 1:1 DAR 16:9], 102672 kb/s, 100 fps, 100 tbr, 100k tbn, 200 tbc (default) Metadata: creation_time : 2018-07-28T16:48:54.000000Z handler_name : Video Media Handler encoder : AVC Coding Stream #0:1(und): Audio: pcm_s16be (twos / 0x736F7774), 48000 Hz, stereo, s16, 1536 kb/s (default) Metadata: creation_time : 2018-07-28T16:48:54.000000Z handler_name : Sound Media Handler Stream #0:2(und): Data: none (rtmd / 0x646D7472), 819 kb/s (default) Metadata: creation_time : 2018-07-28T16:48:54.000000Z handler_name : Timed Metadata Media Handler timecode : 04:23:13:92 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help [swscaler @ 0000029f5c64d600] deprecated pixel format used, make sure you did set range correctly Output #0, image2, to 'tmpSuperSloMo\input/%06d.jpg': Metadata: major_brand : XAVC minor_version : 16785407 compatible_brands: XAVCmp42iso2 encoder : Lavf58.20.100 Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 100 fps, 100 tbn, 100 tbc (default) Metadata: creation_time : 2018-07-28T16:48:54.000000Z handler_name : Video Media Handler encoder : Lavc58.35.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 frame= 288 fps=175 q=2.0 Lsize=N/A time=00:00:02.88 bitrate=N/A speed=1.75x video:30649kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown 0%| | 0/287 [00:00<?, ?it/s] Traceback (most recent call last): File "video_to_slomo.py", line 216, in <module> main() File "video_to_slomo.py", line 165, in main flowOut = flowComp(torch.cat((I0, I1), dim=1)) File "D:\Programme\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "D:\Programme\SuperSloMo\model.py", line 197, in forward x = F.leaky_relu(self.conv1(x), negative_slope = 0.1) File "D:\Programme\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "D:\Programme\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

CPU Mode only using 2 threads on Mac

Describe the bug
When running the slo mo conversion on a MAC, I'm only able to get 2 threads to process the video using CPU mode (no CUDA). RAM sits around 12 gigabytes used for processing, but I have 32 available.

To Reproduce

  1. Running this script to convert: python video_to_slomo.py --ffmpeg /Users/user/slomo/ffmpeg/bin/ --video /Users/user/slomo/input/mountain_timelapse.mp4 --sf 4 --checkpoint /Users/user/slomo/superslomo/SuperSloMo.ckpt --fps 119.880 --output /Users/user/slomo/output/mountain120.mp4 --batch_size 2
  2. No matter what I change the batch_size to, it doesn't change the amount of CPU usage or RAM. I've tried changing it to 10 and nothing happens. If I run the same thing on Windows (no CUDA) a value of 2 eats up 32 gigs of ram and runs the processor on multiple cores.

Expected behavior
I would expect that as I increase the value of the batch_size on a Mac, that the CPU usage would go up and the process would complete faster.

Interpolated results/error output
(base) MacBook:superslomo user$ python video_to_slomo.py --ffmpeg /Users/user/slomo/ffmpeg/bin/ --video /Users/user/slomo/input/mountain_timelapse.mp4 --sf 4 --checkpoint /Users/user/slomo/superslomo/SuperSloMo.ckpt --fps 119.880 --output /Users/user/slomo/output/mountain120.mp4 --batch_size 2
/Users/user/slomo/ffmpeg/bin/ffmpeg -i /Users/user/slomo/input/mountain_timelapse.mp4 -vsync 0 -qscale:v 2 .tmpSuperSloMo/input/%06d.jpg
ffmpeg version 4.2 Copyright (c) 2000-2019 the FFmpeg developers
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-appkit --enable-avfoundation --enable-coreimage --enable-audiotoolbox
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/user/slomo/input/mountain_timelapse.mp4':
Metadata:
major_brand : avc1
minor_version : 538182144
compatible_brands: avc1isom
creation_time : 2019-07-21T19:08:34.000000Z
comment : DE=None, Type=Timelapse, HQ=Normal, Mode=P
Duration: 00:00:10.04, start: 0.000000, bitrate: 35507 kb/s
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 35334 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
creation_time : 2019-07-21T19:08:34.000000Z
handler_name : ?DJI.AVC
encoder : AVC encoder
Stream #0:1(eng): Data: none (priv / 0x76697270), 1690 kb/s
Metadata:
creation_time : 2019-07-21T19:08:34.000000Z
handler_name : ?DJI.Meta
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 0x7f7f58cb8000] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to '.tmpSuperSloMo/input/%06d.jpg':
Metadata:
major_brand : avc1
minor_version : 538182144
compatible_brands: avc1isom
comment : DE=None, Type=Timelapse, HQ=Normal, Mode=P
encoder : Lavf58.29.100
Stream #0:0(eng): Video: mjpeg, yuvj420p(pc), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc (default)
Metadata:
creation_time : 2019-07-21T19:08:34.000000Z
handler_name : ?DJI.AVC
encoder : Lavc58.54.100 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame= 301 fps=206 q=2.0 Lsize=N/A time=00:00:10.04 bitrate=N/A speed=6.89x
video:140142kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
0%| | 0/150 [00:00<?, ?it/s]^[/Users/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:2479: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/Users/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1350: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

Desktop (please complete the following information):

  • OS: MacOS Mojave 10.14.6
  • Using CPU
  • 2.2 GHz Intel Core i7 6 cores, 32 Gig RAM
  • Setup Info
  1. Installed Python 3.7 using Anaconda
  2. Using ffmpeg 4.2
  3. Using provided model
  4. Using stable Pytorch 1.2 for Mac, Conda, Python 3.7, CUDA None

Additional context
The video processed fine using CUDA on a Windows machine. Was just trying to get it to work on a Mac using CPU only.

ffmpeg problem

Hi, Thanks for your great project.

I got a problem, when handle the ffmpeg
I conda install ffmpeg

conda install -c https://conda.anaconda.org/menpo ffmpeg

the I run the script by --ffmpeg `which ffmpeg`

error: sh: 1: /home/xyliu/miniconda3/envs/DL/bin/ffmpeg/ffmpeg: not found .

Could you please tell me how to fix it , thank you~

cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Describe the bug
After running script, Error "cuDNN error: CUDNN_STATUS_EXECUTION_FAILED" appears.

To Reproduce
Steps to reproduce the behavior:
Run script

Expected behavior
Generate video with 120fps

Interpolated results/error output

(base) C:\Users\Amos\SloMo\SuperSloMo>python video_to_slomo.py --ffmpeg C:\Users\Amos\SloMo\ffmpeg\bin\ --video C:\Users\Amos\SloMo\Input\beachvideo.mp4 --sf 5 --checkpoint C:\Users\Amos\SloMo\SuperSloMo\SuperSloMo.ckpt --fps 120 --output C:\Users\Amos\SloMo\Output\beachvideo120.mp4 --batch_size 1
C:\Users\Amos\SloMo\ffmpeg\bin\ffmpeg -i C:\Users\Amos\SloMo\Input\beachvideo.mp4 -vsync 0 -qscale:v 2 tmpSuperSloMo\input/%06d.jpg
ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 9.1.1 (GCC) 20190716
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
libavutil 56. 22.100 / 56. 22.100
libavcodec 58. 35.100 / 58. 35.100
libavformat 58. 20.100 / 58. 20.100
libavdevice 58. 5.100 / 58. 5.100
libavfilter 7. 40.101 / 7. 40.101
libswscale 5. 3.100 / 5. 3.100
libswresample 3. 3.100 / 3. 3.100
libpostproc 55. 3.100 / 55. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'C:\Users\Amos\SloMo\Input\beachvideo.mp4':
Metadata:
major_brand : 3gp5
minor_version : 0
compatible_brands: 3gp5isom
creation_time : 2018-07-27T13:58:06.000000Z
location : +30.2159-085.8796/
location-eng : +30.2159-085.8796/
Duration: 00:00:15.70, start: 0.000000, bitrate: 35206 kb/s
Stream #0:0(und): Video: h264 (Baseline) (avc1 / 0x31637661), yuv420p(tv, smpte170m/bt470bg/smpte170m), 3840x2160, 35074 kb/s, 30 fps, 30 tbr, 90k tbn, 180k tbc (default)
Metadata:
creation_time : 2018-07-27T13:58:06.000000Z
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 124 kb/s (default)
Metadata:
creation_time : 2018-07-27T13:58:06.000000Z
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 0000016d53584240] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to 'tmpSuperSloMo\input/%06d.jpg':
Metadata:
major_brand : 3gp5
minor_version : 0
compatible_brands: 3gp5isom
location-eng : +30.2159-085.8796/
location : +30.2159-085.8796/
encoder : Lavf58.20.100
Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 3840x2160, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc (default)
Metadata:
creation_time : 2018-07-27T13:58:06.000000Z
encoder : Lavc58.35.100 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame= 471 fps= 35 q=2.0 Lsize=N/A time=00:00:15.70 bitrate=N/A speed=1.18x
video:268721kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
0%| | 0/470 [00:00<?, ?it/s]
Traceback (most recent call last):
File "video_to_slomo.py", line 216, in
main()
File "video_to_slomo.py", line 165, in main
flowOut = flowComp(torch.cat((I0, I1), dim=1))
File "C:\Users\Amos\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Amos\SloMo\SuperSloMo\model.py", line 197, in forward
x = F.leaky_relu(self.conv1(x), negative_slope = 0.1)
File "C:\Users\Amos\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Amos\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Desktop (please complete the following information):

  • OS: Windows
  • Device Used GPU
  • Setup Info [e.g. PyTorch 1.1, CUDA 9.0, Python 3.7

Additional context

Many visual artefacts

Hello, I've tried this interesting script on my mac. It seems to generate a slowmo without any issues (slowly, but no errors), however the end result is full of artefacts like so:

ezgif com-video-to-gif

I've tried with several videos and always got these blinking colors. Any ideas what I should try next? Thanks!

I cant't save the slomo video

Hi, thanks for your sharing.
When I test the video_to_slomo.py, I can't save the slomo video.And there is some limit for the size of the input video? My input video's size is 1280*720.The commandline show:
[swscaler @ 0x3b3a5e0] deprecated pixel format used, make sure you did set range correctly
Past duration 0.999992 too large
Last message repeated 1 times
Input stream #0:0 frame changed from size:1280x704 fmt:yuvj420p to size:1280x720 fmt:yuvj420p
Input stream #0:0 frame changed from size:1280x720 fmt:yuvj420p to size:1280x704 fmt:yuvj420p
[swscaler @ 0x39ded20] deprecated pixel format used, make sure you did set range correctly
Input stream #0:0 frame changed from size:1280x704 fmt:yuvj420p to size:1280x720 fmt:yuvj420p
Input stream #0:0 frame changed from size:1280x720 fmt:yuvj420p to size:1280x704 fmt:yuvj420p
[swscaler @ 0x12ea680] deprecated pixel format used, make sure you did set range correctly

Use condat to install torchvision(-cpu)

Hi,
In the readme, you say to use "pip install torchvision" but actually everything can be done with "conda":

For cpu:

conda install pytorch-cpu=0.4.1 torchvision-cpu -c pytorch

For Gpu:

conda install pytorch=0.4.1 torchvision -c pytorch

Nice work !

how to run train.ipynb?

Hi, Avinash Paliwal
I try to use 'Jupyter notebook' to open train.ipynb, but Jupyter notebook says "Unreadable Notebook: F:\Super-SloMo-master\train.ipynb NotJSONError('Notebook does not appear to be JSON: '{\n "nbformat": 4,\n "nbformat_minor"...',)"
Is this a bug or there is other method to run train.ipynb ?

Clarification of results provided

Hey, just wanted a few clarifications regarding the results provided in the Readme -

  1. what do the results superslomo_adope240 mean? are these the test results on the adobe240 dataset? or is the network trained on just adobe240 dataset?

  2. any idea why the results of superslomo are so much lower here compared to the paper ( 30 vs 33.14) , even though it is through the evaluation script provided by the authors?

Bad performance

First of all, Thanks for your great work!

when I run the video_to_slomo.py to slomo my costume video, The result is not good enough!
I cut a little fragment , as follow:
output

Freezing model possibility

Hi,
Thanks first for the nice work you made!

feature request:
I was planning to try the result of your work on tensorflow.js
For this I need to convert the model from tensorflow to tensorflow.js, which means that I need to freeze the model before that.
Apparently, from what I found out, freezing the model means that I need to have at least the following data:
--input_graph_def filename.pbtxt
--input_checkpoint filename.ckpt
--output_node_names
--output_graph filename.pb

see:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py

At the moment, there is only the --input_checkpoint filename.ckpt available.

Describe the solution you'd like
Is there a way you could upload the filename.pbtxt and output_node_names?

What do you think?
Thanks again!
Arno

Mulit-GPU Support

Is your feature request related to a problem? Please describe.
I'm training a very large dataset and I wish I can use some old GPU to speed up the process. Even a little bit.

Describe the solution you'd like
native Multi-GPU Support

Describe alternatives you've considered
I tried to implement myself but I'm totally lost as I have zero idea about deep learning and PyTorch

Additional context
N/A

bug: VGG16 without pretrained=True option?

In train.py line 111 vgg16 = torchvision.models.vgg16(), there may be a missing option: pretrained=True.
By default, pretrained=False.
Thus, perhaps perceptual loss does not work.

suggest to add tqdm()

in video_to_slomo.py

from tqdm import tqdm
for _, (frame0, frame1) in enumerate(tqdm(videoFramesloader), 0):

work well.

about the output result

Thanks for your work.
I tried my own video like:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/chenchen/PycharmProjects/Super-SloMo/0103_1.mov':
Metadata:
major_brand : qt
minor_version : 0
compatible_brands: qt
creation_time : 2019-01-03 03:05:52
Duration: 00:00:40.33, start: 0.000000, bitrate: 5425 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 90 kb/s (default)
Metadata:
creation_time : 2019-01-03 03:05:52
handler_name : Core Media Data Handler
Stream #0:1(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 960x540, 5328 kb/s, 29.98 fps, 29.97 tbr, 600 tbn, 1200 tbc (default)
Metadata:
rotate : 90
creation_time : 2019-01-03 03:05:52
handler_name : Core Media Data Handler
encoder : H.264
Side data:
displaymatrix: rotation of -90.00 degrees
Stream #0:2(und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-03 03:05:52
handler_name : Core Media Data Handler
Stream #0:3(und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-03 03:05:52
handler_name : Core Media Data Handler

and set fps = 240, sf = 8.
But the output video has only one frame(seems like the first frame of the original video) . I wonder if there's anything wrong about my configuration?

Is there anything wrong about the speed?

Describe the bug

To Reproduce
Steps to reproduce the behavior:
e.g.
just run the video_to_slomo
my ffmpeg is built from source
and my pytorch is 1.0 with GPU

Expected behavior
there is something like "[04:25<3:00:25, 37.33s/it]"
and runs really slowly

Additional context
I don't think the version of pytorch caused the speed problem.
So how fast you run this model?

a question about the code

Hi,sorry for disturbing
i just cant understand why the backWarp.forward() function in model.py writes like
x = self.gridX.unsqueeze(0).expand_as(u).float() + u
y = self.gridY.unsqueeze(0).expand_as(v).float() + v

but i think it should be :
x = self.gridX.unsqueeze(0).expand_as(u).float() - u
y = self.gridY.unsqueeze(0).expand_as(v).float() - v

with optical flow from I0 and I1: F_0_1. and frame I1, i think there should be a '-' but not a '+'
I'm just a beginner in ML, sorry for disturbing.

I'm just curious how to set weights for losses

As the authors reveal in their paper:

The weights have been set empirically using a validation set as λr = 0.8, λp = 0.005, λw = 0.4, and λs = 1.

I'm just curious to know how you set the weights for losses here, is there any trick?

loss = 204 * recnLoss + 102 * warpLoss + 0.005 * prcpLoss + loss_smooth

Why not use OpenCV instead of ffmpeg?

I want to ask why ffmpeg is being used instead of OpenCV's VideoCapture/VideoWriter etc? If there is no specific reason I can create a PR to implement OpenCV which will remove dependency from ffmpeg executable.

Where should skip connections go?

I see that your skip connections go into the second layer of each "up" block. According to the paper, it seems they should go into the first one. Not sure how important this is, but what's your take on this?

RuntimeError: CUDA error: out of memory

python video_to_slomo.py --ffmpeg D:\program_data\python\Super-SloMo\path\to\ffmp
eg\bin --video D:\program_data\python\Super-SloMo\path\to\123.mp4 --sf 3 --checkpoint D:\program_data\python\Super-SloMo
\path\to\checkpoint.ckpt --fps 72 --batch_size 1 --output D:\program_data\python\Super-SloMo\path\to\output.mp4

D:\program_data\python\Super-SloMo\path\to\ffmpeg\bin\ffmpeg -i D:\program_data\python\Super-SloMo\path\to\123.mp4 -vsync 0 -qscale:v 2 tmpSuperSloMo\input/%06d.jpg
ffmpeg version N-91520-gbce4da85e8 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7.3.1 (GCC) 20180722
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
  libavutil      56. 18.102 / 56. 18.102
  libavcodec     58. 21.106 / 58. 21.106
  libavformat    58. 17.101 / 58. 17.101
  libavdevice    58.  4.101 / 58.  4.101
  libavfilter     7. 26.100 /  7. 26.100
  libswscale      5.  2.100 /  5.  2.100
  libswresample   3.  2.100 /  3.  2.100
  libpostproc    55.  2.100 / 55.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'D:\program_data\python\Super-SloMo\path\to\123.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41isomM4A
    creation_time   : 2019-01-04T08:31:32.000000Z
    iTunSMPB        :  00000000 00000A40 000003AC 000000000003BA14 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    encoder         : Nero AAC codec / 1.5.4.0
  Duration: 00:00:05.16, start: 0.000000, bitrate: 3156 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv), 1920x1080, 3061 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      creation_time   : 2019-01-04T08:31:32.000000Z
      handler_name    : L-SMASH Video Media Handler
      encoder         : AVC Coding
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 132 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04T08:31:32.000000Z
      handler_name    : Sound Media Handler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 00000194fd361f80] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to 'tmpSuperSloMo\input/%06d.jpg':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41isomM4A
    iTunSMPB        :  00000000 00000A40 000003AC 000000000003BA14 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    encoder         : Lavf58.17.101
    Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 1920x1080, q=2-31, 200 kb/s, 23.98 fps, 23.98 tbn, 23.98 tbc (default)
    Metadata:
      creation_time   : 2019-01-04T08:31:32.000000Z
      handler_name    : L-SMASH Video Media Handler
      encoder         : Lavc58.21.106 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame=  122 fps=0.0 q=2.0 Lsize=N/A time=00:00:05.08 bitrate=N/A speed=6.92x
video:28266kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Traceback (most recent call last):
  File "video_to_slomo.py", line 209, in <module>
    main()
  File "video_to_slomo.py", line 158, in main
    flowOut = flowComp(torch.cat((I0, I1), dim=1))
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\program_data\python\Super-SloMo\model.py", line 199, in forward
    s2 = self.down1(s1)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\program_data\python\Super-SloMo\model.py", line 71, in forward
    x = F.leaky_relu(self.conv2(x), negative_slope = 0.1)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

my GPU is GTX 1060

No such file or directory: 'adobe240fps/test_list.txt'

(base) E:\Super-SloMo>python data\create_dataset.py --ffmpeg_dir D:\ffmpeg\bin --videos_folder data\adobe240fps --dataset_folder dataser --dataset adobe240fps
Traceback (most recent call last):
File "data\create_dataset.py", line 139, in
main()
File "data\create_dataset.py", line 96, in main
f = open("adobe240fps/test_list.txt", "r")
FileNotFoundError: [Errno 2] No such file or directory: 'adobe240fps/test_list.txt'

The pre-downloaded Adobe 240fps dataset has been decompressed to E:\Super-SloMo\data\adobe240fps

This information is always output when I execute it. The question is what does adobe240fps / test_list.txt mean in the end?

Any information about the inference speed of this alg

Hi,
Thanks for sharing this awlsome project!

My friends have evaluated the sepconv ,the result is good but the inference is slow even on a PC. I do not got an available GPU at hand now(shared with someone else ),so I'm not able to compare the two algs. Any information about the inference speed of this alg?
Or any Chance that this Super-SloMo could run on a ARM64 Phone

Thanks

scene change

When scene change occurs in a video, the inserted frames are messes.
How to solve this situation? @avinashpaliwal
Much thx!

How to reduce the memory use?

When I try to transfer a short mp4 video of 1920*1080, 5 secs, there will be cuda runtime error. If I run it on CPU, the memory is still not enough.

Convert .pytorch model to .ckpt?

I was wondering if there is a .ckpt version available of the "SepConv - L_F" model, or perhaps if there is a way to convert the .pytorch model to a .ckpt one?

valPSNR increase very slowly

Nice work thanks!

I have a question about convergence speed

I've run your training code and almost passed 50 epoch. The valPSNR is still under 15.5.

Is that normal?

also created clips seems to be not sequencial is that ok?

Audio in the output file?

Regarding specifically the use case where you increase framerate and don't slow down the video, I was wondering if an argument could be added to preserve the audio in the resulting video?

I'm actually finding this script does some really convincing work at increasing framerate and keeping things in real-time, but it strips the audio.

CUDA out of memory

after frequent use there was an error "CUDA out of memory" I tried to change BrenchSize but it didn't help, it could use a GPU cache cleaner

AttributeError: module 'torch.nn.functional' has no attribute 'interpolate'

Occurred an error during running video_to_slomo.py (pytorch 0.4.0 python3.6)

python video_to_slomo.py --video /path/to/video36.mp4 --sf 5 --checkpoint ../SuperSloMo.ckpt --fps 60 --output ./results/output.mp4

The error log:
ffmpeg -i path/to/video36.mp4 -vsync 0 -qscale:v 2 .tmpSuperSloMo/input/%06d.jpg
ffmpeg version 4.0.3-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc-6 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/media/ext/gaoliqing/video-caption/train-video/video36.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf56.13.100
Duration: 00:00:11.04, start: 0.000000, bitrate: 496 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 320x240, 359 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 0x649ec80] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to '.tmpSuperSloMo/input/%06d.jpg':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 320x240, q=2-31, 200 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc (default)
Metadata:
handler_name : VideoHandler
encoder : Lavc58.18.100 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame= 330 fps=0.0 q=2.0 Lsize=N/A time=00:00:11.01 bitrate=N/A speed=82.4x
video:4311kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
0%| | 0/329 [00:00<?, ?it/s]
Traceback (most recent call last):
File "video_to_slomo.py", line 216, in
main()
File "video_to_slomo.py", line 165, in main
flowOut = flowComp(torch.cat((I0, I1), dim=1))
File "/home/xxx/anaconda3/envs/pytorch0.4.0_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/code/Super-SloMo-master/model.py", line 204, in forward
x = self.up1(x, s5)
File "/home/xxx/anaconda3/envs/pytorch0.4.0_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/xxxx/code/Super-SloMo-master/model.py", line 130, in forward
x = F.interpolate(x, scale_factor=2, mode='bilinear')
AttributeError: module 'torch.nn.functional' has no attribute 'interpolate'

looking forward to your reply, thanks!

About preparing training data

Thank you for sharing your nice work!

I used create_dataset.py to build the training dataset as you mentioned. But I find that 12 frames in each clip is not continuous.

Maybe L.69 in create_dataset.py should be modified:
images = os.listdir(os.path.join(root, file)) -> images = sorted(os.listdir(os.path.join(root, file)))

Or I misunderstood, the frame in each clip should not be continuous?

CUDA out of memory when training a model to .ckpt?

So I'm trying to train my own custom model but am getting an out of memory error:

(base) C:\Users\deama\Desktop\Super-SloMo-master>python train.py --dataset_root
dataset --checkpoint_dir checkpoints
Dataset SuperSloMo
    Number of datapoints: 558
    Root Location: dataset/train
    Transforms (if any): Compose(
                             ToTensor()
                             Normalize(mean=[0.429, 0.431, 0.397], std=[1, 1, 1]
)
                         )
 Dataset SuperSloMo
    Number of datapoints: 5
    Root Location: dataset/validation
    Transforms (if any): Compose(
                             ToTensor()
                             Normalize(mean=[0.429, 0.431, 0.397], std=[1, 1, 1]
)
                         )

Epoch:  0
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py:2423: UserWarn
ing: Default upsampling behavior when mode=bilinear is changed to align_corners=
False since 0.4.0. Please specify align_corners=True if the old behavior is desi
red. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
  File "train.py", line 253, in <module>
    intrpOut = ArbTimeFlowIntrp(torch.cat((I0, I1, F_0_1, F_1_0, F_t_1, F_t_0, g
_I1_F_t_1, g_I0_F_t_0), dim=1))
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py",
line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\deama\Desktop\Super-SloMo-master\model.py", line 207, in forwar
d
    x  = self.up4(x, s2)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py",
line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\deama\Desktop\Super-SloMo-master\model.py", line 134, in forwar
d
    x = F.leaky_relu(self.conv2(torch.cat((x, skpCn), 1)), negative_slope = 0.1)

  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py",
line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", li
ne 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 45.38 MiB (GPU 0; 4.00 GiB t
otal capacity; 3.46 GiB already allocated; 118.97 MiB free; 47.47 MiB cached)

Is the only fix to get a GPU with more VRAM or use lower resolution training data? I have used the adobe240fps dataset provided.

Facing issue with tensorboard

I am facing issue with displaying data with tensordboard. I am able to connect to tensorboard but it is not showing any data.

On using --inspect the output I get is

(tensorflowJatin) jatin@pragati:~/Super-SloMo$ tensorboard --inspect --logdir log --port 6007
I0122 16:47:18.144290 MainThread program.py:165] Not bringing up TensorBoard, but inspecting event files.
I0122 16:47:18.144289 140234084505344 program.py:165] Not bringing up TensorBoard, but inspecting event files.
======================================================================
Processing event files... (this can take a few minutes)
======================================================================

Found event files in:
log

These tags are in log:
audio -
histograms -
images -
scalars -
tensor -
======================================================================

Event statistics for log:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
======================================================================

Thanks for your help

About the warpLoss computation.

Why are g_I0_F_t_0 and g_I1_F_t_1 used for calculating warpLoss instead of g_I0_F_t_0_f and g_I0_F_t_0_f? Are there any theoretical reasons for that? I know it's consistent with the paper, but in which not so much detail exists explaining it. Looking forward to your opinion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.