As we know, RIFE doesn't like patterns.
I'm using high scales, even in UHD, to fight these distorted / disordered patterns that you would get in some interpolation frames.
But I can't get the trt optimization pass to run successfully with scale=4 and ensemble=True in UHD.
Will succeed:
RIFE(clip, model='4.4', scale=4, num_streams=1, sc=True, sc_threshold=0.12, ensemble=False, trt=True)
Will fail:
RIFE(clip, model='4.4', scale=4, num_streams=1, sc=True, sc_threshold=0.12, ensemble=True, trt=True)
I can only guess that the GPU memory is running out?
On my old Nvidia 1070 (8GB) trt optimization would already fail in UHD with scale=1.
Now with my Nvidia 4090 (24GB) trt optimization fails with scale=4 and ensemble=True. (for all RIFE models)
The script...
import os, sys
import vapoursynth as vs
core = vs.core
sys.path.append(r"P:\_Apps\StaxRip\StaxRip 2.29.0-x64 (VapourSynth)\Apps\Plugins\VS\Scripts")
core.std.LoadPlugin(r"P:\_Apps\StaxRip\StaxRip 2.29.0-x64 (VapourSynth)\Apps\Plugins\Dual\L-SMASH-Works\LSMASHSource.dll", altsearchpath=True)
clip = core.lsmas.LibavSMASHSource(r"T:\Test\tst.mp4")
from vsrife import RIFE
import torch
os.environ["CUDA_MODULE_LOADING"] = "LAZY"
clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")
# RIFE calls with different workspace sizes that didn't help
#clip = RIFE(clip, model='4.4', scale=4, num_streams=1, sc=True, sc_threshold=0.12, ensemble=True, trt=True, trt_max_workspace_size=536870912)
#clip = RIFE(clip, model='4.4', scale=4, num_streams=1, sc=True, sc_threshold=0.12, ensemble=True, trt=True, trt_max_workspace_size=4294967296)
clip = RIFE(clip, model='4.4', scale=4, num_streams=1, sc=True, sc_threshold=0.12, ensemble=True, trt=True)
clip = vs.core.resize.Bicubic(clip, format=vs.YUV420P8, matrix_s="709")
clip.set_output()
...will produce lots of these warnings while processing...
...
[10/01/2023-11:53:31] [TRT] [E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)
[10/01/2023-11:53:31] [TRT] [E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)
[10/01/2023-11:53:31] [TRT] [W] Requested amount of GPU memory (8589934592 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
...
...and eventually end with the error output:
Python exception:
Traceback (most recent call last):
File "src\cython\vapoursynth.pyx", line 2866, in vapoursynth._vpy_evaluate
File "src\cython\vapoursynth.pyx", line 2867, in vapoursynth._vpy_evaluate
File "T:\Test\tst.vpy", line 12, in <module>
clip = RIFE(clip, model='4.4', scale=4, num_streams=1, sc=True, sc_threshold=0.12, ensemble=True, trt=True)
File "D:\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Python\Python310\lib\site-packages\vsrife\__init__.py", line 219, in RIFE
flownet = lowerer(
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\lower.py", line 316, in __call__
return do_lower(module, inputs)
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\passes\pass_utils.py", line 118, in pass_with_validation
processed_module = pass_(module, input, *args, **kwargs)
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\lower.py", line 313, in do_lower
lower_result = pm(module)
File "D:\Python\Python310\lib\site-packages\torch\fx\passes\pass_manager.py", line 238, in __call__
out = _pass(out)
File "D:\Python\Python310\lib\site-packages\torch\fx\passes\pass_manager.py", line 238, in __call__
out = _pass(out)
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\passes\lower_pass_manager_builder.py", line 202, in lower_func
lowered_module = self._lower_func(
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\lower.py", line 178, in lower_pass
interp_res: TRTInterpreterResult = interpreter(mod, input, module_name)
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\lower.py", line 130, in __call__
interp_result: TRTInterpreterResult = interpreter.run(
File "D:\Python\Python310\lib\site-packages\torch_tensorrt\fx\fx2trt.py", line 252, in run
assert engine
AssertionError
Tried with 0.25/0.5x/2x/4x the workspace size but didn't help.