Giter Site home page Giter Site logo

opendvc's Introduction

Our latest works on learned video compression:

  • Hierarchical Learned Video Compression (HLVC) (CVPR 2020) [Paper] [Codes]

  • Recurrent Learned Video Compression (RLVC) (IEEE J-STSP 2021) [Paper] [Codes]

  • Perceptual Learned Video Compression (PLVC) (IJCAI 2022) [Paper] [Codes]

  • Advanced Learned Video Compression (ALVC) (IEEE T-CSVT 2022) [Paper] [Codes]

OpenDVC -- An open source implementation of the DVC Video Compression Method

An open source Tensorflow implementation of the paper:

Lu, Guo, et al. "DVC: An end-to-end deep video compression framework." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019.

The original DVC method is only optimized for PSNR. In our OpenDVC codes, we provide the PSNR-optimized re-implementation, denoted as OpenDVC (PSNR), and also the MS-SSIM-optimized model, denoted as OpenDVC (MS-SSIM).

If our open source codes are helpful for your research, especially if you compare with the MS-SSIM model of OpenDVC in your paper, please cite our technical report:

@article{yang2020opendvc,
  title={Open{DVC}: An Open Source Implementation of the {DVC} Video Compression Method},
  author={Yang, Ren and Van Gool, Luc and Timofte, Radu},
  journal={arXiv preprint arXiv:2006.15862},
  year={2020}
}

If you have any question or find any bug, please feel free to contact:

Ren Yang @ ETH Zurich, Switzerland

Email: [email protected]

Dependency

  • Tensorflow 1.12

  • Tensorflow-compression 1.0 (Download link)

    (After downloading, put the folder "tensorflow_compression" to the same directory as the codes.)

  • Pre-trained models (Download link)

    (Download the folder "OpenDVC_model" to the same directory as the codes.)

  • BPG (Download link) -- needed only for the PSNR models

    (In our PSNR model, we use BPG to compress I-frames instead of training learned image compression models.)

  • Context-adaptive image compression model, Lee et al., ICLR 2019 (Paper, Model) -- needed only for the MS-SSIM models

    (In our MS-SSIM model, we use Lee et al., ICLR 2019 to compress I-frames.)

Test codes

Preperation

We follow Lu et al., DVC to feed RGB images into the deep encoder. To compress a YUV video, please first convert to PNG images with the following command.

ffmpeg -pix_fmt yuv420p -s WidthxHeight -i Name.yuv -vframes Frame path_to_PNG/f%03d.png

Note that, OpenDVC currently only supports the frames with the height and width as the multiples of 16. The original DVC method requires the multiples of 64. Therefore, when using OpenDVC, please first crop frames, e.g.,

ffmpeg -pix_fmt yuv420p -s 1920x1080 -i Name.yuv -vframes Frame -filter:v "crop=1920:1072:0:0" path_to_PNG/f%03d.png

We uploaded a prepared sequence BasketballPass here as a test demo, which contains the PNG files of the first 100 frames.

Encoder for video

The augments in the OpenDVC encoder (OpenDVC_test_video.py) include:

--path, the path to PNG files;

--frame, the total frame number to compress;

--GOP, the GOP size, e.g., 10;

--mode, compress with the PSNR or MS-SSIM optimized model;

--metric, evaluate quality in terms of PSNR or MS-SSIM;

--python_path, the path to python (only used for the MS-SSIM models to run Lee et al., ICLR 2019 on I-frames);

--CA_model_path, the path to CA_EntropyModel_Test of Lee et al., ICLR 2019 (only used for the MS-SSIM models);

--l, lambda value. The pre-trained PSNR models are trained by 4 lambda values, i.e., 256, 512, 1024 and 2048, with increasing bit-rate/PSNR. The MS-SSIM models are trained with lambda values of 8, 16, 32 and 64, with increasing bit-rate/MS-SSIM;

--N, filter number in CNN (Do not change);

--M, channel number of latent representations (Do not change).

For example:

python OpenDVC_test_video.py --path BasketballPass --mode PSNR  --metric PSNR --l 1024
python OpenDVC_test_video.py --path BasketballPass --mode MS-SSIM  --metric MS-SSIM --python python --CA_model_path ./CA_EntropyModel_Test --l 32

The OpenDVC encoder generates the encoded bit-stream and compressed frames in two folders.

path = args.path + '/' # path to PNG
path_com = args.path + '_com_' + args.mode  + '_' + str(args.l) + '/' # path to compressed frames
path_bin = args.path + '_bin_' + args.mode  + '_' + str(args.l) + '/' # path to encoded bit-streams

Decoder for video

The corresponding video decoder is OpenDVC_test_video_decoder.py, with the following arguments:

--path_bin, the path to bitstreams;

--path_com, the path to save the decoded frames;

--frame, the total frame number to decode;

--GOP, the GOP size, e.g., 10;

--Height, the height of frames;

--Width, the width of frames; 

  (In practical scenerio, the GOP size and resolution information can be writted in the filehead during encoding)

--mode, compress with the PSNR or MS-SSIM optimized model;

--python_path, the path to python (only used for the MS-SSIM models to run Lee et al., ICLR 2019 on I-frames);

--CA_model_path, the path to CA_EntropyModel_Test of Lee et al., ICLR 2019 (only used for the MS-SSIM models);

--l, lambda value;

--N, filter number in CNN (Do not change);

--M, channel number of latent representations (Do not change).

For example:

python OpenDVC_test_video_decoder.py --path_bin BasketballPass_bin_PSNR_1024 --path_com BasketballPass_dec_PSNR_1024 --mode PSNR --l 1024 --Height 240 --Width 416 --GOP 10 --frame 100

Encoder for one frame

We also provide the encoder for compressing one frame (OpenDVC_test_P-frame.py), which can be used more flexibly. The arguments are as follows:

--ref, the path to the reference frame, e.g., ./ref.png. In (Open)DVC, it should be the previous compressed frame;

--raw, the path to the current raw frame which is to be compressed, e.g., ./raw.png;

--com, the path to save the compressed frame;

--bin, the path to save the bitstream;

--mode, compress with the PSNR or MS-SSIM optimized model;

--metric, evaluate quality in terms of PSNR or MS-SSIM;

--l, lambda value;

--N, filter number in CNN (Do not change);

--M, channel number of latent representations (Do not change).

For example:

python OpenDVC_test_P-frame.py --ref BasketballPass_com/f001.png --raw BasketballPass/f002.png --com BasketballPass_com/f002.png --bin BasketballPass_bin/002.bin --mode PSNR  --metric PSNR --l 1024

Decoder for one frame

The corresponding decoder for one frame is OpenDVC_test_P-frame_decoder.py, whose auguments are the same the encoder, excluding "--raw" and "--metric".

For example:

python OpenDVC_test_P-frame_decoder.py --ref BasketballPass_com/f001.png --bin BasketballPass_bin/002.bin --com BasketballPass_com/f002.png  --mode PSNR --l 1024

Training your own models

Preperation

  • Download the training data. We train the models on the Vimeo90k dataset (Download link) (82G). After downloading, please run the following codes to generate "folder.npy" which contains the directories of all training samples.
def find(pattern, path):
    result = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result.append(root)
    return result

folder = find('im1.png', 'path_to_vimeo90k/vimeo_septuplet/sequences/')
np.save('folder.npy', folder)
  • Compress I-frames. In OpenDVC (PSNR), we compress I-frames (im1.png) by BPG 444 at QP = 22, 27, 32 and 37 for the models of lambda = 2048, 1024, 512 and 256, respectively. In OpenDVC (MS-SSIM), we compress I-frames by Lee et al., ICLR 2019 at quality level = 2, 3, 5 and 7 for the models of lambda = 8, 16, 32 and 64. The Vimeo90k dataset has ~90k 7-frame clips, we need to compress "im1.png" in each clip as I-frame. For example:
bpgenc -f 444 -m 9 im1.png -o im1_QP27.bpg -q 27
bpgdec im1_QP27.bpg -o im1_bpg444_QP27.png        
python path_to_CA_model/encode.py --model_type 1 --input_path im1.png --compressed_file_path im1_level5.bin --quality_level 5
python path_to_CA_model/decode.py --compressed_file_path im1_level5.bin --recon_path im1_level5_ssim.png      
  • Download the pre-trained models of optical flow. Download the folder "motion_flow" (Download link) to the same directory as the codes.

Training the PSNR models

Run OpenDVC_train_PSNR.py to train the PSNR models, e.g.,

python OpenDVC_train_PSNR.py --l 1024

Training the MS-SSIM models

We fine-tune the MS-SSIM models of lambda = 8, 16, 32 and 64 from the PSNR models of lambda = 256, 512, 1024 and 2048, respectively. For instance,

python OpenDVC_train_MS-SSIM.py --l 32

Performance

As shown in the figures below, our OpenDVC (PSNR) model achieves comparable PSNR performance with the reported results in Lu et al., DVC (PSNR optimized), and our OpenDVC (MS-SSIM) model obviously outperforms DVC in terms of MS-SSIM.

opendvc's People

Contributors

renyang-home avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendvc's Issues

Unable to reproduce the reported bit rate results comparing to H. 265

Hi,

First of all, thank you very much for taking the time to provide an open-source implementation for this paper. I had a question regarding deploying this method. I'm using PSNR 1024 model to compress random videos (I had tried samples of different resolution or fps). Then I compress the same video using H. 265 with CRF=28. However, for all of them, the compressed results of the OpenDVC method give me a size of about two times that of achieved with H. 265 ( Although it works for the sample basketball pass video provided). Is there anything I'm missing?

about motion_flow

Hello, thanks for ur open resource of the OpenDVC. I'm a beginner in end-to-end video compression, may I ask a simple question?
the pre-trained models of optical flow is trained by youself or using spynet's weights?

Something wrong in Readme.md

bpgenc -f 444 -m 9 im1.png -o im1_QP27.bpg -q 27
bpgdec im1_QP27.bpg -o im1_bpg444_QP27.bpg

shoule be

bpgenc -f 444 -m 9 im1.png -o im1_QP27.bpg -q 27
bpgdec im1_QP27.bpg -o im1_bpg444_QP27.png

python OpenDVC_test_video.py --path BasketballPass --model PSNR --metric PSNR --l 1024
python OpenDVC_test_video.py --path BasketballPass --model MS-SSIM --metric MS-SSIM --python python --CA_model_path ./CA_EntropyModel_Test --l 32

shoule be

python OpenDVC_test_video.py --path BasketballPass --mode PSNR --metric PSNR --l 1024
python OpenDVC_test_video.py --path BasketballPass --mode MS-SSIM --metric MS-SSIM --python python --CA_model_path ./CA_EntropyModel_Test --l 32

Optical flow values get too large during training

I changed the code to train the model with only end-to-end loss and from scratch (without the pre-trained optical flow network), and the optical flow values got very large as training progresses. I wonder if you have faced the same issue and how you could solve it.

Of course, one workaround is to use multi-loss training. However, since the original DVC paper has done everything end-to-end, I think that's a problem with my code.

Thanks!

Amin

--
PS. I changed the learning rate, and the problem still arises, but in later iterations.

I frame coding

Hi Yang, Thanks for sharing your source code. I have a question about the I frame coding. From your code, I see that you use BPG as the I-frame coding. Actually, the coding performance of BPG is much higher than that of I frame coding in x265 (very fast). The quality of I frame also has a large influence on the following P frame coding. So I think that the comparison between OpenDVC and x265 (shown in Fig. 3 in your technical report) may not be fair. If I have a wrong understanding, please tell me.

motion net weights

Can you share weights of motion net weights
model.ckpt-200000
Thank you vary much!

InternalError:cudnn poolforward launch failed

i have installed tf-gpu 12.0, cuda9, tf-compression 10.0 , and when used the code below for test:
python OpenDVC_test_video.py --path BasketballPass --model PSNR --metric PSNR --l 1024

the information is :
5074244104368395475

"cudnn poolforward launch failed"

does anyone has good idea to solve it?

Which validation dataset is used for the pre-trained models?

I wonder whether you have used the training/validation split that is included in the Vimeo-90k dataset or used the whole dataset to train the published pre-trained models?

Since the find(pattern, path) script defined here results in folders.npy containing all sequences in Vimeo-90k, it seems training is done on the whole dataset. If this is true, then what dataset is used for validation?

AttributeError: 'EntropyBottleneck'

getting this error whenever i tried to run the psnr.py and other files

AttributeError: 'EntropyBottleneck' object has no attribute '_assert_input_compatibility'
included the tensorflow_compression library , i am using python==3.7 and tensorflow==1.13.0 because tensorflow==1.12.0 is not downloading for me,
except tensorflow libraries i installed all other libraries which are there in requirements.txt with exact versions

about Motion Compensation network?

Thank you for your code!

I wonder the effect of motion compensation network. When we get Y1_warp with optical flow, we could skip motion compensation network and get res = Y_raw - Y_warp. Is it OK?

question about "Training your own models"

when i train my models, some problems happen:
Operation received an exception: status: 5,message:could not creaate aview primitive descriptor,in file tensorflow/core/kernels/mkl_slice_op.cc:300
i donot know how to solve it,thank you for help.

why not create a requirements.txt

Hi @RenYang-home , nice work!!!
I am a beginner of video compression, thank you very much for your open source, but there was also some problems when i want to create env for OpenDVC

Problems recur

I create a new env for OpenDVC:

$ conda create -n compression python=3.8

Install tensorflow

$ pip install tensorflow-gpu==1.12
Looking in indexes: https://pypi.doubanio.com/simple
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.12
ERROR: No matching distribution found for tensorflow-gpu==1.12

Advice

why not create a requirements.txt

How to create it

first install pigar

$ pip install pigar

second use it to create requirements.txt

# execute in the root directory
$ pigar

A small mistake from README in 'Decoder for one frame'

Hi @RenYang-home , there was a mistake when decode for one frame from README

python OpenDVC_test_P-frame_decoder.py --ref BasketballPass_com/f001.png --bin BasketballPass_bin/002.bin --com BasketballPass_com/f002.png  --model PSNR --l 1024

there is no --model in command line parsing, it should be --mode

parser = argparse.ArgumentParser(
      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("--ref", default='ref.png')
# parser.add_argument("--raw", default='raw.png')
parser.add_argument("--com", default='dec.png')
parser.add_argument("--bin", default='bitstream.bin')
parser.add_argument("--mode", default='PSNR', choices=['PSNR', 'MS-SSIM'])             --------> this
# parser.add_argument("--metric", default='PSNR', choices=['PSNR', 'MS-SSIM'])
parser.add_argument("--l", type=int, default=1024, choices=[8, 16, 32, 64, 256, 512, 1024, 2048])
parser.add_argument("--N", type=int, default=128, choices=[128])
parser.add_argument("--M", type=int, default=128, choices=[128])

from OpenDVC_test_P-frame_decoder.py, The complete code is as follows

python OpenDVC_test_P-frame_decoder.py --ref BasketballPass_com/f001.png --bin BasketballPass_bin/002.bin --com BasketballPass_com/f002.png  --mode PSNR --l 1024

Have Questions in Installing tensorflow-compression

Using pip install compression-1.0.tar.gz

Processing ./compression-1.0.tar.gz ERROR: Command errored out with exit status 1: command: /home/user058/software/anaconda3/envs/CYJN/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-iq30nor1/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-iq30nor1/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-tw2692z4 cwd: /tmp/pip-req-build-iq30nor1/ Complete output (5 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/user058/software/anaconda3/envs/CYJN/lib/python3.6/tokenize.py", line 452, in open buffer = _builtin_open(filename, 'rb') FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-req-build-iq30nor1/setup.py' ---------------------------------------- WARNING: Discarding file:///home/user058/CYJ/Open/OpenDVC-master/compression-1.0.tar.gz. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
could you help me see what ‘s wrong about it? Thank you very much!
In addition, I wanna know your python version. Hoping you can tell me. Thank you .

A liitle suggestions for README

Hello, first thanks for ur open resource of the OpenDVC. My master project benifits a lot from ur contributions. But there is a tiny error in the README.md and hope u could modify it.
The examples in the README.md
for examples:

python OpenDVC_test_video_decoder.py --path_bin BasketballPass_bin_PSNR_1024 --path_com BasketballPass_dec_PSNR_1024 --model PSNR --l 1024 --Height 240 --Width 416 --GOP 10 --frame 100

should be modified as

python OpenDVC_test_video_decoder.py --path_bin BasketballPass_bin_PSNR_1024 --path_com BasketballPass_dec_PSNR_1024 --mode PSNR --l 1024 --Height 240 --Width 416 --GOP 10 --frame 100

because --model can't match the arguments.

besides, I hope u could update the Dependency because Tensorflow 1.12 may be too old for most of users to get access to.

Thanks, RenYang.

Different lambda values

Hi!

I have a question regarding the pre-trained models: did you train a model from scratch for each lambda value?

Many thanks,

Amin

Issue when running training code

Firstly, I'd like to that you for sharing this opensource version of DVC!

I was able to run through the inference on the BasketBall pass dataset, however, I ran into some issues while trying to train the model using the train instructions given

python OpenDVC_train_PSNR.py --l 1024

I get the following error -

2021-02-14 07:13:48.321268: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-02-14 07:13:48.337753: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-02-14 07:14:06.276314: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at mkl_slice_op.cc:303 : Aborted: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300

I am currently using TF 1.12 and other libraries as specified in the run instructions and have installed a compatible version of MKL and MKLDNN.

Would you have any insights on what might be causing this issue? Any help would be really appreciated!

Training Iterations

How did you decide on a good time to terminate the training? Did you try different checkpoints, or did you use some kind of evaluation on a different dataset?

Thanks!

About BPG tool

Following the installation instructions for bpg-0.9.8, ’make‘ or ‘make install‘ on linux does not seem to work, and in addition, I would like to know how to execute the bpgenc command line on windows.

Performance comparation

Sorry to bother you again, I would like to know if you could share the source code of performance comparation with H.264 and H.265.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.