vchitect / vbench Goto Github PK

View Code? Open in Web Editor NEW

294.0 10.0 6.0 18.65 MB

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Home Page: https://vchitect.github.io/VBench-project/

License: Apache License 2.0

Python 98.03% Shell 1.97%

aigc evaluation-kit gen-ai stable-diffusion text-to-video video-generation benchmark dataset

vbench's Introduction

This repository contains the implementation of the following paper:

VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang^∗, Yinan He^∗, Jiashuo Yu^∗, Fan Zhang^∗, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin⁺, Yu Qiao⁺, Ziwei Liu⁺
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

🔥 Updates

[04/2024] We release all the videos we sampled and used for VBench evaluation. See details here.
[03/2024] 🔥 VBench-Reliability 🔥 We now support evaluating the reliability (e.g., culture, fairness, bias, safety) of video generative models.
[03/2024] 🔥 VBench-I2V 🔥 We now support evaluating Image-to-Video (I2V) models. We also provide Image Suite.
[03/2024] We support evaluating customized videos! See here for instructions.
[01/2024] PyPI pacakge is released! . Simply pip install vbench.
[12/2023] 🔥🔥 VBench 🔥🔥 Evaluation code released for 16 Text-to-Video (T2V) evaluation dimensions.
- ['subject_consistency', 'background_consistency', 'temporal_flickering', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality', 'object_class', 'multiple_objects', 'human_action', 'color', 'spatial_relationship', 'scene', 'temporal_style', 'appearance_style', 'overall_consistency']
[11/2023] Prompt Suites released. (See prompt lists here)

📣 Overview

We propose VBench, a comprehensive benchmark suite for video generative models. We design a comprehensive and hierarchical Evaluation Dimension Suite to decompose "video generation quality" into multiple well-defined dimensions to facilitate fine-grained and objective evaluation. For each dimension and each content category, we carefully design a Prompt Suite as test cases, and sample Generated Videos from a set of video generation models. For each evaluation dimension, we specifically design an Evaluation Method Suite, which uses carefully crafted method or designated pipeline for automatic objective evaluation. We also conduct Human Preference Annotation for the generated videos for each dimension, and show that VBench evaluation results are well aligned with human perceptions. VBench can provide valuable insights from multiple perspectives.

🎓 Evaluation Results

We visualize VBench evaluation results of various publicly available video generation models, as well as Gen-2 and Pika, across 16 VBench dimensions. We normalize the results per dimension for clearer comparisons. (See numeric values at our Leaderboard)

🔨 Installation

Install with pip

pip install vbench

To evaluate some video generation ability aspects, you need to install detectron2 via:

pip install detectron2@git+https://github.com/facebookresearch/detectron2.git

If there is an error during detectron2 installation, see here.

Download VBench_full_info.json to your running directory to read the benchmark prompt suites.

Install with git clone

git clone https://github.com/Vchitect/VBench.git
pip install -r VBench/requirements.txt
pip install VBench

If there is an error during detectron2 installation, see here.

Usage

Use VBench to evaluate videos, and video generative models.

A Side Note: VBench is designed for evaluating different models on a standard benchmark. Therefore, by default, we enforce evaluation on the standard VBench prompt lists to ensure fair comparisons among different video generation models. That's also why we give warnings when a required video is not found. This is done via defining the set of prompts in VBench_full_info.json. However, we understand that many users would like to use VBench to evaluate their own videos, or videos generated from prompts that does not belong to the VBench Prompt Suite, so we also added the function of Evaluating Your Own Videos. Simply turn the custom_input flag on, and you can evaluate your own videos.

[New] Evaluate Your Own Videos

We support evaluating any video. Simply provide the path to the video file, or the path to the folder that contains your videos. There is no requirement on the videos' names.

Note: We support customized videos / prompts for the following dimensions: 'subject_consistency', 'background_consistency', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality'

To evaluate videos with customed input prompt, run our script with the custom_input flag on:

python evaluate.py \
    --dimension $DIMENSION \
    --videos_path /path/to/folder_or_video/ \
    --custom_input

alternatively you can use our command:

vbench evaluate \
    --dimension $DIMENSION \
    --videos_path /path/to/folder_or_video/ \
    --custom_input

Evaluation on the Standard Prompt Suite of VBench

command line

    vbench evaluate --videos_path $VIDEO_PATH --dimension $DIMENSION

For example:

    vbench evaluate --videos_path "sampled_videos/lavie/human_action" --dimension "human_action"

python

    from vbench import VBench
    my_VBench = VBench(device, <path/to/VBench_full_info.json>, <path/to/save/dir>)
    my_VBench.evaluate(
        videos_path = <video_path>,
        name = <name>,
        dimension_list = [<dimension>, <dimension>, ...],
    )

For example:

    from vbench import VBench
    my_VBench = VBench(device, "vbench/VBench_full_info.json", "evaluation_results")
    my_VBench.evaluate(
        videos_path = "sampled_videos/lavie/human_action",
        name = "lavie_human_action",
        dimension_list = ["human_action"],
    )

Example of Evaluating VideoCrafter-1.0

We have provided scripts to download VideoCrafter-1.0 samples, and the corresponding evaluation scripts.

# download sampled videos
sh scripts/download_videocrafter1.sh

# evaluate VideoCrafter-1.0
sh scripts/evaluate_videocrafter1.sh

💎 Pre-Trained Models

[Optional] Please download the pre-trained weights according to the guidance in the model_path.txt file for each model in the pretrained folder to ~/.cache/vbench.

📑 Prompt Suite

We provide prompt lists are at prompts/.

Check out details of prompt suites, and instructions for how to sample videos for evaluation.

📑 Sampled Videos

To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for VBench evaluation. You can download them on Google Drive.

See detailed explanations of the sampled videos here.

We also provide detailed setting for the models under evaluation here.

🏄 Evaluation Method Suite

To perform evaluation on one dimension, run this:

python evaluate.py --videos_path $VIDEOS_PATH --dimension $DIMENSION

The complete list of dimensions:

['subject_consistency', 'background_consistency', 'temporal_flickering', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality', 'object_class', 'multiple_objects', 'human_action', 'color', 'spatial_relationship', 'scene', 'temporal_style', 'appearance_style', 'overall_consistency']

Alternatively, you can evaluate multiple models and multiple dimensions using this script:

bash evaluate.sh

The default sampled video paths:

vbench_videos/{model}/{dimension}/{prompt}-{index}.mp4/gif

To filter static videos in the temporal flickering dimension, run this:

python static_filter.py --videos_path $VIDEOS_PATH

✒️ Citation

If you find our repo useful for your research, please consider citing our paper:

 @InProceedings{huang2023vbench,
     title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},
     author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
     booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
     year={2024}
 }

♥️ Acknowledgement

VBench Contributors

Order is based on the time joining the project:

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Nattapol Chanpaisit, Xiaojie Xu.

Open-Sourced Repositories

This project wouldn't be possible without the following open-sourced repositories: AMT, UMT, RAM, CLIP, RAFT, GRiT, IQA-PyTorch, ViCLIP, and LAION Aesthetic Predictor.

vbench's People

Contributors

Stargazers

Watchers

Forkers

georgegu jidedaka ybublil12 huifengkang tqlwodege mooneese

vbench's Issues

Evaluation issue:

I wrote a script for evaluation,but I encoutered a issue then:
1、script:
from vbench import VBench
import torch
device = torch.device("cuda")
my_VBench = VBench(device, "VBench_full_info.json", "evaluation_results")
my_VBench.evaluate(
videos_path = "./videocrafter/spatial_relationship", #there are several videos in this directory
name = "spatial_relationship",
dimension_list = ["spatial_relationship"],
)
2、issue:
RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)

3、evaluate results:

Evaluation data issues

Hello, Is there an unified configs for the evaluation data (mp4 files), such as the fps, duration, resolution, and etc.

And does the different settings (such as duration and resolution) of mp4 files have an influence on the final evaluation results using VBench? I'm not sure about this.

generating videos from images

Hello, author. After reading your paper, I have some doubts about certain details. May I directly input a video clip (without any prompts) for evaluation? Is your method applicable to generating videos from images? Looking forward to your response.

What does the result means?

{ "imaging_quality": [ 0.6686933368155237, [ { "video_path": "/data/*.mp4", "video_results": 56.49358892440796 },

This is part of the result. I can see that video_results is the imaging_quality score, but what does '0.6686933368155237' means?

FileNotFoundError: [Errno 2] No such file or directory: './VBench_full_info.json'

when running !vbench evaluate --videos_path "..." --dimension "...." encounter file NOT FOUND
and the link you put in the repo for VBench_full_info.json will turn to 404

pretrained/umt_model

Is there a model.txt file missing in the umt_model folder? Do I need to download it?

Some problems

Hi,

I install VBench with pip, but I encountered some problems.

I encountered the following error when evaluating motion_smoothness:

At line 61 in vbench/dynamic_degree.py, line 110 in vbench/motion_smoothness.py and line 174 in vbench/utils.py, VBench only supports the 'mp4' format here.

About the size and number of frames

Hi,

Dose Vbench have limitation on the video size and the number of frames for generated videos? What is the setting of the video size and the number of frames in your evaluation shown in Table 1 in your paper?

Where can I find open source video data?

Thanks for your great work, I want to know where can I find those generated videos mentioned in the paper?

provide the pretrained weights for reproducing the results in leaderboard

Hello, can you provide the model weights and project for reproducing the results in leaderboard?
Thank you so much.

Cannot find and process videos under provided directory

I run the following
vbench evaluate --videos_path ./demo/videos --dimension temporal_flickering

and get the following

args: Namespace(func=<function evaluate at 0x7f3b34fd6950>, output_path='./evaluation_results/', full_json_dir='./VBench_full_info.json', videos_path='./demo/videos', dimension='temporal_flickering', load_ckpt_from_local=None, read_frame=None)
start evaluation
Evaluation meta data saved to ./evaluation_results/temporal_flickering_full_info.json
0it [00:00, ?it/s]

It seems it cannot find the videos under the directory

conda environment issue

ERROR: Could not find a version that satisfies the requirement detectron2==0.6 (from versions: none)
ERROR: No matching distribution found for detectron2==0.6
can't pip install detectron2==0.6

About sampled videos

Hi, great work!

Will the sampled videos from different video generation models be released? Especially the videos generated by Gen-2 and Pika.

Thanks!

how to use detectron2(which require torch<=1.10) and the torch>=1.12 in requirement

as the title

single dimension running problem

Hi, the path "prompts/prompts_per_dimension" contains only a part of evaluation dimensions. So, if i want to evaluation the dimension of "background_consistent", and there is no proper prompt text file (maybe background_consistent.txt) for me to generation videos. So, what can i do to perform evaluation on "background_consistent"?

Why is the SVD not included?

Hi, thank you for building this benchmark. I wonder why SVD is not evaluated?

FileNotFoundError: [Errno 2] No such file or directory: './VBench_full_info.json'

When using vbench installed with pip install, the error popped up.

how to calculate the total score of all metircs

thank you very much!

Evaluation of True Videos

How to run inference for a new video?

Hello, i'm trying to run inference. I've installed the package. I've tried the code here.

I have a new directory of videos, called "videos".

(venv) (base) yonatan:~/VBench$ ls videos/
Iron_Man.mp4  birthday.mp4  lavie_human_action_full_info.json  skateboarding_dog.mp4

When I try to run

vbench evaluate --videos_path "videos" --dimension "human_action"

I receive an error indicating that no data is found.
When I try through the code, I also get a long list of missing videos:

>>> from vbench import VBench
>>> my_VBench = VBench('cuda', 'VBench_full_info.json', 'videos')
>>> my_VBench.evaluate(videos_path='videos', name='my_test', dimension_list=['human_action'])
WARNING!!! This required video is not found! Missing benchmark videos can lead to unfair evaluation result. The missing video is: A person is riding a bike-0.mp4
...
...

I guess that your code only supports your existing evaluated videos? And the instructions do not yet support inference on new videos?

The question is how can I run the evaluation for a new video + description. Thank you very much 🙏

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.