Giter Site home page Giter Site logo

aesbench's Introduction

AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception

How do Multimodal LLMs perform on Image Aesthetics Perception?

1*Yipo Huang, 1*Quan Yuan,1Xiangfei Sheng, 1Zhichao Yang,2Haoning Wu
1Pengfei Chen, 3Yuzhe Yang,1#Leida Li, 2Weisi Lin
1Xidian University, 2Nanyang Technological University, 3OPPO Research Institute
*Equal contribution. #Corresponding author.
If you like this research, please give us a star โญ on GitHub.

We construct a high-quality Expert-labeled Aesthetic Perception Database (EAPD), based on which we further build the golden benchmark to evaluate four abilities of MLLMs on image aesthetics perception, including Aesthetic Perception (AesP), Aesthetic Empathy (AesE), Aesthetic Assessment (AesA) and Aesthetic Interpretation (AesI).

News

  • [2024/01/20] ๐ŸŽ‰ Congrats to SPHINX-MoE for achieving new SOTAs on AesP and AesE!!
  • [2024/01/18] ๐Ÿค— Database of AesBench now support Huggingface!
  • [2024/01/17] ๐Ÿšฉ We have released the Evaluation Database and Codes of AesBench! Check Here for more details.

GPT-4V and Gemini Pro Vision!

Here is the comparison of GPT-4V, Gemini Pro Vision, and other OA MLLMs on AesP.

Rank MLLM Tec. Qua. Col. Lig. Comp. Content NIs AIs AGIs Yes-No What How Why Overall
๐Ÿฅ‡ SPHINX-MoE 66.67% 76.31% 72.68% 66.31% 75.84% 72.19% 68.88% 69.12% 62.18% 80.38% 88.05% 72.93%
๐Ÿฅˆ Q-Instruct 66.03% 74.48% 73.68% 68.09% 76.48% 69.70% 69.28% 64.68% 63.31% 85.28% 86.34% 72.61%
๐Ÿฅ‰ GPT-4V 69.02% 74.66% 71.72% 65.57% 75.67% 72.58% 65.82% 68.93% 64.67% 76.70% 84.46% 72.08%
4 Gemini Pro Vision 65.08% 74.57% 72.24% 67.97% 74.63% 69.62% 70.03% 64.70% 64.95% 78.71% 90.24% 71.99%
5 ShareGPT4V 62.18% 71.90% 69.29% 64.89% 70.79% 71.57% 63.96% 69.32% 61.33% 72.01% 77.56% 69.18%
6 mPLUG-Owl2 60.90% 70.57% 68.30% 62.77% 72.23% 64.71% 64.10% 65.59% 58.64% 73.02% 80.73% 67.89%
7 LLaVA-1.5 53.85% 70.16% 67.40% 59.93% 69.10% 65.71% 62.37% 62.36% 58.92% 70.71% 81.22% 66.32%
8 Qwen-VL 54.81% 66.25% 62.91% 60.64% 68.30% 58.85% 59.44% 61.25% 55.38% 67.53% 74.15% 63.21%
9 LLaVA 46.79% 63.59% 65.30% 64.54% 64.29% 61.10% 60.77% 65.39% 52.27% 61.18% 74.88% 62.43%
10 InstructBLIP 37.82% 55.36% 55.43% 57.09% 57.06% 55.86% 47.21% 59.84% 45.01% 54.98% 56.34% 54.29%
11 MiniGPT-v2 56.73% 56.44% 51.74% 50.00% 56.74% 53.24% 50.93% 53.99% 43.06% 58.73% 66.10% 54.18%
12 GLM 55.77% 54.61% 51.25% 48.94% 54.90% 55.24% 47.34% 60.95% 44.62% 48.48% 55.61% 52.96%
13 Otter 35.90% 54.28% 51.65% 51.06% 51.04% 50.62% 51.20% 56.10% 44.48% 51.37% 49.02% 50.96%
14 IDEFICS-Instruct 37.50% 52.87% 52.84% 51.06% 52.97% 50.12% 48.40% 50.96% 44.62% 51.09% 60.73% 50.82%
15 MiniGPT-4 39.42% 41.31% 42.67% 44.33% 41.57% 42.89% 41.36% 47.23% 32.01% 41.99% 46.10% 41.93%
16 TinyGPT-V 21.79% 24.52% 22.13% 28.01% 22.71% 24.69% 24.34% 32.39% 17.99% 19.77% 19.27% 23.71%

Here is the comparison of GPT-4V, Gemini Pro Vision, and other OA MLLMs on AesE.

Rank MLLM Emotion Interest Uniqueness Vibe NIs AIs AGIs Yes-No What How Why Overall
๐Ÿฅ‡ SPHINX-MoE 68.59% 80.65% 75.86% 82.14% 74.72% 75.19% 69.02% 74.95% 62.89% 72.71% 88.48% 73.32%
๐Ÿฅˆ Q-Instruct 68.64% 83.86% 75.86% 80.00% 76.65% 72.19% 66.62% 64.30% 67.42% 81.57% 86.76% 72.68%
๐Ÿฅ‰ Gemini Pro Vision 66.87% 87.50% 70.00% 79.09% 70.60% 72.35% 71.53% 67.50% 64.52% 72.25% 90.37% 71.37%
4 ShareGPT4V 66.48% 80.65% 68.97% 78.72% 70.95% 73.69% 67.29% 67.75% 65.58% 72.71% 83.58% 70.75%
5 GPT-4V 65.06% 72.41% 62.07% 80.15% 73.87% 72.08% 62.27% 68.67% 64.02% 70.07% 84.20% 70.16%
6 mPLUG-Owl2 65.60% 77.42% 65.52% 78.07% 71.03% 71.57% 66.22% 68.05% 64.16% 70.14% 83.82% 69.89%
7 LLaVA-1.5 62.49% 80.65% 75.85% 78.93% 69.26% 69.58% 65.43% 62.37% 64.16% 71.71% 84.07% 68.32%
8 LLaVA 58.61% 80.63% 65.52% 75.83% 67.01% 66.96% 58.38% 67.95% 55.95% 60.14% 79.66% 64.68%
9 Qwen-VL 58.67% 83.87% 72.41% 73.90% 63.88% 67.08% 61.57% 60.65% 58.07% 66.14% 79.90% 64.18%
10 MiniGPT-v2 52.52% 58.06% 44.83% 58.07% 55.86% 55.85% 50.27% 57.81% 43.48% 53.43% 66.42% 54.36%
11 GLM 53.13% 70.97% 44.83% 55.29% 56.58% 54.86% 48.67% 60.65% 41.78% 50.43% 64.95% 53.96%
12 InstructBLIP 49.64% 58.06% 51.72% 61.50% 55.06% 55.24% 48.94% 55.88% 50.99% 51.43% 58.33% 53.89%
13 Otter 48.42% 70.97% 51.72% 63.21% 53.05% 55.74% 52.39% 54.77% 51.84% 53.43% 54.41% 53.64%
14 IDEFICS-Instruct 43.93% 64.52% 62.07% 64.06% 50.72% 53.12% 49.07% 50.20% 41.08% 52.43% 66.42% 50.82%
15 MiniGPT-4 39.78% 38.71% 24.14% 39.04% 42.70% 37.78% 35.51% 50.61% 31.59% 31.86% 38.48% 39.35%
16 TinyGPT-V 30.36% 29.03% 31.03% 35.40% 32.50% 36.03% 26.99% 36.00% 29.89% 28.86% 31.62% 32.04%

Here is the comparison of GPT-4V, Gemini Pro Vision, and other OA MLLMs on AesA.

Rank MLLM NIs AIs AGIs Overall
๐Ÿฅ‡ Q-Instruct 62.20% 49.75% 40.69% 52.86%
๐Ÿฅˆ GPT-4V 59.98% 46.92% 40.59% 50.86%
๐Ÿฅ‰ mPLUG-Owl2 57.78% 49.50% 40.83% 50.57%
4 SPHINX-MoE 57.62% 48.50% 38.70% 49.93%
5 Gemini Pro Vision 54.17% 48.39% 42.20% 49.38%
6 ShareGPT4V 54.65% 48.38% 35.90% 47.82%
7 InstructBLIP 52.73% 47.88% 34.84% 46.54%
8 Qwen-VL 54.25% 39.28% 40.43% 46.25%
9 LLaVA 51.69% 48.00% 34.31% 45.96%
10 LLaVA-1.5 50.08% 48.13% 34.97% 45.46%
11 IDEFICS-Instruct 50.00% 47.76% 33.78% 45.00%
12 Otter 49.20% 48.25% 34.04% 44.86%
13 TinyGPT-V 44.06% 41.65% 44.81% 43.57%
14 MiniGPT-4 41.65% 36.28% 35.90% 38.57%
15 GLM 38.92% 37.78% 35.90% 37.79%
16 MiniGPT-v2 27.05% 31.92% 36.97% 31.11%

Here is the comparison of GPT-4V, Gemini Pro Vision, and other OA MLLMs on AesI.

Rank Model Relevance Precision Completeness Overall
๐Ÿฅ‡ GPT-4V 1.385 1.151 1.366 1.301
๐Ÿฅˆ ShareGPT4V 1.440 1.117 1.331 1.296
๐Ÿฅ‰ SPHINX-MoE 1.501 1.171 1.130 1.267
4 Gemini Pro Vision 1.416 1.087 1.164 1.222
5 Qwen-VL 1.393 1.006 1.175 1.192
6 mPLUG-Owl2 1.402 1.016 1.130 1.182
7 IDEFICS-Instruct 1.406 1.007 1.126 1.180
8 LLaVA-1.5 1.397 0.953 1.120 1.157
9 InstructBLIP 1.372 0.863 1.144 1.126
10 LLaVA 1.374 0.918 1.084 1.125
11 Otter 1.242 0.848 0.989 1.027
12 Q-Instruct 1.222 0.939 0.898 1.020
13 MiniGPT-v2 1.191 0.868 0.948 1.003
14 MiniGPT-4 1.158 0.823 1.016 0.999
15 GLM 1.122 0.729 0.944 0.932
16 TinyGPT-V 0.871 0.511 0.720 0.701

Submission Guideline

  • via GitHub Release: Please see our release for details.

Acknowledgement

Special thanks are extended to the 32 aesthetic experts who participated in our experiments, whose rich aesthetic experience and responsible attitude played a crucial role in the construction of the dataset. We highlight the following:

Wei Liu, Xin Liu, Luxia Chen, Tianjiao Gu, Dahai Tian, Ziyan Ou, et al.

Many thanks are extended to collaborators, for their kind assistance in data collection and MLLM deployment:

Zhichao Duan and Pangu Xie.

Citation

If you find our work interesting, please feel free to cite our paper:

@article{AesBench,
    title={AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception},
    author={Huang, Yipo and Yuan, Quan and Sheng, Xiangfei and Yang, Zhichao and Wu, Haoning and Chen, Pengfei and Yang, Yuzhe and Li, Leida and Lin, Weisi},
   journal={arXiv preprint arXiv:2401.08276},
    year={2024},
}

aesbench's People

Contributors

yipoh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.