Giter Site home page Giter Site logo

shanface33 / gpt4mf_ub Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 11.42 MB

Official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Home Page: https://arxiv.org/pdf/2403.14077.pdf

deepfake-detection deepfake-images image-forensics chatgpt-4 multimodal-llm ai-generated-image-detection

gpt4mf_ub's Introduction

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Github arXiv

This is the official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Shan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan, Yan Ju, Chuanbo Hu, Xin Li, Baoyuan Wu, Siwei Lyu

Summary

In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.

preview

Two multimodal LLMs have been evaluated: GPT4V and Gemini 1.0 Pro.

Test-data

The dataset used in this study can be downloaded from the following link, which contains 1,000 StyleGAN2 generated face images, 1,000 Latent Diffusion generated images, and 1,000 real faces from FFHQ dataset, drived from DF^3 dataset. Both raw data and post-proccessed (pped) data have been provided.

The test data has the following structure:

Test_data
|--Real_512Size 
|--StyleGAN_raw_512size 
|--StyleGAN_pped_256size
|--LD_raw_256Size
|--LD_pped_512Size

Comparison with ML-based detectors

We'll make all responses from two multimodal LLMs upon the paper’s acceptance.

Table: Comparison of AUC (%) in detecting DeepFake faces

Method Raw SG2 Raw LD Pped SG2 Pped LD
CNN-aug 96.5 58.6 53.2 52.4
GAN-DCT 53.4 75.4 44.4 56.0
Nodown 99.6 97.1 47.4 44.9
BeyondtheSpectrum 98.1 77.3 45.4 46.9
PSM 99.2 82.5 73.1 71.3
GLFF 97.5 86.7 80.6 79.4
Gemini 1.0 (zero-shot) 76.6 75.1 77.5 81.5
GPT4V (zero-shot) 77.2 79.5 88.7 89.8

The following figure shows examples of GPT4V for DeepFake face detection. Left: Results for AI-generated images. Right: Results for real faces. The responses for AI-generated faces are labeled in pink, while for the real faces are labeled in green. Both success (w/ marks) and failure (w/ crosses) cases are shown. See paper for details.

preview

Citation

@misc{jia2024chatgpt,
      title={Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics}, 
      author={Shan Jia and Reilin Lyu and Kangran Zhao and Yize Chen and Zhiyuan Yan and Yan Ju and Chuanbo Hu and Xin Li and Baoyuan Wu and Siwei Lyu},
      year={2024},
      eprint={2403.14077},
      archivePrefix={arXiv},
}

gpt4mf_ub's People

Contributors

shanface33 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.