Giter Site home page Giter Site logo

lasted's Introduction

LASTED

An implementation code for paper "Generalizable Synthetic Image Detection via Language-guided Contrastive Learning"

Table of Contents

Background

The heightened realism of AI-generated images can be attributed to the rapid development of synthetic models, including generative adversarial networks (GANs) and diffusion models (DMs). The malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, however, raises significant concerns regarding the authenticity of images. Though many forensic algorithms have been developed for detecting synthetic images, their performance, especially the generalization capability, is still far from being adequate to cope with the increasing number of synthetic models.

The heightened realism of AI-generated images raises significant concerns regarding the image authenticity.

In this work, we propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning and a new formulation of the detection problem. We first augment the training images with carefully-designed textual labels, enabling us to use a joint text-image contrastive learning for the forensic feature extraction. In addition, we formulate the synthetic image detection as an identification problem, which is vastly different from the traditional classification-based approaches. It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models and delivers promising performance that far exceeds state-of-the-art competitors by +22.66% accuracy and +15.24% AUC.

Illustration of our proposed LASTED. The training images are first augmented with the carefully-designed textual labels, and then image/text encoders are jointly trained.

Dependency

  • torch 1.9.0
  • clip 1.0

Usage

  1. Prepare the training/testing list file (e.g., annotation/Test.txt) through preprocess.py.

  2. For training LASTED: set isTrain=1 then sh main.sh.

  3. For testing LASTED: set isTrain=0 and test_file='Test.txt', then sh main.sh. LASTED will detect the images listed in annotation/Test.txt and report the detection results.

Note: The pretrained LASTED and related testing datasets can be downloaded from Google Drive. The training dataset can be downloaded from Baidu Pan (~120GB with 4 subsets) or Google Drive (~30GB with 3 subsets). To acquire the unzip code of the dataset, please fill the License Form and/or contact our email ([email protected]).

Citation

If you use this code/dataset for your research, please citing the reference:

@article{lasted,
  title={Generalizable Synthetic Image Detection via Language-guided Contrastive Learning},
  author={H. Wu and J. Zhou and S. Zhang},
  journal={arXiv preprint:2305.13800},
  year={2023}
}

Acknowledgments

lasted's People

Contributors

highwaywu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zj56 alanderk

lasted's Issues

Questions

Sorry I am very new to this but I was quiet interested in your repository and how it works. After several attempts I think I have a version of it that works, sort of, on windows. But while experimenting, I noticed that the files don't seem to do what I thought they did. Correct me if I am wrong but this is more of an training script that the inference using your pretrained model, correct?

If I am missing something please let me know. I would love to use this to try and indentify synthetic images from a set of real ones to test it out but I am not sure I exactly know how. Let me know if you have time to discuss this. I appreciate your work!

Training problem

Hi, very nice work. I'm trying to reproduce the results on a new image dataset but I'm experiencing a blocking problem.
The problem's output is as follows:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 3, 3], but got 5-dimensional input of size [16, 1, 3, 448, 448] instead

I find it on line 360 of the 'main.py' file when the function 'train_one_epoch()' is called.
Do you have any idea how I can solve it?

I'm pretty sure that [16] is about batch size, because I tried editing several times; while [448,448] is the image size.

Thank you

Model just predicts most images as real (painting or photo)

Hi, I tried using your model to try to detect AI generated photos like those from SD, SDXL, Dalle, etc. However, most of the predictions are "real". Do you see any problem with my code?

import clip
import gradio as gr
import numpy as np
import torch
import torchvision.transforms as transforms
from PIL import Image

from model import LASTED

LABELS = ["Real Photo", "Synthetic Photo", "Real Painting", "Synthetic Painting"]


def modify_state_dict(sd: dict) -> dict:
    new_sd = dict()
    for k, v in sd.items():
        new_sd[k.replace("module.", "")] = v
    return new_sd


def classify(image: Image.Image):
    with torch.inference_mode():
        tensor_in = transform(image).unsqueeze(0).to(device)
        text = clip.tokenize(LABELS).to(device)

        image_features = model.clip_model.encode_image(tensor_in)
        text_features = model.clip_model.encode_text(text)

        image_features /= image_features.norm(dim=-1, keepdim=True)
        text_features /= text_features.norm(dim=-1, keepdim=True)
        similarity = (
            (100.0 * image_features @ text_features.T)
            .softmax(dim=-1)
            .detach()
            .cpu()
            .numpy()
        )
        print(f"similarity: {similarity}")

        return np.array(LABELS)[np.argmax(similarity, axis=1)].tolist()


if __name__ == "__main__":
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")

    transform = transforms.Compose(
        [
            # transforms.ToPILImage(),
            transforms.Resize((448, 448)),
            transforms.ToTensor(),
            transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
        ]
    )

    print("Loading model...")
    model = LASTED()
    model.load_state_dict(
        modify_state_dict(torch.load("LASTED_pretrained.pt", map_location="cpu"))
    )
    model.eval()
    model.to(device)
    print("Done!")

    demo = gr.Interface(
        fn=classify,
        inputs=[gr.Image(label="input image", type="pil")],
        outputs=[gr.Text(label="predicted label")],
    )

    demo.launch(server_name="0.0.0.0", server_port=80)

Question about dataset used in the paper

Hello,

Great paper great results thanks for your contribution to the field.

I was wondering if you could answer a question I had to do with the datasets used for training. In the paper you say.

"We form the training dataset by including four categories of data, namely, real photos from LSUN [79], real paintings from Danbooru [3], synthetic photos by ProGAN [49], and synthetic paintings by Stable Diffusion (SD) [9, 11] from [6].

The image synthesis models ProGAN and SD here are deliberately trained on LSUN and Danbooru, respectively, forcing the detector to learn more discriminative representations from visually similar real and synthetic images."

I wonder why you didn't choose to allow both ProGAN and SD to be trained on LSUN and Danbooru together? Surely by training both methods on both domains you would have more coverage of the "fake" space and hence better generalisation.

Additionally I just want to confirm that this work required you to train a ProGAN model on LSUN data and a Stable diffusion model on Danbooru data? Or did you obtain these models from previous works, if so where did you obtain these?

Thanks!

Request for an Alternative Download Method for the Training Dataset

Hi!

I would like to express my interest in your project and my desire to download the training dataset. However, I have encountered a significant issue with the current download method via Baidu accounts.

The download speed I am experiencing is quite slow, averaging around 100 KB/s, which makes it impractical to obtain the dataset in a reasonable timeframe. It would take me more than a day to download a single dataset.

Considering the challenges I am facing with the current method, I wanted to kindly inquire if there might be an alternative means of making the training dataset available. Your assistance or guidance on this matter would be greatly appreciated.

Thanks!

Training Data Issue

Hi!

I'm trying to train a custom dataset with your network with COCO and stable diffusion subset each 200K images.
I tried with the r1 i.e [Real, Synthetic] with 0,1 labels in the text file ,first but the loss is not dropping.

Then I tried to divide my data with r3 [Real, Synthetic, Real Painting, Synthetic Painting] each 200k images and assigned them labels of 0,1,2,3 accordingly. The loss increased instead with same BS. I see maybe there is some confusion in how to prepare the training file with labels. Can you provide a sample training txt file with image paths?

Thanks in advance

Issue with Openset testset

I have found that some of the synthetic images in Artist&Danbooru folders reproduce real photos, but not paintings as mentioned in the paper. Moreover, there is no annotation file in the repository. How can I use the Openset test set and where can I find the annotation file?

Some question for rerun the code

Hello ,i am new in research.I have already set all the settings through READ ME file.But when i run the code by "sh main.sh" some problems occurs:DataParallel' object has no attribute 'data_root' .Here are my settings in main.sh:
python main.py
--model 'LASTED'
--train_file 'annotation/Train_Photos_num199244.txt'
--num_class 2
--val_ratio 0.005
--test_file 'annotation/Test_MidjourneyV5_num2000.txt'
--isTrain 1
--lr 0.0001
--resume ''
--data_size 448
--batch_size 6
--gpu '1,2'
2>&1 | tee weights/log.log
If I miss something please make me know .Thank you !

Some questions about the trainning

Hello, may I ask what is the approximate number of epotchs for trainning the model in your experiment? I would like to use it as a reference. Thank you.

Question about training time

Hello agian,

You mention in the paper that you trained using 4 A100's.

We are looking to replicate your results, could you give us an idea of how long it took to train the model on your hardware?

Thanks!

Do you have a paper/technical report to refer to more implementation details?

It seems like you are using CLIP with 4 possible textual description and then use cosine similarity for classification, just like CLIP. However, unlike CLIP where the cardinality of the labels, i.e., number of possible text sentences is practically unlimited (in training at least), whereas in LASTED it is only 4. I wonder how much of an uplift is there if we are to train on the same CLIP image encoder, LASTED vs something like just adding a regressor head on top of CLIP image encoder using standard multi-class categorical class entropy loss.

Train data and Openset issues

Hello,
The practical test set seems to have only paintings and not real/fake photos, so I was wondering if it is a better idea to benchmark the model on a combination of the two. Then, I referred to the provided Openset and was confused why all the names are prepended by real_ - aren't all the images there generated by GANs/DMs? What real_ stands for in the image names?

Finally, to make it possible for other studies to compare to yours, can you provide the full training set (or scripts to generate it from the datasets mentioned in the paper so it doesn't have overlaps with the provided test sets)? I wanted to reproduce your results by training LASTED on my end and then updating the way it works to see if improvements can be made. In that regard, providing any random seeds, etc. used during benchmarking would be appreciated as well, so the same metric values could be optained.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.