Giter Site home page Giter Site logo

aric's Introduction

Aesthetically Relevant Image Captioning

Zhipeng Zhong, Fei Zhou and Guoping Qiu

This repository is the official PyTorch implementation of Aesthetically Relevant Image Captioning(arxiv). This paper has been accepted by AAAI 2023 oral paper.

Contents

  1. Introduction
  2. Installation
  3. Dataset
  4. Train
  5. Test
  6. Results
  7. Citation
  8. Acknowledgment
  9. License

Introduction

In this paper, we study image AQA and IAC together and present a new IAC method termed Aesthetically Relevant Image Captioning (ARIC). Based on the observation that most textual comments of an image are about objects and their interactions rather than aspects of aes- thetics, we first introduce the concept of Aesthetic Relevance Score (ARS) of a sentence and have developed a model to automatically label a sentence with its ARS. We then use the ARS to design the ARIC model which includes an ARS weighted IAC loss function and an ARS based diverse aes- thetic caption selector (DACS). We present extensive exper- imental results to show the soundness of the ARS concept and the effectiveness of the ARIC model by demonstrating that texts with higher ARS’s can predict the aesthetic ratings more accurately and that the new ARIC model can generate more accurate, aesthetically more relevant and more diverse image captions. Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores

Multi-modal Aesthetic Quality Assessment

Dataset

1. Dataset preparation

You need to download the DPC2022 dataset.

For DPC2022 dataset, you can download it in DPC2022 (access code: bo3c) Notice: the link will be unavailable sometimes since the dataset can not pass through Baidudisk's review for illegal data. please let me know if the link dead again.

2. Dataset format

we manually selected 1022 most frequently appeared image aesthetics related words such as shot, color, composition, light, focus, background, subject, detail, contrast, etc. we manually selected 2146 words related to objects such as eye, sky, face, ribbon, water, tree, flower, expression, hand, bird, glass, dog, hair, cat, smile, sun, window, car, etc. All of them can be avaiable at directory aesthetic_and_object_word in DPC2022.

There is an illustration for a data annotation example 245925.json

   "ID": "245925",
   ...
   "avg_all_users_score": 5.3108, (aesthetic score)
   ...
   "raw_comments": [...],  (raw comments without any preprocessing)
   "clean_comments": [...] ( divide acomment to sentences and remove emoji and other strange spellings, symbols and punctuation marks)
   "score_comments":[...] (labeled sentences)
   "score_comments":[
       "comment": {
                "comment": sentence text
                "tfidf_cscore": normalize tfidf score
                "length": length of sentence
                "words": [...] the word and part of speech after  tokenize sentence.
            },
            "sum_score": ARS score
            "norm_sum_score": normlize ARS score
            "tfidf": 
            "length": score about count of words in the sentence
            "aesthetic": score about count of aesthetic words in the sentence
            "objective": score about count of object words in the sentence
            "sentiment": sentiment score
            "sentiments": {
                "neg_score": 
                "neu_score": 
                "pos_score": 
            },
            "clip_score": clip score
            "norm_clip_score": normlize clip score
        },
   ]



Installation

there are some bug I don't know using data distribution if torch version 1.7.1

  • Python 3.8.0
  • torch 1.7.1+cu110
  • NVIDIA GPU + CUDA
  1. Clone repo

    git clone https://github.com/PengZai/ARIC.git

  2. Install dependent packages

    cd ARIC
    conda create -n ARIC python=3.8.0
    conda activate ARIC
    pip install -r requirements.txt 
    pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
    pip install git+https://github.com/openai/CLIP.git
    
    

Train

1. Component prepare

you are required to download pretrain models and metrices and move them to workspaceFolder.(access code: spd5)

2. Modify Config.py

modify these paths to correct paths.

    --huggingface_model_root (huggingface model bert, roberta, vit)
    --image_root 
    --image_bottom_up_attention_feature_root
    --data_root (directory clean_and_all_score)
    --train_root (directory containing train.csv)
    --comments_root (directory containing train_split_by_val_scored_comment_id_pair.csv)
    --test_and_val_root (directory containing test.csv and val.csv)

3. train visualgpt and generate caption using Diverse Aesthetic Caption Selector (DACS)

   bash dist_train.sh

4. Visualgpt

5. DACS

Test

1.Image AQA based on Generated Captions

   bash dist_generation_eval.sh

Results

There are some Diverse Aesthetic Caption Selector (DACS) results

Citation

@article{zhong2022aesthetically,
  title={Aesthetically Relevant Image Captioning},
  author={Zhong, Zhipeng and Zhou, Fei and Qiu, Guoping},
  journal={arXiv preprint arXiv:2211.15378},
  year={2022}
}

Acknowledgment

Our code is built on VisualGPT. We thank the authors for sharing their codes.

License

The code and DPC2022 dataset are released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for NonCommercial use only. Any commercial use should get formal permission first.

aric's People

Contributors

pengzai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.