Giter Site home page Giter Site logo

plot2code's Introduction

Plot2Code Benchmark

Plot2Code benchmark is now open-sourced at huggingface (ARC Lab) and GitHub. More information can be found in our paper.

This repository contains the code for an evaluation pipeline that generates Python code from reference plots, executes the generated code to draw plots, and then calculates various evaluation metrics to assess the quality of the generated code.

Why we need Plot2Code?

  • 🧐 While MLLMs have demonstrated potential in visual contexts, their capabilities in visual coding tasks have not been thoroughly evaluated. Plot2Code offers a platform for comprehensive assessment of these models.

  • πŸ€— To enable individuals to ascertain the proficiency of AI assistants in generating code that renders into plots given reference plots, we initiated the Plot2Code project. This ensures evaluations are pertinent to real-world applications.

  • πŸ’» Plot2Code accommodates all modalities (text and images) for both input and output, facilitating an exploration of the influence of each modality.

Supported Tasks

Plot2Code is primarily designed as a benchmark for code generation from scientific plots. Specifically, it supports the following settings:

  • Text2Image: We provide instructions to the assistant, requesting it to generate pyplot code and subsequently render the plots.
  • Image2Image: Referred to as the Direct Asking setting in our paper, we input the reference plot directly and ask the assistant to generate pyplot code to render similar plots.
  • I+T 2 Image: Combining both instructions and reference plots as input, this is called the Conditional Asking setting in our paper.

By employing these settings, we can investigate the impact of each input modality on the quality of the final rendered plots.

Requirements

  • NumPy
  • Matplotlib==3.8.4
  • Pillow
  • Levenshtein
  • openai>1.12.0

You can install the required packages using the following command:

pip install -r requirements.txt

How to Download

You can use following codes to download the dataset:

git lfs install
mkdir data
cd data
git clone https://huggingface.co/datasets/TencentARC/Plot2Code

Usage

  1. Generate code from reference plots. Add --instruction for the conditional setting.
export OPENAI_API_KEY=[API_KEY]
export OPENAI_API_BASE=[API_BASE]

# GPT-4V generate code (direct asking)
python -m plot2code.gpt4v_generate_code --prompt_strategy default

# GPT-4V generate code (conditional asking)
python -m plot2code.gpt4v_generate_code --prompt_strategy default --instruct

# GPT-4V generate code (conditional asking with CoT)
python -m plot2code.gpt4v_generate_code --prompt_strategy CoT --instruct
  1. Execute the generated code to render the plots.
python -m plot2code.execute_generated_code --model_name "$model_name" --prompt_strategy $prompt_strategy
  1. Evaluate the similarity between the generated plots and the grond truth plots.
echo "Calculating text match score..."
python -m plot2code.eval.text_match_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy

echo "Calculating gpt-4v evaluation score..."
python -m plot2code.eval.gpt4v_evaluations_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy

echo "Combining evaluation results..."
python -m plot2code.eval.combine_evaluation_results  --model_name "$model_name"  --prompt_strategy $prompt_strategy

See scripts for more details.

News

  • πŸ”₯[2024/05] We open source the Plot2Code benchmark. Stay tuned for this project! πŸ˜†

License

This project is open-sourced under the Apache-2.0. These evaluation code and datasets are fully open for academic research and can be used for commercial purposes with official written permission.

Citation

The code and model in this repository is mostly developed for or derived from the paper below. Please cite it if you find the repository helpful.

@misc{wu2024plot2code,
      title={Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots}, 
      author={Chengyue Wu and Yixiao Ge and Qiushan Guo and Jiahao Wang and Zhixuan Liang and Zeyu Lu and Ying Shan and Ping Luo},
      year={2024},
      eprint={2405.07990},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.