pipilurj / g-llava Goto Github PK

Official github repo of G-LLaVA

Python 94.94% HTML 1.49% JavaScript 1.94% CSS 0.35% Shell 1.28%

g-llava's Introduction

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

This repository contains the code and data for the paper titled "G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model".

Paper, Dataset , Models(G-LLaVA-7B, G-LLaVA-13B)

Install Packages

cd G-LLaVA
conda create -n gllava python=3.10 -y
conda activate gllava
pip install -e .

Enable Deepspeed

pip install deepspeed

Data Preparation

Download our dataset.

Place the data under playground/data. Here is the data structure:

playground/data/
├── images/
│   ├── geo3k/
│   ├── geoqa_plus/
│   ├── test/
├── alignment.json
├── qa_tuning.json
├── test_question.jsonl
├── test_answers.jsonl

"test_question.jsonl" and "test_answers.jsonl" correspond to the test set of GeoQA.

First Stage Alignment

This stage enables the model to better interpret the content of geometric figures.

bash scripts/run_alignment.sh

Second Stage Instruction Tuning

This stage equips the model with stronger ability for solving geometry problems.

bash scripts/run_qa.sh

Evaluation

Generate responses from the model.

bash scripts/eval_multi.sh /
                path-to-model /
                playground/data/test_questions.jsonl /
                path-to-output /
                path-to-image-folder /
                num_gpus /
                temperature

Run automatic evaluation to calculate the accuracy.

python scripts/geo_acc_calculate.py /
             --ground_truth_file playground/data/test_answers.jsonl /
             --predictions_file path-to-output-file

Here are some example scripts:

bash scripts/eval_multi.sh /path/to/checkpoint/ playground/data/test_questions.jsonl results_try/Gllava-test playground/data/images/ 8 0

python scripts/geo_acc_calculate.py  --ground_truth_file playground/data/test_answers.jsonl --predictions_file results_try/Gllava-test_merged.jsonl

Acknowledgement

The project is built on top of the amazing LLaVA repository. Thanks for their great work!

If you find our code and dataset helpful to your research, please consider citing us with this BibTeX:

@misc{gao2023gllava,
      title={G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model}, 
      author={Jiahui Gao and Renjie Pi and Jipeng Zhang and Jiacheng Ye and Wanjun Zhong and Yufei Wang and Lanqing Hong and Jianhua Han and Hang Xu and Zhenguo Li and Lingpeng Kong},
      year={2023},
      eprint={2312.11370},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

g-llava's People

Contributors

Stargazers

Watchers

Forkers

steven640pixel rayjue

g-llava's Issues

Reproduce GeoQA: acc without alignment phase is larger than with alignment

Thank you for your brilliant work on Geometric problems.

I used the provided scripts and found that the accuracy of models without alignment is 64.5%, while the one with alignment is 63.3%, which is quite different from the original paper.

It seems that the alignment phase is unnecessary. What is the problem?

Evaluation on MathVista

Hello! Thanks for your great work, which is very insightful.

I'm now trying to reproduce the results in the paper, and I have already succeeded on the Geo.

However, I'm struggling with MathVista and I’m very confused as to why there is no content related to MathVista in the code repository, which is the main benchmark demonstrated in the paper.

Can you show how to implement the evaluation?

[Question] Help regarding evaluation

@SumilerGAO @pipilurj Could you please help me little with evaluation LLaVA model on some custom dataset. We are also doin someting like you did, but in some other domain and are now in evaluation phase.
Can I please get some help ? Like what all ways I can use to evaluate LLaVA ?

Below is screenshot of a evaluation script I ran from orginal LLaVA repo. So can you please help with what these below means ?
I executed: llava/eval/summarize_gpt_review.py

Checkpoint request

Are the checkpoints provided anywhere, either the original or even any of the replicated ones ?

Are there any training details that need attention?

I tried training using the code from this repository, but the model I trained myself performed poorly on Mathvista.
Could you tell me if there are any training details that need attention, and is it possible to provide the official checkpoint?

Detailed training parameters

Thank you for your good work, but I found that the parameters in the training script in the code repository are different from those in the article. What are the real training parameters?

Could you please upload the checkpoint after the two-stage fine-tuning?

Hello, thanks for your great work!
It is a bit hard to finetune on my own computer(A100). Could you please upload the checkpoint after the two-stage fine-tuning? Thanks a lot!

where is the appendix？

the paper dont have appendix
XD

share the prompts?

In the paper you said:

"we use text-only Chat�GPT 3.5 to create image captions based on these human-labeled QA pairs".

Would you share the prompt you use?

Unable to load the Huggingface model due to missing "preprocessor_config.json"

# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("renjiepi/G-LLaVA-7B")
model = AutoModelForCausalLM.from_pretrained("renjiepi/G-LLaVA-7B")

Error:

OSError: renjiepi/G-LLaVA-7B does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/renjiepi/G-LLaVA-7B/tree/main' for available files.

I am using transformers==4.41.0.

The difference between the two training scripts

Hello author, first of all thank you for your excellent work.

Secondly, I have a question about training details.
Apart from the model path and data set path, what are the differences between run_alignment.sh and run_qa.sh?
Why is there freeze_backbone in run_alignment.sh but not in run_qa.sh?

Looking forward to your reply

image-caption pairs

Where is the image-caption pairs? in the link, it just contains the QA pairs.

How you translated GeoQA-Plus into English

Thanks for your brilliant work on geometry problems.

May I ask how you translated GeoQA-Plus into English? Did you use GPT for the translation? And how did you ensure the consistency of the original problem's meaning?

By the way, which type and how many gpus you used for training? Thank you.