Giter Site home page Giter Site logo

g-llava's Introduction

ex1

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

This repository contains the code and data for the paper titled "G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model".

Paper, Dataset , Models(G-LLaVA-7B, G-LLaVA-13B)

ex1

Install Packages

cd G-LLaVA
conda create -n gllava python=3.10 -y
conda activate gllava
pip install -e .

Enable Deepspeed

pip install deepspeed

Data Preparation

Download our dataset.

Place the data under playground/data. Here is the data structure:

playground/data/
├── images/
│   ├── geo3k/
│   ├── geoqa_plus/
│   ├── test/
├── alignment.json
├── qa_tuning.json
├── test_question.jsonl
├── test_answers.jsonl

"test_question.jsonl" and "test_answers.jsonl" correspond to the test set of GeoQA.

First Stage Alignment

This stage enables the model to better interpret the content of geometric figures.

bash scripts/run_alignment.sh

Second Stage Instruction Tuning

This stage equips the model with stronger ability for solving geometry problems.

bash scripts/run_qa.sh

Evaluation

Generate responses from the model.

bash scripts/eval_multi.sh /
                path-to-model /
                playground/data/test_questions.jsonl /
                path-to-output /
                path-to-image-folder /
                num_gpus /
                temperature

Run automatic evaluation to calculate the accuracy.

python scripts/geo_acc_calculate.py /
             --ground_truth_file playground/data/test_answers.jsonl /
             --predictions_file path-to-output-file

Here are some example scripts:

bash scripts/eval_multi.sh /path/to/checkpoint/ playground/data/test_questions.jsonl results_try/Gllava-test playground/data/images/ 8 0

python scripts/geo_acc_calculate.py  --ground_truth_file playground/data/test_answers.jsonl --predictions_file results_try/Gllava-test_merged.jsonl

Acknowledgement

The project is built on top of the amazing LLaVA repository. Thanks for their great work!

If you find our code and dataset helpful to your research, please consider citing us with this BibTeX:

@misc{gao2023gllava,
      title={G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model}, 
      author={Jiahui Gao and Renjie Pi and Jipeng Zhang and Jiacheng Ye and Wanjun Zhong and Yufei Wang and Lanqing Hong and Jianhua Han and Hang Xu and Zhenguo Li and Lingpeng Kong},
      year={2023},
      eprint={2312.11370},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

g-llava's People

Contributors

sumilergao avatar pipilurj avatar

Stargazers

Ayush Mangal avatar  avatar Xue Du avatar York avatar Junsheng Huang avatar jweihe avatar Yang Wang  avatar Yisheng (Ethan) He avatar Jamie Jiazhan Feng avatar  avatar  avatar Jason avatar Haochuan Wang avatar Ray Yang avatar Yongming Rao avatar Yuncheng Huang avatar zhikaizhang avatar Drax avatar Omid avatar  avatar Wizyoung avatar  avatar 조승혁 avatar Celine_Yu avatar starshine avatar kge avatar Tao Xijia avatar MagicSource avatar AD2023 avatar  avatar  avatar Jeff Carpenter avatar Ruiyi Zhang avatar JJ Jiang avatar slyviacassell avatar kingfly avatar Zhimeng Guo avatar wenhu chen avatar yangyi avatar Lu Qi avatar Changqian avatar  avatar pe653 avatar Wei Liu avatar Yoon, Seungje avatar  Jiahui Zhu avatar Fanxu Meng avatar KABI avatar eyuansu62 avatar  avatar  avatar Zihan Huang avatar Gabriel avatar Jianhua Han avatar Li Zhongzhi avatar  avatar  avatar Zeyu Qin avatar  avatar TianyangHan avatar  avatar Miles Ge avatar Duc-Hoang Pham avatar Litian Huang avatar Zhen Huang avatar  avatar  avatar Jinze Bai avatar  avatar lismin avatar John avatar  avatar cnxup avatar  avatar Tianyu Huang avatar Occupying-Mars avatar Nikolay Pultsin avatar Tianbao Xie avatar Zhihui Xie avatar Jose Cohenca avatar Lei Li avatar Changsheng Lu (卢长胜) avatar  avatar Hardy avatar Tianheng Cheng avatar  avatar Akash avatar Bowen Dong avatar 爱可可-爱生活 avatar Yang Liu avatar Bencheng avatar Seungyun Baek avatar  avatar  avatar Ethan, Wenjun Hou avatar MiZhenxing avatar Hanrong Ye avatar shizhediao avatar Yueqi Xie avatar Runtao Liu avatar

Watchers

Karthik Kannan avatar Zhanliang Liu avatar  avatar LucasZhao avatar  avatar

g-llava's Issues

Reproduce GeoQA: acc without alignment phase is larger than with alignment

Thank you for your brilliant work on Geometric problems.

I used the provided scripts and found that the accuracy of models without alignment is 64.5%, while the one with alignment is 63.3%, which is quite different from the original paper.

It seems that the alignment phase is unnecessary. What is the problem?

Evaluation on MathVista

Hello! Thanks for your great work, which is very insightful.

I'm now trying to reproduce the results in the paper, and I have already succeeded on the Geo.

However, I'm struggling with MathVista and I’m very confused as to why there is no content related to MathVista in the code repository, which is the main benchmark demonstrated in the paper.

Can you show how to implement the evaluation?

[Question] Help regarding evaluation

@SumilerGAO @pipilurj Could you please help me little with evaluation LLaVA model on some custom dataset. We are also doin someting like you did, but in some other domain and are now in evaluation phase.
Can I please get some help ? Like what all ways I can use to evaluate LLaVA ?

Below is screenshot of a evaluation script I ran from orginal LLaVA repo. So can you please help with what these below means ?
I executed: llava/eval/summarize_gpt_review.py

image

Checkpoint request

Are the checkpoints provided anywhere, either the original or even any of the replicated ones ?

Are there any training details that need attention?

I tried training using the code from this repository, but the model I trained myself performed poorly on Mathvista.
Could you tell me if there are any training details that need attention, and is it possible to provide the official checkpoint?

Detailed training parameters

image
Thank you for your good work, but I found that the parameters in the training script in the code repository are different from those in the article. What are the real training parameters?

share the prompts?

In the paper you said:

"we use text-only Chat�GPT 3.5 to create image captions based on these human-labeled QA pairs".

Would you share the prompt you use?

Unable to load the Huggingface model due to missing "preprocessor_config.json"

# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("renjiepi/G-LLaVA-7B")
model = AutoModelForCausalLM.from_pretrained("renjiepi/G-LLaVA-7B")

Error:

OSError: renjiepi/G-LLaVA-7B does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/renjiepi/G-LLaVA-7B/tree/main' for available files.

I am using transformers==4.41.0.

The difference between the two training scripts

Hello author, first of all thank you for your excellent work.

Secondly, I have a question about training details.
Apart from the model path and data set path, what are the differences between run_alignment.sh and run_qa.sh?
Why is there freeze_backbone in run_alignment.sh but not in run_qa.sh?

Looking forward to your reply

How you translated GeoQA-Plus into English

Thanks for your brilliant work on geometry problems.

May I ask how you translated GeoQA-Plus into English? Did you use GPT for the translation? And how did you ensure the consistency of the original problem's meaning?

By the way, which type and how many gpus you used for training? Thank you.

Dataset realease?

This is super great work. May I check when will you release the dataset? thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.