Giter Site home page Giter Site logo

dangnha / zaloai2023-elementary-math-solving Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dinhquy-nguyen-1704/zaloai2023-elementary-math-solving

0.0 0.0 0.0 314 KB

Baseline achieving 0.8 accuracy on the private test set in the ZaloAI Challenge 2023 Elementary Math Solving

Python 100.00%

zaloai2023-elementary-math-solving's Introduction

ZaloAI2023-Elementary-Math-Solving

1. Introduction

In this GitHub repository, I present a baseline solution for the Elementary Math Solving task from the ZaloAI Challenge 2023. Leveraging the powerful mathematical reasoning capabilities of the Deepseek-math model, this approach achieves an impressive 80% accuracy on the competition's private test set.

VQA System

2. Getting Started

git clone https://github.com/dinhquy-nguyen-1704/ZaloAI2023-Elementary-Math-Solving.git
cd ZaloAI2023-Elementary-Math-Solving
pip install -r requirements.txt
huggingface-cli login
wandb login

3. Finetune

I only utilize a dataset of over 1000 training samples from the competition to fine-tune the model.

To rerun the fine-tuning code, you can execute the following command line.

python main.py --hf_account <HuggingFace account> --model_hf_name <HuggingFace model's name>

You can also find the fine-tuned model I've trained at [๐Ÿค— Models] and the merged version at [๐Ÿค— Models].

4. Inference

To infer a fine-tuned model with any elementary math multiple-choice question, you can run the following commands.

Chain of Thought:

python inference_cot.py --hf_account <HuggingFace account> --model_hf_name <HuggingFace model's name>

Few-shot Chain of Thought:

python inference_few_shot_cot.py --hf_account <HuggingFace account> --model_hf_name <HuggingFace model's name>

You can absolutely use the model I've fine-tuned for inference as well.

Chain of Thought:

python inference_cot.py --hf_account quynguyen1704 --model_hf_name deepseek-math-7b-rl-zaloai-v2

Few-shot Chain of Thought:

python inference_few_shot_cot.py --hf_account quynguyen1704 --model_hf_name deepseek-math-7b-rl-zaloai-v2

5. Evaluate

To evaluate the accuracy of the model on the private test set, you can run the following command:

Chain of Thought:

python evaluate_cot.py --hf_account <HuggingFace account> --model_hf_name <HuggingFace model's name> --max_new_tokens <max new tokens>

Few-shot Chain of Thought:

python evaluate_few_shot_cot.py --hf_account <HuggingFace account> --model_hf_name <HuggingFace model's name> --max_new_tokens <max new tokens>

You can also completely replace my model with yours and give it a try.

Chain of Thought with vLLM:

You can also evaluate with vLLM, through the model I merged here. With vLLM, the entire evaluation process with 332 questions in the test set will take about 30 minutes, compared to 4 hours when not using it. However, in return, the quality of the model's answers will be slightly reduced.

python evaluate_vllm.py --hf_account quynguyen1704 --model_hf_name deepseek-math-7b-rl-zaloai-vllm --max_new_tokens 2048

6. Results

The following table summarizes the results of the model after fine-tuning. For questions where the model does not have enough tokens to generate the final answer (A, B, C or D), answer E will be output.

Model Max_new_tokens Prompt Note Accuracy
deepseek-math-7b-rl 500 CoT 67%
deepseek-math-7b-rl 1024 CoT 82%
deepseek-math-7b-rl 1024 Few-shot CoT 80%
deepseek-math-7b-rl 2048 CoT vLLM 80%

7. Limitations

Deepseek-Math-7B-RL is a powerful LLM model with strong mathematical reasoning capabilities in English, Chinese, and Vietnamese. However, there are still certain drawbacks:

  • With max_new_tokens = 500, there are many questions in the private dataset where the model doesn't have enough tokens to generate a final answer.
  • With max_new_tokens = 1024, the inference time for each question is quite long, averaging about 40s - 60s per question.

8. References

zaloai2023-elementary-math-solving's People

Contributors

dinhquy-nguyen-1704 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.