trotsky1997 / mathblackbox Goto Github PK

View Code? Open in Web Editor NEW

439.0 439.0 52.0 15 KB

Python 99.09% Shell 0.91%

mathblackbox's Introduction

Zhang Di - ShangHai AI Lab

Hello this is Zhang Di, An AI devloper at ShangHai AI Lab, and PhD students of Fudan Univ.

Former Full-time ML developer of Alibaba .Inc

Former M.Eng of USTC Robotics Lab and Internship at Ant Group, MIT Han Lab.

mathblackbox's People

Contributors

Stargazers

Watchers

Forkers

apollohuang1 banyan-god phungvanduy lazycat420 haailabs retromorph luca-git sithukaungset polya20 pathfindermilan serignecisse zplus1 ifitsmanu aiseei hbcbh1999 hughbzhang shadowkun maybelaterornot dtragoud xieisabug itharindu le-big-mac lilleswing taruvaid bird-laboratories marclove ashwinrajendraprasad citymap lawrencefeng17 jamesdhope netzkontrast alcidesmorales thomascherickal asapsav anietieakpan sonnydev fivejjs anzhihun jjhw moea-git hooji kaynewest codeaudit qianzhouyi2 bochen0909 lucasjbrew manfar babcockt18 tiandiao123 sidu tianyu-z bhaskatripathi

mathblackbox's Issues

MathBlackBox Jupyter Notebook

Good afternoon,

I have refactored the code for MathBlackBox to be a Jupyter notebook. It requires you put in your own API key, by default it uses DeepSeek V2 Coder, although I cannot guarantee functionality because I ran out of API credits. Any OpenAI compatible API will work.
LLaMA_3B 2.ipynb.zip
I hope someone tries it and sees if it works properly.

Comparing with self-consistency?

The method seems very powerful but expensive. I'm wondering how is it compared with self-consistency under similar computational budgets?

evaluation 步骤是如何工作的

小模型由于性能问题，很多时候没法对任务进行正确评估，这个你们是怎么解决的？

Application of This Method to Powerful Closed-Source Models (e.g., GPT-4 Turbo) and Its Effect?

The work in the paper seems very powerful and can make significant progress on the 8B open-source model. Then, when using this method on a more powerful model and taking advantage of better instruction-following ability and reasoning ability, can relatively breakthrough mathematical reasoning results be achieved?

Question about MathBlackBox

Hi! 😊 Your MathBlackBox project caught my eye. Amazing work on this python repository! ✨ Could you send me more details on Telegram? Also, please review my work and follow me on GitHub @nectariferous. Thanks!

Pass@k or Pass@1?

After seeing this work, I read the paper and found that the effect is very good. When reading the code, I found that this line of code seems to cause the indicator to degenerate from pass@1 to pass@k. Is my point of view correct?

MathBlackBox/run_with_earlystopping.py

Line 769 in 390a894

if check(ground_truth,answer) and 'testtime' in DATA_NAME:

I am not saying that pass@k is not a good indicator. The default evaluation indicator of gsm8k is usually equivalent to pass@1, but https://arxiv.org/pdf/2205.14318 also uses pass@k, and they are far from reaching this score. But if we can clearly mark the relationship between the value of k and the corresponding score, then we can better understand the paper.

Furthermore, it may be difficult to get the ground truth in reality, and pass@1 is actually more in line with reality. Do you know if there is a better way to evaluate pass@1?

If I understand it incorrectly, please kindly correct me.

ground truth knowledge

Hi, the paper was a very interesting read and this technique seems to have a lot of potential. However, looking at the code - if I understand it correctly - I have noticed that a significant portion of it is dependent on the knowledge of the correct answer - the 'ground truth'. If this knowledge is not available to the program until the final result validation, how does the program perform then? Thanks.

trotsky1997 / mathblackbox Goto Github PK

mathblackbox's Introduction

Zhang Di - ShangHai AI Lab

mathblackbox's People

Contributors

Stargazers

Watchers

Forkers

mathblackbox's Issues

MathBlackBox Jupyter Notebook

Comparing with self-consistency?

evaluation 步骤是如何工作的

Application of This Method to Powerful Closed-Source Models (e.g., GPT-4 Turbo) and Its Effect?

Question about MathBlackBox

Pass@k or Pass@1?

ground truth knowledge

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent