Giter Site home page Giter Site logo

mathblackbox's Introduction

Zhang Di - ShangHai AI Lab

Hello this is Zhang Di, An AI devloper at ShangHai AI Lab, and PhD students of Fudan Univ.

Former Full-time ML developer of Alibaba .Inc

Former M.Eng of USTC Robotics Lab and Internship at Ant Group, MIT Han Lab.

CV

mathblackbox's People

Contributors

trotsky1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mathblackbox's Issues

MathBlackBox Jupyter Notebook

Good afternoon,

I have refactored the code for MathBlackBox to be a Jupyter notebook. It requires you put in your own API key, by default it uses DeepSeek V2 Coder, although I cannot guarantee functionality because I ran out of API credits. Any OpenAI compatible API will work.
LLaMA_3B 2.ipynb.zip
I hope someone tries it and sees if it works properly.

Comparing with self-consistency?

The method seems very powerful but expensive. I'm wondering how is it compared with self-consistency under similar computational budgets?

Pass@k or Pass@1?

After seeing this work, I read the paper and found that the effect is very good. When reading the code, I found that this line of code seems to cause the indicator to degenerate from pass@1 to pass@k. Is my point of view correct?

if check(ground_truth,answer) and 'testtime' in DATA_NAME:

I am not saying that pass@k is not a good indicator. The default evaluation indicator of gsm8k is usually equivalent to pass@1, but https://arxiv.org/pdf/2205.14318 also uses pass@k, and they are far from reaching this score. But if we can clearly mark the relationship between the value of k and the corresponding score, then we can better understand the paper.

Furthermore, it may be difficult to get the ground truth in reality, and pass@1 is actually more in line with reality. Do you know if there is a better way to evaluate pass@1?

If I understand it incorrectly, please kindly correct me.

ground truth knowledge

Hi, the paper was a very interesting read and this technique seems to have a lot of potential. However, looking at the code - if I understand it correctly - I have noticed that a significant portion of it is dependent on the knowledge of the correct answer - the 'ground truth'. If this knowledge is not available to the program until the final result validation, how does the program perform then? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.