Giter Site home page Giter Site logo

Comments (4)

geekinglcq avatar geekinglcq commented on August 26, 2024 1

image
https://huggingface.co/BAAI/AquilaChat2-34B 上的表格qwen的mmlu结果直接变0了?

from aquila2.

pkulhr1998 avatar pkulhr1998 commented on August 26, 2024 1

image https://huggingface.co/BAAI/AquilaChat2-34B 上的表格qwen的mmlu结果直接变0了?

MMLU这种四选一乱选都有25分的Aquila团队居然能搞出0分也是离了大谱,qwen官方和opencompass都有开源能测的代码,你们就算自己测不明白调别人的代码或者report现成的分数也行啊。

Chatglm2 BUSTM也给0分,更搞笑的是Qwen7B RAFT分数是NaN。还好意思天天各种新闻稿PR吹自己第一,这骗骗外行人应付下智源的老板可以,就别在这让内行笑话了。

真正评测还是得看opencompass,连抄作业都不会。

from aquila2.

xuanricheng avatar xuanricheng commented on August 26, 2024

image 抛开GSM8K/MMLU这些比Qwen官方和Opencompass第三方测的分低不说 ARC-e这一项过于离谱,Qwen-14B这项分数应该在80以上,这里直接就砍到40+了,比Qwen-7B还低不少。WMT也存在同样Qwen-14B不如Qwen-7B问题 望官方能仔细check一下结果,至少这种一眼离谱的异常点可以double check一下吧?

非常感谢您的反馈和对我们榜单准确性的关注。我们非常重视社区的反馈,并致力于提供准确和公正的信息。
经过您的提醒,我们连夜重新检查了所有评测数据,并发现确实存在小部分错误。您指出的Qwen-14B在ARC-e项目上的评分47.3其实是ARC-c项目的评分,这是在贴出榜单时的一个疏忽。同时,我们也发现WMT项目上的评分是由于简单的数据处理错误导致的。我们非常抱歉这些错误造成的混淆和不便,感谢您的细心和及时的指正。
我们已经修正了这些错误,并更新了我们的README和相关文档中的榜单结果。您现在可以在我们的文章和Github的 readme上查看修正后的评测结果。
再次感谢您的宝贵意见,您的反馈对我们非常重要,同时也期待Qwen 团队和我们智源团队继续保持沟通。

from aquila2.

xuanricheng avatar xuanricheng commented on August 26, 2024

image https://huggingface.co/BAAI/AquilaChat2-34B 上的表格qwen的mmlu结果直接变0了?

请参考这个 issue:#59

from aquila2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.