Evaluation of RoLLMs

Official code used for evaluating Romanian LLMs as proposed in Masala et al. 2024. This repo is a fork of the popular FastChat) repo used for LLM-as-a-Judge evaluation. On top of the existing framework we add a suite of Romanian benchmarks:

RoMTBench - the manually translated version of MTBench in Romanian
RoCulturaBench - a novel professionaly designed and built dataset for evaluating Romanian cultural knowledge of LLMs

The questions and (human) answers for RoMTBench and RoCulturaBench can be found here and here respectively.

Models answers and judgments will be uploaded soon.

Evaluation

For evaluating LLMs using (Ro)MTBench and RoCulturaX please see fastchat/llm_judge.

Citation

@misc{masala2024vorbecstiromanecsterecipetrain,
      title={"Vorbe\c{s}ti Rom\^ane\c{s}te?" A Recipe to Train Powerful Romanian LLMs with English Instructions}, 
      author={Mihai Masala and Denis C. Ilie-Ablachim and Alexandru Dima and Dragos Corlatescu and Miruna Zavelca and Ovio Olaru and Simina Terian and Andrei Terian and Marius Leordeanu and Horia Velicu and Marius Popescu and Mihai Dascalu and Traian Rebedea},
      year={2024},
      eprint={2406.18266},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.18266}, 
}

Acknowledgement

This repo benefits from FastChat. We thank them for their wonderful work.

openllm-ro / fastchat Goto Github PK

fastchat's Introduction

Evaluation of RoLLMs

Evaluation

Citation

Acknowledgement

fastchat's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent