Topic: llm-evaluation-framework Goto Github

Some thing interesting about llm-evaluation-framework

👇 Here are 12 public repositories matching this topic...

aws-samples / fm-leaderboarder

llm-evaluation-framework,FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

Organization: aws-samples

llm-benchmarking llm-evaluation llm-evaluation-framework

bowen-upenn / llm_token_bias

llm-evaluation-framework,This is the official implementation of the paper "A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners" in PyTorch.

User: bowen-upenn

Home Page: https://arxiv.org/abs/2406.11050

large-language-models llm llm-reasoning reasoning token-bias llm-evaluation llm-evaluation-framework

confident-ai / deepeval

llm-evaluation-framework,The LLM Evaluation Framework

Organization: confident-ai

Home Page: https://docs.confident-ai.com/

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

jaaack-wang / multi-problem-eval-llm

llm-evaluation-framework,Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities

User: jaaack-wang

Home Page: https://arxiv.org/pdf/2406.10786

explainable-ai large-language-models llm llm-eval llm-evaluation-framework llm-prompting

nagababumo / building-and-evaluating-advanced-rag

llm-evaluation-framework,

User: nagababumo

llamaindex llm-evaluation llm-evaluation-framework python rag retrieval-augmented-generation

networks-learning / prediction-powered-ranking

llm-evaluation-framework,Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.

Organization: networks-learning

llm-eval llm-evaluation llm-evaluation-framework ranking-algorithm prediction-powered-inference rank-sets

parea-ai / parea-sdk-py

llm-evaluation-framework,Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

Organization: parea-ai

Home Page: https://docs.parea.ai/sdk/python

llm llm-evaluation llm-tools llmops llms-benchmarking llm-eval llm-evaluation-framework llm-evaluation-toolkit prompt-engineering generative-ai

parea-ai / parea-sdk-ts

llm-evaluation-framework,TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

Organization: parea-ai

Home Page: https://docs.parea.ai/sdk/typescript

llm llm-evaluation llm-evaluation-framework llm-evaluation-toolkit llm-tools llms llms-benchmarking llm-eval prompt-engineering

llm-evaluation-framework,Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Organization: promptfoo

Home Page: https://www.promptfoo.dev/

llm prompt-engineering prompts llmops prompt-testing testing rag evaluation evaluation-framework llm-eval

psycoy / mixeval

llm-evaluation-framework,The official evaluation suite and dynamic data release for MixEval.

User: psycoy

Home Page: https://mixeval.github.io/

benchmark benchmark-mixture benchmarking-framework benchmarking-suite evaluation evaluation-framework foundation-models large-language-model large-language-models large-multimodal-models llm-evaluation llm-evaluation-framework llm-inference mixeval

stair-lab / villm-eval

llm-evaluation-framework,Evaluation of Language Models in Non-English Languages

Organization: stair-lab

Home Page: https://villm-eval.readthedocs.io/en/latest/

llm-evaluation-framework llms-benchmarking

zhuohaoyu / kieval

llm-evaluation-framework,[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

User: zhuohaoyu

acl2024 explainable-ai llm llm-evaluation llm-evaluation-framework llm-evaluation-metrics llm-evaluation-toolkit machine-learning

Topic: llm-evaluation-framework Goto Github

👇 Here are 12 public repositories matching this topic...

aws-samples / fm-leaderboarder

bowen-upenn / llm_token_bias

confident-ai / deepeval

jaaack-wang / multi-problem-eval-llm

nagababumo / building-and-evaluating-advanced-rag

networks-learning / prediction-powered-ranking

parea-ai / parea-sdk-py

parea-ai / parea-sdk-ts

promptfoo / promptfoo

psycoy / mixeval

stair-lab / villm-eval

zhuohaoyu / kieval

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent