Topic: llm-evaluation Goto Github
Some thing interesting about llm-evaluation
Some thing interesting about llm-evaluation
llm-evaluation,Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
User: adamcoscia
Home Page: https://arxiv.org/abs/2403.04760
llm-evaluation,The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
Organization: agenta-ai
Home Page: http://www.agenta.ai
llm-evaluation,Template for an AI application that extracts the job information from a job description using openAI functions and langchain
Organization: agenta-ai
Home Page: https://agenta.ai
llm-evaluation,Evaluating LLMs with CommonGen-Lite
Organization: allenai
Home Page: https://inklab.usc.edu/CommonGen/
llm-evaluation,A collection of hand on notebook for LLMs practitioner
User: antoniogr7
llm-evaluation,FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
User: armingh2000
llm-evaluation,Python SDK for running evaluations on LLM generated responses
Organization: athina-ai
Home Page: https://docs.athina.ai
llm-evaluation,FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
Organization: aws-samples
llm-evaluation,It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
User: azminewasi
llm-evaluation,Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Organization: babelscape
Home Page: https://arxiv.org/abs/2404.08676
llm-evaluation,Cookbooks and tutorials on Literal AI
Organization: chainlit
Home Page: https://cloud.getliteral.ai/
llm-evaluation,The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
User: chanliang
Home Page: https://arxiv.org/abs/2310.07289
llm-evaluation,The LLM Evaluation Framework
Organization: confident-ai
Home Page: https://docs.confident-ai.com/
llm-evaluation,Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.
User: deshwalmahesh
Home Page: https://arxiv.org/abs/2405.08029
llm-evaluation,Visualize LLM Evaluations for OpenAI Assistants
User: euskoog
Home Page: https://openai-assistants-evals-dash.vercel.app/
llm-evaluation,Link your OpenAI Assistants to a custom store + Evaluate Assistant responses
User: euskoog
llm-evaluation,Large Model Evaluation Experiments
Organization: evaluation-tools
llm-evaluation,Exploring the depths of LLMs 🚀
User: giacomomeloni
llm-evaluation,🐢 Open-Source Evaluation & Testing for LLMs and ML models
Organization: giskard-ai
Home Page: https://docs.giskard.ai
llm-evaluation,LLMs Evaluation
User: gurpreetkaurjethra
llm-evaluation,Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
Organization: hkust-nlp
Home Page: https://hkust-nlp.github.io/dart-math/
llm-evaluation,DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Organization: intuit-ai-research
llm-evaluation,[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
User: ivarfresh
llm-evaluation,A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
User: j0st
Home Page: https://huggingface.co/spaces/jost/PoliticalLLM
llm-evaluation,A prompt collection for testing and evaluation of LLMs.
User: kwinkunks
llm-evaluation,🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Organization: langfuse
Home Page: https://langfuse.com/docs
llm-evaluation,A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Organization: llm-evaluation-s-always-fatiguing
llm-evaluation,LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
User: loganrjmurphy
Home Page: http://arxiv.org/abs/2405.17216
llm-evaluation,Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers.
Organization: microsoft
Home Page: https://prompty.ai
llm-evaluation,Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
Organization: minnesotanlp
Home Page: https://minnesotanlp.github.io/cobbler-project-page/
llm-evaluation,Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.
Organization: networks-learning
llm-evaluation,Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
User: onejune2018
llm-evaluation,Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Organization: parea-ai
Home Page: https://docs.parea.ai/sdk/python
llm-evaluation,TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Organization: parea-ai
Home Page: https://docs.parea.ai/sdk/typescript
llm-evaluation,A list of LLMs Tools & Projects
User: petroivaniuk
llm-evaluation,Find better generation parameters for your LLM
User: praful932
Home Page: https://llmsearch.netlify.app
llm-evaluation,Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Organization: promptfoo
Home Page: https://www.promptfoo.dev/
llm-evaluation,The official evaluation suite and dynamic data release for MixEval.
User: psycoy
Home Page: https://mixeval.github.io/
llm-evaluation,Framework for LLM evaluation, guardrails and security
Organization: raga-ai-hub
Home Page: https://www.raga.ai/llms
llm-evaluation,A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Organization: re-align
Home Page: https://allenai.github.io/re-align/
llm-evaluation,Open-Source Evaluation for GenAI Application Pipelines
Organization: relari-ai
Home Page: https://docs.relari.ai/
llm-evaluation,This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
User: rochitasundar
Home Page: https://www.coursera.org/account/accomplishments/certificate/8JAYVEUAQF56
llm-evaluation,Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
Organization: rungalileo
Home Page: https://www.rungalileo.io/hallucinationindex
llm-evaluation,Awesome papers involving LLMs in Social Science.
Organization: value4ai
llm-evaluation,EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
User: vidhyavarshanyjs
Home Page: https://ensemblex.streamlit.app
llm-evaluation,Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
Organization: vila-lab
Home Page: https://huggingface.co/spaces/Open-Style/OSQ-Leaderboard
llm-evaluation,Superpipe - optimized LLM pipelines for structured data
Organization: villagecomputing
Home Page: https://superpipe.ai
llm-evaluation,[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
Organization: vita-group
Home Page: https://arxiv.org/abs/2310.01382
llm-evaluation,Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
Organization: yandex-research
Home Page: https://arxiv.org/abs/2401.06766
llm-evaluation,[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
User: zhuohaoyu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.