stanford-futuredata / ares Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://ares-ai.vercel.app/
License: Apache License 2.0
Home Page: https://ares-ai.vercel.app/
License: Apache License 2.0
I'm not sure how the RAGAS score is computed from annotations in RAG_Automatic_Evaluation/RAGAS_Scoring.py:
# Lines 68-72
sampled_y_labels = dataset.sample(n=300, random_state=42)
context_relevance_prediction = sum(dataset["Context_Relevance_Label"].tolist()) / len(sampled_y_labels)
answer_relevance_prediction = sum(dataset["Answer_Relevance_Label"].tolist()) / len(sampled_y_labels)
context_scores.append(context_relevance_prediction)answer_relevance_scores.append(answer_relevance_prediction)
While I'm not sure what this code is trying to compute, I ran it to sanity check, and I got nan outputs:
Any help understanding this issue and pointers to the relevant sections in the paper would be greatly appreciated. Thanks!
Hello,
I am currently working with the project and have a question regarding the test_dataset used within. Could you please clarify whether the test_dataset needs to be domain-specific, particularly tailored to the RAG domain, or if a generic labeled dataset is suitable for this purpose?
What is this labels list how it will look like as I have created a list with string elements [query,....] but it is showing key error mentioned one example of such list in documentation
Hi,
I have tried to reproduce the paper, or more specifically, follow the step by step instructions and unfortunately, nothing works.
As for the things I've detected so far in the python script version:
1.- Current requirements.txt can't be installed as instructed by the README.md file. because of conflicting library version.
2.- Sample document_filepath.tsv file in example_files has 6 examples and the column "Documents".
3.- Synthetic generation example code fails as the number of documents sampled is less that the given number --documents_sampled 10000
4.- If you change the number of documents_sampled to 5, so it doesn't fail, it will fail later as the step to generate the negative alternative requires at least 100 samples
So with the given documents in the example_files folder, it's impossible to generate a synthetic dataset.
Following the new vercel documentation at https://ares-ai.vercel.app/synth_gen/ is an absolute hit and miss, because of the copy pasted regions. For example in this page https://ares-ai.vercel.app/synth_gen/
But to make things even worse, the Python code in the ares-ai library is different from the Python scripts so if you try to run the code using the example_files/document_filepath.tsv this will fail too!! In the original file, you only need to pass a "Document" column so that ARES would generate the synthetic dataset, but now you also require a Query, Answer columns. Otherwise you would get the following error:
Error: The DataFrame is missing the following required column(s): Query, Answer.
So it seems like the requirements for ARES are quite more complex than expected. In the README file appears the following information:
"The ARES training pipeline is three steps:โ
Generate synthetic queries and answers from in-domain passages"
Then:
"A human preference validation set of annotated query, document, and answer triples for the evaluation criteria (e.g. context relevance, answer faithfulness, and/or answer relevance). There should be at least 50 examples but several hundred examples is ideal."
But to generate the synthetic dataset, it requires a query, document, and answer triples instead of a in-domain passages file as described.
There are tons of other inconsistencies, but given your code and documentation it's impossible to reproduce even the more basic examples.
After you run the training classifier code available at https://ares-ai.vercel.app/training_classifier/ the process fails returning the following error:
RuntimeError: Parent directory checkpoints/microsoft-mdeberta-v3-base does not exist.
I would expect the process to check if the folder exists and create it otherwise before writing the checkpoint to disk.
in the paper, there is a strong negative generation method to construct negative samples for LLM judges training, but in the repo, I can't find any code about this, is it actually not used to produce the final result in the paper?
The readme mentions the following:
Optional: Initalize OpenAI or TogetherAI API key with the following command.
However I am not able to import ARES without setting the OpenAI key, this line
from ares import ARES
gives the following error:
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Can I use ARES without the OpenAI key? The readme claims it can work with custom RAG models.
I am following along the instructions in the new README.md and they don't work as expected.
Note: I have installed ARES using the instructions at https://ares-ai.vercel.app/installation/ ,given that the Python version has not been bumped to any new release. The previous codebase was 0.2.3, current version in PyPi is still 0.2.3.
In the Quick Start 1 tutorial, this wget commands point to datasets that were deleted during the last update. So:
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets/nq_few_shot_prompt_v1.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_labeled_output.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv
returns error 404 for all files:
--2024-04-23 00:39:10-- https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-23 00:39:10 ERROR 404: Not Found.
When executing the ues_idp block for the first time a ModuleNotFound is returned. vLLM package is missing, so it has to be installed manually.
In step 2, synthetic dataset generation, document_filepath is expected to be a list, a str is passed. synthetic_queries_filename parameter is incorrect. Correct name is synthetic_queries_filenames and it's of type list, not str.
File nq_few_shot_prompt_for_synthetic_query_generation.tsv under examples only has Query and Document as columns. Running the synthetic dataset generation code returns: KeyError: 'Context_Relevance_Label' as that column is missing from the file.
In Step 3, the route to training dataset should be data/output/synthetic_queries_1.tsv as in the previous code block. Data is missing at the beginning of the path.
Parameter 'training_dataset' for classifier_model is expected to be of type list, received str instead.
Parameter 'validation_set' for classifier_model is expected to be of type list, received str instead.
Parameter 'label_column' for classifier_model is expected to be of type list, received str instead.
There might be more errors once I am able to run the code, but I've not been able to generate the synthetic dataset using flan because of the incorrect few_shot_file
Hi, thank you for sharing the wonderful project. I'd like to evaluate my Korean RAG applications, there's no RAG evaluation framework supporting multilingual. Could you add this feature for non english developers? If you guide me how to add codes to support multilingual, I'll contribute to it.
Thanks.
Hi, it would be interesting to see how the lower cost Claude models compare vs. GPT-4 for labeling!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.