stanford-futuredata / ares Goto Github PK

View Code? Open in Web Editor NEW

316.0 316.0 34.0 258.46 MB

Home Page: https://ares-ai.vercel.app/

License: Apache License 2.0

Python 100.00%

ares's People

Contributors

Stargazers

Watchers

ares's Issues

RAGAS score calculation from annotations is unclear

I'm not sure how the RAGAS score is computed from annotations in RAG_Automatic_Evaluation/RAGAS_Scoring.py:

# Lines 68-72
sampled_y_labels = dataset.sample(n=300, random_state=42)
context_relevance_prediction = sum(dataset["Context_Relevance_Label"].tolist()) / len(sampled_y_labels)
answer_relevance_prediction = sum(dataset["Answer_Relevance_Label"].tolist()) / len(sampled_y_labels)
context_scores.append(context_relevance_prediction)answer_relevance_scores.append(answer_relevance_prediction)

While I'm not sure what this code is trying to compute, I ran it to sanity check, and I got nan outputs:

Any help understanding this issue and pointers to the relevant sections in the paper would be greatly appreciated. Thanks!

Clarification Needed on the Specificity of test_dataset

Hello,

I am currently working with the project and have a question regarding the test_dataset used within. Could you please clarify whether the test_dataset needs to be domain-specific, particularly tailored to the RAG domain, or if a generic labeled dataset is suitable for this purpose?

--labels <label columns>

What is this labels list how it will look like as I have created a list with string elements [query,....] but it is showing key error mentioned one example of such list in documentation

Documentation and code are so broken!

Hi,

I have tried to reproduce the paper, or more specifically, follow the step by step instructions and unfortunately, nothing works.

As for the things I've detected so far in the python script version:

1.- Current requirements.txt can't be installed as instructed by the README.md file. because of conflicting library version.
2.- Sample document_filepath.tsv file in example_files has 6 examples and the column "Documents".
3.- Synthetic generation example code fails as the number of documents sampled is less that the given number --documents_sampled 10000
4.- If you change the number of documents_sampled to 5, so it doesn't fail, it will fail later as the step to generate the negative alternative requires at least 100 samples

So with the given documents in the example_files folder, it's impossible to generate a synthetic dataset.

Following the new vercel documentation at https://ares-ai.vercel.app/synth_gen/ is an absolute hit and miss, because of the copy pasted regions. For example in this page https://ares-ai.vercel.app/synth_gen/

The document paths alternate between data and /data, output and /output making the sample code fail
Sample dataset name is not correct. In your repo you have nq_ratio_0.6_.tsv and nq_ratio_0.5.tsv, but documentation uses nq_ratio_0.5_.tsv
Both the nq_ratio_0.5_.tsv and the nq_ratio_0.6_.tsv have less than 10000 documents, so the example command fails.
In the model choice, this section is copied right from the training classifier section and offers incorrect information.

But to make things even worse, the Python code in the ares-ai library is different from the Python scripts so if you try to run the code using the example_files/document_filepath.tsv this will fail too!! In the original file, you only need to pass a "Document" column so that ARES would generate the synthetic dataset, but now you also require a Query, Answer columns. Otherwise you would get the following error:

Error: The DataFrame is missing the following required column(s): Query, Answer.

So it seems like the requirements for ARES are quite more complex than expected. In the README file appears the following information:

"The ARES training pipeline is three steps:

Generate synthetic queries and answers from in-domain passages"

Then:

"A human preference validation set of annotated query, document, and answer triples for the evaluation criteria (e.g. context relevance, answer faithfulness, and/or answer relevance). There should be at least 50 examples but several hundred examples is ideal."

But to generate the synthetic dataset, it requires a query, document, and answer triples instead of a in-domain passages file as described.

There are tons of other inconsistencies, but given your code and documentation it's impossible to reproduce even the more basic examples.

Checkpoint folder is not created automatically after training classifiers

After you run the training classifier code available at https://ares-ai.vercel.app/training_classifier/ the process fails returning the following error:

RuntimeError: Parent directory checkpoints/microsoft-mdeberta-v3-base does not exist.

I would expect the process to check if the folder exists and create it otherwise before writing the checkpoint to disk.

strong negative generation

in the paper, there is a strong negative generation method to construct negative samples for LLM judges training, but in the repo, I can't find any code about this, is it actually not used to produce the final result in the paper?

Switch openai embeddings to local multilingual embeddings

https://github.com/elsatch/local-ARES/blob/5bbfcb4d176a29f9d33fa18188b8fac70012dbc2/LLM-as-a-Judge_Adaptation/Filter_Synthetic_Queries.py#L20-L29

Unable to import without setting OpenAI key

The readme mentions the following:

Optional: Initalize OpenAI or TogetherAI API key with the following command.

However I am not able to import ARES without setting the OpenAI key, this line

from ares import ARES

gives the following error:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Can I use ARES without the OpenAI key? The readme claims it can work with custom RAG models.

New README file instructions are incorrect

I am following along the instructions in the new README.md and they don't work as expected.

Note: I have installed ARES using the instructions at https://ares-ai.vercel.app/installation/ ,given that the Python version has not been bumped to any new release. The previous codebase was 0.2.3, current version in PyPi is still 0.2.3.

In the Quick Start 1 tutorial, this wget commands point to datasets that were deleted during the last update. So:

wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets/nq_few_shot_prompt_v1.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_labeled_output.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv

returns error 404 for all files:

--2024-04-23 00:39:10--  https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-23 00:39:10 ERROR 404: Not Found.

When executing the ues_idp block for the first time a ModuleNotFound is returned. vLLM package is missing, so it has to be installed manually.

In step 2, synthetic dataset generation, document_filepath is expected to be a list, a str is passed. synthetic_queries_filename parameter is incorrect. Correct name is synthetic_queries_filenames and it's of type list, not str.

File nq_few_shot_prompt_for_synthetic_query_generation.tsv under examples only has Query and Document as columns. Running the synthetic dataset generation code returns: KeyError: 'Context_Relevance_Label' as that column is missing from the file.

In Step 3, the route to training dataset should be data/output/synthetic_queries_1.tsv as in the previous code block. Data is missing at the beginning of the path.

Parameter 'training_dataset' for classifier_model is expected to be of type list, received str instead.
Parameter 'validation_set' for classifier_model is expected to be of type list, received str instead.
Parameter 'label_column' for classifier_model is expected to be of type list, received str instead.

There might be more errors once I am able to run the code, but I've not been able to generate the synthetic dataset using flan because of the incorrect few_shot_file

[Feature Request] Multilingual support

Hi, thank you for sharing the wonderful project. I'd like to evaluate my Korean RAG applications, there's no RAG evaluation framework supporting multilingual. Could you add this feature for non english developers? If you guide me how to add codes to support multilingual, I'll contribute to it.

Thanks.

[Feature Request] AWS Bedrock / Anthropic Claude

Hi, it would be interesting to see how the lower cost Claude models compare vs. GPT-4 for labeling!

stanford-futuredata / ares Goto Github PK

ares's People

Contributors

Stargazers

Watchers

Forkers

ares's Issues

RAGAS score calculation from annotations is unclear

Clarification Needed on the Specificity of test_dataset

--labels <label columns>

Documentation and code are so broken!

Checkpoint folder is not created automatically after training classifiers

strong negative generation

Switch openai embeddings to local multilingual embeddings

Unable to import without setting OpenAI key

New README file instructions are incorrect

[Feature Request] Multilingual support

[Feature Request] AWS Bedrock / Anthropic Claude

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent