Giter Site home page Giter Site logo

ares's People

Contributors

alexisdeschamps avatar dependabot[bot] avatar jonsaadfalcon avatar robbym-dev avatar tm17-abcgen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ares's Issues

RAGAS score calculation from annotations is unclear

I'm not sure how the RAGAS score is computed from annotations in RAG_Automatic_Evaluation/RAGAS_Scoring.py:

# Lines 68-72
sampled_y_labels = dataset.sample(n=300, random_state=42)
context_relevance_prediction = sum(dataset["Context_Relevance_Label"].tolist()) / len(sampled_y_labels)
answer_relevance_prediction = sum(dataset["Answer_Relevance_Label"].tolist()) / len(sampled_y_labels)
context_scores.append(context_relevance_prediction)answer_relevance_scores.append(answer_relevance_prediction)

While I'm not sure what this code is trying to compute, I ran it to sanity check, and I got nan outputs:
image

Any help understanding this issue and pointers to the relevant sections in the paper would be greatly appreciated. Thanks!

Clarification Needed on the Specificity of test_dataset

Hello,

I am currently working with the project and have a question regarding the test_dataset used within. Could you please clarify whether the test_dataset needs to be domain-specific, particularly tailored to the RAG domain, or if a generic labeled dataset is suitable for this purpose?

--labels <label columns>

What is this labels list how it will look like as I have created a list with string elements [query,....] but it is showing key error mentioned one example of such list in documentation

Documentation and code are so broken!

Hi,

I have tried to reproduce the paper, or more specifically, follow the step by step instructions and unfortunately, nothing works.

As for the things I've detected so far in the python script version:

1.- Current requirements.txt can't be installed as instructed by the README.md file. because of conflicting library version.
2.- Sample document_filepath.tsv file in example_files has 6 examples and the column "Documents".
3.- Synthetic generation example code fails as the number of documents sampled is less that the given number --documents_sampled 10000
4.- If you change the number of documents_sampled to 5, so it doesn't fail, it will fail later as the step to generate the negative alternative requires at least 100 samples

So with the given documents in the example_files folder, it's impossible to generate a synthetic dataset.

Following the new vercel documentation at https://ares-ai.vercel.app/synth_gen/ is an absolute hit and miss, because of the copy pasted regions. For example in this page https://ares-ai.vercel.app/synth_gen/

  • The document paths alternate between data and /data, output and /output making the sample code fail
  • Sample dataset name is not correct. In your repo you have nq_ratio_0.6_.tsv and nq_ratio_0.5.tsv, but documentation uses nq_ratio_0.5_.tsv
  • Both the nq_ratio_0.5_.tsv and the nq_ratio_0.6_.tsv have less than 10000 documents, so the example command fails.
  • In the model choice, this section is copied right from the training classifier section and offers incorrect information.

But to make things even worse, the Python code in the ares-ai library is different from the Python scripts so if you try to run the code using the example_files/document_filepath.tsv this will fail too!! In the original file, you only need to pass a "Document" column so that ARES would generate the synthetic dataset, but now you also require a Query, Answer columns. Otherwise you would get the following error:

Error: The DataFrame is missing the following required column(s): Query, Answer.

So it seems like the requirements for ARES are quite more complex than expected. In the README file appears the following information:

"The ARES training pipeline is three steps:โ€‹

Generate synthetic queries and answers from in-domain passages"

Then:

"A human preference validation set of annotated query, document, and answer triples for the evaluation criteria (e.g. context relevance, answer faithfulness, and/or answer relevance). There should be at least 50 examples but several hundred examples is ideal."

But to generate the synthetic dataset, it requires a query, document, and answer triples instead of a in-domain passages file as described.

There are tons of other inconsistencies, but given your code and documentation it's impossible to reproduce even the more basic examples.

strong negative generation

in the paper, there is a strong negative generation method to construct negative samples for LLM judges training, but in the repo, I can't find any code about this, is it actually not used to produce the final result in the paper?

Unable to import without setting OpenAI key

The readme mentions the following:

Optional: Initalize OpenAI or TogetherAI API key with the following command.

However I am not able to import ARES without setting the OpenAI key, this line

from ares import ARES

gives the following error:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Can I use ARES without the OpenAI key? The readme claims it can work with custom RAG models.

New README file instructions are incorrect

I am following along the instructions in the new README.md and they don't work as expected.

Note: I have installed ARES using the instructions at https://ares-ai.vercel.app/installation/ ,given that the Python version has not been bumped to any new release. The previous codebase was 0.2.3, current version in PyPi is still 0.2.3.

In the Quick Start 1 tutorial, this wget commands point to datasets that were deleted during the last update. So:

wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets/nq_few_shot_prompt_v1.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_labeled_output.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv

returns error 404 for all files:

--2024-04-23 00:39:10--  https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-23 00:39:10 ERROR 404: Not Found.

When executing the ues_idp block for the first time a ModuleNotFound is returned. vLLM package is missing, so it has to be installed manually.

In step 2, synthetic dataset generation, document_filepath is expected to be a list, a str is passed. synthetic_queries_filename parameter is incorrect. Correct name is synthetic_queries_filenames and it's of type list, not str.

File nq_few_shot_prompt_for_synthetic_query_generation.tsv under examples only has Query and Document as columns. Running the synthetic dataset generation code returns: KeyError: 'Context_Relevance_Label' as that column is missing from the file.

In Step 3, the route to training dataset should be data/output/synthetic_queries_1.tsv as in the previous code block. Data is missing at the beginning of the path.

Parameter 'training_dataset' for classifier_model is expected to be of type list, received str instead.
Parameter 'validation_set' for classifier_model is expected to be of type list, received str instead.
Parameter 'label_column' for classifier_model is expected to be of type list, received str instead.

There might be more errors once I am able to run the code, but I've not been able to generate the synthetic dataset using flan because of the incorrect few_shot_file

[Feature Request] Multilingual support

Hi, thank you for sharing the wonderful project. I'd like to evaluate my Korean RAG applications, there's no RAG evaluation framework supporting multilingual. Could you add this feature for non english developers? If you guide me how to add codes to support multilingual, I'll contribute to it.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.