webqna / webqa Goto Github PK

View Code? Open in Web Editor NEW

40.0 1.0 8.0 3.07 MB

License: Creative Commons Zero v1.0 Universal

Shell 100.00%

webqa's Issues

Dataset download failed

The 51 blocks are downloaded, but errors always occur during the decompression process.

This site can’t be reached when click on "Download Data -> Main Data"

Hi,

I cannot connect to the server. I also try to use wget from PSC but time out. Could you help to check server status?

Json File required but eval returns Tsv

Hi, I really like your work and I try to evaluate some benchmark on WebQA.
From my understanding, I need to run the vlp/eval.py file which gives me predictions in the tsv format.
However, uploading to the server requires a json file, thus I was wondering if I was missing a step since I do not know whether that is done for me or I have to manually create a json file. Thank you!

is there any template form for the short explanation paper?

To be included in the Neurips2021 Competition write-up, authors must provide a short explanation (~1 page) of what they did and what insights they discovered by Oct 29, 2021.

as stated in webqna homepage, is there any template form for the short explanation paper?

The API_URL server refusing the connection for local evaluation

This site can’t be reached and 128.2.205.68 refused to connect. How can we get the API_URL server running locally, As an admin could you troubleshoot the issue of the target machine.
API_URL = "http://128.2.205.68:5000/metric_post"

Different snippet id but Same fact and url for text document

Hi, I'm a student looking at a dataset.

I took a look at the dataset and realized that there was data in the text document that had a different snippet id but the exact same fact and wiki url.

For example, in WebQA_train_val.json

{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bbd0e20dba11ecb1e81171463288e9_7"
}
{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bbd13c0dba11ecb1e81171463288e9_8"
}
{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bcc8440dba11ecb1e81171463288e9_14"
}

All three examples have different snippet_ids,
but the same fact: "The theme song of the 2008 Summer Olympics was "You and Me," which was composed by Chen Qigang, the musical director of the opening ceremony.",
and the same url: "https://en.wikipedia.org/wiki/2008_Summer_Olympics".

My understanding is that different text documents are given different snippet_ids.
If there is something I am missing or misunderstanding, I would appreciate it if you could let me know.

I haven't figured out if there are more examples like this, but I'd like to correct my misconceptions first.

Thank you for your help.

The data download from google drive is consistent with demo

The data downloaded from Google Cloud Drive is inconsistent with the data displayed by Have_a_Look_WebQA.ipynb in the demo folder, and the content of the Keywords_A is missing。Is there any way to download the data displayed in the demo version?

Unexpected end of archive error when unzipping with 7z

Error summary

I encountered this issue of "Unexpected end of archive error" when unzipping with 7z. I cloned this repo and followed the instructions. This error happened after running the download_imgs.sh file and confirming all 51 chunks and the imgs.lineidx file is present in the directory. All our attempts have failed on three different platforms including windows, AWS ubuntu, and GCP ubuntu. The error message looks like:

Failed unzipping details

Attempt 1: windows local

Platform: Windows 11 Home
Disk: 300+ GB free storage
7z Version: V22.01 (x64)

Attempt 2: linux local

Platform: Windows WSL Ubuntu
Disk: 300+ GB free storage
7z version: V16.02

Attempt 3: AWS

System specs:

Deep Learning AMI GPU CUDA 11.1.1 (Ubuntu 18.04) 20230405
Image ID: ami-01b901ded27bef504

Attempt 4: GCP

System specs:

Lastly

I put the the attempt details in hope that it can be recreated. I wonder if this issue has to do with the 7z version or the downloading script being maybe not up to date.

Dataset json file doesn't have "Keywords_A" for any question.

Webqa_train_val.json
For any query, Keywords_A is missing.
But I see this key as part of your baseline evaluation and also Take_a_look_WebQA.ipynb files,
Kindly help with this, please share a data version that has this key, so that we can evaluate our models better.