Giter Site home page Giter Site logo

webqa's Issues

Json File required but eval returns Tsv

Hi, I really like your work and I try to evaluate some benchmark on WebQA.
From my understanding, I need to run the vlp/eval.py file which gives me predictions in the tsv format.
However, uploading to the server requires a json file, thus I was wondering if I was missing a step since I do not know whether that is done for me or I have to manually create a json file. Thank you!

is there any template form for the short explanation paper?

To be included in the Neurips2021 Competition write-up, authors must provide a short explanation (~1 page) of what they did and what insights they discovered by Oct 29, 2021.

as stated in webqna homepage, is there any template form for the short explanation paper?

Different snippet id but Same fact and url for text document

Hi, I'm a student looking at a dataset.

I took a look at the dataset and realized that there was data in the text document that had a different snippet id but the exact same fact and wiki url.

For example, in WebQA_train_val.json

{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bbd0e20dba11ecb1e81171463288e9_7"
}
{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bbd13c0dba11ecb1e81171463288e9_8"
}
{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bcc8440dba11ecb1e81171463288e9_14"
}

All three examples have different snippet_ids,
but the same fact: "The theme song of the 2008 Summer Olympics was "You and Me," which was composed by Chen Qigang, the musical director of the opening ceremony.",
and the same url: "https://en.wikipedia.org/wiki/2008_Summer_Olympics".

My understanding is that different text documents are given different snippet_ids.
If there is something I am missing or misunderstanding, I would appreciate it if you could let me know.

I haven't figured out if there are more examples like this, but I'd like to correct my misconceptions first.

Thank you for your help.

The data download from google drive is consistent with demo

The data downloaded from Google Cloud Drive is inconsistent with the data displayed by Have_a_Look_WebQA.ipynb in the demo folder, and the content of the Keywords_A is missing。Is there any way to download the data displayed in the demo version?

Unexpected end of archive error when unzipping with 7z

Error summary

I encountered this issue of "Unexpected end of archive error" when unzipping with 7z. I cloned this repo and followed the instructions. This error happened after running the download_imgs.sh file and confirming all 51 chunks and the imgs.lineidx file is present in the directory. All our attempts have failed on three different platforms including windows, AWS ubuntu, and GCP ubuntu. The error message looks like:

image

Failed unzipping details

Attempt 1: windows local

  • Platform: Windows 11 Home
  • Disk: 300+ GB free storage
  • 7z Version: V22.01 (x64)
image

Attempt 2: linux local

  • Platform: Windows WSL Ubuntu
  • Disk: 300+ GB free storage
  • 7z version: V16.02
image

Attempt 3: AWS

System specs:

  • Deep Learning AMI GPU CUDA 11.1.1 (Ubuntu 18.04) 20230405
  • Image ID: ami-01b901ded27bef504
    image
    image
    image

Attempt 4: GCP

System specs:
image
image
image

Lastly

I put the the attempt details in hope that it can be recreated. I wonder if this issue has to do with the 7z version or the downloading script being maybe not up to date.

Dataset json file doesn't have "Keywords_A" for any question.

Webqa_train_val.json
For any query, Keywords_A is missing.
But I see this key as part of your baseline evaluation and also Take_a_look_WebQA.ipynb files,
Kindly help with this, please share a data version that has this key, so that we can evaluate our models better.

Not all examples consist of positive images?

Hi, I went through the WebQA_train_val.json and found out of 41739 examples only 21465 has positive image ids? So is this normal or I did some mistake during the preprocessing?

Metrics

In your paper, you have tested on Img-based and Txt-based datasets. However, in the leaderboard, there are not so clear. Does the result come from an average?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.