Giter Site home page Giter Site logo

edgqa's Introduction

EDGQA

Codes for ISWC 2021 research track paper "EDG-based Question Decomposition for Complex Question Answering over Knowledge Bases".

Citation

@inproceedings{Hu2021edg,
  author    = {Xixin Hu and
               Yiheng Shu and
               Xiang Huang and
               Yuzhong Qu},
  editor    = {Andreas Hotho and
               Eva Blomqvist and
               Stefan Dietze and
               Achille Fokoue and
               Ying Ding and
               Payam M. Barnaghi and
               Armin Haller and
               Mauro Dragoni and
               Harith Alani},
  title     = {EDG-Based Question Decomposition for Complex Question Answering over
               Knowledge Bases},
  booktitle = {The Semantic Web - {ISWC} 2021 - 20th International Semantic Web Conference,
               {ISWC} 2021, Virtual Event, October 24-28, 2021, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {12922},
  pages     = {128--145},
  publisher = {Springer},
  year      = {2021},
  url       = {https://doi.org/10.1007/978-3-030-88361-4\_8},
  doi       = {10.1007/978-3-030-88361-4\_8},
}

What is EDGQA?

EDGQA is a QA system over knowledge bases based on Entity-Description Graphs (EDGs). Currently EDGQA has been implemented for DBpedia, tested on LC-QuAD 1.0 and QALD-9.

image-20210707111759566

The above figure shows SPARQL (a. and c.) and EDG (b. and d.) on two exemplar natural language questions. The dashed line connects a description to an intermediate entity. The types of nodes and edges of EDG is defined as follows.

image-20210707111936154

By generating such EDGs, questions are represented as a combination of entities and their description, providing a structure for understanding and answering complex questions. More information are detailed in our paper.

1. Requirements

  • JDK 1.8.0
  • Maven
  • python 3.6

Knowledge base dumps and linking systems are also needed.

1.1 Knowledge Base

In EDGQA, DBpedia 1604 (for lcquad) and DBpedia 1610 (for qald-9) is stored in Virtuoso.

You can deploy the dbpedia locally or use the online endpoint (can be incompatible to the datasets). Then fill the server address and port in src/main/java/cn/edu/nju/ws/edgqa/utils/kbutil/KBUtil.java.

1.2 Linking tools

Earl, Falcon, Dexter are used in EDGQA.

See directory linking_tools, and follow the instructions to set up the three linking systems. Then fill in the server address and port in src/main/java/cn/edu/nju/ws/edgqa/utils/linking/LinkingTool.java.

For more information:

1.3 Semantic matching models

EDGQA employs bert-based classifier as semantic matching models for relation detection and query reranking.

See directory models to deploy the models correctly.

2. Run QA

2.1 Run EDGQA

Program arguments are defined in src/main/java/cn/edu/nju/ws/edgqa/main/QAArgs.java.

Running settings for Intellij IDEA 2019.3 above versions are stored in EDGQA/.run.

Run src/main/java/cn/edu/nju/ws/edgqa/main/EDGQA.java by following CLI arguments:

-d --dataset: 'lc-quad', 'qald-9'
-tr --train: 'true' for training set, 'false' for test set
-r --run: 'autotest', 'single', or 'serial_number'
-uc --use_cache: 'true' for using linking cache, 'false' otherwise
-cc --create_cache: 'true' for creating linking cache, 'false' otherwise
-gll --global_linking: 'true' for using global linking, 'false' otherwise
-lll --local_linking: 'true' for using local linking, 'false' otherwise
-qd --question_decomposition: 'true' for using EDG to decompose the question
-rr --reranking: 'true' for re-ranking by EDG block, 'false' otherwise

Because the linking tools consume a lot of time, caching the linking results of the test queries helps improve the speed of the test. The cache needs to be built the first time the QA is run and is available when it is run again. Use the arguments use_cache and create_cache above to set the cache tool.

[Optional] The cache of linking results is available on this Google Drive dir. Please set it to the current directory, i.e., EDGQA/cache. It is optional for running EDGQA.

2.2 Run PointerNetworkQA

Run cn/edu/nju/ws/edgqa/handler/PointerNetworkQA.java by following CLI arguments:

--dataset: 'lc-quad', 'qald-9'

3. Resources

4. Contact

Feel free to create a GitHub Issue or send an e-mail. We look forward to receiving your feedback.

edgqa's People

Contributors

hxx97 avatar yhshu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

edgqa's Issues

Use phrase "time for solving the current question" instead of "current time" to avoid ambiguity

In the file "src/main/java/cn/edu/nju/ws/edgqa/main/EDGQA.java", in the function "private static void computeMetrics(long startTime, CumulativeIRMetrics cumulativeIRMetrics, CumulativeIRMetrics QALDcumulativeIRMetrics, int quesIdx, long questionStartTime, int sparqlTemplateId, String goldenSparql, EDG edg, IRMetrics localIRMetrics, IRMetrics localQALDIRMetrics)", when show the information of the time for solving the current question, it's better to use phase "time for solving the current question" instead of "current time". Otherwise, some people may regard the "current time" as "time from the start of the system to the present" or "current time(time and date)".

Error in Executing queries for LC-Quad

I tried to run the System on Qald 9 dataset as specified in the Read.me file, and it worked correctly. Then I tried to run it on Lc-Quad with the following parameters "-d lc-quad -uc true" but for all questions I am getting empty Golden Answer and Predicted Answer.

This can be seen in the following example for question 5:

[DEBUG] Golden sparql: SELECT DISTINCT ?uri WHERE { <http://dbpedia.org/resource/Joe_Pass http://dbpedia.org/ontology/associatedBand ?uri. http://dbpedia.org/resource/Dream_Dancing_(album) http://dbpedia.org/property/artist ?uri . }
[INFO] SparqlGenerator list: []
[INFO] Golden Answer: []
[INFO] Predicted Answer: []

But in the log that you provide here it show non-empty Golden answer and Predicted answer lists as shown:

[DEBUG] Golden sparql: SELECT DISTINCT ?uri WHERE { http://dbpedia.org/resource/Joe_Pass http://dbpedia.org/ontology/associatedBand ?uri. http://dbpedia.org/resource/Dream_Dancing_(album) http://dbpedia.org/property/artist ?uri . }
[INFO] SparqlGenerator list: [SELECT DISTINCT ?e0 WHERE { {?e0 http://dbpedia.org/ontology/associatedBand http://dbpedia.org/resource/Joe_Pass} UNION {http://dbpedia.org/resource/Joe_Pass http://dbpedia.org/ontology/associatedBand ?e0} . {?e0 http://dbpedia.org/property/artist http://dbpedia.org/resource/Dream_Dancing_(album)} UNION {http://dbpedia.org/resource/Dream_Dancing_(album) http://dbpedia.org/property/artist ?e0} . }, SELECT DISTINCT ?e0 WHERE { {?e0 http://dbpedia.org/ontology/associatedBand http://dbpedia.org/resource/Joe_Pass} UNION {http://dbpedia.org/resource/Joe_Pass http://dbpedia.org/ontology/associatedBand ?e0} . {?e0 http://dbpedia.org/ontology/artist http://dbpedia.org/resource/Dream_Dancing_(album)} UNION {http://dbpedia.org/resource/Dream_Dancing_(album) http://dbpedia.org/ontology/artist ?e0} . }]
[INFO] Golden Answer: [http://dbpedia.org/resource/Ella_Fitzgerald]
[INFO] Predicted Answer: [http://dbpedia.org/resource/Ella_Fitzgerald]

I ran the Golden SPARQL query against the Virtuoso endpoint to make sure that it is not the cause of the error, and it worked and returned the correct result, but during the running of the system, I get the explained output.

Would you please help me in solving this error?

Why do I need to rewrite the QALD SPARQL in order to run the QA system?

I have noticed that when EDGQA deals with the QALD dataset, before using Query Factory to create the goldenQuery, there is a step which use "Preprocessor.rewriteQALDSparql" to rewrite the SPARQL for QALD. I wonder why this step is in need.
goldenSparql = Preprocessor.rewriteQALDSparql(goldenSparql); // rewrite sparql for qald
Query goldenQuery = QueryFactory.create(goldenSparql, Syntax.syntaxARQ);
goldenAnswer.addAll(KBUtil.getQueryStringResult(goldenQuery)); // the golden answers

Virtuoso Install?

What version of Virtuoso is required to run EDGQA? I get an error with KBUtil initialising.

image

IDE counts? Poor QALD-9 results with IDE eclipse but pretty good results with IDE IDEA

When I run the EDGQA system in the IDEA, I can reproduce the results in the paper, and even get a little better results, both LC-QuAD and QALD-9. However, when I try to run the EDGQA system in the eclipse, I can't get such good results. I have run the test for QALD-9 for two times in eclipse, and the results are as follows:
[INFO] Cumulative metrics, sample: 150, P: 0.277, R: 0.367, macro F1: 0.286, macro F1*: 0.316
[INFO] QALD Cumulative metrics, sample: 150, P: 0.557, R: 0.367, macro F1: 0.286, QALD macro F1: 0.442
and
[INFO] Cumulative metrics, sample: 150, P: 0.298, R: 0.387, macro F1: 0.306, macro F1*: 0.337
[INFO] QALD Cumulative metrics, sample: 150, P: 0.591, R: 0.387, macro F1: 0.306, QALD macro F1: 0.468
And the test results for QALD-9 in IDEA are as follows:
[INFO] Cumulative metrics, sample: 150, P: 0.319, R: 0.409, macro F1: 0.326, macro F1*: 0.359
[INFO] QALD Cumulative metrics, sample: 150, P: 0.546, R: 0.409, macro F1: 0.326, QALD macro F1: 0.468
I've applied the same settings in both eclipse and IDEA. And As far as I'm concerned, the results for QALD-9 is ought to be independent of the IDE I use. I wonder why the above difference between eclipse and IDEA exists.
On the other hand, I've been running experiments of LC-quad datasets in eclipse to further explore the doubts above.

Confusion about function toString and toQALDString in CumulativeIRMetrics.java

I feel confused about the correspondence between macro F1 and getMicroF1() in both function toString and toQALDString in file CumulativeIRMetrics.java. Maybe there is a small mistake?
public String toString() { return "sample: " + numSample + ", P: " + decimalFormat.format(getPrecision()) + ", R: " + decimalFormat.format(getRecall()) + ", macro F1: " + decimalFormat.format(getMicroF1()) + ", macro F1*: " + decimalFormat.format(getMacroF1()); }
public String toQALDString() { return "sample: " + numSample + ", P: " + decimalFormat.format(getPrecision()) + ", R: " + decimalFormat.format(getRecall()) + ", macro F1: " + decimalFormat.format(getMicroF1()) + ", QALD macro F1: " + decimalFormat.format(getMacroF1()); } }

Dump files for LCQUAD and QALD 9

Can you please let me know which files from the DBpedia dump did you load in virtuoso from DBpedia 1604 and DBpedia 1610 in order to achieve the benchmark results?

Add check of withStdOutMap.get(writer) when decide whether to execute System.out.print(str)

In the file "main/src/main/java/cn/edu/nju/ws/edgqa/utils/LogUtil.java", in the function "public static void print(String writer, String str, LogType logType)", it's better to use "withStdOutMap.containsKey(writer) && withStdOutMap.get(writer)" instead of only "withStdOutMap.containsKey(writer)" to decide whether to execute the code "System.out.print(str)". Otherwise, the variable "withStdOut" doesn't play its due role and seems redundant.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.