Giter Site home page Giter Site logo

konradhoeffner / cubeqa Goto Github PK

View Code? Open in Web Editor NEW
20.0 13.0 5.0 780 KB

CubeQA—Question Answering on Statistical Linked Data

Home Page: https://aksw.org/Projects/CubeQA.html

License: GNU General Public License v3.0

Java 100.00%
question-answering datacube rdf semantic-web

cubeqa's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cubeqa's Issues

Add pre-processing

  • determine wh-pronoun (POS tag WP) and remove from question
  • identify aggregates (instead of at detector step) and their maximum reference phrase which is later used in case of multiple measure properties

parts of the question may be used multiple times

On the question "How much did the Philippines receive in the year of 2007?", the Philippines seem to be used twice:

D org.aksw.cubeqa.template.Fragment toTemplate: adding restriction Restriction on property (http://linkedspending.aksw.org/ontology/finland-aid-recipient-country, v1083550630) with where patterns: [?obs <http://linkedspending.aksw.org/ontology/finland-aid-recipient-country> <https://openspending.org/finland-aid/recipient-country/ph>.] and order limit patterns [] from score result ScoreResult(property=(http://linkedspending.aksw.org/ontology/finland-aid-recipient-country, v1083550630), value=https://openspending.org/finland-aid/recipient-country/ph, score=1.0)
D org.aksw.cubeqa.template.Fragment toTemplate: adding restriction Restriction on property (http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area, v860796570) with where patterns: [ ?obs  <http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area> "Malaysia, Philippines".] and order limit patterns [] from score result ScoreResult(property=(http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area, v860796570), value=Malaysia, Philippines, score=0.95)
W org.aksw.cubeqa.template.Fragment no answer property candidate found...
W org.aksw.cubeqa.template.Fragment ...using default answer property: (http://linkedspending.aksw.org/ontology/finland-aid-amount, v1908038420)
D org.aksw.cubeqa.AlgorithmTest select SUM(xsd:decimal(?v1908038420)) 
{
 ?obs  <http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area> "Malaysia, Philippines".
 ?obs  <http://linkedspending.aksw.org/ontology/refYear> ?v0.
filter(year(?v0)=2007).
?obs <http://linkedspending.aksw.org/ontology/finland-aid-recipient-country> <https://openspending.org/finland-aid/recipient-country/ph>.
?obs qb:dataSet <http://linkedspending.aksw.org/instance/finland-aid>.
?obs a qb:Observation.
?obs <http://linkedspending.aksw.org/ontology/finland-aid-amount> ?v1908038420.
}

Reject unanswerable questions and those with empty output to increase recall

On some types of questions it is possible to say beforehand that the question cannot be answered.
In this case there should be some mechanism to signal an invalid query so that no attempt is made and the recall is higher.

Similarly, if a SPARQL query returns nothing there should be an option to return no answer as well though that depends on if no answers are generally expected or not, in theory they should occur often but in question that has to be analysed. The benchmark for example has no questions with empty answer.

This should however be made clear in the log output so that unreasonably high scores are not reported in case of some query failure.

fix linkedspending server

Needed for the unit tests. It may be more elegant to uncouple unit tests from any specific external dependency but it is not clear if that is easily done. For example, a local SPARQL endpoint could be created that contains the relevant files if they are not too large.

aggregate mistdetected inside of word

Aggregate "min" is detected in "administrative" in the sentence "How much money Nepal receives for Environmental policy and administrative management?".

Investigate Intui2, similar algorithm (not statistical)?

There seems to be an algorithm with a similar tree-based approach (however not applied to statistical data). TODO: Investigate, compare, add to related work, maybe contact them.

See QA survey draft:
Intui2 [18] is an sQA system on DBpedia based on
synfragments, which map to a subtree of the syntac-
tic parse tree. Semantically a synfragment is a mini-
mal span of text that can be interpreted as a RDF triple
or complex RDF query. Synfragments interoperation
with their parent synfragment by combining all combi-
nations of child synfragments, ordered by syntactic and
semantic characteristics. The authors assume that an
interpretation of a question in an RDF query language
can be obtained by the recursively interpretation of its
synfragments. With this approach the authors were able
to answer 30 out of 90 DBpedia questions correctly.

qbench2 question 55 sparql exception

Question Number 55: Answering How much did Armenia spent in 2009 on general public services?
correct query: select sum(xsd:decimal(?amount)) as ?sum from http://linkedspending.aksw.org/618ac3ec98384f44a9ef142356ce476d
{
?obs qb:dataSet ls:618ac3ec98384f44a9ef142356ce476d.
?obs lso:618ac3ec98384f44a9ef142356ce476d-cofog1 https://openspending.org/618ac3ec98384f44a9ef142356ce476d/cofog1/01.
?obs lso:618ac3ec98384f44a9ef142356ce476d-amount ?amount.
?obs lso:refYear ?year.
filter(year(?year)=2009).
}
correct answer: [{=113006558300}]
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog3
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economic
range http://linkedspending.aksw.org/ontology/fromClass unknown: creating NOP scorer for http://linkedspending.aksw.org/ontology/from
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel3
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/datasetid
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/type
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/program
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel3
Exception in thread "main" java.lang.RuntimeException: error with sparql query select ?p {?spec ?p http://linkedspending.aksw.org/ontology/618ac3ec98384f44a9ef142356ce476d-amount. filter(contains(str(?p),"http://purl.org/linked-data/cube#"))} limit 1
at org.aksw.cubeqa.property.ComponentProperty.(ComponentProperty.java:111)
at org.aksw.cubeqa.property.ComponentProperty.getInstance(ComponentProperty.java:201)
at org.aksw.cubeqa.Cube.getDefaultAnswerProperty(Cube.java:172)
at org.aksw.cubeqa.template.CubeTemplateFragment.toTemplate(CubeTemplateFragment.java:157)
at org.aksw.cubeqa.template.CubeTemplator.buildTemplate(CubeTemplator.java:52)
at org.aksw.cubeqa.Algorithm.answer(Algorithm.java:14)
at org.aksw.cubeqa.benchmark.Benchmark.evaluate(Benchmark.java:128)
at org.aksw.cubeqa.benchmark.Benchmark.evaluate(Benchmark.java:105)
at org.aksw.cubeqa.scripts.EvaluateQBench2.main(EvaluateQBench2.java:14)
Caused by: java.util.NoSuchElementException: QueryIterPlainWrapper
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:152)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:129)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:41)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.nextBinding(ResultSetStream.java:87)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.nextSolution(ResultSetStream.java:115)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.next(ResultSetStream.java:124)
at com.hp.hpl.jena.sparql.engine.ResultSetCheckCondition.next(ResultSetCheckCondition.java:65)
at org.aksw.cubeqa.property.ComponentProperty.(ComponentProperty.java:110)
... 8 more

compare to wolfram alpha

  • show that similar questions do not work there, e.g. average height of american presidents
  • add to related work

Years sometimes not selected, missing boost somewhere?

In "What is the total amount given over the World Vision Colombia channel in 2007?" 2007 is identified as amount, not year. Numerical amounts are negatively boosted in favour of years though, maybe amount is not scored correctly?

load benchmark datasets in memory instead of using SPARQL endpoint

The benchmark dataset endpoints http://cubeqa.aksw.org/sparql / http://linkedspending.aksw.org/sparql are currently not online.
If it doesn't take much work, it would be better for future proofing to load it in memory instead.
The dataset is available as zipped N-Triples at https://github.com/KonradHoeffner/linkedspending/releases/download/data-qbench2datasets/qbench2datasets.zip
Consider publishing it and using it as HDT instead, using HDT Java & Jena.
See #46.

Can not run project

Hello
I was trying to run your project and i got this problem:
"Failed to execute goal on project cubeqa: Could not resolve dependencies for project org.aksw.cubeqa:cubeqa:jar:0.0.1-SNAPSHOT: Could not find artifact org.aksw:openspending2rdf:jar:0.0.1-SNAPSHOT in maven.aksw.internal (http://maven.aksw.org/repository/internal) -> [Help 1]"
I don't know how to fix it. Can you help me to resolve it? I'm just a student coming from Viet Nam, so if i take a mistake in this comment, please forgive me, thank you.

Remove unused property values from the index?

Example: Dataset finland_aid contains channels "Fida International" and "Finnish NGO, Fida International", however the first one isn't used but is textually nearer to "Fida International" as query phrase.

update dependencies

Including log4j if necessary. Wait for #44 so that the unit tests can be run again.
Unit tests should actually not depend on external services, see #46.

Investigate question 57

How much was committed in total for Namibia over Martinus-säätiö for Basic health infrastructure?

Somehow the channel "Indufor" gets detected here.

Aggregates as Detector

Aggregates don't seem to be correctly detected. Integrate into detector system which assumably needs a rework (or the restrictions).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.