konradhoeffner / cubeqa Goto Github PK

View Code? Open in Web Editor NEW

20.0 13.0 5.0 780 KB

CubeQA—Question Answering on Statistical Linked Data

Home Page: https://aksw.org/Projects/CubeQA.html

License: GNU General Public License v3.0

Java 100.00%

question-answering datacube rdf semantic-web

cubeqa's People

Stargazers

Watchers

Forkers

qiuyuew sakthimuruga aiedward lianglili rjsheperd

cubeqa's Issues

Zeroes and empty strings count as no value (configurable)

Per default, 0 or empty string values should be counted as absent values, to be excluded per filter in the SPARQL query. This should be changeable in a per-dataset configuration.

HalfInfiniteIntervalDetector with floats

Add pre-processing

determine wh-pronoun (POS tag WP) and remove from question
identify aggregates (instead of at detector step) and their maximum reference phrase which is later used in case of multiple measure properties

parts of the question may be used multiple times

On the question "How much did the Philippines receive in the year of 2007?", the Philippines seem to be used twice:

D org.aksw.cubeqa.template.Fragment toTemplate: adding restriction Restriction on property (http://linkedspending.aksw.org/ontology/finland-aid-recipient-country, v1083550630) with where patterns: [?obs <http://linkedspending.aksw.org/ontology/finland-aid-recipient-country> <https://openspending.org/finland-aid/recipient-country/ph>.] and order limit patterns [] from score result ScoreResult(property=(http://linkedspending.aksw.org/ontology/finland-aid-recipient-country, v1083550630), value=https://openspending.org/finland-aid/recipient-country/ph, score=1.0)
D org.aksw.cubeqa.template.Fragment toTemplate: adding restriction Restriction on property (http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area, v860796570) with where patterns: [ ?obs  <http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area> "Malaysia, Philippines".] and order limit patterns [] from score result ScoreResult(property=(http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area, v860796570), value=Malaysia, Philippines, score=0.95)
W org.aksw.cubeqa.template.Fragment no answer property candidate found...
W org.aksw.cubeqa.template.Fragment ...using default answer property: (http://linkedspending.aksw.org/ontology/finland-aid-amount, v1908038420)
D org.aksw.cubeqa.AlgorithmTest select SUM(xsd:decimal(?v1908038420)) 
{
 ?obs  <http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area> "Malaysia, Philippines".
 ?obs  <http://linkedspending.aksw.org/ontology/refYear> ?v0.
filter(year(?v0)=2007).
?obs <http://linkedspending.aksw.org/ontology/finland-aid-recipient-country> <https://openspending.org/finland-aid/recipient-country/ph>.
?obs qb:dataSet <http://linkedspending.aksw.org/instance/finland-aid>.
?obs a qb:Observation.
?obs <http://linkedspending.aksw.org/ontology/finland-aid-amount> ?v1908038420.
}

Reject unanswerable questions and those with empty output to increase recall

On some types of questions it is possible to say beforehand that the question cannot be answered.
In this case there should be some mechanism to signal an invalid query so that no attempt is made and the recall is higher.

Similarly, if a SPARQL query returns nothing there should be an option to return no answer as well though that depends on if no answers are generally expected or not, in theory they should occur often but in question that has to be analysed. The benchmark for example has no questions with empty answer.

This should however be made clear in the log output so that unreasonably high scores are not reported in case of some query failure.

values detected for answer properties

For example in "How many sectors per recipient country?", there is a specific recipient country detected.

Verify EqualsAndHashCode of Restriction

fix linkedspending server

Needed for the unit tests. It may be more elegant to uncouple unit tests from any specific external dependency but it is not clear if that is easily done. For example, a local SPARQL endpoint could be created that contains the relevant files if they are not too large.

Do we need a boolean scorer?

Better Cube instance integration in PerTimeDetector

All the other detectors are singleton, transform PerTimeDetector also to one or investigate another solution.

aggregate mistdetected inside of word

Aggregate "min" is detected in "administrative" in the sentence "How much money Nepal receives for Environmental policy and administrative management?".

Question 74 wrong performance measurement?

"How much did the top 10 aided countries get in 2008?" does not result in correctly looking output but gets score of 1. Investigate.

Virtuoso doesn't process "2010"^^xsd:gYear anymore

Maybe that is a Virtuoso bug but I didn't perform an update, so investigate this.

Investigate Intui2, similar algorithm (not statistical)?

There seems to be an algorithm with a similar tree-based approach (however not applied to statistical data). TODO: Investigate, compare, add to related work, maybe contact them.

See QA survey draft:
Intui2 [18] is an sQA system on DBpedia based on
synfragments, which map to a subtree of the syntac-
tic parse tree. Semantically a synfragment is a mini-
mal span of text that can be interpreted as a RDF triple
or complex RDF query. Synfragments interoperation
with their parent synfragment by combining all combi-
nations of child synfragments, ordered by syntactic and
semantic characteristics. The authors assume that an
interpretation of a question in an RDF query language
can be obtained by the recursively interpretation of its
synfragments. With this approach the authors were able
to answer 30 out of 90 DBpedia questions correctly.

CubeTemplateFragment improve matchResult combination algorithm

greedy algorithm, does not work when highestNameRef has the only value Ref
check more pairs

Ensure right parsing of all xsd temporal types in ComponentProperty.java

qbench2 question 55 sparql exception

Question Number 55: Answering How much did Armenia spent in 2009 on general public services?
correct query: select sum(xsd:decimal(?amount)) as ?sum from http://linkedspending.aksw.org/618ac3ec98384f44a9ef142356ce476d
{
?obs qb:dataSet ls:618ac3ec98384f44a9ef142356ce476d.
?obs lso:618ac3ec98384f44a9ef142356ce476d-cofog1 https://openspending.org/618ac3ec98384f44a9ef142356ce476d/cofog1/01.
?obs lso:618ac3ec98384f44a9ef142356ce476d-amount ?amount.
?obs lso:refYear ?year.
filter(year(?year)=2009).
}
correct answer: [{=113006558300}]
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog3
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economic
range http://linkedspending.aksw.org/ontology/fromClass unknown: creating NOP scorer for http://linkedspending.aksw.org/ontology/from
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel3
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/datasetid
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/type
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/program
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel3
Exception in thread "main" java.lang.RuntimeException: error with sparql query select ?p {?spec ?p http://linkedspending.aksw.org/ontology/618ac3ec98384f44a9ef142356ce476d-amount. filter(contains(str(?p),"http://purl.org/linked-data/cube#"))} limit 1
at org.aksw.cubeqa.property.ComponentProperty.(ComponentProperty.java:111)
at org.aksw.cubeqa.property.ComponentProperty.getInstance(ComponentProperty.java:201)
at org.aksw.cubeqa.Cube.getDefaultAnswerProperty(Cube.java:172)
at org.aksw.cubeqa.template.CubeTemplateFragment.toTemplate(CubeTemplateFragment.java:157)
at org.aksw.cubeqa.template.CubeTemplator.buildTemplate(CubeTemplator.java:52)
at org.aksw.cubeqa.Algorithm.answer(Algorithm.java:14)
at org.aksw.cubeqa.benchmark.Benchmark.evaluate(Benchmark.java:128)
at org.aksw.cubeqa.benchmark.Benchmark.evaluate(Benchmark.java:105)
at org.aksw.cubeqa.scripts.EvaluateQBench2.main(EvaluateQBench2.java:14)
Caused by: java.util.NoSuchElementException: QueryIterPlainWrapper
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:152)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:129)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:41)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.nextBinding(ResultSetStream.java:87)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.nextSolution(ResultSetStream.java:115)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.next(ResultSetStream.java:124)
at com.hp.hpl.jena.sparql.engine.ResultSetCheckCondition.next(ResultSetCheckCondition.java:65)
at org.aksw.cubeqa.property.ComponentProperty.(ComponentProperty.java:110)
... 8 more

compare to wolfram alpha

show that similar questions do not work there, e.g. average height of american presidents
add to related work

Extend Benchmark

Benchmark needs 100 fully and correctly annotated questions.

Intersect multiple intervals for the same property

Fuzzy Matching of labels

use lucene index for string and objectproperty scorer

Interval with two non-infinite endpoints

Years sometimes not selected, missing boost somewhere?

In "What is the total amount given over the World Vision Colombia channel in 2007?" 2007 is identified as amount, not year. Numerical amounts are negatively boosted in favour of years though, maybe amount is not scored correctly?

Egyptian doesn't get stemmed to egypt

See http://linguistics.stackexchange.com/questions/12547/how-to-map-egyptian-to-egypt
Stemming seems to be the right method but we need a more aggressive one like the lancaster stemmer but it needs to be integratable into the Lucene index.

benchmarks should get versions

In case benchmarks get updated there should be a version number for them.

use slice labels of a dataset for identification along with label and comment

Dataset http://linkedspending.aksw.org/instance/uk-local-gloucestershirev1 has label only "uk-local-gloucestershirev1", which is not descriptive. But the labels of the slices are much more useful. Thus, labels of slices should be used to identify the dataset for a question.

google drive table recategorize errors

move default aggregate from templator to cubetemplate or cubetemplatefragment

load benchmark datasets in memory instead of using SPARQL endpoint

The benchmark dataset endpoints http://cubeqa.aksw.org/sparql / http://linkedspending.aksw.org/sparql are currently not online.
If it doesn't take much work, it would be better for future proofing to load it in memory instead.
The dataset is available as zipped N-Triples at https://github.com/KonradHoeffner/linkedspending/releases/download/data-qbench2datasets/qbench2datasets.zip
Consider publishing it and using it as HDT instead, using HDT Java & Jena.
See #46.

Add affirmative question support

Mock dependencies for unit tests

Right now, the unit tests only work if http://linkedspending.aksw.org/sparql is online but they should be independent.
Create a minimal serialization for the tests and query that, for example with Jena.

Make sure Index with same property does not conflict between data cubes

Can not run project

Hello
I was trying to run your project and i got this problem:
"Failed to execute goal on project cubeqa: Could not resolve dependencies for project org.aksw.cubeqa:cubeqa:jar:0.0.1-SNAPSHOT: Could not find artifact org.aksw:openspending2rdf:jar:0.0.1-SNAPSHOT in maven.aksw.internal (http://maven.aksw.org/repository/internal) -> [Help 1]"
I don't know how to fix it. Can you help me to resolve it? I'm just a student coming from Viet Nam, so if i take a mistake in this comment, please forgive me, thank you.

sector is not found in AlgorithmTest

find out why the sector is not found in AlgorithmTest even when boostString is set to 0.1 (in ObjectPropertyScorerTest it works)

Remove unused property values from the index?

Example: Dataset finland_aid contains channels "Fida International" and "Finnish NGO, Fida International", however the first one isn't used but is textually nearer to "Fida International" as query phrase.

TopDetector problems

"highest" does not seem to be detected at all

konradhoeffner / cubeqa Goto Github PK

cubeqa's People

Stargazers

Watchers

Forkers

cubeqa's Issues

Recommend Projects

Recommend Topics

Recommend Org