konradhoeffner / cubeqa Goto Github PK
View Code? Open in Web Editor NEWCubeQA—Question Answering on Statistical Linked Data
Home Page: https://aksw.org/Projects/CubeQA.html
License: GNU General Public License v3.0
CubeQA—Question Answering on Statistical Linked Data
Home Page: https://aksw.org/Projects/CubeQA.html
License: GNU General Public License v3.0
Per default, 0 or empty string values should be counted as absent values, to be excluded per filter in the SPARQL query. This should be changeable in a per-dataset configuration.
On the question "How much did the Philippines receive in the year of 2007?", the Philippines seem to be used twice:
D org.aksw.cubeqa.template.Fragment toTemplate: adding restriction Restriction on property (http://linkedspending.aksw.org/ontology/finland-aid-recipient-country, v1083550630) with where patterns: [?obs <http://linkedspending.aksw.org/ontology/finland-aid-recipient-country> <https://openspending.org/finland-aid/recipient-country/ph>.] and order limit patterns [] from score result ScoreResult(property=(http://linkedspending.aksw.org/ontology/finland-aid-recipient-country, v1083550630), value=https://openspending.org/finland-aid/recipient-country/ph, score=1.0)
D org.aksw.cubeqa.template.Fragment toTemplate: adding restriction Restriction on property (http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area, v860796570) with where patterns: [ ?obs <http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area> "Malaysia, Philippines".] and order limit patterns [] from score result ScoreResult(property=(http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area, v860796570), value=Malaysia, Philippines, score=0.95)
W org.aksw.cubeqa.template.Fragment no answer property candidate found...
W org.aksw.cubeqa.template.Fragment ...using default answer property: (http://linkedspending.aksw.org/ontology/finland-aid-amount, v1908038420)
D org.aksw.cubeqa.AlgorithmTest select SUM(xsd:decimal(?v1908038420))
{
?obs <http://linkedspending.aksw.org/ontology/finland-aid-geographical-target-area> "Malaysia, Philippines".
?obs <http://linkedspending.aksw.org/ontology/refYear> ?v0.
filter(year(?v0)=2007).
?obs <http://linkedspending.aksw.org/ontology/finland-aid-recipient-country> <https://openspending.org/finland-aid/recipient-country/ph>.
?obs qb:dataSet <http://linkedspending.aksw.org/instance/finland-aid>.
?obs a qb:Observation.
?obs <http://linkedspending.aksw.org/ontology/finland-aid-amount> ?v1908038420.
}
On some types of questions it is possible to say beforehand that the question cannot be answered.
In this case there should be some mechanism to signal an invalid query so that no attempt is made and the recall is higher.
Similarly, if a SPARQL query returns nothing there should be an option to return no answer as well though that depends on if no answers are generally expected or not, in theory they should occur often but in question that has to be analysed. The benchmark for example has no questions with empty answer.
This should however be made clear in the log output so that unreasonably high scores are not reported in case of some query failure.
For example in "How many sectors per recipient country?", there is a specific recipient country detected.
Needed for the unit tests. It may be more elegant to uncouple unit tests from any specific external dependency but it is not clear if that is easily done. For example, a local SPARQL endpoint could be created that contains the relevant files if they are not too large.
All the other detectors are singleton, transform PerTimeDetector also to one or investigate another solution.
Aggregate "min" is detected in "administrative" in the sentence "How much money Nepal receives for Environmental policy and administrative management?".
"How much did the top 10 aided countries get in 2008?" does not result in correctly looking output but gets score of 1. Investigate.
Maybe that is a Virtuoso bug but I didn't perform an update, so investigate this.
There seems to be an algorithm with a similar tree-based approach (however not applied to statistical data). TODO: Investigate, compare, add to related work, maybe contact them.
See QA survey draft:
Intui2 [18] is an sQA system on DBpedia based on
synfragments, which map to a subtree of the syntac-
tic parse tree. Semantically a synfragment is a mini-
mal span of text that can be interpreted as a RDF triple
or complex RDF query. Synfragments interoperation
with their parent synfragment by combining all combi-
nations of child synfragments, ordered by syntactic and
semantic characteristics. The authors assume that an
interpretation of a question in an RDF query language
can be obtained by the recursively interpretation of its
synfragments. With this approach the authors were able
to answer 30 out of 90 DBpedia questions correctly.
Question Number 55: Answering How much did Armenia spent in 2009 on general public services?
correct query: select sum(xsd:decimal(?amount)) as ?sum from http://linkedspending.aksw.org/618ac3ec98384f44a9ef142356ce476d
{
?obs qb:dataSet ls:618ac3ec98384f44a9ef142356ce476d.
?obs lso:618ac3ec98384f44a9ef142356ce476d-cofog1 https://openspending.org/618ac3ec98384f44a9ef142356ce476d/cofog1/01.
?obs lso:618ac3ec98384f44a9ef142356ce476d-amount ?amount.
?obs lso:refYear ?year.
filter(year(?year)=2009).
}
correct answer: [{=113006558300}]
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog3
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economic
range http://linkedspending.aksw.org/ontology/fromClass unknown: creating NOP scorer for http://linkedspending.aksw.org/ontology/from
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/cofog
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel3
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/datasetid
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel2
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economiclevel1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel1
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/type
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/program
no range: creating NOP property scorer for http://linkedspending.aksw.org/ontology/economicidlevel3
Exception in thread "main" java.lang.RuntimeException: error with sparql query select ?p {?spec ?p http://linkedspending.aksw.org/ontology/618ac3ec98384f44a9ef142356ce476d-amount. filter(contains(str(?p),"http://purl.org/linked-data/cube#"))} limit 1
at org.aksw.cubeqa.property.ComponentProperty.(ComponentProperty.java:111)
at org.aksw.cubeqa.property.ComponentProperty.getInstance(ComponentProperty.java:201)
at org.aksw.cubeqa.Cube.getDefaultAnswerProperty(Cube.java:172)
at org.aksw.cubeqa.template.CubeTemplateFragment.toTemplate(CubeTemplateFragment.java:157)
at org.aksw.cubeqa.template.CubeTemplator.buildTemplate(CubeTemplator.java:52)
at org.aksw.cubeqa.Algorithm.answer(Algorithm.java:14)
at org.aksw.cubeqa.benchmark.Benchmark.evaluate(Benchmark.java:128)
at org.aksw.cubeqa.benchmark.Benchmark.evaluate(Benchmark.java:105)
at org.aksw.cubeqa.scripts.EvaluateQBench2.main(EvaluateQBench2.java:14)
Caused by: java.util.NoSuchElementException: QueryIterPlainWrapper
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.nextBinding(QueryIteratorBase.java:152)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:129)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.next(QueryIteratorBase.java:41)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.nextBinding(ResultSetStream.java:87)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.nextSolution(ResultSetStream.java:115)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.next(ResultSetStream.java:124)
at com.hp.hpl.jena.sparql.engine.ResultSetCheckCondition.next(ResultSetCheckCondition.java:65)
at org.aksw.cubeqa.property.ComponentProperty.(ComponentProperty.java:110)
... 8 more
Benchmark needs 100 fully and correctly annotated questions.
In "What is the total amount given over the World Vision Colombia channel in 2007?" 2007 is identified as amount, not year. Numerical amounts are negatively boosted in favour of years though, maybe amount is not scored correctly?
See http://linguistics.stackexchange.com/questions/12547/how-to-map-egyptian-to-egypt
Stemming seems to be the right method but we need a more aggressive one like the lancaster stemmer but it needs to be integratable into the Lucene index.
In case benchmarks get updated there should be a version number for them.
Dataset http://linkedspending.aksw.org/instance/uk-local-gloucestershirev1
has label only "uk-local-gloucestershirev1", which is not descriptive. But the labels of the slices are much more useful. Thus, labels of slices should be used to identify the dataset for a question.
The benchmark dataset endpoints http://cubeqa.aksw.org/sparql / http://linkedspending.aksw.org/sparql are currently not online.
If it doesn't take much work, it would be better for future proofing to load it in memory instead.
The dataset is available as zipped N-Triples at https://github.com/KonradHoeffner/linkedspending/releases/download/data-qbench2datasets/qbench2datasets.zip
Consider publishing it and using it as HDT instead, using HDT Java & Jena.
See #46.
Right now, the unit tests only work if http://linkedspending.aksw.org/sparql is online but they should be independent.
Create a minimal serialization for the tests and query that, for example with Jena.
Hello
I was trying to run your project and i got this problem:
"Failed to execute goal on project cubeqa: Could not resolve dependencies for project org.aksw.cubeqa:cubeqa:jar:0.0.1-SNAPSHOT: Could not find artifact org.aksw:openspending2rdf:jar:0.0.1-SNAPSHOT in maven.aksw.internal (http://maven.aksw.org/repository/internal) -> [Help 1]"
I don't know how to fix it. Can you help me to resolve it? I'm just a student coming from Viet Nam, so if i take a mistake in this comment, please forgive me, thank you.
find out why the sector is not found in AlgorithmTest even when boostString is set to 0.1 (in ObjectPropertyScorerTest it works)
Example: Dataset finland_aid contains channels "Fida International" and "Finnish NGO, Fida International", however the first one isn't used but is textually nearer to "Fida International" as query phrase.
Temporarily disabling the cube cache as a workaround.
Problem still occurs, probably getInstance is broken.
New workaround using var = "v"+Math.abs(uri.hashCode())
, underlying issue still needs to be fixed.
How much was committed in total for Namibia over Martinus-säätiö for Basic health infrastructure?
Somehow the channel "Indufor" gets detected here.
Aggregates don't seem to be correctly detected. Integrate into detector system which assumably needs a rework (or the restrictions).
Include index in the interval detection so that only matched phrases are included in the interval.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.