Comments (10)
Hi, we are glad you are interested in JedAI!
I didn't have the time to reproduce the error you mention. It is probably caused because there is a same-as statement that connects an entity to itself. I guess you have modified TestGtRDFReader.java so that it reads both datasets. Which of the two datasets do you use as input for the GtRDFReader?
Kind regards,
George
from jedaitoolkit.
Hi, these datasets come from silkframework.org.
I have already checked that both datasets not contain any "sameAs" property.
I have debugged the code and seems that the issue is derivated from the following part of the code presents in the GtRDFReader (line 229-234) class:
` final String sub = stmt.getSubject().toString();
final String obj = stmt.getObject().toString();
// add a new edge for every pair of duplicate entities
int entityId1 = urlToEntityId1.get(sub);
int entityId2 = urlToEntityId1.get(obj) + datasetLimit;`
Thanks !
from jedaitoolkit.
Hi,
for some reason, I see lots of sameAs statements in the datasets you have uploaded.
I created here a class that tries to reproduce the error you are mentioning:
https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java
So, my question is which file does gtFilePath point to in your case (Line 21)?
On my computer, I run TestSilkData.java without getting any exception.
The problem I see with setting
gtFilePath = mainDir + "source.nt";
is that I only get sameAs statements like the following:
[http://dbpedia.org/resource/Karma_%28film%29, http://www.w3.org/2002/07/owl#sameAs, http://data.linkedmdb.org/resource/film/7632]
where http://data.linkedmdb.org/resource/film/7632 is not included in any of the given datasets and causes problems.
I would be happy to help you if you clarified which dataset you use for groundtruth, provided of course that this groundtruth file contains correct links of the form
URL_from_Dataset_1 sameAs URL_from_Dataset_2.
Kind regards,
George
from jedaitoolkit.
Many thanks for the reply!
I had written wrong code.
My goal is to check how JedAI links the two datasets (source.nt and target.nt) in order to replace the silk tool!
kind regards,
Frank
from jedaitoolkit.
You are welcome Frank! Let us know if we can assist you in any other way.
from jedaitoolkit.
Hi George,
attached you can find the java class that you have provided to me, modified with block management and similarity process.
I can't understand very well the result obtain from the class (result.txt in attached). I think that the percentages of similarity are not highly, but the linkage between datasets are present!
What do you think? Any suggestions?
Thanks in advance,
Frank
from jedaitoolkit.
Hi Frank,
I am sorry for the late response.
I updated the TestSilkData.java class (https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java) with a more complete version of the code. The code you sent me didn't perform Entity Clustering, which is necessary for yielding the final results. The absolute values of the similarities might be low, but what matters is their relative values. In the Clean-Clean ER scenario you are considering, Unique Mapping Clustering should be applied in the end so that for every entity, the best match is selected (i.e., the pair with the highest similarity), as long as this similarity exceeds a certain threshold.
Note that the new code tests a large number of configurations in order to find the one with the highest performance. As a result, it will take some time to complete. I ran it, but no meaningful results were produced, because the ground-truth reader cannot extract any pair of duplicates from the source.nt file that is used as the source of the groundtruth.
Kind regards,
George
from jedaitoolkit.
Thank you for your time!
Where can I find a simple example in rdf in order to better understand your tool?
Kind regards,
Frank
from jedaitoolkit.
Hi Frank,
You can find many relevant datasets here
http://oaei.ontologymatching.org/2009/ ,
where we have also taken many of our benchmarks from.
You can also check the following datasets along with the expected mappings:
oaeiIMidentity.zip
They were used for OAEI instance matching track (http://oaei.ontologymatching.org/2014/im/index.html)
Best regards,
Manos.
from jedaitoolkit.
Many thanks for the indications and suggestions!
Kind regards,
Frank
from jedaitoolkit.
Related Issues (20)
- ArrayIndexOutOfBoundsException when blocking with schema clusters HOT 1
- data pairs shown as false negatives and as true positives HOT 1
- SiGMa Similarity
- Could not read successfully the input file! HOT 1
- CSV Headers with upper case doesn't works for PPJoin HOT 1
- Documentation or examples for the open source library HOT 3
- GtCSVReader problems with jgrapht ConnectivityInspector HOT 2
- PPJoin throw ArrayIndexOutOfBound if candidateSize > requireOverlaps.length HOT 1
- Cannot read ground truth HOT 1
- Dirty datasets in CSV format HOT 3
- Change comparison counts type to int HOT 2
- DBPedia link broken HOT 1
- JedAI for Data matching HOT 1
- Make block building, block processing, entity clustering classes serializable and add setters for configurable fields HOT 2
- Question about Data HOT 2
- Dirty ER examples input .csv HOT 2
- Unable to Read csv or json files HOT 1
- Apply JedAI blocking programmatically - missing documentation HOT 2
- Dependency org.apache.httpcomponents:httpclient-cache, leading to CVE problem HOT 1
- Converting the DBPedia dataset into non-Java format HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jedaitoolkit.