Giter Site home page Giter Site logo

Error on TestGtRDFReader about jedaitoolkit HOT 10 OPEN

scify avatar scify commented on June 9, 2024
Error on TestGtRDFReader

from jedaitoolkit.

Comments (10)

gpapadis avatar gpapadis commented on June 9, 2024

Hi, we are glad you are interested in JedAI!

I didn't have the time to reproduce the error you mention. It is probably caused because there is a same-as statement that connects an entity to itself. I guess you have modified TestGtRDFReader.java so that it reads both datasets. Which of the two datasets do you use as input for the GtRDFReader?

Kind regards,
George

from jedaitoolkit.

franklarryx avatar franklarryx commented on June 9, 2024

Hi, these datasets come from silkframework.org.
I have already checked that both datasets not contain any "sameAs" property.
I have debugged the code and seems that the issue is derivated from the following part of the code presents in the GtRDFReader (line 229-234) class:

` final String sub = stmt.getSubject().toString();
final String obj = stmt.getObject().toString();

        // add a new edge for every pair of duplicate entities
        int entityId1 = urlToEntityId1.get(sub);
        int entityId2 = urlToEntityId1.get(obj) + datasetLimit;`

Thanks !

from jedaitoolkit.

gpapadis avatar gpapadis commented on June 9, 2024

Hi,

for some reason, I see lots of sameAs statements in the datasets you have uploaded.
I created here a class that tries to reproduce the error you are mentioning:
https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java
So, my question is which file does gtFilePath point to in your case (Line 21)?
On my computer, I run TestSilkData.java without getting any exception.
The problem I see with setting
gtFilePath = mainDir + "source.nt";
is that I only get sameAs statements like the following:
[http://dbpedia.org/resource/Karma_%28film%29, http://www.w3.org/2002/07/owl#sameAs, http://data.linkedmdb.org/resource/film/7632]
where http://data.linkedmdb.org/resource/film/7632 is not included in any of the given datasets and causes problems.
I would be happy to help you if you clarified which dataset you use for groundtruth, provided of course that this groundtruth file contains correct links of the form
URL_from_Dataset_1 sameAs URL_from_Dataset_2.

Kind regards,
George

from jedaitoolkit.

franklarryx avatar franklarryx commented on June 9, 2024

Many thanks for the reply!
I had written wrong code.
My goal is to check how JedAI links the two datasets (source.nt and target.nt) in order to replace the silk tool!

kind regards,
Frank

from jedaitoolkit.

gpapadis avatar gpapadis commented on June 9, 2024

You are welcome Frank! Let us know if we can assist you in any other way.

from jedaitoolkit.

franklarryx avatar franklarryx commented on June 9, 2024

Hi George,

attached you can find the java class that you have provided to me, modified with block management and similarity process.
I can't understand very well the result obtain from the class (result.txt in attached). I think that the percentages of similarity are not highly, but the linkage between datasets are present!

What do you think? Any suggestions?

Thanks in advance,
Frank

classAndResult.zip

from jedaitoolkit.

gpapadis avatar gpapadis commented on June 9, 2024

Hi Frank,

I am sorry for the late response.

I updated the TestSilkData.java class (https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java) with a more complete version of the code. The code you sent me didn't perform Entity Clustering, which is necessary for yielding the final results. The absolute values of the similarities might be low, but what matters is their relative values. In the Clean-Clean ER scenario you are considering, Unique Mapping Clustering should be applied in the end so that for every entity, the best match is selected (i.e., the pair with the highest similarity), as long as this similarity exceeds a certain threshold.

Note that the new code tests a large number of configurations in order to find the one with the highest performance. As a result, it will take some time to complete. I ran it, but no meaningful results were produced, because the ground-truth reader cannot extract any pair of duplicates from the source.nt file that is used as the source of the groundtruth.

Kind regards,
George

from jedaitoolkit.

franklarryx avatar franklarryx commented on June 9, 2024

Thank you for your time!
Where can I find a simple example in rdf in order to better understand your tool?

Kind regards,
Frank

from jedaitoolkit.

mthanos avatar mthanos commented on June 9, 2024

Hi Frank,

You can find many relevant datasets here
http://oaei.ontologymatching.org/2009/ ,
where we have also taken many of our benchmarks from.

You can also check the following datasets along with the expected mappings:
oaeiIMidentity.zip
They were used for OAEI instance matching track (http://oaei.ontologymatching.org/2014/im/index.html)

Best regards,
Manos.

from jedaitoolkit.

franklarryx avatar franklarryx commented on June 9, 2024

Many thanks for the indications and suggestions!

Kind regards,
Frank

from jedaitoolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.