Giter Site home page Giter Site logo

scify / jedaitoolkit Goto Github PK

View Code? Open in Web Editor NEW
201.0 201.0 45.0 284.2 MB

An open source, high scalability toolkit in Java for Entity Resolution.

Home Page: http://jedai.scify.org

License: Apache License 2.0

Java 100.00%
blocking entity-matching entity-resolution scalability

jedaitoolkit's Issues

Dirty ER examples input .csv

Hi, it is possible to have sample files in .csv format for

  • entity profile D1
  • ground truth
    because .csv files with any formatting will not work.
    The error from JedAI-gui is the following:

image

Thanks you for the support

UI and Docker's Web Application get stuck in Data Reading Phase

I get the following error after specifying input sources and then pressing "Next" button in Data Reading Step in JedAI UI:

The input files could not be read successfully.

Details: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Character
cannot be cast to java.lang.String (java.lang.Character cannot be cast to cast to java.lang.String)

In the terminal of Docker's Web Application I have the following:

java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Character
	at kr.di.uoa.gr.jedaiwebapp.models.Dataset.<init>(Dataset.java:86) ~[classes!/:0.0.1-SNAPSHOT]
	at kr.di.uoa.gr.jedaiwebapp.controllers.WorkflowController.validate_DataRead(WorkflowController.java:75) ~[classes!/:0.0.1-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_212]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_212]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_212]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_212]
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190) ~[spring-web-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
...

Unable to load csv

I am using the latest JedAI-gui: jedai-ui.7z. I tried loading DBLP-ACM .csv data:

  1. ACM.csv
  2. DBLP2.csv
  3. DBLP-ACM_perfectMapping.csv

and I get the following error (please see attached):
Screenshot 2020-05-25 at 15 20 31

Unable to build jedai-core - missing dependencies

Hi,

I'm unable to build the project.
The following dependencies can't be found :

  • com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0
  • gr.demokritos:JInsect:jar:1.1
  • salvo.jesus:OpenJGraph:jar:1.1

The first one can't be found, the two others seems to be on an unreachable repository http://backend1.scify.org:60004/artifactory/pub-release-local

mvn clean install -U
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] jedai                                                              [pom]
[INFO] jedai-core                                                         [jar]
[INFO] jedai-ui                                                           [jar]
[INFO]
[INFO] ---------------------------< gr.scify:jedai >---------------------------
[INFO] Building jedai 1.3                                                 [1/3]
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ jedai ---
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) @ jedai ---
[INFO] Installing C:\projet\JedAIToolkit\pom.xml to C:\Users\nicolas.lledo\.m2\repository\gr\scify\jedai\1.3\jedai-1.3.pom
[INFO]
[INFO] ------------------------< gr.scify:jedai-core >-------------------------
[INFO] Building jedai-core 1.3                                            [2/3]
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/com/esotericsoftware/minlog/minlog/1.2-slf4j-jdanbrown-0/minlog-1.2-slf4j-jdanbrown-0.pom
[WARNING] The POM for com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0 is missing, no dependency information available
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/gr/demokritos/JInsect/1.1/JInsect-1.1.pom
[WARNING] The POM for gr.demokritos:JInsect:jar:1.1 is missing, no dependency information available
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/salvo/jesus/OpenJGraph/1.1/OpenJGraph-1.1.pom
[WARNING] The POM for salvo.jesus:OpenJGraph:jar:1.1 is missing, no dependency information available
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/com/esotericsoftware/minlog/minlog/1.2-slf4j-jdanbrown-0/minlog-1.2-slf4j-jdanbrown-0.jar
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/salvo/jesus/OpenJGraph/1.1/OpenJGraph-1.1.jar
Downloading from nexus.somecompany.com: http://nexus.somecompany.com/repository/maven-public/gr/demokritos/JInsect/1.1/JInsect-1.1.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for jedai 1.3:
[INFO]
[INFO] jedai .............................................. SUCCESS [  0.452 s]
[INFO] jedai-core ......................................... FAILURE [  1.671 s]
[INFO] jedai-ui ........................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  2.393 s
[INFO] Finished at: 2019-02-27T17:50:24+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project jedai-core: Could not resolve dependencies for project gr.scify:jedai-core:jar:1.3: The following artifacts could not be resolved: com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0, gr.demokritos:JInsect:jar:1.1, salvo.jesus:OpenJGraph:jar:1.1: Could not find artifact com.esotericsoftware.minlog:minlog:jar:1.2-slf4j-jdanbrown-0 in nexus.somecompany.com (http://nexus.somecompany.com/repository/maven-public/) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :jedai-core

SiGMa Similarity

I had a look at the code of the SiGMa Similarity in class CharacterNGramsWithGlobalWeights and it seems to be exactly the same code as in the Generalized Jaccard Similarity. Am I missing something or is SiGMa not really implemented?

Question about Data

Hi, I found that the number of data under this repository does not seem to match the original one, and I would like to know if the data has been processed. For example, the original Amazon-Google has 1363, 3226 entities and 1300 matches respectively, but the numbers are less in this project.

Also I see a lot of dirty data that seems to just mix the two tables together? Is there any other processing.

Cannot read ground truth

There is a bug in the code that prevents the ground-truth in CSV format from being read. I tried the samples provided and the web-based docker image failed to load it. I downloaded the code and run it step by step and I think there is a problem with the GtCSVReader. The reading part takes strings like "thisisastring" where only thisisastring should be read. I tried to add nextLine[0] = nextLine[0].substring(1, nextLine[0].length()-1); on line 200 in that file, but no success so far. I need to make it work to test some CSV entity matchings, so maybe somebody has the fix for this issue?

CSV Headers with upper case doesn't works for PPJoin

In Similarity join page on UI, on providing the Select attribute of Dataset 1: & Select attribute of Dataset 2: value with uppercase value Eg: "INSTANCE ID", the algorithm fails to match results. On further investigating I found that the class AbstractSimilarityJoin method getAttributeValue(String attributeName, EntityProfile profile) on line 67 the attributeName should be changed to attributeName.toLowerCase() for considering attributeNames properly or else it simply ignores the if condition.

Apply JedAI blocking programmatically - missing documentation

Hi!

I have successfully made the Web application work and I also made my first successful steps by using JedAI with Python.

But now I want to do it programatically with Python and without the Web application, so I want to apply the full workflow but only with the terminal and the VS Code.

But I couldn't find any detailed documentation how I can do blocking, cleaning ... programatically.

Dirty datasets in CSV format

Hi I was wondering if you have the dirty datasets available in CSV format? Otherwise I can just create a quick script that reads the JSO files and convert them myself, but I figured there is no harm in asking first! Thanks in advance.

Converting the DBPedia dataset into non-Java format

Hello,
im working on converting the DBPedia dataset into a format accessible without Java.
I have already converted cleanDBPedia1/2.
However i do not understand the ground truth format.
The profiles have attributes and a URI.
The pairs in the ground truth consist of numbers.
However, when i interpret these numbers as offsets into either file i end up with non-matching pairs.
I wrote the entities into the files in the order they were in the deserialized Java list.
How to find matching pairs / understand the grund truth?
Kind regards

JedAI for Data matching

Hello,
I am trying to run Web based application for a data matching task. I have two tables in the csv format: the first table contains 1.2k rows and the second table contains 7k queries. I want to use JedAI to match each query with a row from the first table. When I run a "block-based workflow" the process stuck in the table loading.
I am a bit lost about how to configure the model. So far I tried the settings in the video tutorial and some other settings but the application never generates any outputs. I share the Tables with the message, please let me know if there is anything wrong with the way i generated them.

Documentation or examples for the open source library

I cannot seem to find any documentation or examples of a standard workflow implemented in python or java in your repository. Do either of these exist? If so, where could I find them? If not, it would be very useful to have these, since a new user of your tool like me now has to go through all of the java classes to learn how to use the tool, which will take a lot of time.

data pairs shown as false negatives and as true positives

I found some cases where data pairs showed up in the end results as false negative and true positive simultaneously.
Its cause is in the class UnilateralDuplicatePropagation and the following functions:

public boolean isSuperfluous(int entityId1, int entityId2) {
        final IdDuplicates duplicatePair1 = new IdDuplicates(entityId1, entityId2);
        final IdDuplicates duplicatePair2 = new IdDuplicates(entityId2, entityId1);
        if (duplicates.contains(duplicatePair1)
                || duplicates.contains(duplicatePair2)) {
            if (entityId1 < entityId2) {
                detectedDuplicates.add(duplicatePair1);
            } else {
                detectedDuplicates.add(duplicatePair2);
            }
        }

        return false;
    }
public Set<IdDuplicates> getFalseNegatives() {
        final Set<IdDuplicates> falseNegatives = new HashSet<>(duplicates);
        falseNegatives.removeAll(detectedDuplicates);
        return falseNegatives;
    }

Only one of two possible combinations of IDs is written to detectedDuplicates, but superfluous combinations still exist in duplicates. When removing detectedDuplicates from duplicates to create falseNegatives, those superfluous combinations remain and are exported as false negatives, while the combinations in detectedDuplicates are exported as true positives.

Reduce memory footprint of SimilarityPairs

We're using jedai-core (not jedai-ui) in our application and we ran into some Out of Memory errors and started profiling our application. The largest chunk of memory was from SimilarityPairs. We experimented with reducing the size of the similarities from double to float and that reduced the memory footprint by about 25% (630 MB -> 470 MB).

I'm assuming we don't need the extra precision afforded by double, is that correct?

Better structure for match results output file

The PrintToFile.toCSV() method should output the original entity urls, and should be in a format which is easier to import into a database. e.g. 3 columns: custer_id, dataset, entity_url

GtCSVReader problems with jgrapht ConnectivityInspector

This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.

During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.

For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs (which should in turn produce 2225 clusters of size 2).

Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?

Remove maven-assembly-plugin Configuration From jeda-core

If another project is going to depend on jedai-core, having the transitive dependencies assembled inside jedai-core has the potential to conflict if different versions of those same transitive dependencies are needed for the other project. Since jedai-ui is already assembling transitive dependencies, removing transitive dependencies from jedai-core should not have any effect on the UI.

Regarding JedAIToolkit sample csv file

I am looking at the source code of JedAIToolkit in github.

I am not able to find the sample csv file for testing.

Can I get the cd_gold.csv and cd.csv file which has been used for testing purpose of TestGtCSVReader.java & TestEntityCSVReader.java.

Unable to Read csv or json files

I have my own custom data csv files for both dataset as well as ground truth file,
can anyone help me to use this file to get result.
Actually it throw some errors while using this files as an input.

Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:218) at org.scify.jedai.datareader.groundtruthreader.GtCSVReader.getDuplicatePairs(GtCSVReader.java:206) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:58) at org.scify.jedai.workflowbuilder.Main.main(Main.java:254)

can anyone help me?

[WorkflowBuilder.Main] Error: can't locate dataset

Using the library from CLI (Linux) it raises this exception:

Please choose one of the available Clean-clean ER datasets:
1 - Abt-Buy
2 - DBLP-ACM
3 - DBLP-Scholar
4 - Amazon-Google Products
5 - IMDB-DBPedia Movies
1
Abt-Buy has been selected!
0 [main] ERROR com.esotericsoftware.minlog  - Error in data reading
java.io.FileNotFoundException: data/cleanCleanErDatasets/amazonProfiles (File o directory non esistente)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.scify.jedai.datareader.AbstractReader.loadSerializedObject(AbstractReader.java:54)
	at org.scify.jedai.datareader.entityreader.EntitySerializationReader.getEntityProfiles(EntitySerializationReader.java:48)
	at org.scify.jedai.workflowbuilder.Main.main(Main.java:241)
Exception in thread "main" java.lang.NullPointerException
	at java.util.ArrayList.addAll(ArrayList.java:581)
	at org.scify.jedai.datareader.entityreader.EntitySerializationReader.getEntityProfiles(EntitySerializationReader.java:48)
	at org.scify.jedai.workflowbuilder.Main.main(Main.java:241)

Change comparison counts type to int

We're using jedai-core in our application and we ran into some issues where the number of executed comparisons in ComparisonIterator was going over the number of total comparisons. We identified that this was happening because executedComparisons and totalComparisons are floats and changing them to ints fixed the problem. In Java, comparing two floats for exact equality is generally discouraged.

Error on TestGtRDFReader

Hi, I'm tried some tests with JedAI tool.
This tool is useful for my job and I think that it has big potentiality.
I've downloaded the attached file in nt format: source.nt, target.nt.
In the firts step I have successfully executed TestRdfReader class presents in the test package for both datasets. After that I've tried to execute TestGtRDFReader class with the same datasets used before, but I have the following error:
Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:203) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.performReading(GtRDFReader.java:236) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.getDuplicatePairs(GtRDFReader.java:92) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:57) at org.scify.jedai.datareader.TestGtRDFReader.main(TestGtRDFReader.java:39)

datasets.zip

Thanks in advance!

Dependency org.apache.httpcomponents:httpclient-cache, leading to CVE problem

Hi, In /maven-plugins/sitegen-maven-plugin,there is a dependency **org.apache.httpcomponents:httpclient-cache:jar:4.2.6
** that calls the risk method.

CVE-2020-13956

The scope of this CVE affected version is [,4.5.13)

After further analysis, in this project, the main Api called is org.apache.http.client.utils.URIUtils: extractHost(java.net.URI)Lorg.apache.http.HttpHost

Risk method repair link : GitHub

CVE Bug Invocation Path--

Path Length : 7

org.scify.jedai.datawriter.BlocksPerformanceWriter: printDetailedResultsToSPARQL(java.util.List,java.util.List,java.lang.String,java.lang.String)V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.jena.sparql.modify.UpdateProcessRemoteForm: execute()V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.jena.riot.web.HttpOp: execHttpPostForm(java.lang.String,org.apache.jena.sparql.engine.http.Params,java.lang.String,org.apache.jena.riot.web.HttpResponseHandler,org.apache.http.client.HttpClient,org.apache.http.protocol.HttpContext,org.apache.jena.atlas.web.auth.HttpAuthenticator)V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.jena.riot.web.HttpOp: exec(java.lang.String,org.apache.http.client.methods.HttpUriRequest,java.lang.String,org.apache.jena.riot.web.HttpResponseHandler,org.apache.http.client.HttpClient,org.apache.http.protocol.HttpContext,org.apache.jena.atlas.web.auth.HttpAuthenticator)V /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.http.impl.client.AbstractHttpClient: execute(org.apache.http.client.methods.HttpUriRequest,org.apache.http.protocol.HttpContext)Lorg.apache.http.HttpResponse; /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.http.impl.client.AbstractHttpClient: determineTarget(org.apache.http.client.methods.HttpUriRequest)Lorg.apache.http.HttpHost; /home/hjf/.m2/repository/org/apache/jena/jena-cmds/3.1.0/jena-cmds-3.1.0.jar
org.apache.http.client.utils.URIUtils: extractHost(java.net.URI)Lorg.apache.http.HttpHost;

Dependency tree--

[INFO] org.scify:jedai-core:jar:3.2.1
[INFO] +- org.jgrapht:jgrapht-core:jar:1.4.0:compile
[INFO] |  \- org.jheaps:jheaps:jar:0.11:compile
[INFO] +- net.sf.trove4j:trove4j:jar:3.0.3:compile
[INFO] +- com.esotericsoftware:minlog:jar:1.3.1:compile
[INFO] +- info.debatty:java-lsh:jar:0.11:compile
[INFO] |  \- info.debatty:java-string-similarity:jar:0.12:compile
[INFO] +- org.apache.commons:commons-lang3:jar:3.4:compile
[INFO] +- org.apache.commons:commons-math3:jar:3.1.1:compile
[INFO] +- org.apache.jena:jena-arq:jar:3.1.0:compile
[INFO] |  +- org.apache.jena:jena-core:jar:3.1.0:compile
[INFO] |  |  +- org.apache.jena:jena-iri:jar:3.1.0:compile
[INFO] |  |  +- xerces:xercesImpl:jar:2.11.0:compile
[INFO] |  |  |  \- xml-apis:xml-apis:jar:1.4.01:compile
[INFO] |  |  +- commons-cli:commons-cli:jar:1.3:compile
[INFO] |  |  \- org.apache.jena:jena-base:jar:3.1.0:compile
[INFO] |  |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
[INFO] |  +- org.apache.jena:jena-shaded-guava:jar:3.1.0:compile
[INFO] |  +- org.apache.httpcomponents:httpclient:jar:4.2.6:compile
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.2.5:compile
[INFO] |  |  \- commons-codec:commons-codec:jar:1.6:compile
[INFO] |  +- com.github.jsonld-java:jsonld-java:jar:0.7.0:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-core:jar:2.3.3:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.3.3:compile
[INFO] |  |  |  \- com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:compile
[INFO] |  |  \- commons-io:commons-io:jar:2.4:compile
[INFO] |  +- org.apache.httpcomponents:httpclient-cache:jar:4.2.6:compile
[INFO] |  +- org.apache.thrift:libthrift:jar:0.9.2:compile
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.20:compile
[INFO] |  +- org.apache.commons:commons-csv:jar:1.0:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.7.20:compile
[INFO] +- org.apache.jena:jena-cmds:jar:3.1.0:compile
[INFO] |  +- org.apache.jena:apache-jena-libs:pom:3.1.0:compile
[INFO] |  |  \- org.apache.jena:jena-tdb:jar:3.1.0:compile
[INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.20:compile
[INFO] |  \- log4j:log4j:jar:1.2.17:compile
[INFO] +- com.opencsv:opencsv:jar:3.7:compile
[INFO] +- org.jdom:jdom2:jar:2.0.6:compile
[INFO] +- org.scify:JInsect:jar:1.1:compile
[INFO] |  \- org.scify:OpenJGraph:jar:1.1:compile
[INFO] +- org.rdfhdt:hdt-java-core:jar:1.1:compile
[INFO] |  +- com.beust:jcommander:jar:1.32:compile
[INFO] |  +- org.rdfhdt:hdt-api:jar:1.1:compile
[INFO] |  \- org.apache.commons:commons-compress:jar:1.6:compile
[INFO] |     \- org.tukaani:xz:jar:1.4:compile
[INFO] +- com.google.guava:guava-testlib:jar:30.1.1-jre:test
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.2:test
[INFO] |  +- org.checkerframework:checker-qual:jar:3.8.0:test
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.5.1:test
[INFO] |  +- com.google.j2objc:j2objc-annotations:jar:1.3:test
[INFO] |  +- com.google.guava:guava:jar:30.1.1-jre:test
[INFO] |  |  +- com.google.guava:failureaccess:jar:1.0.1:test
[INFO] |  |  \- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:test
[INFO] |  \- junit:junit:jar:4.13.2:test
[INFO] |     \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] +- org.hamcrest:hamcrest:jar:2.2:test
[INFO] +- org.junit.jupiter:junit-jupiter-api:jar:5.7.2:test
[INFO] |  +- org.apiguardian:apiguardian-api:jar:1.1.0:test
[INFO] |  +- org.opentest4j:opentest4j:jar:1.2.0:test
[INFO] |  \- org.junit.platform:junit-platform-commons:jar:1.7.2:test
[INFO] \- org.junit.jupiter:junit-jupiter-engine:jar:5.7.2:test
[INFO]    \- org.junit.platform:junit-platform-engine:jar:1.7.2:test

Suggested solutions:

Update dependency version

Thank you very much.

Unable to build

I cloned the project to my local and followed the steps listed in the readme , but it fails to build with the error below :

git clone https://github.com/scify/JedAIToolkit.git
cd JedAIToolkit
git submodule update --init
mvn clean package
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (default) @ jedai-ui ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] jedai .............................................. SUCCESS [  0.259 s]
[INFO] jedai-core ......................................... SUCCESS [ 59.511 s]
[INFO] jedai-ui ........................................... FAILURE [  6.408 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:06 min
[INFO] Finished at: 2018-12-11T15:42:46-05:00
[INFO] Final Memory: 42M/406M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.                                                                   2-beta-5:single (default) on project jedai-ui: Error reading assemblies: Error l                                                                   ocating assembly descriptor: assembly.xml
[ERROR]
[ERROR] [1] [INFO] Searching for file location: C:\Users\Yeikel\Documents\JedAIT                                                                   oolkit\jedai-ui\assembly.xml
[ERROR]
[ERROR] [2] [INFO] File: C:\Users\Yeikel\Documents\JedAIToolkit\jedai-ui\assembl                                                                   y.xml does not exist.
[ERROR]
[ERROR] [3] [INFO] Invalid artifact specification: 'assembly.xml'. Must contain                                                                    at least three fields, separated by ':'.
[ERROR]
[ERROR] [4] [INFO] Failed to resolve classpath resource: assemblies/assembly.xml                                                                    from classloader: ClassRealm[plugin>org.apache.maven.plugins:maven-assembly-plu                                                                   gin:2.2-beta-5, parent: sun.misc.Launcher$AppClassLoader@33909752]
[ERROR]
[ERROR] [5] [INFO] Failed to resolve classpath resource: assembly.xml from class                                                                   loader: ClassRealm[plugin>org.apache.maven.plugins:maven-assembly-plugin:2.2-bet                                                                   a-5, parent: sun.misc.Launcher$AppClassLoader@33909752]
[ERROR]
[ERROR] [6] [INFO] File: C:\Users\Yeikel\Documents\JedAIToolkit\assembly.xml doe                                                                   s not exist.
[ERROR]
[ERROR] [7] [INFO] Building URL from location: assembly.xml
[ERROR] Error:
[ERROR] java.net.MalformedURLException: no protocol: assembly.xml
[ERROR]         at java.net.URL.<init>(URL.java:593)
[ERROR]         at java.net.URL.<init>(URL.java:490)
[ERROR]         at java.net.URL.<init>(URL.java:439)
[ERROR]         at org.apache.maven.shared.io.location.URLLocatorStrategy.resolv                                                                   e(URLLocatorStrategy.java:54)
[ERROR]         at org.apache.maven.shared.io.location.Locator.resolve(Locator.j                                                                   ava:81)
[ERROR]         at org.apache.maven.plugin.assembly.io.DefaultAssemblyReader.add                                                                   AssemblyFromDescriptor(DefaultAssemblyReader.java:309)
[ERROR]         at org.apache.maven.plugin.assembly.io.DefaultAssemblyReader.rea                                                                   dAssemblies(DefaultAssemblyReader.java:125)
[ERROR]         at org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.e                                                                   xecute(AbstractAssemblyMojo.java:352)
[ERROR]         at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo                                                                   (DefaultBuildPluginManager.java:134)
[ERROR]         at org.apache.maven.lifecycle.internal.MojoExecutor.execute(Mojo                                                                   Executor.java:208)
[ERROR]         at org.apache.maven.lifecycle.internal.MojoExecutor.execute(Mojo                                                                   Executor.java:154)
[ERROR]         at org.apache.maven.lifecycle.internal.MojoExecutor.execute(Mojo                                                                   Executor.java:146)
[ERROR]         at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.bu                                                                   ildProject(LifecycleModuleBuilder.java:117)
[ERROR]         at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.bu                                                                   ildProject(LifecycleModuleBuilder.java:81)
[ERROR]         at org.apache.maven.lifecycle.internal.builder.singlethreaded.Si                                                                   ngleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR]         at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(                                                                   LifecycleStarter.java:128)
[ERROR]         at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309                                                                   )
[ERROR]         at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194                                                                   )
[ERROR]         at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
[ERROR]         at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
[ERROR]         at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
[ERROR]         at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
[ERROR]         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR]         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces                                                                   sorImpl.java:62)
[ERROR]         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMet                                                                   hodAccessorImpl.java:43)
[ERROR]         at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR]         at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhan                                                                   ced(Launcher.java:289)
[ERROR]         at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Laun                                                                   cher.java:229)
[ERROR]         at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExi                                                                   tCode(Launcher.java:415)
[ERROR]         at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launch                                                                   er.java:356)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit                                                                   ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea                                                                   d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE                                                                   xception
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :jedai-ui

ArrayIndexOutOfBoundsException when blocking with schema clusters

I got the following error when I tried blocking with schema clusters:
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.scify.jedai.blockbuilding.AbstractBlockBuilding.lambda$parseIndex$10(AbstractBlockBuilding.java:167) at java.base/java.util.HashMap.forEach(HashMap.java:1336) at org.scify.jedai.blockbuilding.AbstractBlockBuilding.parseIndex(AbstractBlockBuilding.java:164) at org.scify.jedai.blockbuilding.AbstractBlockBuilding.readBlocks(AbstractBlockBuilding.java:196) at org.scify.jedai.blockbuilding.AbstractBlockBuilding.getBlocks(AbstractBlockBuilding.java:96) at org.scify.jedai.gui.utilities.WorkflowManager.runBlockBuilding(WorkflowManager.java:824) at org.scify.jedai.gui.utilities.WorkflowManager.runBlockingBasedWorkflow(WorkflowManager.java:896) at org.scify.jedai.gui.utilities.WorkflowManager.executeFullBlockingBasedWorkflow(WorkflowManager.java:393) at org.scify.jedai.gui.utilities.WorkflowManager.executeFullWorkflow(WorkflowManager.java:695) at org.scify.jedai.gui.controllers.steps.CompletedController.lambda$runAlgorithmBtnHandler$6(CompletedController.java:316) at java.base/java.lang.Thread.run(Thread.java:834)

There is a String split operation in the parseIndex function, that is not working properly:
final String[] entropyString = key.split(CLUSTER_SUFFIX);
The delimiters used in keyare equivalent to CLUSTER_PREFIX, not CLUSTER_SUFFIX, and they contain a dollar-sign that has to be escaped. I worked around the issue by changing the above line to
final String[] entropyString = key.split("#\\$!cl");

I'd suggest changing the values of the prefix and suffix to something that is compatible with regex - the solution above is less readable after all.

No URLs to Download

I am trying to download the pre-compiled version from the http://jedai.scify.org website.

When I click on Download desktop app for both "Desktop application for Entity Resolution" and "Workbench tool," I get a "Page Not Found" on Github.

Additionally, I created an issue for this, because the webpage doesn't have any contact information. :/

^ I tried compiling it on my machine, but it showed to a crawl, and took over an hour, so I decided to try to download the precompiled JARs. That's why I wanted to download it.

Make jedai-core Extensible

Users of jedai-core are unable to extend the library to utilize a custom similarity metric or entity matching method due to the enums defined in the project (e.g. SimilarityMetric, EntityMatchingMethod, BlockCleaningMethod, etc.). Instead, if these features utilized an extension mechanism (for example, java.util.ServiceLoader or something equivalent), custom features would be possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.