Giter Site home page Giter Site logo

rxp90 / jsymspell Goto Github PK

View Code? Open in Web Editor NEW
19.0 19.0 7.0 2.63 MB

Java 8+ zero-dependency port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

Home Page: https://medium.com/@wolfgarbe/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f

License: MIT License

Java 100.00%
java spellcheck spelling spelling-correction symspell

jsymspell's Issues

Race condition prevents library initialization ~1/100 times.

We've started using this at $WORK, and we've seen that sometimes initializing the library fails.

new SymSpellBuilder().setUnigramLexicon(unigrams)
                            .setBigramLexicon(bigrams)
                            .setMaxDictionaryEditDistance(2)
                            .createSymSpell();

Throws this exception:

        java.lang.ArrayIndexOutOfBoundsException
                at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
                at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
                at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
                at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
                at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
                at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
                at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
                at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
                at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
                at io.gitlab.rxp90.jsymspell.SymSpellImpl.<init>(SymSpellImpl.java:43)
                at io.gitlab.rxp90.jsymspell.SymSpellBuilder.createSymSpell(SymSpellBuilder.java:64)
                ... 
        Caused by: java.lang.ArrayIndexOutOfBoundsException
                at java.lang.System.arraycopy(Native Method)
                at java.util.ArrayList.addAll(ArrayList.java:586)
                at io.gitlab.rxp90.jsymspell.SymSpellImpl.lambda$null$1(SymSpellImpl.java:45)
                at java.util.HashMap.forEach(HashMap.java:1289)
                at io.gitlab.rxp90.jsymspell.SymSpellImpl.lambda$new$2(SymSpellImpl.java:45)
                at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
                at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)
                at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
                at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
                at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
                at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
                at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
                at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
                at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
                at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

We've wrapped the initialization code in a loop as a stop gap, but it appears the problem is here:

this.unigramLexicon.keySet().parallelStream().forEach((word) -> {
            Map<String, Collection<String>> edits = this.generateEdits(word);
            edits.forEach((string, suggestions) -> {
                ((Collection)this.deletes.computeIfAbsent(string, (ignored) -> {
                    return new ArrayList();
                })).addAll(suggestions);
            });
        });

The this.deletes collection is a ConcurrentHashMap, but the new ArrayList() it returns is not thread-safe. I believe switching the computeIfAbsent to a compute call that also adds the values to the ArrayList will fix it, but I'm not sure.

Error importing from maven

To reproduce:

  1. Add implementation 'io.gitlab.rxp90:jsymspell:1.0' to grade and "build".

Result:

error: cannot access Bigram

import io.gitlab.rxp90.jsymspell.api.Bigram;
^
bad class file: /Users/home/.gradle/caches/modules-2/files-2.1/io.gitlab.rxp90/jsymspell/1.0/8367b65ce9301a734bb6368a7d7149299ccb964d/jsymspell-1.0.jar(io/gitlab/rxp90/jsymspell/api/Bigram.class)
class file has wrong version 55.0, should be 52.0
Please remove or make sure it appears in the correct subdirectory of the classpath.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.