Giter Site home page Giter Site logo

yomguithereal / clj-fuzzy Goto Github PK

View Code? Open in Web Editor NEW
260.0 260.0 27.0 780 KB

A handy collection of algorithms dealing with fuzzy strings and phonetics.

Home Page: http://yomguithereal.github.io/clj-fuzzy/

License: MIT License

JavaScript 0.09% Clojure 99.88% Shell 0.03%

clj-fuzzy's Issues

issue using this project as a dependency in clojurescript

Hello! Thanks for writing this cool library :).

I am using clojurescript version "0.0-3165" and clojure version "1.7.0-beta1" and I am unable to depend on clj-fuzzy from clojurescript. If it makes a difference, I am using the boot build tool.

Levenshtein Distance Error On Empty Sequence

If the first sequence passed to the levenshtein distance function is empty, an exception is thrown:

(fuzzy/levenshtein "" "abc")
ClassCastException clojure.lang.LazySeq cannot be cast to clojure.lang.IPersistentStack  clojure.lang.RT.peek (RT.java:710)

Whereas:

(fuzzy/levenshtein "abc" "")
3

An empty sequence for first position needs to be handled.

Levenstein distance performance

The levenshtein/distance function has very poor performance for even short strings โ€“ is this a known issue? It takes 10 seconds on a MacBook Pro (3 GHz Intel Core i7) running Java 8.

user=> (time (clj-fuzzy.levenshtein/distance "feature" "get-project-features"))
"Elapsed time: 10438.251547 msecs"
13

Documentation website outdated

Hi,

I ran out on the issue of the dependency not found in Clojurescript until I found out I was using an outdated version of the lib 0.1.8, following install steps documented here http://yomguithereal.github.io/clj-fuzzy/clojure.html.

Looking more closely, website indicates Currently v0.3.2 in the sidebar and 0.1.8 in the Clojure install page, which are both wrong according to Clojars.

Is there anything I can do to help ?

Thanks!

Big-O Performance

Great to see a library like this. I would love to see the Big-O performance of each fuzzy algorithm displayed so I know what size of data I can it for and maybe some advice about pros and cons.

I'm doing some fuzzy matching for accounting purposes ("McDonalds": "is this a business expense? probably not") and I wouldn't know which algo to pick to save the most time

dice algorithm NaN

Just trying out this algorithm and seems the dice algorithm has some minor bugs (or I am not understanding it quite right):
screen shot 2014-11-13 at 09 16 01

These are the results I am getting with strings of length 0 and length 1, could this have anything to do with the input being characters rather than actual strings? Is that as intended?

Clojurescript should be a dev dependency

Onyx (https://github.com/onyx-platform/onyx) uses clj-fuzzy as a dependency, however we have to exclude clojurescript, as it is an unnecessary dependency for clojure users, and can cause conflicts in our user's projects. I think you would be best served by making it a dev dependency, as any of your cljs users will need clojurescript as a dependency anyway.

Thanks for a great project!

Spanish support?

There is no mention whatsoever about language support.
Schinke stemmer is supposed to be latin but it doesn't work as expected.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.