aceakash / string-similarity Goto Github PK
View Code? Open in Web Editor NEWFinds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
License: MIT License
Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
License: MIT License
I just tryed the example:
stringSimilarity.compareTwoStrings("healed", "sealed");
//0.8
=> 80% for a 1 letter change.
stringSimilarity.compareTwoStrings("healed", "ehaled");
//0.6
=> 60% for 2 letter switching
Ok I get it but now I just try with another word that contains 1 letter less (5 char length vs 6)
stringSimilarity.compareTwoStrings("fuira", "fuia");
//0.57
=> 57% for a 1 letter change (just lost 23%)
stringSimilarity.compareTwoStrings("furia", "fuira");
//0.25
=> 25% for a 1 letter change (just lost 35%)
Seems to me that less the string is long more the matching is severe.
Is there a way to make it "average" undepending of the length ?
Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
What's the rationale behind this claim?
stringSimilarity.findBestMatch('wall e', ['wall·e', 'wall']);
stringSimilarity.findBestMatch('wall-e', ['wall·e', 'wall']);
stringSimilarity.findBestMatch('wall_e', ['wall-e', 'wall']);
These all return 1, as though "wall" is the best match. They should all return 0, since they differ by only 1 character and are more symbolically similar.
Hello, I'm getting this when building the productio package
....... from UglifyJs
Unexpected token: punc (,) ....
Perhaps there is a way to add build configuration to the package to fix this?
I've gone around by copying the code in my utilities library.
Thanks!
IE doesn't understand ES6 (const, let and arrow functions are the ES6 things that I see in the package sources) and to provide IE compatibility (curse him) we need to have our vendor bundles in ES5. And it's not easy to transpile a specific library during bundling...
The common way is to have ./dist/compare-strings.js in the npm package repo and an npm build script for ES6 -> ES5 transpilation process. If it's ok, I can provide a PR covering this situation. What do you think?
Hello, I found an issue comparing a word against a the same word plus a blank and another letter.
Eg:
"Iphone" compared with "Iphone X" gives me a match of 1, but the texts are not equal. It should be close to 1 but not 1.
I'm using version 2.0.0
findBestMatch('Iphone', ['Iphone 8', 'Iphone 10', 'Iphone X', 'Iphone XS'])
findBestMatch is really cool, but it could use one extra layer of convenience....
You see...I have an array of objects, and one of the attributes is the string that I am comparing.
I need to find the object from the array with the best matching string.
As it is, I have to extract the strings from all objects, find the best matching string, and then go back and find the object whose string is the best matching string.
It would be easier for me to pass-in the list of objects, along with the name of the attribute, and get back a reference to the best object. I suspect that this pattern of usage might be very common.
Again, this is not a bug, just a suggestion to make the API easier to use.
Thanks for this software -- I am going to use it to solve a tricky problem in converting some very old insurance data.
Code to replicate
var stringSimilarity = require('string-similarity');
console.log(stringSimilarity.compareTwoStrings('John Doe', 'john doe'));
Actual result
0.5
Expected result
1
Hi,
I'm having a weird result when comparing those two string.
It always return a rating of 0 despite them having 2 letters in common.
stringSimilarity.compareTwoStrings('NOS', 'NPS')
//0
stringSimilarity.findBestMatch('NOS', ['NPS'])
//{ ratings: [ { target: 'NPS', rating: 0 } ], bestMatch: { target: 'NPS', rating: 0 } }
https://runkit.com/588655d7fb7a220014a01b47/5886577d0629220014e341d7
Thanks.
Hello everyone, I hope all is doing good. I found a case in which there is a difference (search for <_15>), the difference is that the first string has <_15>FLL while the second one has <_15>ORD, yet the function is returning 1 as if it were a perfect match. The version used for this comparison was 4.0.1. Below you can see an example ready to be ran in node.js (system version 14.4.0):
const similarity = require("string-similarity");
const body1 = '<REQ><_0>MSG</_0><_1/><_2>55</_2><_3>ORG</_3><_4>F1</_4><_5>MIA</_5><_6>07560685</_6><_7>AC30</_7><_8>HFD</_8><_9>F1</_9><_10>T</_10><_11>US</_11><_12>USD</_12><_13>ZE</_13><_14>ODI</_14><_15>FLL</_15><_16>ORD</_16><_17>UNT</_17><_18>5</_18><_19>1</_19><_20>UNZ</_20><_21>1</_21><_22>000000</_22><_23/></REQ>';
const body2 = '<REQ><_0>MSG</_0><_1/><_2>55</_2><_3>ORG</_3><_4>F1</_4><_5>MIA</_5><_6>07560685</_6><_7>AC30</_7><_8>HFD</_8><_9>F1</_9><_10>T</_10><_11>US</_11><_12>USD</_12><_13>ZE</_13><_14>ODI</_14><_15>ORD</_15><_16>FLL</_16><_17>UNT</_17><_18>5</_18><_19>1</_19><_20>UNZ</_20><_21>1</_21><_22>000000</_22><_23/></REQ>';
console.log(similarity.compareTwoStrings(body1, body2));
Thanks!
Hey there!
I've noticed that string-similarity is having issues with game titles. Here's a little example:
var matches = stringSimilarity.findBestMatch('Portal 2', ['Portal', 'Portal 2']);
This example returns 'Portal' as bestMatch. However if I change the order of the targetStrings array, like this:
var matches = stringSimilarity.findBestMatch('Portal 2', ['Portal 2', 'Portal']);
Then bestMatch is Portal 2. While this sounds like the solution, searching Portal would lead to bestMatch = Portal 2.
Testing around I also found out that if I compareTwoStrings('Portal', 'Portal 2'), the return value is 1, even tho those 2 strings are obviously not exactly the same?
Is there any way to make the comparison more strict?
I am using compareTwoStrings in my own projects and have found a lot of use in having an optional parameter where you can give regex/string to replace by. In my recent implementation I needed to exclude special characters. Having the parameter allowed me to just add that bit of regex in my call like so: compareTwoStrings('hello', 'hey', /[^\w\s]/gi). I also added option to remove case sensitivity. I would love to contribute these features.
I notice that if we pass ababacac
and abacabac
in compareTwoStrings
, it return output 1
which is wrong.
var stringSimilarity = require('string-similarity');
console.log(stringSimilarity.compareTwoStrings('ababacac', 'abacabac'));
Expected
Should not 1
Output
1
minor change, var instead of let and const.
PR:
#70
How is similarity calculated?
The package.json reports the license to be ISC, while the LICENSE file reports it to be MIT.
It's quite important that this is fixed as license reporting tools will rightfully report this as problematic.
For some reason when both strings are 1 or less characters long, compareTwoStrings will return Number.NaN instead of expected 1 or 0.
Examples:
compareTwoStrings("", "") === Number.NaN
compareTwoStrings("a", "a") === Number.NaN
compareTwoStrings("a", "") === Number.NaN
compareTwoStrings("aa", "aa") === 1
compareTwoStrings("aa", "") === 0
This is a problem as Number.NaN is always greater then other numbers,
eg the following will always return false, even though expected true:
compareTwoStrings("", "") > 0.9
I have temporarily got around this in my own project, by simply using:
if(a.length <= 1 && b.length <= 1)return a === b ? 1 : 0;
return compareTwoStrings(a, b);
Cheers,
Josh
I have the following two strings:
grid styling xs 1/12
grid styling xs 2/12
And my search input is
xs 2
Both strings get the exact same score, which doesn't seem right.
Because the xs 2 has a longer "direct match" in the order of words with the second string than the first one.
The first string only gets the same score because there is also a "2" in the string. But it's on a location that shouldn't influence the score as match as the "2" in the "right spot"
I've tried to compare the strings, as follow:
stringSimilarity.findBestMatch('bnp', ['BNP Paribas', absolutelyunrelated])
Both
rating
are 0
stringSimilarity.findBestMatch('BNP', ['BNP Paribas', absolutelyunrelated])
BNP Paribas rating is 0.36363636363636365
I wouldn't expect 0 with bnp
, I'd expect something around 0.2+ I guess, the difference is huge based on the string case.
Hello!
If I try to compare two strings like
compareTwoStrings("set", "st");
the result is always 0. Strings with length lower than 3 cannot be compared.
Is this normal?
Greatings!
Use case:
Instead of wanting to compare ["foo","bar","baz"]
, it can be useful to pass in an array of objects for which you want to compare one property, i.e.
[
{ name: "foo", otherProperty: 23 },
{ name: "bar", otherProperty: 27 },
{ name: "baz", otherProperty: 99 }
]
and instruct the function to compare based on the name
property, but return the whole object in the response.
I have already created a PR #124 for this. Just need approval
I need to compare two strings completely, that means it should also detect spaces and caps.
I have read the relase notes, and I don't understand why you decided to disregard spaces from version 3.0.0, so after running npm install --save [email protected]
, it detects spaces, but it is not case sensitive.
Latest version:
stringSimilarity.compareTwoStrings("Te st", "Test"); //1.00
2.0 version
stringSimilarity.compareTwoStrings("TEST", "test"); //1.00
Please help, I need to get this done as soon as possible.
Thank you!
Hello, how can I pass a key:value array targetStrings to
findBestMatch(mainString, targetStrings) {
}
The match target is the key in the key:value array.
Thanks
It will be good if you memoize the result, so if you run the function with the same arguments, it will give the result right away instead of making the calculation all over again
If we test from https://planetcalc.com for;
source : Olive-green table for sale, in extremely good condition.
target : For sale: table in very good condition, olive green in colour.
number of movement is 47
source : Olive-green table for sale, in extremely good condition.
target : Wanted: mountain bike with at least 21 gears..
number of movement is 47
looks same lol :D but doesn't make sense. Sørensen–Dice very accurate.
I'd be great if this library was also usable in the browser as it currently uses require this makes it impossible to use on the client side. 😞
I have test string-similarity with chinese letter, but seems not working, it appears "0"
var similarity = stringSimilarity.compareTwoStrings('布莱顿', '布赖顿');
plz advise. thx.
Warning: [some/module.ts] depends on 'string-similarity'. CommonJS or AMD dependencies can cause optimization bailouts.
Is it possible to have an es6 version? It would be nice to avoid this kind of thing: https://web.dev/commonjs-larger-bundles/
I'm trying to use the package to implement a fuzzy search in a React component.
I don't want to use UMD.
I'm trying to import the module like so:
import stringSimilarity from 'stringSimilarity'
Node throw: Cannot find module 'stringSimilarity'.
Is it possible with this package ?
I need string similarity in angular, how to perform string-similarity with angular 4, thanks
I am pretty sure I did it correctly I have the first as a string and the second as an array the code is
const pokemons = require('../../arrays/pokemons.js')
const poss = stringSimilarity.findBestMatch(args, pokemons)
arrays file
exports.pokemons = [
pokemon lists
]
Your algorithm is not the Dice coefficient. It counts all bigram duplicates, whereas the Dice coefficient only counts distinct bigrams (as defined in Wikipedia).
As an example, let's compare two versions of the main file of this repo (https://github.com/aceakash/string-similarity/blob/2718c82bbbf5190ebb8e9c54d4cbae6d1259527a/compare-strings.js and the latest https://github.com/aceakash/string-similarity/blob/eaeec5d74c98a6f6fcb1b06fad44ad7f3d8c2965/src/index.js. They have a Dice coefficient of 0.90, but this lib string-similarity
outputs 0.74 when comparing these two files.
Please have a look at the implementations in Talisman, NLTK or in many languages in https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice%27s_coefficient
All in the title.
This looks like expected behavior, but it could be useful to fall back to a simple algorithm if one of the inputs is a length 1 string.
if (first.length === 1 || second.length === 1) { // if either is a 1-letter string
let [smaller, larger] = (first.length === 1)
? [first, second]
: [second, first];
return larger.includes(smaller) ? 2.0 / (larger.length + 1) : 0;
}
This came up when I tried to use compareTwoStrings
for a search ranking.
string-similarity/compare-strings.js
Line 14 in ccdb537
Hi :)
Just finished modifying this script so that I can use it in mongoDB (SpiderMonkey with some parts of ES6) without the lodash dependency, and noticed this unused method, isEdgeCaseWithOneOrZeroChars
.
It was introduced here, but that's over a year ago, and it hasn't been used since then.
So I'm wondering if it's some unfinished work that should be there, or just some a stab at some approach deemed unnecessary and then accidentally left behind?
Cheers! :)
Daniel
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.