The library provides efficient implementations of various strings metric
algorithms. It works with strict Text
values.
The current version of the package implements:
- Levenshtein distance
- Normalized Levenshtein distance
- Damerau-Levenshtein distance
- Normalized Damerau-Levenshtein distance
- Hamming distance
- Jaro distance
- Jaro-Winkler distance
- Overlap coefficient
- Jaccard similarity coefficient
There is edit-distance
package whose scope overlaps with the scope of
this package. The differences are:
-
edit-distance
allows to specify costs for every operation when calculating Levenshtein distance (insertion, deletion, substitution, and transposition). This is rarely needed though in real-world applications, IMO. -
edit-distance
only provides Levenshtein distance,text-metrics
aims to provide implementations of most string metrics algorithms. -
edit-distance
works onStrings
, whiletext-metrics
works on strictText
values.
Although we originally used C for speed, currently all functions are pure Haskell tuned for performance. See this blog post for more info.
Copyright ยฉ 2016โ2018 Mark Karpov
Distributed under BSD 3 clause license.