Giter Site home page Giter Site logo

tcrouch / edits Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 70 KB

Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

License: MIT License

Ruby 99.67% Shell 0.33%
ruby edit-distance similarity-measures levenshtein jaro jaro-winkler edit-distance-algorithm damerau-levenshtein restricted-edit text

edits's Introduction

Edits

Gem GitHub Workflow Status (with branch) Inline docs Yard Docs

A collection of edit distance algorithms in Ruby.

Includes Levenshtein, Restricted Edit (Optimal Alignment) and Damerau-Levenshtein distances, and Jaro and Jaro-Winkler similarity.

Installation

Add this line to your application's Gemfile:

gem 'edits'

And then execute:

$ bundle

Or install it yourself as:

$ gem install edits

Usage

Levenshtein variants

Calculate the edit distance between two sequences with variants of the Levenshtein distance algorithm.

Edits::Levenshtein.distance "raked", "bakers"
# => 3
Edits::RestrictedEdit.distance "iota", "atom"
# => 3
Edits::DamerauLevenshtein.distance "acer", "earn"
# => 3
  • Levenshtein edit distance, counting insertion, deletion and substitution.
  • Restricted Damerau-Levenshtein edit distance (aka Optimal Alignment), counting insertion, deletion, substitution and transposition (adjacent symbols swapped). Restricted by the condition that no substring is edited more than once.
  • Damerau-Levenshtein edit distance, counting insertion, deletion, substitution and transposition (adjacent symbols swapped).
Levenshtein Restricted Damerau-Levenshtein Damerau-Levenshtein
"raked" vs. "bakers" 3 3 3
"iota" vs. "atom" 4 3 3
"acer" vs. "earn" 4 4 3

Levenshtein and Restricted Edit distances also have a bounded version.

# Max distance
Edits::Levenshtein.distance_with_max "fghijk", "abcde", 3
# => 3

The convenience method most_similar searches for the best match to a given sequence from a collection. It is similar to using min_by, but leverages a maximum bound.

Edits::RestrictedEdit.most_similar "atom", ["iota", "tome", "mown", "tame"]
# => "tome"

Jaro & Jaro-Winkler

Calculate the Jaro and Jaro-Winkler similarity/distance of two sequences.

Edits::Jaro.similarity "information", "informant"
# => 0.90235690235690236
Edits::Jaro.distance "information", "informant"
# => 0.097643097643097643

Edits::JaroWinkler.similarity "information", "informant"
# => 0.94141414141414137
Edits::JaroWinkler.distance "information", "informant"
# => 0.05858585858585863

Hamming

Calculate the hamming distance between two sequences.

Edits::Hamming.distance("explorer", "exploded")
# => 2

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/tcrouch/edits.

License

The gem is available as open source under the terms of the MIT License.

edits's People

Contributors

codacy-badger avatar dependabot-preview[bot] avatar dependabot[bot] avatar tcrouch avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

matthewhopwood

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.