Giter Site home page Giter Site logo

correcting-words-using-nltk's Introduction

Correcting Words using NLTK in python

This repository is the detailed explanation and code for spelling auto correction using NLP. For spelling correction, we're going to employ two strategies. Each technique uses a list of misspelled words and suggests a replacement term for each mistaken word. It looks for a word with the same initial letter as the misspelled word that is closest to it in the list of accurate spellings. The word that meets the specified parameters is then returned. On the premise of the distance metric they employ to identify the closest word, the methods can be distinguished. The dictionary of appropriate terms is provided by the nltk package words.

Method 1: Using Jaccard distance Method

The Jaccard distance, which measures the dissimilarity between two sample groups, is the opposite of the Jaccard coefficient. By deducting the Jaccard parameter from 1, we can calculate the Jaccard distance. We can also obtain it by multiplying the union's size by the ratio of the sizes of the overlap of the two sets. Instead of using tokens, we use Q-grams, which are equal to N-grams and are referred to as characters. The following formula yields the Jaccard Distance.

Dj(A,B) = 1 - J(A,B) = (|AB| - |AB|)/|AB|

Importing and Downloading:

We import nltk, jaccard_distance and ngrams. We also download the words resource from nltk.downloads and assign it to crt_wrds.

Calculating:

We define a function auto_crt() in which we calculate the Jaccard distance of the incorrect word with each correct spelling word having the same initial letter in the form of bigrams of characters. We then sort them in ascending order so the shortest distance is on top and extract the word corresponding to it and return it.

Executing:

We take input from you as list and run it through the words in it. You can see the magic of n-grams here and find the correct spelling of the given incorrect word.

Method 2: Using Edit distance Method

Edit Distance measures dissimilarity between two strings by finding the minimum number of operations needed to transform one string into the other. The transformations that can be performed are:

  • Inserting a new character:

bat -> bats (insertion of 's')

  • Deleting an existing character.

care -> car (deletion of 'e')

  • Substituting an existing character.

bin -> bit (substitution of n with t)

  • Transposition of two existing consecutive characters.

sing -> sign (transposition of ng to gn)

Importing and Downloading:

Import nltk and edit_distance. We also download the words resource from nltk.downloads and assign it to crt_wrds.

Calculating:

Define a function auto_crt() in which we calculate the Edit distance of the incorrect word with each correct spelling word having the same initial letter. We then sort them in ascending order so the shortest distance is on top and extract the word corresponding to it and retrun it.

Executing:

We take input from you as list and run it through the words in it as same as the above method. This method gives us the output according to the shortest distance. Therefore, it can change for the same inputs in different outputs.

correcting-words-using-nltk's People

Contributors

jitendracheripally2003 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.