Giter Site home page Giter Site logo

spellfix's Introduction

spellfix

A typo correction command-line program written in Python.

status

Master: codecov Build Status

Develop: codecov Build Status

installation

git clone https://github.com/mathematicalmichael/spellfix.git
cd spellfix
pip install .

optional Tf-Idf suggestions

pip install numpy cython
pip install .[recommend]

optional testing dependencies

pip install .[testing]

That should do it, and you can always use the syntax pip install -e ., which will install a development version (as opposed to linking to compiled binaries).

usage

Let's say you have a file named wordlist.txt with a list of names you need to sort through.

If you instead have a CSV with counts, called wordlist.csv, where the first column is name and second column has frequency counts (such as those from groupby, run

python convert csv_to_unknown_json.py -f wordlist.csv

as a pre-processing step. This sets up a dictionary based on those counts for use in spellfix.py. It also creates the expected wordlist.txt file by stripping the second column. Make sure that there is no header in the CSV

If you have the suggestions module, run

python suggest.py -f wordlist.txt

To generate a wordlist-suggestions.json file for use in spellfix.py, based on TF-iDF with ngrams.

If these files are missing, spellfix.py will do what it can starting from a bare .txt file, but will be less good at making suggestions because of its use of Levenshtein distance (of 2) as a threshold for corrections. Complicated changes won't be suggested, hence the suggest.py module.

Now you can run

python spellfix.py -f wordlist.txt

and use the interactive menu. Pressing Enter will perform the same function as S, saving.

When corrections are made or words are added, extra weight is assigned to them in order to aid automatic decision making. When probabilities are sufficiently high and possible choices sufficiently low, the algorithm will make the choice on your behalf.

The default action of the spellfix module is to run correct(), in an attempt to build the dictionary of corrections.

When you save, you'll be able to "hot start" from the files. You will see files prepended with the name of your original list.

docs

python spellfix.py --help

python make_names.py --help

usage

Generate a file of names (if you don't have your own), and call it wordlist.txt, which the program looks for by default.

python make_names.py

Then, you can run python spellfix.py

If you want to specify your own file: python spellfix.py -f FILE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.