Giter Site home page Giter Site logo

mnemonicode's Introduction

Oren Tirosh is no longer maintaining the original version of this project.
Stephen Paul Weber likes it and is making it accessible on GitHub.

These routines implement a method for encoding binary data into a sequence
of words which can be spoken over the phone, for example, and converted
back to data on the other side.

For more information see <http://web.archive.org/web/20101031205747/http://www.tothink.com/mnemonic/>

There are some other somewhat similar systems that seem less satisfactory:

- OTP was designed for easy typing, and for minimizing length, but as
  a consequence the word list contains words that are similar ("AD"
  and "ADD") that are poor for dictating over the phone

- PGPfone has optimized "maximum phonetic distance" between words,
  which resolves the above problem but has some other drawbacks:

  - Low efficiency, as it encodes a little less than 1 bit per
    character;

  - Word quality issues, as some words are somewhat obscure to
    non-native speakers of English, or are awkward to use or type.

Mnemonic tries to do better by being more selective about its word
list.  Its criteria are thus:

Mandatory Criteria:

 - The wordlist contains 1626 words.

 - All words are between 4 and 7 letters long.

 - No word in the list is a prefix of another word (e.g. visit,
   visitor).

 - Five letter prefixes of words are sufficient to be unique. 

Less Strict Criteria:

  - The words should be usable by people all over the world. The list
    is far from perfect in that respect. It is heavily biased towards
    western culture and English in particular. The international
    vocabulary is simply not big enough. One can argue that even words
    like "hotel" or "radio" are not truly international. You will find
    many English words in the list but I have tried to limit them to
    words that are part of a beginner's vocabulary or words that have
    close relatives in other european languages. In some cases a word
    has a different meaning in another language or is pronounced very
    differently but for the purpose of the encoding it is still ok - I
    assume that when the encoding is used for spoken communication
    both sides speak the same language.

  - The words should have more than one syllable. This makes them
    easier to recognize when spoken, especially over a phone
    line. Again, you will find many exceptions. For one syllable words
    I have tried to use words with 3 or more consonants or words with
    diphthongs, making for a longer and more distinct
    pronounciation. As a result of this requirement the average word
    length has increased. I do not consider this to be a problem since
    my goal in limiting the word length was not to reduce the average
    length of encoded data but to limit the maximum length to fit in
    fixed-size fields or a terminal line width.

  - No two words on the list should sound too much alike. Soundalikes
    such as "sweet" and "suite" are ruled out. One of the two is
    chosen and the other should be accepted by the decoder's
    soundalike matching code or using explicit aliases for some words.

  - No offensive words. The rule was to avoid words that I would not
    like to be printed on my business card. I have extended this to
    words that by themselves are not offensive but are too likely to
    create combinations that someone may find embarrassing or
    offensive. This includes words dealing with religion such as
    "church" or "jewish" and some words with negative meanings like
    "problem" or "fiasco". I am sure that a creative mind (or a random
    number generator) can find plenty of embarrasing or offensive word
    combinations using only words in the list but I have tried to
    avoid the more obvious ones. One of my tools for this was simply a
    generator of random word combinations - the problematic ones stick
    out like a sore thumb.

  - Avoid words with tricky spelling or pronounciation. Even if the
    receiver of the message can probably spell the word close enough
    for the soundalike matcher to recognize it correctly I prefer
    avoiding such words. I believe this will help users feel more
    comfortable using the system, increase the level of confidence and
    decrease the overall error rate. Most words in the list can be
    spelled more or less correctly from hearing, even without knowing
    the word.

  - The word should feel right for the job. I know, this one is very
    subjective but some words would meet all the criteria and still
    not feel right for the purpose of mnemonic encoding. The word
    should feel like one of the words in the radio phonetic alphabets
    (alpha, bravo, charlie, delta etc).

mnemonic.h	Header file
mnemonic.c	Encoding/decoding and associated routines
mn_wordlist.c	The word list itself
mnencode.c	Sample program - encode data from stdin to stdout
mndecode.c	Sample program - decode data from stdin to stdout

== Other Implementations ==

Elixr:      <https://github.com/mwmiller/mnemonex>
Go:         <https://bitbucket.org/dchapes/mnemonicode>
JavaScript: <https://github.com/mbrubeck/mnemonic.js>
Python:     <https://github.com/bwhmather/python-mnemonicode>
Rust:       <https://github.com/mbrubeck/rust-mnemonic>

mnemonicode's People

Contributors

arran4 avatar dchapes avatar dj3vande avatar singpolyma avatar snej avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mnemonicode's Issues

Word list doesn't fulfill stated criteria/requirements

I noticed this "Mandatory Criteria" section in the README of this project:

mnemonicode/README

Lines 28 to 37 in 315aed6

Mandatory Criteria:
- The wordlist contains 1626 words.
- All words are between 4 and 7 letters long.
- No word in the list is a prefix of another word (e.g. visit,
visitor).
- Five letter prefixes of words are sufficient to be unique.

Unfortunately, some of these promised qualities are no longer true of the current list.

I don't know how crucial these qualities of the list are for various purposes the list is used for, but I think it'd be good to update the information to reflect the current list.

  • The list has 1633 words on it, not 1626.
  • There are a handful of 3-letter words on the list, like "ego", "fax", "jet" and "ski". (Note that removing all 3-letter words leaves you with 1,626 words -- perhaps the original list size?)

Also, a number of words on the list share five letter prefixes with other words on the list, meaning they are not unique. Here are the ones I was able to find:

  • capital and capitan
  • content and context
  • domingo, dominic and domino
  • formal and format
  • justice and justin
  • parade and paradox
  • patrol and patron
  • plaster and plastic
  • polite and politic
  • postage and postal
  • profile and profit
  • protect and protein
  • static and station

Possible solutions

  1. Edit the README to match the current list. 1633 words; all words are between 3 and 7 letters long; and No word in the list is a prefix of another word (e.g. visit, visitor), which are all (still) true.
  2. Remove the 3 letters words from the list, making the 1st and 2nd criteria true. Still need to remove the statement that "Five letter prefixes of words are sufficient to be unique."
  3. Remove one of each of these pairs (and keep all existing 3-letter words on the list), we're down to 1,614 words. Then add 12 new words to get back to the originally promised 1626 words. You'd then only have to edit criteria to read "All words are between 3 and 7 letters long."

Decoding an encoded string does not always return the original string

Per https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718931 there are certain strings that, once encoded, cannot be decoded back to the original string.

I did a review of the commit history since that old code was imported into this project, but I didn't see anything in the list that is likely to have fixed this. I haven't tested the specific case described at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718931 with the latest code here, but I can if needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.