Giter Site home page Giter Site logo

cmudict-parser's Introduction

cmudict-parser: ARPAbet and IPA for CMU Dictionary

Python Python

Python parser for CMUDict files. It returns ARBAbet and IPA transciption of dictionary words.

Installation

python3.8 -m pip install pipenv
python3.8 -m pipenv install --ignore-pipfile

Usage

from cmudict_parser import get_dict

cmudict = get_dict(
    download_folder="/tmp"
)

print(cmudict.get_all_arpa("to"))
# ['T UW1', 'T IH0', 'T AH0']

print(cmudict.get_all_ipa("to"))
# ['tˈu', 'tɪ', 'tʌ']

print(cmudict.get_first_ipa("to"))
# tˈu

Development

apt install python3-lib2to3
python3.8 -m pip install pipenv
python3.8 -m pipenv install --dev

Add to other project

In the destination project run:

# if not already done:
python3.8 -m pip install pipenv
# add reference
python3.8 -m pipenv install -e git+https://github.com/stefantaubert/cmudict-parser.git@master#egg=cmudict_parser

Notes

cmudict-parser's People

Contributors

jasminsternkopf avatar stefantaubert avatar

Stargazers

 avatar

Watchers

 avatar  avatar

cmudict-parser's Issues

AssertionError in SentenceToIPA

Please fix the error occuring on together, higgledy-piggledy, the.

Callback:

Exception has occurred: AssertionError
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 115, in get_ipa_of_words_with_hyphen
    assert ipa is not None
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 89, in ipa_of_punctuation_and_words_combined
    ipa_of_word_without_punct = f"{get_ipa_of_words_with_hyphen(dict, word_without_punctuation, replace_unknown_with)}{char_at_end}"
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 54, in get_ipa_of_word_with_punctuation
    return ipa_of_punctuation_and_words_combined(dict, punctuation_before_word, word_without_punctuation, punctuation_after_word, replace_unknown_with)
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 42, in get_ipa_of_word_in_sentence
    ipa = get_ipa_of_word_with_punctuation(dict, word, replace_unknown_with)
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 34, in get_ipa_of_word_in_sentence_cache
    ipa = get_ipa_of_word_in_sentence(dict, word, replace_unknown_with)
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 21, in <listcomp>
    ipa_words = [get_ipa_of_word_in_sentence_cache(
  File "/home/mi/code/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 21, in sentence_to_ipa
    ipa_words = [get_ipa_of_word_in_sentence_cache(
  File "/home/mi/code/cmudict-parser/cmudict_parser/CMUDict.py", line 51, in sentence_to_ipa
    return get_ipa_of_sentence(self._entries_first_ipa, sentence, replace_unknown_with, use_caching)
  File "/home/mi/code/cmudict-parser/cmudict_parser/playground.py", line 9, in <module>
    res = cmu.sentence_to_ipa(x, replace_unknown_with="_")

IndexError on sentence with comma

The following text raised an error: it is not a real gain, for the modern printer throws the gain away by putting inordinately wide spaces between his lines, which, probably,

Traceback

Traceback (most recent call last):
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/text-utils/text_utils/text.py", line 86, in en_to_ipa_cmu_epitran
    result = CMU_CACHE.sentence_to_ipa(
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/CMUDict.py", line 63, in sentence_to_ipa
    ipa = self.get_ipa_of_word_in_sentence(word, replace_unknown_with)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/CMUDict.py", line 70, in get_ipa_of_word_in_sentence
    ipa = self.get_ipa_of_words_with_punctuation(word, replace_unknown_with)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/CMUDict.py", line 95, in get_ipa_of_words_with_punctuation
    if self.contains(word_with_apo_at_beginning) and punctuations_before_word[-1] == "'":
IndexError: string index out of range

Conversion failed

Conversion failed for Suckin'--I mean helpin' people an' fightin' an' all that.

Traceback:

Traceback (most recent call last):
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/text-utils/text_utils/text.py", line 86, in en_to_ipa_cmu_epitran
    result = CMU_CACHE.sentence_to_ipa(
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/CMUDict.py", line 51, in sentence_to_ipa
    return get_ipa_of_sentence(self._entries_first_ipa, sentence, replace_unknown_with, use_caching)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 17, in sentence_to_ipa
    ipa_words = [get_ipa_of_word_in_sentence_cache(
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 17, in <listcomp>
    ipa_words = [get_ipa_of_word_in_sentence_cache(
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 30, in get_ipa_of_word_in_sentence_cache
    ipa = get_ipa_of_word_in_sentence(dict, word, replace_unknown_with)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 38, in get_ipa_of_word_in_sentence
    ipa = get_ipa_of_word_with_punctuation(dict, word, replace_unknown_with)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 50, in get_ipa_of_word_with_punctuation
    return ipa_of_punctuation_and_words_combined(dict, punctuation_before_word, word_without_punctuation, punctuation_after_word, replace_unknown_with)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 85, in ipa_of_punctuation_and_words_combined
    ipa_of_word_without_punct = f"{get_ipa_of_words_with_hyphen(dict, word_without_punctuation, replace_unknown_with)}{char_at_end}"
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 107, in get_ipa_of_words_with_hyphen
    ipa = find_combination_of_certain_length_in_dict(
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 119, in find_combination_of_certain_length_in_dict
    word, apos_before, apos_after = strip_apos_at_beginning_and_end_if_they_do_not_belong_to_word(
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 130, in strip_apos_at_beginning_and_end_if_they_do_not_belong_to_word
    word, apos_before = strip_apos(word, 0)
  File "/home/mi/.local/share/virtualenvs/tacotron2-vfc1XCWN/src/cmudict-parser/cmudict_parser/SentenceToIPA.py", line 144, in strip_apos
    while word[pos] == "'":
IndexError: string index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.