mphilli / english-to-ipa Goto Github PK

View Code? Open in Web Editor NEW

352.0 352.0 76.0 3.77 MB

Converts English text to IPA notation

License: MIT License

Python 100.00%

english-to-ipa's People

Contributors

Stargazers

Watchers

Forkers

benjaminbenetti stephica jemisa wangyu0305 mitchellpkt xue2sheng mrddter apriltuesday ftorregrossa s001dxp timvancann silasjeon bdeb212 vocaddict antonofthewoods jayboon pei2tech tianchi03 templeblock valerionerigit ipwnosx gzjas yannikbenz juliakorovsky lnashier archiba tyson-tx xiaolai uwci pub-technology gowtamimohanty lhmei stefantaubert binu-alexander coryvegan md84419 yeungon kingfener nathantoms willianantunes dimast00 dotrinh-dm efrence zagzam machari pydswork btrungchi testmailtt robmsmt frann11-forks dev-kn felicityyiran ishine imhugogonzalez ssunqf makinzm outsourcestudio wwens floppydiflop aflah02 hisashi-y nvhoc elijahahianyo andyscpalmer 5l1v3r1 ifeanyipossibilities ebell495 sguzman ulisseshen hpham04 13guff13 jelouh

english-to-ipa's Issues

Import issue, please help

Hello,

I try to set up this English-to-IPA but I get many error message
It would be very very appreciated to get your help to install.

I try

sudo python setup.py install : in the English-to-IPA-master folder
sudo python setup.py build : in the English-to-IPA-master folder
sudo python setup.py install : in the English-to-IPA-master folder
No any build or install and do it in the English-to-IPA-master folder

both results give same error message

(IN the English-to-IPA folder, i type python and do this...)
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import eng_to_ipa
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa/init.py", line 1, in
from .transcribe import *
File "eng_to_ipa/transcribe.py", line 58
c.execute(f"SELECT word, phonemes FROM dictionary WHERE word IN ({quest[:-2]})", words_in)
^
SyntaxError: invalid syntax

I'm very new at python, so I just comment and put
c.execute(words_in)
and try the same process.

Then I get this error message.

import eng_to_ipa as es
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa/init.py", line 1, in
from .transcribe import *
File "eng_to_ipa/transcribe.py", line 4, in
import eng_to_ipa.stress as stress
File "eng_to_ipa/stress.py", line 25
SyntaxError: Non-ASCII character '\xcb' in file eng_to_ipa/stress.py on line 25, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I can see the symbols looks like " , " & " ' "
, so I just replace it as comma and quotes.

Then I get this error message.

import eng_to_ipa
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa/init.py", line 1, in
from .transcribe import *
File "eng_to_ipa/transcribe.py", line 4, in
import eng_to_ipa.stress as stress
File "eng_to_ipa/stress.py", line 4, in
import eng_to_ipa.syllables as syllables
File "eng_to_ipa/syllables.py", line 4, in
from eng_to_ipa import transcribe
ImportError: cannot import name transcribe

"love, to" are all wired. Is the bug from CMU dict or this module?

please check and fix it if it's from the English-to-IPA.
if it's from CMU dict, can someone Email to CMU dict author?

US phoneset to IPA

I've seen you used a strategy to convert ARPABET to IPA symbols. Is there such a thing for US PHONESET as described here for Festival?

I created a project called Transcriber Wrapper and I implemented a way to convert what Festival returns to IPA symbols, but I'm sure that is incorrect! I created a class called InternationalPhoneticAlphabet that has all the logic. As my project is based on Phonemizer, I described it in a issue here.

@mphilli do you think there is a way for that? I'm asking you because I can use your project in order to assert many things, like the transcription! By the way, thank you for your hard work and dedication!

Nothing happens

Good day. I experience no problems with the installation or whatsoever, no errors. Actually, when I run the code, simply nothing happens. There isn't any output at all. What are the possible reasons for that?

Differences between here and ARPABET

According to http://www.speech.cs.cmu.edu/cgi-bin/cmudict#about, the CMU dictionary uses 2-letter ARPABET notation for representing sounds.
According to the wikipedia page for ARPABET (pointed to as the reference for translating to IPA on the above CMU page), the IPA correspondances would be (lowercased):

CMU_TO_IPA = {
    'aa'       : 'ɑ',   # balm, bot
    'ae'       : 'æ',   # bat
    'ah'       : 'ʌ',   # butt
    'ao'       : 'ɔ',   # story
    'aw'       : 'aʊ',  # bout
    'ax'       : 'ə',   # comma
    'axr'      : 'ɚ',   # letter
    'ay'       : 'aɪ',  # bite
    'eh'       : 'ɛ',   # bet
    'er'       : 'ɝ',   # bird
    'ey'       : 'eɪ',  # bait
    'ih'       : 'ɪ',   # bit
    'ix'       : 'ɨ',   # roses, rabbit
    'iy'       : 'i',   # beat
    'ow'       : 'oʊ',  # boat
    'oy'       : 'ɔɪ',  # boy
    'uh'       : 'ʊ',   # book
    'uw'       : 'u',   # boot
    'ux'       : 'ʉ',   # dude

    'b'        : 'b',   # buy
    'ch'       : 'tʃ',  # China
    'd'        : 'd',   # die
    'dh'       : 'ð',   # thy
    'dx'       : 'ɾ',   # butter
    'el'       : 'l̩',   # bottle
    'em'       : 'm̩',   # rhythm
    'en'       : 'n̩',   # button
    'f'        : 'f',   # fight
    'g'        : 'ɡ',   # guy
    'hh'       : 'h',   # high
    'h'        : 'h',   # high
    'jh'       : 'dʒ',  # jive
    'k'        : 'k',   # kite
    'l'        : 'l',   # lie
    'm'        : 'm',   # my
    'n'        : 'n',   # nigh
    'ng'       : 'ŋ',   # sing
    'nx'       : 'ɾ̃',   # winner
    'p'        : 'p',   # pie
    'q'        : 'ʔ',   # uh-oh
    'r'        : 'ɹ',   # rye
    's'        : 's',   # sigh
    'sh'       : 'ʃ',   # shy
    't'        : 't',   # tie
    'th'       : 'θ',   # thigh
    'v'        : 'v',   # vie
    'w'        : 'w',   # wise
    'wh'       : 'ʍ',   # why
    'y'        : 'j',   # yacht
    'z'        : 'z',   # zoo
    'zh'       : 'ʒ',   # pleasure
}

Whereas (slightly reorganised) I see on https://github.com/mphilli/English-to-IPA/blob/master/eng_to_ipa/transcribe.py#L98:

{
    "a": "ə",  #
    "ey": "e", #
    "aa": "ɑ",
    "ae": "æ",
    "ah": "ə", #
    "ao": "ɔ",
    "aw": "aʊ",
    "ay": "aɪ",
    "eh": "ɛ",
    "er": "ər", #
    "ih": "ɪ",
    "iy": "i",
    "ow": "oʊ",
    "oy": "ɔɪ",
    "uh": "ʊ",
    "uw": "u",
    "ch": "ʧ",
    "dh": "ð",
    "hh": "h",
    "jh": "ʤ",
    "ng": "ŋ",
    "sh": "ʃ",
    "th": "θ",
    "y": "j",
    "zh": "ʒ",
}

I have put a # next to the differences (the initial list also includes identical graphs). Is there a reason for the differences? I can understand 'er' but most of the others strike me as curious choices.

"and" is translated incorrectly

The call
print(ipa.convert("and"))
returns
ənd
which is incorrect. It should return
ænd

syllable mistakes

There are some mistakes in the syllables.
Such as "amusement", the result is "əmˈjuzmənt", but "əˈmjuːzmənt" or "əˈmjuːzmənt" in dict.

Pronunciation and Accents

I was wondering if there were a way to change the "accent" of the IPA produced? I would like my output to render a British accent, but also render different pronunciations of words from different accents in the same country. A lot of accents are due to vowel shifts, so a way to change all of one vowel sound to another.

If you point me in the right direction, I can try and figure it out. I'm really only a beginner, but I'm getting there with python.

Thanks for all this code. It's just what I was looking for :)

Edit: What I mean by British accent is RP or NRP. I'm referring to TRAP - BATH differences etc.. A summary can be found here: https://notendur.hi.is/peturk/KENNSLA/02/TOP/AmvowelsSum.html

Outdated version on PyPI

There is an outdated version of this package on PyPI, which is the version that pip downloads by default.
https://pypi.org/project/eng-to-ipa/0.0.2/
It is maintained by someone named archiba.
mphili: can you claim and update it?

Can I use British English?

I prefer the British pronunciation and symbols.

Misconversion of words like C.O.D.

Hi!

Thanks for your work. I am using the convert function in transcribe.py.

It seems that word C.O.D. exist in CMUdict around Line 16930 C.O.D. S IY1 OW1 D IY1, but the convert function can not covert C.O.D. because preserve_punc function treat word C.O.D. as ["", "C.O.D", "."].

Obversiouly, C.O.D not in CMUdict, so the result is c.o.d*.

We can add some codes in preserve_punc function for dealing with this special case.

Syllables

Great library.

Is it also possible to use the library for dividing words into syllables?

ipa.convert("upset" causes an error

Running this code

 import eng_to_ipa as ipa
 print(ipa.convert("upset"))

returns
KeyError: 'ə'

cmudict_preparer.py Error

Hello!
Thank you for your work!
I am trying to run code from this repo and I've just faced a problem with dictionary preparer script.
I downloaded a dictionary from http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b and when I start script I face KeyError. Do you know how to solve such problem?

scripts username$ python cmudict_preparer.py

INFO:root:running cmudict_preparer...
INFO:root:reading source file...
Traceback (most recent call last):
File "cmudict_preparer.py", line 31, in
unique_dict[re.findall(pattern, word)[0]] += "%" + ' '.join(line.replace("\n", "").split(" ")[1:])
KeyError: 'SEMI-COLON'

Support for non-sqlite dbs

I'm very interested in using this but my app is multi-threaded so can't really use sqlite. I am going to add support for Postgres for my own purposes but if you would be interested in merging generic DB support then I can try and make it work for other flavours too. If you are interested I'll submit a PR.

how can i get all the IPA symbols?

Hello, thanks for your great work ! I'm a little confused about the symbols of DJ, KK and IPA Phonetic Alphabet. And I found not all IPA symbols was used in English, so could you please show all the symbols used in this tool?

How do you install it?

There's no setup.py

porting this project to a browser-based JS version

I will port this project to develop a version that works in the browser.

AH should be ʌ not ə

The short u in cup, which ARPABET renders as AH, should be ʌ in IPA. You have it as ə

How to process out of bound words?

Thans for the job.
I am not English native, I found that some out of bound word will give no correct IPA output, but the word with a star. eg:

e2ipa.convert("Strathclyde Police declined to comment") output is ：

Stranthclyde* pə¹lis dɪklaɪnd ...

I am not English speaker, and  I do not know how to process this kind of exception.

The codes cannot be used!

Hi ,

You codes cannot be installed with following error:

File "/usr/local/lib/python3.5/dist-packages/English_to_IPA-0.0.1-py3.5.egg/eng_to_ipa/transcribe.py", line 58
c.execute(f"SELECT word, phonemes FROM dictionary WHERE word IN ({quest[:-2]})", words_in)
^
SyntaxError: invalid syntax

File "/usr/local/lib/python3.5/dist-packages/English_to_IPA-0.0.1-py3.5.egg/eng_to_ipa/rhymes.py", line 17
c.execute(f"SELECT word, phonemes FROM dictionary WHERE phonemes "
^
SyntaxError: invalid syntax

Conversion "ʌ"

The ARPABET "AH" (such as "love", "cut", "hut") should be converted to "ʌ".
But seems mistakenly converted to "ə".
Could you check that?

error

u did awesome work, but when i run the import then the error occur. do u know why? thanks.

import eng_to_ipa
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa_init_.py", line 1, in
from .transcribe import *
File "eng_to_ipa\transcribe.py", line 69
asset.execute(f"SELECT word, phonemes FROM dictionary WHERE word IN ({quest[:-2]})", words_in)
^
SyntaxError: invalid syntax

What's the license for this code?

Hey there. I'd like to use this code as a library, but am unsure of the license? MIT or Apache2 licenses would be great, if possible :)

does it support Python 2.7？

TypeError: 'encoding' is an invalid keyword argument for this function

Thank you!

Hello!
Thanks for this library!

Get homophones?

Hi, is there a function similar to the rhyming function but for homophones?

stress_marks="none" not working

That's it. It just doesn't work.

new function (ruby-rt-rp-html) request

Can we add a new function on it with ruby-rt-rp tag to produce html code? then we can insert the code into html directly.
the ruby-rt-rp tag will align the phonics with its word.(https://github.com/dohliam/rubify)
for example:

Let's listen to  Fox  tell the story. I love you. I   am eating a peach. Sheep is here. Ship is here. Once   upon   a time. 
 lɛts ˈlɪsən  tu  fɑks tɛl   ðə ˈstɔri. aɪ lʌv   ju.  aɪ æm  ˈitɪŋ   ə   piʧ.        ʃip   ɪz  hir.    ʃɪp  ɪz   hir.   wʌns  əˈpɑn  ə taɪm.

Is English stress character should be put before or after the IPA symbol?

Thanks for the good job. I am not Native English and knowns little about IPA. But I saw in some TTS front-end that the stress character ˈ is just put after the IPA syllable. eg "ðə kwɪk braʊn fɑks ʤəmpt oˈʊvər ðə lˈeɪzi dɔg." while in this project, it just put before the IPA syllable. which is better? or which is the IPA standard?

mphilli / english-to-ipa Goto Github PK

english-to-ipa's People

Contributors

Stargazers

Watchers

Forkers

english-to-ipa's Issues

both results give same error message

Then I get this error message.

Then I get this error message.

Recommend Projects

Recommend Topics

Recommend Org