mphilli / english-to-ipa Goto Github PK
View Code? Open in Web Editor NEWConverts English text to IPA notation
License: MIT License
Converts English text to IPA notation
License: MIT License
Hello,
I try to set up this English-to-IPA but I get many error message
It would be very very appreciated to get your help to install.
I try
(IN the English-to-IPA folder, i type python and do this...)
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import eng_to_ipa
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa/init.py", line 1, in
from .transcribe import *
File "eng_to_ipa/transcribe.py", line 58
c.execute(f"SELECT word, phonemes FROM dictionary WHERE word IN ({quest[:-2]})", words_in)
^
SyntaxError: invalid syntax
I'm very new at python, so I just comment and put
c.execute(words_in)
and try the same process.
import eng_to_ipa as es
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa/init.py", line 1, in
from .transcribe import *
File "eng_to_ipa/transcribe.py", line 4, in
import eng_to_ipa.stress as stress
File "eng_to_ipa/stress.py", line 25
SyntaxError: Non-ASCII character '\xcb' in file eng_to_ipa/stress.py on line 25, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I can see the symbols looks like " , " & " ' "
, so I just replace it as comma and quotes.
import eng_to_ipa
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa/init.py", line 1, in
from .transcribe import *
File "eng_to_ipa/transcribe.py", line 4, in
import eng_to_ipa.stress as stress
File "eng_to_ipa/stress.py", line 4, in
import eng_to_ipa.syllables as syllables
File "eng_to_ipa/syllables.py", line 4, in
from eng_to_ipa import transcribe
ImportError: cannot import name transcribe
please check and fix it if it's from the English-to-IPA.
if it's from CMU dict, can someone Email to CMU dict author?
I've seen you used a strategy to convert ARPABET to IPA symbols. Is there such a thing for US PHONESET as described here for Festival?
I created a project called Transcriber Wrapper and I implemented a way to convert what Festival returns to IPA symbols, but I'm sure that is incorrect! I created a class called InternationalPhoneticAlphabet that has all the logic. As my project is based on Phonemizer, I described it in a issue here.
@mphilli do you think there is a way for that? I'm asking you because I can use your project in order to assert many things, like the transcription! By the way, thank you for your hard work and dedication!
Good day. I experience no problems with the installation or whatsoever, no errors. Actually, when I run the code, simply nothing happens. There isn't any output at all. What are the possible reasons for that?
According to http://www.speech.cs.cmu.edu/cgi-bin/cmudict#about, the CMU dictionary uses 2-letter ARPABET notation for representing sounds.
According to the wikipedia page for ARPABET (pointed to as the reference for translating to IPA on the above CMU page), the IPA correspondances would be (lowercased):
CMU_TO_IPA = {
'aa' : 'ɑ', # balm, bot
'ae' : 'æ', # bat
'ah' : 'ʌ', # butt
'ao' : 'ɔ', # story
'aw' : 'aʊ', # bout
'ax' : 'ə', # comma
'axr' : 'ɚ', # letter
'ay' : 'aɪ', # bite
'eh' : 'ɛ', # bet
'er' : 'ɝ', # bird
'ey' : 'eɪ', # bait
'ih' : 'ɪ', # bit
'ix' : 'ɨ', # roses, rabbit
'iy' : 'i', # beat
'ow' : 'oʊ', # boat
'oy' : 'ɔɪ', # boy
'uh' : 'ʊ', # book
'uw' : 'u', # boot
'ux' : 'ʉ', # dude
'b' : 'b', # buy
'ch' : 'tʃ', # China
'd' : 'd', # die
'dh' : 'ð', # thy
'dx' : 'ɾ', # butter
'el' : 'l̩', # bottle
'em' : 'm̩', # rhythm
'en' : 'n̩', # button
'f' : 'f', # fight
'g' : 'ɡ', # guy
'hh' : 'h', # high
'h' : 'h', # high
'jh' : 'dʒ', # jive
'k' : 'k', # kite
'l' : 'l', # lie
'm' : 'm', # my
'n' : 'n', # nigh
'ng' : 'ŋ', # sing
'nx' : 'ɾ̃', # winner
'p' : 'p', # pie
'q' : 'ʔ', # uh-oh
'r' : 'ɹ', # rye
's' : 's', # sigh
'sh' : 'ʃ', # shy
't' : 't', # tie
'th' : 'θ', # thigh
'v' : 'v', # vie
'w' : 'w', # wise
'wh' : 'ʍ', # why
'y' : 'j', # yacht
'z' : 'z', # zoo
'zh' : 'ʒ', # pleasure
}
Whereas (slightly reorganised) I see on https://github.com/mphilli/English-to-IPA/blob/master/eng_to_ipa/transcribe.py#L98:
{
"a": "ə", #
"ey": "e", #
"aa": "ɑ",
"ae": "æ",
"ah": "ə", #
"ao": "ɔ",
"aw": "aʊ",
"ay": "aɪ",
"eh": "ɛ",
"er": "ər", #
"ih": "ɪ",
"iy": "i",
"ow": "oʊ",
"oy": "ɔɪ",
"uh": "ʊ",
"uw": "u",
"ch": "ʧ",
"dh": "ð",
"hh": "h",
"jh": "ʤ",
"ng": "ŋ",
"sh": "ʃ",
"th": "θ",
"y": "j",
"zh": "ʒ",
}
I have put a # next to the differences (the initial list also includes identical graphs). Is there a reason for the differences? I can understand 'er' but most of the others strike me as curious choices.
The call
print(ipa.convert("and"))
returns
ənd
which is incorrect. It should return
ænd
There are some mistakes in the syllables.
Such as "amusement", the result is "əmˈjuzmənt", but "əˈmjuːzmənt" or "əˈmjuːzmənt" in dict.
I was wondering if there were a way to change the "accent" of the IPA produced? I would like my output to render a British accent, but also render different pronunciations of words from different accents in the same country. A lot of accents are due to vowel shifts, so a way to change all of one vowel sound to another.
If you point me in the right direction, I can try and figure it out. I'm really only a beginner, but I'm getting there with python.
Thanks for all this code. It's just what I was looking for :)
Edit: What I mean by British accent is RP or NRP. I'm referring to TRAP - BATH differences etc.. A summary can be found here: https://notendur.hi.is/peturk/KENNSLA/02/TOP/AmvowelsSum.html
There is an outdated version of this package on PyPI, which is the version that pip downloads by default.
https://pypi.org/project/eng-to-ipa/0.0.2/
It is maintained by someone named archiba.
mphili: can you claim and update it?
I prefer the British pronunciation and symbols.
Hi!
Thanks for your work. I am using the convert
function in transcribe.py.
It seems that word C.O.D.
exist in CMUdict around Line 16930 C.O.D. S IY1 OW1 D IY1
, but the convert
function can not covert C.O.D. because preserve_punc
function treat word C.O.D.
as ["", "C.O.D", "."]
.
Obversiouly, C.O.D
not in CMUdict, so the result is c.o.d*.
We can add some codes in preserve_punc
function for dealing with this special case.
Great library.
Is it also possible to use the library for dividing words into syllables?
Running this code
import eng_to_ipa as ipa
print(ipa.convert("upset"))
returns
KeyError: 'ə'
Hello!
Thank you for your work!
I am trying to run code from this repo and I've just faced a problem with dictionary preparer script.
I downloaded a dictionary from http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b and when I start script I face KeyError. Do you know how to solve such problem?
scripts username$ python cmudict_preparer.py
INFO:root:running cmudict_preparer...
INFO:root:reading source file...
Traceback (most recent call last):
File "cmudict_preparer.py", line 31, in
unique_dict[re.findall(pattern, word)[0]] += "%" + ' '.join(line.replace("\n", "").split(" ")[1:])
KeyError: 'SEMI-COLON'
I'm very interested in using this but my app is multi-threaded so can't really use sqlite. I am going to add support for Postgres for my own purposes but if you would be interested in merging generic DB support then I can try and make it work for other flavours too. If you are interested I'll submit a PR.
Hello, thanks for your great work ! I'm a little confused about the symbols of DJ, KK and IPA Phonetic Alphabet. And I found not all IPA symbols was used in English, so could you please show all the symbols used in this tool?
There's no setup.py
I will port this project to develop a version that works in the browser.
The short u in cup, which ARPABET renders as AH, should be ʌ in IPA. You have it as ə
Thans for the job.
I am not English native, I found that some out of bound word will give no correct IPA output, but the word with a star. eg:
e2ipa.convert("Strathclyde Police declined to comment") output is :
Stranthclyde* pə¹lis dɪklaɪnd ...
I am not English speaker, and I do not know how to process this kind of exception.
Hi ,
You codes cannot be installed with following error:
File "/usr/local/lib/python3.5/dist-packages/English_to_IPA-0.0.1-py3.5.egg/eng_to_ipa/transcribe.py", line 58
c.execute(f"SELECT word, phonemes FROM dictionary WHERE word IN ({quest[:-2]})", words_in)
^
SyntaxError: invalid syntax
File "/usr/local/lib/python3.5/dist-packages/English_to_IPA-0.0.1-py3.5.egg/eng_to_ipa/rhymes.py", line 17
c.execute(f"SELECT word, phonemes FROM dictionary WHERE phonemes "
^
SyntaxError: invalid syntax
The ARPABET "AH" (such as "love", "cut", "hut") should be converted to "ʌ".
But seems mistakenly converted to "ə".
Could you check that?
u did awesome work, but when i run the import then the error occur. do u know why? thanks.
import eng_to_ipa
Traceback (most recent call last):
File "", line 1, in
File "eng_to_ipa_init_.py", line 1, in
from .transcribe import *
File "eng_to_ipa\transcribe.py", line 69
asset.execute(f"SELECT word, phonemes FROM dictionary WHERE word IN ({quest[:-2]})", words_in)
^
SyntaxError: invalid syntax
Hey there. I'd like to use this code as a library, but am unsure of the license? MIT or Apache2 licenses would be great, if possible :)
TypeError: 'encoding' is an invalid keyword argument for this function
Hello!
Thanks for this library!
Hi, is there a function similar to the rhyming function but for homophones?
That's it. It just doesn't work.
Can we add a new function on it with ruby-rt-rp tag to produce html code? then we can insert the code into html directly.
the ruby-rt-rp tag will align the phonics with its word.(https://github.com/dohliam/rubify)
for example:
Let's listen to Fox tell the story. I love you. I am eating a peach. Sheep is here. Ship is here. Once upon a time.
lɛts ˈlɪsən tu fɑks tɛl ðə ˈstɔri. aɪ lʌv ju. aɪ æm ˈitɪŋ ə piʧ. ʃip ɪz hir. ʃɪp ɪz hir. wʌns əˈpɑn ə taɪm.
Thanks for the good job. I am not Native English and knowns little about IPA. But I saw in some TTS front-end that the stress character ˈ is just put after the IPA syllable. eg "ðə kwɪk braʊn fɑks ʤəmpt oˈʊvər ðə lˈeɪzi dɔg." while in this project, it just put before the IPA syllable. which is better? or which is the IPA standard?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.