osori / korean-romanizer Goto Github PK

View Code? Open in Web Editor NEW

93.0 3.0 7.0 175 KB

A Python library for Korean romanization

Home Page: https://korean-romanizer.ij.fyi

License: Other

Python 100.00%

korean transliteration romanization python

korean-romanizer's People

Contributors

Stargazers

Watchers

Forkers

ella77 quyminh dongchans jrroelle srevinsaju min-ho-lim beckgom ssl123

korean-romanizer's Issues

KeyError: 'ᆶ'

Hi, I'm processing some multilingual data, but I'm afraid I don't actually know Korean myself. I'm getting an error on this word. I don't know if the word is written incorrectly (if so, how should it be fixed?), or if it's a problem in the library. Any help would be appreciated.

korean_romanizer.Romanizer('뚫리고').romanize()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-769d5465ca1f> in <module>
----> 1 korean_romanizer.Romanizer('뚫리고').romanize()

c:\program files\python37\lib\site-packages\korean_romanizer\romanizer.py in romanize(self)
    121                 else:
    122                     # s is a full syllable
--> 123                     _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    124 
    125             else:

KeyError: 'ᆶ'

KeyError: 12640

Hi, apologise, I can't give much insight as to why this throws an error, but:

from korean_romanizer.romanizer import Romanizer
Romanizer('경인로 34번길 79-2  ㅠ동 201호(숭의').romanize()

yields the error:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-429dd49ff95b> in <module>
     12 
     13 # romanize_kr()
---> 14 Romanizer('경인로 34번길 79-2  ㅠ동 201호(숭의').romanize()

~/.python_virtualenvs/scripts-Ot3yg93O/lib/python3.7/site-packages/korean_romanizer/romanizer.py in romanize(self)
    110                 s = Syllable(char)
    111                 #try:
--> 112                 _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    113                 #except Exception as e:
    114                 #    _romanized += "[에러:" + str(e) + "]"

KeyError: 12640

Sorry I can't be of more help fixing this (I don't speak Korean). But good luck with the project, super useful.

Simulate phonological rules for pronouncing coda

According to the standard pronunciation rule from National Institute of Korean Language, there are several phonological rules that should be applied before romanizing Korean texts. For example, there is a rule that states that only seven consonants, [ㄱ, ㄴ, ㄷ, ㄹ, ㅁ, ㅂ, ㅇ] can be placed in coda. Because of this rule, consonants that are not a part of those seven consonants must be substituted accordingly to one of those seven consonants.

Following is the full list of rules provided by NIKL, which should be implemented asap.

1. 받침 ‘ㄲ, ㅋ’, ‘ㅅ, ㅆ, ㅈ, ㅊ, ㅌ’, ‘ㅍ’은 어말 또는 자음 앞에서 각각 대표음 [ㄱ, ㄷ, ㅂ]으로 발음한다.
2. 겹받침 ‘ㄳ’, ‘ㄵ’, ‘ㄼ, ㄽ, ㄾ’, ‘ㅄ’은 어말 또는 자음 앞에서 각각 [ㄱ, ㄴ, ㄹ, ㅂ]으로 발음한다.
3. 겹받침 ‘ㄺ, ㄻ, ㄿ’은 어말 또는 자음 앞에서 각각 [ㄱ, ㅁ, ㅂ]으로 발음한다.
4. 받침 ‘ㅎ’의 발음은 다음과 같다.

i. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 ‘ㄱ, ㄷ, ㅈ’이 결합되는 경우에는, 뒤 음절 첫소리와 합쳐서 [ㅋ, ㅌ, ㅊ]으로 발음한다.
ii. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 ‘ㅅ’이 결합되는 경우에는, ‘ㅅ’을 [ㅆ]으로 발음한다.
iii. ‘ㅎ’ 뒤에 ‘ㄴ’이 결합되는 경우에는, [ㄴ]으로 발음한다.
iv. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 모음으로 시작된 어미나 접미사가 결합되는 경우에는, ‘ㅎ’을 발음하지 않는다.

5. 홑받침이나 쌍받침이 모음으로 시작된 조사나 어미, 접미사와 결합되는 경우에는, 제 음가대로 뒤 음절 첫소리로 옮겨 발음한다.
6. 겹받침이 모음으로 시작된 조사나 어미, 접미사와 결합되는 경우에는, 뒤엣것만을 뒤 음절 첫소리로 옮겨 발음한다.(이 경우, ‘ㅅ’은 된소리로 발음함.)
7. 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’들로 시작되는 실질 형태소가 연결되는 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.
8. 한글 자모의 이름은 그 받침소리를 연음하되, ‘ㄷ, ㅈ, ㅊ, ㅋ, ㅌ, ㅍ, ㅎ’의 경우에는 특별히 다음과 같이 발음한다.

final consonant ㅇ + initial consonant ㅇ combination deleting the final consonant

for example, '강약' would produce 'gayak'
강원 would be 'gawon'

Cannot romanize '앞만'

>>> from romanizer import Romanizer
>>> Romanizer('앞만').romanize()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "romanizer.py", line 292, in romanize
    _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
KeyError: 'ᇁ'

Attempting to romanise '좋아하고 있어요' throws a ValueError

Specs:

OS: Windows 10 x86_64
Python Version: 3.8.2

Current output:

>>> from korean_romanizer.romanizer import Romanizer
>>> Romanizer('좋아하고 있어요').romanize()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\romanizer.py", line 105, in romanize
    pronounced = Pronouncer(self.text).pronounced
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\pronouncer.py", line 23, in __init__
    self.pronounced = ''.join([ str(c) for c in self.final_substitute()])
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\pronouncer.py", line 102, in final_substitute
    next_syllable.initial = next_syllable.final_to_initial(syllable.final)
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\syllable.py", line 55, in final_to_initial
    idx = unicode_compatible_finals.index(char)
ValueError: None is not in list

Cannot romanize string including uncombined hangul characters (e.g. ㅏㄹ)

KeyError                                  Traceback (most recent call last)
<ipython-input-9-a36b3cd68e86> in <module>
      1 r = Romanizer("ㅏㄹ")
----> 2 r.romanize()

<ipython-input-1-afb14c96da98> in romanize(self)
    290                 s = Syllable(char)
    291                 #try:
--> 292                 _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    293                 #except Exception as e:
    294                 #    _romanized += "[에러:" + str(e) + "]"

KeyError: 12623

Use of this repo

Hi Osori,
I am not sure if this is a correct way to use github "issue", but I found this code to be really helpful. I am wondering if I could use your code directly in my project!
Thanks

Wrong romanization result: 설악: seolak (-> seorak)

[붙임 2] ‘ㄹ’은 모음 앞에서는 ‘r’로, 자음 앞이나 어말에서는 ‘l’로 적는다. 단, ‘ㄹㄹ’은 ‘ll’로 적는다.

osori / korean-romanizer Goto Github PK

korean-romanizer's People

Contributors

Stargazers

Watchers

Forkers

korean-romanizer's Issues

KeyError: 'ᆶ'

KeyError: 12640

Simulate phonological rules for pronouncing coda

final consonant ㅇ + initial consonant ㅇ combination deleting the final consonant

Cannot romanize '앞만'

Attempting to romanise '좋아하고 있어요' throws a ValueError

Specs:

Current output:

Cannot romanize string including uncombined hangul characters (e.g. ㅏㄹ)

Use of this repo

Wrong romanization result: 설악: seolak (-> seorak)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent