Giter Site home page Giter Site logo

korean-romanizer's People

Contributors

jrroelle avatar min-ho-lim avatar osori avatar quyminh avatar srevinsaju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

korean-romanizer's Issues

KeyError: 'ᆶ'

Hi, I'm processing some multilingual data, but I'm afraid I don't actually know Korean myself. I'm getting an error on this word. I don't know if the word is written incorrectly (if so, how should it be fixed?), or if it's a problem in the library. Any help would be appreciated.

korean_romanizer.Romanizer('뚫리고').romanize()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-769d5465ca1f> in <module>
----> 1 korean_romanizer.Romanizer('뚫리고').romanize()

c:\program files\python37\lib\site-packages\korean_romanizer\romanizer.py in romanize(self)
    121                 else:
    122                     # s is a full syllable
--> 123                     _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    124 
    125             else:

KeyError: 'ᆶ'

KeyError: 12640

Hi, apologise, I can't give much insight as to why this throws an error, but:

from korean_romanizer.romanizer import Romanizer
Romanizer('경인로 34번길 79-2  ㅠ동 201호(숭의').romanize()

yields the error:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-429dd49ff95b> in <module>
     12 
     13 # romanize_kr()
---> 14 Romanizer('경인로 34번길 79-2  ㅠ동 201호(숭의').romanize()

~/.python_virtualenvs/scripts-Ot3yg93O/lib/python3.7/site-packages/korean_romanizer/romanizer.py in romanize(self)
    110                 s = Syllable(char)
    111                 #try:
--> 112                 _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    113                 #except Exception as e:
    114                 #    _romanized += "[에러:" + str(e) + "]"

KeyError: 12640

Sorry I can't be of more help fixing this (I don't speak Korean). But good luck with the project, super useful.

Simulate phonological rules for pronouncing coda

According to the standard pronunciation rule from National Institute of Korean Language, there are several phonological rules that should be applied before romanizing Korean texts. For example, there is a rule that states that only seven consonants, [ㄱ, ㄴ, ㄷ, ㄹ, ㅁ, ㅂ, ㅇ] can be placed in coda. Because of this rule, consonants that are not a part of those seven consonants must be substituted accordingly to one of those seven consonants.

Following is the full list of rules provided by NIKL, which should be implemented asap.

  • 1. 받침 ‘ㄲ, ㅋ’, ‘ㅅ, ㅆ, ㅈ, ㅊ, ㅌ’, ‘ㅍ’은 어말 또는 자음 앞에서 각각 대표음 [ㄱ, ㄷ, ㅂ]으로 발음한다.

  • 2. 겹받침 ‘ㄳ’, ‘ㄵ’, ‘ㄼ, ㄽ, ㄾ’, ‘ㅄ’은 어말 또는 자음 앞에서 각각 [ㄱ, ㄴ, ㄹ, ㅂ]으로 발음한다.

  • 3. 겹받침 ‘ㄺ, ㄻ, ㄿ’은 어말 또는 자음 앞에서 각각 [ㄱ, ㅁ, ㅂ]으로 발음한다.

  • 4. 받침 ‘ㅎ’의 발음은 다음과 같다.

i. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 ‘ㄱ, ㄷ, ㅈ’이 결합되는 경우에는, 뒤 음절 첫소리와 합쳐서 [ㅋ, ㅌ, ㅊ]으로 발음한다.
ii. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 ‘ㅅ’이 결합되는 경우에는, ‘ㅅ’을 [ㅆ]으로 발음한다.
iii. ‘ㅎ’ 뒤에 ‘ㄴ’이 결합되는 경우에는, [ㄴ]으로 발음한다.
iv. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 모음으로 시작된 어미나 접미사가 결합되는 경우에는, ‘ㅎ’을 발음하지 않는다.

  • 5. 홑받침이나 쌍받침이 모음으로 시작된 조사나 어미, 접미사와 결합되는 경우에는, 제 음가대로 뒤 음절 첫소리로 옮겨 발음한다.

  • 6. 겹받침이 모음으로 시작된 조사나 어미, 접미사와 결합되는 경우에는, 뒤엣것만을 뒤 음절 첫소리로 옮겨 발음한다.(이 경우, ‘ㅅ’은 된소리로 발음함.)

  • 7. 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’들로 시작되는 실질 형태소가 연결되는 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.

  • 8. 한글 자모의 이름은 그 받침소리를 연음하되, ‘ㄷ, ㅈ, ㅊ, ㅋ, ㅌ, ㅍ, ㅎ’의 경우에는 특별히 다음과 같이 발음한다.

Cannot romanize '앞만'

>>> from romanizer import Romanizer
>>> Romanizer('앞만').romanize()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "romanizer.py", line 292, in romanize
    _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
KeyError: 'ᇁ'

Attempting to romanise '좋아하고 있어요' throws a ValueError

Specs:

  • OS: Windows 10 x86_64
  • Python Version: 3.8.2

Current output:

>>> from korean_romanizer.romanizer import Romanizer
>>> Romanizer('좋아하고 있어요').romanize()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\romanizer.py", line 105, in romanize
    pronounced = Pronouncer(self.text).pronounced
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\pronouncer.py", line 23, in __init__
    self.pronounced = ''.join([ str(c) for c in self.final_substitute()])
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\pronouncer.py", line 102, in final_substitute
    next_syllable.initial = next_syllable.final_to_initial(syllable.final)
  File "C:\Users\Redacted\AppData\Local\Programs\Python\Python38\lib\site-packages\korean_romanizer\syllable.py", line 55, in final_to_initial
    idx = unicode_compatible_finals.index(char)
ValueError: None is not in list

Cannot romanize string including uncombined hangul characters (e.g. ㅏㄹ)

KeyError                                  Traceback (most recent call last)
<ipython-input-9-a36b3cd68e86> in <module>
      1 r = Romanizer("ㅏㄹ")
----> 2 r.romanize()

<ipython-input-1-afb14c96da98> in romanize(self)
    290                 s = Syllable(char)
    291                 #try:
--> 292                 _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    293                 #except Exception as e:
    294                 #    _romanized += "[에러:" + str(e) + "]"

KeyError: 12623

Use of this repo

Hi Osori,
I am not sure if this is a correct way to use github "issue", but I found this code to be really helpful. I am wondering if I could use your code directly in my project!
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.