Giter Site home page Giter Site logo

emres / turkish-deasciifier Goto Github PK

View Code? Open in Web Editor NEW
143.0 11.0 23.0 216 KB

Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs

Home Page: https://ileriseviye.wordpress.com/tag/turkish-deasciifier/

Python 99.71% Roff 0.29%
deasciifier python nlp nlp-library turkish turkish-nlp diacritics diacritics-reconstruction diacritics-restoration

turkish-deasciifier's People

Contributors

emres avatar faraday avatar roktas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

turkish-deasciifier's Issues

encode-decode error

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify
Traceback (most recent call last):
  File "/usr/bin/turkish-deasciify", line 26, in <module>
    d.deasciify()
  File "/usr/bin/turkish-deasciify", line 22, in deasciify
    sys.stdout.write(result.encode("utf-8"))
TypeError: write() argument must be str, not bytes

Deleting .decode("utf-8") and .encode("utf-8") in /usr/bin/turkish-deasciify solves the issue.

CSV üzerinde satırlara uygulama

Emre bey merhabalar,

Verilerim CSV formatında Google colab üzerinde şu kodları oluşturdum:

from turkish.deasciifier import Deasciifier

import csv 

duzelt = []

with open('/GDrive/My Drive/API-satir/merge1k.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        my_ascii_turkish_txt = (row)
        deasciifier = Deasciifier(my_ascii_turkish_txt)
        my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
        duzelt.append(my_deasciified_turkish_txt)
        print(my_deasciified_turkish_txt) 

Ancak çalıştırdığım zaman aşağıdaki hatayı alıyorum.

def set_char_at(self, mystr, pos, c):
return mystr[0:pos] + c + mystr[pos+1:]
def convert_to_turkish(self):

TypeError: can only concatenate list (not "str") to list  

Bu sorunu nasıl aşabilirim? Yardımcı olursanız çok sevinirim.

Sorunlu Kelimeler.

Sorunlu kelimelerin bazılarını derledim, turkish_pattern_table değişkeninde tanımlanırsa düzeltilebilir. olası kullanımları öğretmek gerekiyor.
Sorunlu kelimeler

  • Acar - Açar
  • Asık - Aşık
  • Oldu - Öldü
  • Sık - Sik - Şık
  • Tas - Taş
  • Su - Şu
  • Surat - Sürat
  • Koy - Köy
  • Turunçgiller

Cümle içinde kullanalım

Ascii Deasciifier hatalı çeviri
COK SIKSINIZ ÇOK SIKSINIZ
ASIK VEYSEL ASIK SURATLI MIYDI? AŞIK VEYSEL AŞIK SÜRATLİ MİYDİ?
AL KIRDIN SIKTIN BIRAKTIN! AL KIRDİN SIKTIN BIRAKTIN!
YEMEGI TASA KOY GETIR YEMEĞİ TAŞA KÖY GETİR
TURUNCGILLER TURUNÇĞİLLER
COK ACAR BIRI ÇOK AÇAR BİRİ

Slow performance

Hardware

MacBookPro13,3
Quad-Core Intel Core i7 - 2,7 GHz
Memory - 16 GB

Benchmark results

Word Count Character Count Result (seconds)
10000 82236 5.3s
20000 176226 23.1s
40000 376746 94.3s
80000 804532 438.6s
100000 1025479 819.4s

Summary

Converting a 1000-page book will take an average of 3 hours.
It takes weeks to translate a large old ascii website SQL database.

So a progress bar and optimization are required. fast word processing libraries can be used.

birkaç ekleme

Merhaba,

küçük bir kaç ekleme yapmak isterim;

daha atık davranmaya
alana sigacak şekilde
perçeption (ing. kelime ama bu haliyle tuhaf göründü)

Diziyi elle güncellemek istemiyorsunuz sanırım, en azından kayıda geçsin istedim.
Kullanmak isteyenler kendi değişikliklerini yapabilir.

Başarılı bir çalışma olmuş, teşekkürler.

pip install

Merhaba.

pip install git+https://github.com/emres/turkish-deasciifier.git yaptığımda

ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\user\AppData\Local\Temp\pip-req-build-nl97thly\setup.py", line 62
        except OSError, e:
                      ^
    SyntaxError: invalid syntax


erroru alıyorum. setup dosyasında sorun var sanırım

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.