Please note: the author of this repository has no knowledge of any languages that use the Cyrillic characters. Any issues related to linguistics should be reported directly to the Python repository, and this repository will follow up after it is updated.
A Dart package for bi-directional transliteration of Cyrillic script to Latin script and vice versa.
By default, transliterates for the Serbian language. A language flag can be set in order to transliterate to and from Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.
Transliteration is the conversion of a text from one script to another. For instance, a Latin alphabet transliteration of the Serbian phrase "Мој ховеркрафт је пун јегуља" is "Moj hoverkraft je pun jegulja".
This package is based on the Python project cyrillic-transliteration which was originally authored by Open Data Kosovo.
A citation would be much appreciated if you use CyrTranslit in a research publication:
Georges Labrèche. (2023). CyrTranslit (v1.1.1). Zenodo. https://doi.org/10.5281/zenodo.7734906
BibTex entry:
@software{georges_labreche_2023_7734906,
author = {Georges Labrèche},
title = {CyrTranslit},
month = mar,
year = 2023,
note = {{A Python package for bi-directional
transliteration of Cyrillic script to Latin script
and vice versa. Supports transliteration for
Bulgarian, Montenegrin, Macedonian, Mongolian,
Russian, Serbian, Tajik, and Ukrainian.}},
publisher = {Zenodo},
version = {v1.1.1},
doi = {10.5281/zenodo.7734906},
url = {https://doi.org/10.5281/zenodo.7734906}
}
CyrTranslit is actively used as a reliable tool to advance research! Here's an incomplete list of publications for research projects that have relied on CyrTranslit:
- Ljajić, Adela & Prodanović, Nikola & Medvecki, Darija & Bašaragin, Bojana & Mitrović, Jelena. (2022). "Topic Modeling Technique on Covid19 Tweets in Serbian," in 12th International Conference on Information Society and Technology (ICIST), Kopaonik, Serbia.
- Mussylmanbay, Meiirgali. (2022). "Addresses Standardization and Geocoding using Natural Language Processing," Nazarbayev University, Kazakhstan.
- Jokic, Danka & Stanković, Ranka & Krstev, Cvetana & Šandrih Todorović, Branislava. (2021). "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian," in 3rd Conference on Language, Data and Knowledge (LDK 2021). 10.4230/OASIcs.LDK.2021.13.
- Lakew, Surafel Melaku (2020). "Thesis Multilingual Neural Machine Translation for Low Resource Languages," University of Trento, Italy.
- Filo, Denis. (2020). "Neuronový strojový překlad pro jazykové páry s malým množstvím trénovacích dat: Low-Resource Neural Machine Translation," Brno University of Technology, Brno, Czechia.
- Batanović, Vuk & Nikolic, Bosko. (2019). "Using Language Technologies to Automate the UNDP Rapid Integrated Assessment Mechanism in Serbian," in International Conference on Language Technologies for All: Enabling Linguistic Diversity and Multilingualism Worldwide (LT4All), Paris, France.
- Brown, J. M. M. & Schmidt, Andreas & Wierzba, Marta (Eds.). (2019). "Of trees and birds: A Festschrift for Gisbert Fanselow," Universitätsverlag Potsdam, Potsdam.
- Lakew, Surafel Melaku & Erofeeva, Aliia & Federico, Marcello. (2018). "Neural Machine Translation into Language Varieties," in 3rd Conference on Machine Translation: Research Papers, Brussels, Belgium.
- Ljajić, Adela & Marovac, Ulfeta. (2018). "Improving sentiment analysis for twitter data by handling negation rules in the Serbian language," Computer Science and Information Systems. 16. 13-13. 10.2298/CSIS180122013L.
- Жабран, И., Кикоть, А., Гафияк, А., Бородина, Е., & Алёшин, С. (2017). "Developing Q-Orca site backend using various Python programming language libraries," Modern Engineering and Innovative Technologies, 3(07-03), 48–53.
CyrTranslit is Dart pub repository so it can be installed using pub add:
dart pub add cyrtranslit # latest version
dart pub add cyrtranslit: version # specific version
dart pub add cyrtranslit:'^version' # minimum version
or you can also add this package to your pubspec.yaml
file.
dependencies:
cyrtranslit: ^1.0.0
CyrTranslit currently supports bi-directional transliteration of Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian:
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.supported())
['bg', 'me', 'mk', 'mn', 'ru', 'sr', 'tj', 'ua']
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Съединението прави силата!", langCode: "bg"))
"Săedinenieto pravi silata!"
print(cyrtranslit.lat2Cyr("Săedinenieto pravi silata!", langCode: "bg"))
"Съединението прави силата!"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Република", langCode: "me"))
"Republika"
print(cyrtranslit.lat2Cyr("Republika", langCode: "me"))
"Република"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Моето летачко возило е полно со јагули", langCode: "mk"))
"Moeto letačko vozilo e polno so jaguli"
print(cyrtranslit.lat2Cyr("Moeto letačko vozilo e polno so jaguli", langCode: "mk"))
"Моето летачко возило е полно со јагули"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө", langCode: "mn"))
"Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö"
print(cyrtranslit.lat2Cyr("Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö", langCode: "mn"))
"Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Моё судно на воздушной подушке полно угрей", langCode: "ru"))
"Moyo sudno na vozdushnoj podushke polno ugrej"
print(cyrtranslit.lat2Cyr("Moyo sudno na vozdushnoj podushke polno ugrej", langCode: "ru"))
"Моё судно на воздушной подушке полно угрей"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Мој ховеркрафт је пун јегуља"))
"Moj hoverkraft je pun jegulja"
print(cyrtranslit.lat2Cyr("Moj hoverkraft je pun jegulja"))
"Мој ховеркрафт је пун јегуља"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Ман мактуб навишта истодам", langCode: "tj"))
"Man maktub navišta istodam"
print(cyrtranslit.lat2Cyr("Man maktub navišta istodam", langCode: "tj"))
"Ман мактуб навишта истодам"
import 'package:cyrtranslit/cyrtranslit.dart' as cyrtranslit;
print(cyrtranslit.cyr2Lat("Під лежачий камінь вода не тече", langCode: "ua"))
"Pid ležačyj kamin' voda ne teče"
print(cyrtranslit.lat2Cyr("Pid ležačyj kamin' voda ne teče", langCode: "ua"))
"Під лежачий камінь вода не тече"
You can include support for other Cyrillic script alphabets. Follow these steps in order to do so:
- Create a new transliteration dictionary in the mapping.dart_ dictionary.
- Watch out for cases where two consecutive Latin alphabet letters are meant to transliterate into a single Cyrillic script letter. These cases need to be explicitly checked for inside the lat2Cyr() function in transliterator.dart.
- Add test cases inside of cyrtranslit_test.dart.
- Update the documentation in the README.md.
- List yourself as one of the contributors.
Before tagging a release version and deploying to pub.dev:
- Update the
version
properties in pubspec.yaml.
A big thank you to everyone who contributed:
- Original python package: Members of @opendatakosovo.
- Bulgarian 🇧🇬: @Syndamia and @Sparkycz.
- Russian 🇷🇺: @ratijas and @rominf.
- Tajik 🇹🇯: @diejani.
- Ukrainian 🇺🇦: @AnonymousVoice1.
- Mongolian 🇲🇳: @Serbipunk.
- Command Line Interface (CLI): @ZJaume (Not implemented in dart package).