Giter Site home page Giter Site logo

Comments (20)

menny avatar menny commented on June 10, 2024 1

@Sixthtry you can use ask:shiftCodes to specify special uppercase codes, in case the automatic guessing fails

from languagepack.

menny avatar menny commented on June 10, 2024 1

Done: https://github.com/AnySoftKeyboard/LanguagePack/tree/Turkish
Please make sure your branch is rebased on top latest https://github.com/AnySoftKeyboard/LanguagePack/tree/Turkish, and that you point your PR to my Turkish branch.

Also, how are you creating the dictionary?

from languagepack.

menny avatar menny commented on June 10, 2024 1

I see. This is how to get the real gz file. First, clone the repo to your local machine:
git clone https://android.googlesource.com/platform/packages/inputmethods/LatinIME
then cd to the dictionaries folder:
cd LatinIME/dictionaries
find the Turkish dictionary tr_wordlist.combined.gz (it's 926338 bytes), and unzip it. Use the uncompressed file as the input for parseAospForEnglishDictionary

from languagepack.

 avatar commented on June 10, 2024

Hello @Sixthtry,
I think that you have to focus on the Unicode first :
U+0130 = LATIN CAPITAL LETTER I WITH DOT ABOVE = İ
Wich has as lower case U+0069 = LATIN SMALL LETTER I = i
You have now to convert both to the right decimal on the keyboard starting with the small letter i.
Wainting for an answer.
Regards,

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

Hi @BoFFire,
In the layout, I have both Latin Small Letter I, which is [ i ] and unicode "105", and Latin Small Letter Dotless I, which is [ ı ] and unicode 305. Both works well when writing in lowercase.

The problem is, when I turn on the uppercase writing, dotted little i with unicode 105 should turn to "LATIN CAPITAL LETTER I WITH DOT", which is [ İ ] and unicode 304 but instead it becomes "LATIN CAPITAL LETTER I", which is [ I ] and unicode 73.

Edit: Appearently, I misexplained here. Corrected the letters.

(Sorry for being late. I wrote an answer on phone but git client got crashed.)

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

That was just what I needed for. Thank you very much sir. And may I ask you to open up a Turkish branch? I am trying to build a nice dictionary and then I will try to upload files here somehow in the next days(I am new to the Git).

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

As I saw from a thread in here, "New Sicilian language pack" thread to be exact, I am collecting wikipedia pages. I also did WordsfromAosp but this is all I know about dictionaries.

from languagepack.

menny avatar menny commented on June 10, 2024

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

Hey, umm,
I was trying to get as many pages as I can in Turkish manually yesterday. Today I realized that there is dump of wikipedia as said in build.gradle comments https://dumps.wikimedia.org/other/static_html_dumps/current downloaded the one for Turkish, it was around 300 MB but it extracted as 6.8 GB .tar file. Now there is .html's inside this but they are in files and I don't know how can I put them in build.gradle.

Also I found another file in wikimedia dumps. It is in https://dumps.wikimedia.org/trwikisource/20171201/ and there is an .xml (trwikisource-20171201-pages-articles.xml.bz2) file which contains "Articles, templates, media/file descriptions, and primary meta-pages" as title says in a file which is 70 MB after extract.

I am wondering, what I will get from these two files will be the same?

[I am sorry that I am asking a lot of questions, but I am doing this for the first time and I want it to be usable enough.]

from languagepack.

menny avatar menny commented on June 10, 2024

Wiki dumps are good, but there are two general problems with them:

  1. They include words in other languages, which sometime get added to the generated dictionary
  2. They are not "dialog“ source, so the word frequency might seem strange

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

I gave up on aosp word list as it just does not generate any wordlist. I can see that it is a long file in editor but I can not read of course but the code can't either. The parseAospForEnglishDictionary task generates a 2 line file from that which contains only; </wordlist>

from languagepack.

menny avatar menny commented on June 10, 2024

@Sixthtry can you send me a link to that aosp file?

from languagepack.

menny avatar menny commented on June 10, 2024

Also, you'll need to fix/complete the issues reported here: https://271-4270172-gh.circle-artifacts.com/0/lint_reports/app/checkstyle/checkstyle.html

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

https://android.googlesource.com/platform/packages/inputmethods/LatinIME/+/master/dictionaries/tr_wordlist.combined.gz Here is the aosp wordlist link. I downloaded it from the [txt] link in the right corner

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

Thank you for being so helpful, that solved the problem. I want to ask just one last question; Is there a document where I can find all ask: and android: codes?

from languagepack.

menny avatar menny commented on June 10, 2024

from languagepack.

MK8T avatar MK8T commented on June 10, 2024

I dont see the ask:shiftCodes to specify special uppercase codes
in https://github.com/AnySoftKeyboard/LanguagePack/blob/Turkish/src/main/res/xml/qwerty.xml ... perhaps dont read the good file ?

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

@MK8T
Okay, sorry for that but there seems to be a problem with file. I will reupload it to github manually then, it works in my keyboard.

from languagepack.

lem0n4de avatar lem0n4de commented on June 10, 2024

@MK8T
Oh, I understand now. You are looking at the 'Turkish' branch of original AnySoftKeyboard repo. My repo did not merged yet with it. You should look at here = https://github.com/Sixthtry/LanguagePack/blob/master/src/main/res/xml/qwerty.xml

from languagepack.

friesenkiwi avatar friesenkiwi commented on June 10, 2024

Closing this in favor of #116

from languagepack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.