Giter Site home page Giter Site logo

shabados / gurmukhi-utils Goto Github PK

View Code? Open in Web Editor NEW
28.0 28.0 9.0 2.59 MB

Utilities library for converting, analyzing, and testing Gurmukhi strings.

License: MIT License

Python 54.52% Ruby 19.79% TypeScript 11.42% JavaScript 3.31% Dart 10.96%
ascii bani devanagari english gurbani gurmukhi hindi nastaliq punjabi sikh spanish unicode urdu

gurmukhi-utils's Introduction

Presenter

Software for searching, navigating, and presenting the Shabad OS Database

About

Shabad OS Presenter can be used to display bani & gurbani in the home or at the gurdwara. A server model allows multiple devices to act as a display or a controller. The same model enables live captions to be in sync with the projector / presentation device.

For more, please see:

Features

  • Multiple displays and controllers synced together
  • Live broadcast captioning / subtitling
  • Curated design
    • Text legibility/readability
    • Organized UI/UX functionality
  • Keyboard shortcuts
    • Search, History, Bookmarks
    • Jump to line N of shabad
    • Autoselect line based on context/position

Screenshots

Contribute

If you want to help, please get started with the CONTRIBUTING.md doc.

Community

The easiest way to communicate is via GitHub issues. Please search for similar issues regarding your concerns before opening a new issue.

Get organization updates for Shabad OS by following us on Instagram and Twitter. We also invite you to join us on our public chat server hosted on Slack.

Our intention is to signal a safe open-source community. Please help us foster an atmosphere of kindness, cooperation, and understanding. By participating, you agree to abide by the Contributor Covenant.

If you have a concern that doesn't warrant opening a GitHub issue, please reach out to us:

"Thank you!" to all the volunteers who've contributed to Presenter.

gurmukhi-utils's People

Contributors

akshdeep-singh avatar bhajneet avatar dsomel21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gurmukhi-utils's Issues

Rename transliterate to toEnglish

More consistent with the other transliteration functions. It should also accept unicode as its input, inline with the other functions.

Add GitPod integration for quickly playgrounding with the gurmukhi-utils repo

Is your feature request related to a problem? Please describe.
Feature request is not related to a problem.

Describe the solution you'd like

  • As a user, I should be able to access the latest GitPod

Click here:
Screen Shot 2020-11-02 at 9 09 59 PM

  • As a user, all dependencies should already be installed
  • As a user, once the workplace is setup, I should be able to playground with it.

Screen Shot 2020-11-02 at 9 16 57 PM

  • As a user, if any changes or dependencies are pushed, they should be installed automatically.

Describe alternatives you've considered
N/A

Additional context
I am creating an issue inorder to keep track of this feature, as advised by the Contribution.md says:

PROTIP If there is no issue related to the work being done, then create an issue for tracking purposes.

Add Pingal/Syllable Counter

This would be able give a count of syllables in a line for poetry.

All 35 Akhars including bindi chars count as 1 syllable.

There are 2 types of lagamatras, Harasv and Dheerag. Harasv doesn't count as a syllable, and Dheerag is 1 syllable.

Pairin Akhars also do not count as a syllable.

All other characters (such as Dandian) should be skipped.

Transliteration development - getting rid of extra a's

Describe the bug
Extra a's coming up in words of transliterations, have given a couple of examples below but if more needed let me know.

To Reproduce
Steps to reproduce the behavior:
Search: kmkp (kaljug meh keertan pardhaanaa

Translit that you get:
kalajug meh keeratan paradhaanaa |

What I would like: Get rid of the extra a's

Change to:
kaljug meh keertan pardhaanaa |

Another example:

guramukh japeeai laae dhiaanaa |

change to:

gurmukh japeeai laae dhiaanaa |

Specs

  • OS 2.9.0
  • Database 4.7.0

Add Unicode Nukta Character combinations to toAscii

Describe the bug

Unicode allows two ways to type nukta characters.

First is the defined code point, (U+0A36) which is a single character. The other method is to add a nukta char (U+0A3C) to an existing char, such as ਸ਼ (U+0A38 + U+0A3C)

Since nutka is mapped to æ, the conversion results in instead of S

Expected behavior
A clear and concise description of what you expected to happen.

ਸ਼ (U+0A38 + U+0A3C) => S
...and so on for other Pair Bindi chars

Render issues when converting Yakash with Sihari and other Vowel from ASCII to Unicode

Example 1

kul jn mDy imil´o swrg pwn ry ]

Should convert to:
ਕੁਲ ਜਨ ਮਧੇ ਮਿਲੵਿੋ ਸਾਰਗ ਪਾਨ ਰੇ ॥

Platform ਮਿਲਿੋੵ ਮਿਲੵੋਿ ਮਿਲੵਿੋ
iOS 11
iOS 12
iOS 13 Beta
MacOS High Sierra
MacOS Mojave
MacOS Catalina Beta
Windows 10 (Edge/Office)
Windows 10 (Chrome/FF)
Android (9, 8)

Example 2

Asmwn im´wny lhMg drIAw gusl krdn bUd ]

Should convert to:
ਅਸਮਾਨ ਮੵਿਾਨੇ ਲਹੰਗ ਦਰੀਆ ਗੁਸਲ ਕਰਦਨ ਬੂਦ ॥

Platform ਮਿਾੵਨੇ ਮਿੵਾਨੇ ਮੵਿਾਨੇ
iOS 11
iOS 12
iOS 13 Beta
MacOS High Sierra
MacOS Mojave
MacOS Catalina Beta
Windows 10 (Edge/Office)
Windows 10 (Chrome/FF)
Android (9, 8)

Hindi pronunciation should be written out instead of symbolized (ੴ)

Our english and urdu pronunciations do not symbolize ik oankaar. They are written out in their native scripts to help readers pronounce gurmukhi.

image

Hindi pronunciation is following a centuries old tradition of hindi-written gutkas using gurmukhi for ੴ. This would not help newcomers who speak/read exclusively hindi understand how to pronounce it.

Given that we have the gurmukhi always on display, the pronunciation would be pretty easy to connect with gurmukhi ੴ.

Instead I propose we show इक ओंकार or इक-ओंकार (उोंकार? needs proper confirmation) (the equivalent of gurmukhi's iek EMkwr) to help hindi readers get an insight into the important pronunciation of this.

I would imagine hindi speakers, who lived in punjab and have some exposure to sikhi, would maybe know how to pronounce ੴ (though some would likely be slightly off in their pronunciation), and others who didn't go to gurdwaras or had any real exposure to sikhi, I can't imagine they would know what the second symbol is. Surely, they'll get the "ik" part since it's the same in both languages.

Proposal will help both types of speakers in their correct pronunciation of ੴ (ik oankaar). I will talk to some people in person to help get a consensus of what is best for those hindi readers/speakers to better access the gurmukhi script.

add docs in panjabi language (ਪੰਜਾਬੀ ਦਸਤਾਵੇਜ਼)

Persona

ਪੜ੍ਹਾ ਵਾਲੇ

Goal

ਪੰਜਾਬੀ ਵਿੱਚ ਦਸਤਾਵੇਜ਼ ਵੀ ਦਿਓ

Motivation

ਪੰਜਾਬੀ ਵਿੱਚ ਤਫ਼ਸੀਲ ਚੰਗੀ ਹੋਏਗੀ

Acceptance Criteria

I wanted to suggest providing the documentation for this tool (possibly others) in Punjabi as well as English in the interest of fostering the use of the language on the internet. What do people here think?

Approach

No response

Mockup

No response

Update toShahmukhi with appropriate end of character transliteration

[ new RegExp( `(\\S[^ਹ])([ਿੁ])([\\s$${vishraams.join( '' )}])`, 'ug' ), '$1$3' ], // Remove trailing ੁ and ਿ except when on ਹ or on a standalone akhar

This would also need to be updated.

Perhaps best to open a new issue for you to work on, I don't know how to fix the tests on this

image

Please allow this PR to be merged in for the time being

Originally posted by @bhajneet in #177 (comment)

Change toAscii to accept ascii strings

This would allow toAscii to be used to sanitize ascii strings. It would also only convert unicode gurmukhi to it's ascii counterpart for old font set ups. And it would fix any ascii character combos.

Rename stripAccents to something more familiar?

I suggested "toBaseForm" or "toCoreConsonant" or such. However, Harjot pointed out that there is a convention to use "strip".

I wonder if we can simply use the same function as the "main letters" column? They would be to base character since we only use main letters for search purposes?

Originally posted by @bhajneet in https://github.com/ShabadOS/desktop/pull/579/files/610320d6eeeb02bfc8d5a57cd320ef1d1042dc42


Bhajneet those are mostly vowels, with the exception of the last 3/4. We can certainly look at renaming the function, but please FAI in gurmukhi-utils.

One preliminary note is that our convention so far has been to name anything that removes something to stripSomething, instead of toSomethingElse, so I'd recommend us trying to be consistent about it

Originally posted by @Harjot1Singh in https://github.com/ShabadOS/desktop/pull/579/files/610320d6eeeb02bfc8d5a57cd320ef1d1042dc42

"firsLetters" function is not aligned with 'first_letters' colum of 'line' table from ShabadOs database

Question

I am trying to query data from 'lines' table of ShabadOs database using 'first_letters' column. Before I fetch data, for some reasons I use firstLetters function from 'gurmukhi-utils' on a line from ShabadOs database. I found out that in the column 'first_letters' of 'lines' table of the database is ignoring lower case 'i' therefore there is no lower case 'i' in the 'first_letters' column for any row in the database but the firstLetters function is not ignoring the lower case 'i'. Is it intentional this behavior ?

Example from a line taken from ShabadOs database:

id shabad_id first_letters gurmukhi
"C5NR" "ZKH" "gkvvv" "gwvY ko; ividAw ivKmu vIcwru ]"

As you can see the column 'first_letters' for the data from 'gurmukhi' column is 'gkvvv' and apparently it is skipping every lower case 'i' but on the hand when i run 'firstLetters' function from 'gurmukhi-utils' it is not skipping lower case 'i'.

firstLetters("gwvY ko; ividAw ivKmu vIcwru ]"); // it gives me "gkiiv".

I just want to know is it intentional or it is a bug ?

Dhanwaad

Add ES6 Export

When using in Node.js, it should use index.js directly rather than the webpacked version.

Transliteration error

On the transliteration for 'pair' letter ShabadOS is producing the @ sign.

See image below:
image

Letter should be ignored in the transliteration

Mac OS Mojave
iMac
Shabad OS App Version 2.9.0, Database Version 4.7.0

Letters function

Similar to firstLetters() except should grab every letter in the string, excluding vowels by default.

Unicode vowel-letters do not produce correct firstLetter output

Describe the bug
Unicode vowel-letters do not produce correct `firstLetter output. Instead, the conjoined vowel-letter is included.

To Reproduce
firstLetters('ਇਕਨਾ. ਹੁਕਮੀ ਬਖਸੀਸ; ਇਕਿ, ਹੁਕਮੀ ਸਦਾ ਭਵਾਈਅਹਿ ॥') => ਇ.ਹਬ;ਇ,ਹਸਭ॥`

Expected behavior
ੲ.ਹਬ;ੲ,ਹਸਭ॥

Add more Shahmukhi Rules

Rules gotten from feedback

  • For , اُ should be used if in the beginning of the word, و should be used if at the end of the word.
  • ے is never used in the middle of a word, only at the end, should use ی by default.
  • If there are 2 vowels, in the case of ਨਾਈ, a hamza ء needs to be added.

Separate transliteration vs pronunciation

Transliteration is a two way process. If you feed gurmukhi into a transliterator and then try to convert that back into gurmukhi it should match. 1 to 1. 0 loss of data.

I think today only our Hindi is close to actually being able to do this. And that's because all the characters of gurbani have a corresponding character in devanagri.

This is also possible in English using accented characters. If you start to use 2 letters for translit in english then you almost absolutely must provide a letter-separator char to interpret it programmatically.

I would argue that we change our functions for translit into functions for pronunciation outside of Hindi. This should be reflected in the desktop frontend as well.

If we want a true english translit, I would recommend to start off the basis of what Sikh RI have done.

And any translit which converts the one in ੴ to ਇਕ is not a transliteration at all. Same with the second character. These cannot be converted back programmatically and thus are not a true 1-1 transliteration.

A transliteration need not necessarily be easy to read for pronunciation's sake. Any loss of sihari aunkurh etc which may be used for grammatical rules is a failure of transliteration.

Excess spaces in `stripEndings`

Describe the bug
stripEndings() can leave behind leading spaces. These should be removed.

To Reproduce
stripEndings('] jpu]) => ' jpu'`

Expected behavior
stripEndings('] jpu]) => 'jpu'`

Add a function to sanitize ascii gurmukhi strings

Is your feature request related to a problem? Please describe.
Our database has words like Anµd which a common user cannot search. They may search instead AnMd. Which our backend must convert to the former for proper matching.

Describe the solution you'd like
For strings like ਅਨੰਦ converted toAscii and then potentially sanitized to match the database.

stripEndings translation edge cases

Describe the bug
Some translations have numbers, such as 8.4 million mid-line. This causes it to be detected as a line ending and then everything including it onwards is removed, incorrectly.

To Reproduce
stripEndings( 'Through 8.4 million incarnations you have wandered' ) => Through

Expected behavior
stripEndings( 'Through 8.4 million incarnations you have wandered' ) => Through 8.4 million incarnations you have wandered

Conversion rules of ਹ (haha) for typical punjabi pronunciation

for the word jaahi and similar words that are basically 'aahi', currently shabad os is make it aeh but it should be 'aahi' e.g.

oe ji aaveh aas kar jaeh niraase kit |

should be
jaahi not jaeh

To Reproduce
Steps to reproduce the behavior:
type in ajAAkjnk

Screenshots
If applicable, add screenshots to help explain your problem.
image

Specs

  • Device: iMac
  • OS: 10.14.6
  • Version 2.10.1, Database 4.7.0

This is creating bit of extra work when copying and pasting if it could be rectified in the algorithm/rules that would be hugely appreciated!

add conversion to IPA

Will allow for TTS and potential usage on Voice Assistants (such as Alexa or Google Assistant)

Spaces before endings and rahao variations are not removed with `stripEndings`

Describe the bug
stripEndings() does not remove variations of rahao, nor are any all preceding space characters removed.

To Reproduce

  1. stripEndings('] jpu ]') // => ' jpu'
  2. stripEndings('ਤਾ ਖਸਮੈ ਮਿਲਣਾ ॥੧॥ ਰਹਾਉ ਦੂਜਾ ॥') // => 'ਤਾ ਖਸਮੈ ਮਿਲਣਾ ਦੂਜਾ'

Expected behavior

  1. stripEndings('] jpu ]') // => 'jpu'
  2. stripEndings('ਤਾ ਖਸਮੈ ਮਿਲਣਾ ॥੧॥ ਰਹਾਉ ਦੂਜਾ ॥') // => ਤਾ ਖਸਮੈ ਮਿਲਣਾ

Transliteration corrections

Referring to the pangti naanak kehat pukaar kai..., for three letter words that have 'haha' in the middle, the pronunciation rule is that the first letter gets a laav (if you're of that school of thought), so if possible can we develop the translit rules so it says 'kehat' not kahat. Similar word is like Sehaj.

In the next pangti sabh bheo paraaeo

the rule does follow what i suggested but it's not needed, so if we could simplify to bheo and paraaeo that'd be ideal.

IMG_4193
IMG_1216

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.