omnilingo / omnilingo Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 8.0 6.03 MB

Listening-based language learning

Home Page: http://omnilingo.github.io

License: GNU Affero General Public License v3.0

Python 2.55% HTML 0.78% JavaScript 92.26% CSS 4.42%

language-learning language-learning-game languages languages-spoken web-application

omnilingo's Introduction

OmniLingo

What is this?

The goal of the project is to help you practice listening comprehension.

It works by giving you random sentences in the language you're learning and asking you to fill in the gaps. The sentences were submitted by contributors to Mozilla Common Voice platform.

The project aims to not require any knowledge of a meta language in order to start learning. If you are interested in a more traditional course creation project, check out LibreLingo.

The game works by ordering the questions by difficulty, then you are given batches of five with a random task for each of the questions. When you sucessfully answer a batch of five in less time than the audio takes to play, then you advance a level and get given a new batch of five.

Tasks

Fill in the blanks: A cloze-style task
Drag and drop: Get a set of tiles and click on them to build a word or sentence
Pick the right one: Get two options and choose the right one
Spot the word: Get set of six tiles and click on the ones that appear in the audio

Keys

Space: Play the recording
Enter:
1. Submit and check if you got it right
2. If already submitted, move to the next recording

Data

The data comes from the Common Voice dataset releases.

Target audience

This system is designed with two main user groups in mind:

People who want to learn a new language
People who want to learn how to write their native language

The system endeavours to be audio first, with knowledge of writing built up by hearing.

Contact

Talk to us

IRC: irc.freenode.net #OmniLingo
Matrix: #OmniLingo:matrix.org (access via Element)
Telegram: OmniLingo

Available languages

All of the languages available in Common Voice 6.1 dataset.

Abkhaz · Arabic · Assamese · Breton · Catalan · Hakha Chin · Czech · Chuvash · Welsh · German · Dhivehi · Greek · English · Esperanto · Spanish · Estonian · Basque · Persian · Finnish · French · Frisian · Irish · Hindi · Upper Sorbian · Hungarian · Interlingua · Indonesian · Italian · Japanese · Georgian · Kabyle · Kyrgyz · Luganda · Lithuanian · Latvian · Mongolian · Maltese · Dutch · Odia · Punjabi · Polish · Portuguese · Romansh Sursilvan · Romansh Vallader · Romanian · Russian · Kinyarwanda · Sakha · Slovenian · Swedish · Tamil · Thai · Turkish · Tatar · Ukrainian · Vietnamese · Votic · Chinese (China) · Chinese (Hong Kong) · Chinese (Taiwan)

If you want to work with a language not yet in Common Voice, we highly recommend that you get set up in Common Voice, but in the meantime, you can check out the format guidelines.

Releases

0.1.0 Functional proof of concept
0.2.0 Partial prototype with level progression

Deployment

For deployment information check out our blogpost at the IPFS blog.

To add more languages, download a dataset from Common Voice and put it in cv-corpus-6.1-2020-12-11/.

Happy hacking! :)

Dependencies

For those who prefer to install their dependencies through their package manager in Debian/Ubuntu, the following dependencies are available there:

python3-mutagen - audio metadata editing library (Python 3)
python3-jieba - Jieba Chinese text segmenter (Python 3)
python3-flask - micro web framework based on Werkzeug and Jinja2 - Python 3.x

Acknowledgements

Logo by Fabi Yamada! Licensed under CC-BY.
Funding generously provided by Protocol Labs.
Recording and mp3 encoding code based on https://github.com/welll/record-encode-audio-from-browser (MIT license)
libmp3lame port to javascript by Andreas Krennmair [email protected]; libmp3lame is under LGPL

omnilingo's People

Contributors

Stargazers

Watchers

Forkers

oddaaron00 jonorthwash alunduil finickydesert haierzld parajbs harikalarkutusu sublime-coleslaw

omnilingo's Issues

Add a pre-processing build step, so that it doesn't happen on startup

Collection of German sentences takes over 10 minutes to load and this is under PyPy. This is way too long.

Maybe use SQLite as the output format - we could then have an index based on difficulty.

Add alternative keybindings

Another option, thanks @d33tah is to have:

Up: Play
Down: Submit
Left/right: switch between blanks

Implement spaced repetition

The system should repeat stuff the user gets wrong.

Get user permission to play

At the moment the user has to explicitly use the play button in order to give permission, which is annoying when trying to add a keyboard only mode.

Add keyboard-only mode

There should be a mode that only uses keyboard input.

Play the audio (or replay)
Type
Submit
Next clip

We will probably need to use some option key for play/next clip.

Implement edit distance using panphon

This will require a g2p model per language.

https://github.com/dmort27/panphon

Potentially can use wikipron for that.

Ignore case in comparison

temel = Temel

Focus entry box after playing audio

Gamify the thing

A memory task

Suggested by @brianrocca:

http://www.manythings.org/ac/

Basically:

Memory game for the audio. One idea would be to give people longer segments and get them to remember and pick out as many words as they can after hearing the audio once.

Add basic feedback

This is related to some other issues, e.g. #2 #7 . But it would be good to have some basic feedback while we work out what will actually work. I'll implement tick and cross for each batch of 10.

Add a system test, maybe Selenium-based?

Schedule regular dev meetings

IMO the date should be predictable.

Every week? Two weeks? A month?

Open or just both of us?

Deleting a single language should only invalidate that language

e.g. if we have hi/tr/fi, and we delete the cache for hi it should only regenerate that.

keep track of ones the user got right/wrong

don't present the ones the user already got right again

Put the instructions behind a click

Maybe something like
fa-question-circle from FontAwesome.

throw all inline CSS into an external CSS file

Add good multiple language support

Currently it allows choosing a different language at startup, but it would be cool if the same server could serve multiple languages.

Tab to play stopped working

I think probably after adding the Blanks/Choices options. Probably onload should focus the player.

Theme the player

In multiple choices mode, sometimes both options are the same

As in the screenshot:

improve text input

make it more obvious where to type, maybe have an actual box
allow pressing return to submit

Select sentences based on desired difficulty level

Set up a project-specific domain

This is blocked by #1, but I think it could prove useful in many ways. There are lots of domains that cost less than a typical coffee per year and the renewal is just as cheap.

Add a script that builds a random sample of sentences

When testing various algorithms, I don't need thousands of sentences, a few hundred would do.

BiDi layout

Make sure we have a layout that works for bidirectional text.

Speed up question loading

@ftyers said it's slow because of the code behind distractors. It would be a good idea to optimize that.

A task for identifying N words in the audio

The task:

Play an audio
Get presented with e.g. 6 words (or Chinese characters)
- 3 are in the audio, 3 are not (distractors)
You have to click on the ones that are there, without the ones that aren't.

Thanks to @JacobSchmitt for the idea!

Add a way to set up reminders to keep users engaged (e-mail, chat?)

Come up with a better name

Add internationalization support

Let people upload a word list and only give sentences with those words

Improve difficulty ranking

Ideas:

Use LM perplexity
Use compression ratio

E.g. we need to think about: (1) short sentences with rare words and (2) someone speaking slowly but with a lot of background noise.

We could try with the linear interpolation of all three, or it could be configurable.

Have the audio
Take the transcript, split into characters
Scramble the characters and present them as tiles
Give the user slots to drag the characters into
Slots turn green when the user gets it right, otherwise red,
- If red then the user can drag it back or drag it to another slot

This could be useful info for the implementation: https://daily-dev-tips.com/posts/vanilla-javascript-drag-n-drop/

client_id, unixtime, clipid, correctword, userresponse

Add a licence

My preference:

AGPL
GPL v3
MIT/BSD