unfoldingword / wordmap Goto Github PK

View Code? Open in Web Editor NEW

6.0 11.0 1.0 2.06 MB

Multilingual Word Alignment Prediction

Home Page: https://wordmap.netlify.com

TypeScript 99.57% JavaScript 0.43%

wordmap scripture-open-components

wordmap's Introduction

wordMAP

Multi-Lingual Word Alignment Prediction

Word alignment prediction is the process of associating (mapping) words from some primary text with corresponding words in a secondary text. his tool uses statistical algorithms to determine which words or phrases in two texts are equivalent in meaning.

With wordMAP you can create amazing translation tools that:

Ensure all terms and phrases in the primary text have a proper translation in the secondary text.
Provide in-context vocabulary suggestions to the translator.
Helps prevent inconsistencies in the translation.
Pre-translates text.

Installation

yarn add wordmap

Usage

Here's a minimum setup example.

const map = new WordMAP();
map.appendAlignmentMemoryString("Tag", "day");
const source = "Guten Tag";
const target = "Good morning";
const suggestions = map.predict(source, target);
console.log(suggestions[0].toString());
// produces -> "0 [0|n:guten->n:good] [0|n:tag->n:morning]"

Use Cases

Aligning a primary text with a secondary text e.g. when generating word maps for gateway languages.
Aligning a secondary text with a ternary text.
Aligning a primary text to a ternary text (using the secondary as a proxy)

The Need

Existing tools require large data sets, complex running environments, and are usually limited to running in a server environment.

We need a tool that:

runs on the client with minimal configuration.
works with existing web browser technology.
integrates with translationCore and related tools.
works without an Internet connection.
does not have a minimum corpus size.
requires minimal system resources.

Learn more

Want to learn more? Read WHITEPAPER.md.

Development

When publishing to npm be sure to use the command yarn deploy. This will publish the proper module structure to npm.

wordmap's People

Contributors

Stargazers

Watchers

Forkers

foxprogs

wordmap's Issues

word occurrence is suggested in the wrong order when aligned to dissimilar ngrams.

This is different from the issue fixed here #49.

In this case the source tokens are not similar.

EDIT: see better screenshot here.

Improvements for aligning greek to greek.

Some resources are aligned to different greek texts. Since translationCore is tied to a specific greek text this renders those other resources useless. Aligning the different greek sources together could provide a "key" that would enable resources based on a different greek text to be used inside of translationCore.

Converting resources using these "keys" could be performed in a separate CLI (not wordMap).

translationCore uses the UGNT greek text.
We can start by testing with aligning the westcot hort greek text to the UGNT.

The first step will be to collect a sample of the WH text.
Potential resources:

Brainstorming Session

These are the results of our brainstorm in August.

wordMAP 2.0 Planning

This is a placeholder issue for planning out potential features we'd like to see in a 2.0 version. Later we can split things up into separate issues.

Improve suggestions when competing alignment memory is provided

wordMAP does not seem to handle competing alignment memory very well. One alignment memory seems to always win regardless of the alignment memory distribution.

From Larry:

It seems to me the problem is that in many places the Greek article is not present, so we need to add in the English article to make the grammar work. Therefore, the alignment memory has places were both the and God align to θεός, and places where the aligns to the Greek article ὁ. It seems the memory is trying to double up in this example, and is scavenging the text to find another the to align with ὁ since it has already aligned the closes one to θεός.

alignment ouptut idea.

We need to support alignments across verses.
Here are three possible solutions.

{
          "confidence": 0.516905944153279,
          "sourceNgram": [0],
          "targetNgram": [10, { // add an object
            "position": 0,
            "verse": 2,
            "chapter": 3
          }],
          // or separate object
          "versification": {
            "target": {
              "nextVerseId": 1
            },
            "source": {
              
            }
          }
        },

or we can just keep the number ids and format it like this 11001 e.g. chapter 11 and verse 1.