Giter Site home page Giter Site logo

wordmap's Introduction

wordMAP

Build Status codecov TypeScript npm version

Multi-Lingual Word Alignment Prediction

Word alignment prediction is the process of associating (mapping) words from some primary text with corresponding words in a secondary text. his tool uses statistical algorithms to determine which words or phrases in two texts are equivalent in meaning.

With wordMAP you can create amazing translation tools that:

  • Ensure all terms and phrases in the primary text have a proper translation in the secondary text.
  • Provide in-context vocabulary suggestions to the translator.
  • Helps prevent inconsistencies in the translation.
  • Pre-translates text.

Installation

yarn add wordmap

Usage

Here's a minimum setup example.

const map = new WordMAP();
map.appendAlignmentMemoryString("Tag", "day");
const source = "Guten Tag";
const target = "Good morning";
const suggestions = map.predict(source, target);
console.log(suggestions[0].toString());
// produces -> "0 [0|n:guten->n:good] [0|n:tag->n:morning]"

Use Cases

  • Aligning a primary text with a secondary text e.g. when generating word maps for gateway languages.
  • Aligning a secondary text with a ternary text.
  • Aligning a primary text to a ternary text (using the secondary as a proxy)

The Need

Existing tools require large data sets, complex running environments, and are usually limited to running in a server environment.

We need a tool that:

  • runs on the client with minimal configuration.
  • works with existing web browser technology.
  • integrates with translationCore and related tools.
  • works without an Internet connection.
  • does not have a minimum corpus size.
  • requires minimal system resources.

Learn more

Want to learn more? Read WHITEPAPER.md.

Development

When publishing to npm be sure to use the command yarn deploy. This will publish the proper module structure to npm.

wordmap's People

Contributors

bspidel avatar da1nerd avatar dependabot-support avatar dependabot[bot] avatar jag3773 avatar klappy avatar mannycolon avatar photonomad0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

foxprogs

wordmap's Issues

Improvements for aligning greek to greek.

Some resources are aligned to different greek texts. Since translationCore is tied to a specific greek text this renders those other resources useless. Aligning the different greek sources together could provide a "key" that would enable resources based on a different greek text to be used inside of translationCore.

Converting resources using these "keys" could be performed in a separate CLI (not wordMap).

translationCore uses the UGNT greek text.
We can start by testing with aligning the westcot hort greek text to the UGNT.

The first step will be to collect a sample of the WH text.
Potential resources:

wordMAP 2.0 Planning

This is a placeholder issue for planning out potential features we'd like to see in a 2.0 version. Later we can split things up into separate issues.


Improve suggestions when competing alignment memory is provided

wordMAP does not seem to handle competing alignment memory very well. One alignment memory seems to always win regardless of the alignment memory distribution.

From Larry:

It seems to me the problem is that in many places the Greek article is not present, so we need to add in the English article to make the grammar work. Therefore, the alignment memory has places were both the and God align to θεός, and places where the aligns to the Greek article . It seems the memory is trying to double up in this example, and is scavenging the text to find another the to align with since it has already aligned the closes one to θεός.

alignment ouptut idea.

We need to support alignments across verses.
Here are three possible solutions.

{
          "confidence": 0.516905944153279,
          "sourceNgram": [0],
          "targetNgram": [10, { // add an object
            "position": 0,
            "verse": 2,
            "chapter": 3
          }],
          // or separate object
          "versification": {
            "target": {
              "nextVerseId": 1
            },
            "source": {
              
            }
          }
        },

or we can just keep the number ids and format it like this 11001 e.g. chapter 11 and verse 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.