Giter Site home page Giter Site logo

bidi-js's Introduction

bidi-js

This is a pure JavaScript implementation of the Unicode Bidirectional Algorithm version 13.0.0. Its goals, in no particular order, are to be:

  • Correct
  • Small
  • Fast

Conformance

This implementation currently conforms to section UAX-C1 of the bidi spec, as verified by running all the provided conformance tests.

Compatibility

It has no external dependencies and therefore should run just fine in any relatively capable web browser, Node.js, etc. The provided distribution .js files are valid ES5.

Usage

Install it from npm:

npm install bidi-js

NPM

Import and initialize:

import bidiFactory from 'bidi-js'
// or: const bidiFactory = require('bidi-js')

const bidi = bidiFactory()

The bidi-js package's only export is a factory function which you must invoke to return a bidi object; that object exposes the methods for bidi processing.

(Why a factory function? The main reason is to ensure the entire module's code is wrapped within a single self-contained function with no closure dependencies. This enables that function to be stringified and passed into a web worker, for example.)

Now that you have the bidi object, you can:

Calculate bidi embedding levels

const embeddingLevels = bidi.getEmbeddingLevels(
  text, //the input string containing mixed-direction text
  explicitDirection //"ltr" or "rtl" if you don't want to auto-detect it
)

const { levels, paragraphs } = embeddingLevels

The result object embeddingLevels will usually be passed to other functions described below. Its contents, should you need to inspect them individually, are:

  • levels is a Uint8Array holding the calculated bidi embedding levels for each character in the string. The most important thing to know about these levels is that any given character is in a right-to-left scope if its embedding level is an odd number, and left-to-right if it's an even number.

  • paragraphs is an array of {start, end, level} objects, one for each paragraph in the text (paragraphs are separated by explicit breaking characters, not soft line wrapping). The start and end indices are inclusive, and level is the resolved base embedding level of that paragraph.

Calculate character reorderings

const flips = bidi.getReorderSegments(
  text, //the full input string
  embeddingLevels //the full result object from getEmbeddingLevels
)

// Process all reversal sequences, in order:
flips.forEach(range => {
  const [start, end] = range
  // Reverse this sequence of characters from start to end, inclusive
  for (let i = start; i <= end; i++) {
    //...
  }
})

Each "flip" is a range that should be reversed in place; they must all be applied in order.

Sometimes you don't want to process the whole string at once, but just a particular substring. A common example would be if you've applied line wrapping, in which case you need to process each line individually (in particular this does some special handling for trailing whitespace for each line). For this you can pass the extra start and end parameters:

yourWrappedLines.forEach(([lineStart, lineEnd]) => {
  const flips = bidi.getReorderSegments(
    text,
    embeddingLevels,
    lineStart,
    lineEnd //inclusive
  )
  // ...process flips for this line
})

Handle right-to-left mirrored characters

Some characters that resolve to right-to-left need to be swapped with their "mirrored" characters. Examples of this are opening/closing parentheses. You can determine all the characters that need to be mirrored like so:

const mirrored = bidi.getMirroredCharactersMap(
  text,
  embeddingLevels
)

This returns a Map of numeric character indices to replacement characters.

You can also process just a substring with extra start and end parameters:

const mirrored = bidi.getMirroredCharactersMap(
  text,
  embeddingLevels,
  start,
  end //inclusive
)

If you'd rather process mirrored characters individually, you can use the single getMirroredCharacter function, just make sure you only do it for right-to-left characters (those whose embedding level is an odd number.) It will return null if the character doesn't support mirroring.

const mirroredChar = (embeddingLevels.levels[charIndex] & 1) //odd number means RTL
    ? bidi.getMirroredCharacter(text[charIndex])
    : null

Get a character's bidi type

This is used internally, but you can also ask for the "bidi character type" of any character, should you need it:

const bidiType = bidi.getBidiCharTypeName(string[charIndex])
// e.g. "L", "R", "AL", "NSM", ...

bidi-js's People

Contributors

lojjic avatar talltyler avatar nmtigor avatar gztchan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.