Giter Site home page Giter Site logo

luabidi's Introduction

luabidi

Lua implementation of the Unicode Bidirectional Algorithm, as specified in UAX #9.

Installing

luarocks install luabidi

Documentation

Quickstart

local bidi = require('bidi')
local serpent  = require('serpent') -- luarocks install serpent

local text = {0x06CC, 0x06C1} -- "یہ" U+06CC U+06C1

local reordered_text = bidi.get_visual_reordering(text)

-- hex representation
for i,v in ipairs(reordered_text) do
  reordered_text[i] = string.format("U+%04X", v)
end

for i,v in ipairs(text) do
  text[i] = string.format("U+%04X", v)
end

print("Original codepoints (in logical order): " .. serpent.line(text,{comment = false}))
print("Visual reordering: " .. serpent.line(reordered_text,{comment = false})) -- should be { "U+06C1", "U+06CC"}

More sample code in the examples folder.

Development

Building

luarocks make

Testing and Linting

In order to make changes to the code and run the tests, the following dependencies need to be installed:

  • Bustedluarocks install busted
  • luacheckluarocks install luacheck

Run the test suite:

make spec

Lint the codebase:

make lint

Contact

Open a Github issue, or email me at [email protected].

luabidi's People

Contributors

deepakjois avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

va13ak

luabidi's Issues

Incorporate ucdn inside this repo

It might simplify things a bit to just have the ucdn code inside the repo itself. Optionally, we can continue to maintain luaucdn for others to consume as a dependency.

Update implementation to Unicode 8.0

The Java reference implementation is not updated to Unicode 8.0 yet. But the C implementation has been updated.

Here are the differences in the Source file: https://gist.github.com/deepakjois/5a3ae81a105abd3523ed0efe2e52f52e/revisions

Here are the test cases that are currently failing:

################################################################################
# Test cases for the algorithm changes and clarifications made in Unicode 8.0
#
## Explicit directional overrides applied to isolates tightly flanked by embeddings
#202E 0061 202A 0062 202C 2066 0063 2069 202A 0064 202C 0065 202C;2;0;x 1 x 2 x 1 2 1 x 2 x 1 x;11 9 7 6 5 3 1
#202E 0061 202A 0062 202C 2066 0063 2069 202A 0064 202C 0065 202C;1;1;x 3 x 4 x 3 4 3 x 4 x 3 x;11 9 7 6 5 3 1
#202D 05D0 202B 05D1 202C 2068 05D2 2069 202B 05D3 202C 05D4 202C;2;1;x 2 x 3 x 2 3 2 x 3 x 2 x;1 3 5 6 7 9 11
#202D 0661 202B 0662 202C 2068 0663 2069 202B 0664 202C 0665 202C;0;0;x 2 x 4 x 2 6 2 x 4 x 2 x;1 3 5 6 7 9 11
#
## Explicit directional overrides applied to paired brackets
#202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6
#202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6
#202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3
#202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3
#202A 202E 0061 202C 0028 05D0 202C 202D 0029 202C;2;0;x x 3 x 3 3 x x 2 x;5 4 2 8
#202B 202D 05D0 202C 0028 0061 202C 202E 0029 202C;2;1;x x 4 x 4 4 x x 3 x;8 2 4 5
#202A 202E 0061 202C 0028 005B 05D0 202C 202D 005D 0029 202C;2;0;x x 3 x 3 3 3 x x 2 2 x;6 5 4 2 9 10
#202B 202D 05D0 202C 0028 005B 0061 202C 202E 005D 0029 202C;2;1;x x 4 x 4 4 4 x x 3 3 x;10 9 2 4 5 6
#202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4
#202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4
#202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1
#202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1
#202D 202E 0061 202C 0028 202C 202A 05D0 0029 05D1;2;0;x x 3 x 2 x x 3 3 3;2 4 9 8 7
#202E 202D 05D0 202C 0028 202C 202B 0061 0029 0062;2;1;x x 4 x 3 x x 4 4 4;7 8 9 4 2
#202D 202E 0061 202C 0028 005B 202C 202A 05D0 005D 0029 05D1;2;0;x x 3 x 2 2 x x 3 3 3 3;2 4 5 11 10 9 8
#202E 202D 05D0 202C 0028 005B 202C 202B 0061 005D 0029 0062;2;1;x x 4 x 3 3 x x 4 4 4 4;8 9 10 11 5 4 2
#
## Nonspacing marks applied to paired brackets
#0061 0028 0062 0029 0331;1;1;2 2 2 2 2;0 1 2 3 4
#0061 0028 0332 0062 0029 0333;1;1;2 2 2 2 2 2;0 1 2 3 4 5
#05D0 0028 05D1 0029 0331;0;0;1 1 1 1 1;4 3 2 1 0
#05D0 0028 0332 05D1 0029 0333;0;0;1 1 1 1 1 1;5 4 3 2 1 0
#0661 0028 0662 0029 0331;0;0;2 1 2 1 1;4 3 2 1 0
#0661 0028 0332 0662 0029 0333;0;0;2 1 1 2 1 1;5 4 3 2 1 0
#
## Nested bracket pairs that reach and exceed the fixed capacity of the bracket stack
## a ( ( ... ( b ) ) ... ) with 62, 63, and 64 nested bracket pairs
#0061 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0062 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029;1;1;2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2;0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
#0061 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0062 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029;1;1;2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2;0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
#0061 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0028 0062 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029 0029;1;1;2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
#
################################################################################

Perf improvements

Currently the tests against the UCD files BidiTest.txt and BidiCharacterTest.txt take very long (a few hours). In Go, these tests complete in under 30secs.

It is very likely a problem with the test suite, rather than the library itself.

Update implementation to deal with sequences containing paired brackets that have canonical equivalents

look for the section in BidiCharacterTest.txt for the test cases:

# Sequences containing paired brackets that have canonical equivalents
0061 0020 2329 0062 002E 0031 232A;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6
0061 0020 3008 0062 002E 0031 3009;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6
0061 0020 2329 0062 002E 0031 3009;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6
0061 0020 3008 0062 002E 0031 232A;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6
05D0 0020 2329 05D1 002E 0031 232A;0;0;1 1 1 1 1 2 1;6 5 4 3 2 1 0
05D0 0020 3008 05D1 002E 0031 3009;0;0;1 1 1 1 1 2 1;6 5 4 3 2 1 0
05D0 0020 2329 05D1 002E 0031 3009;0;0;1 1 1 1 1 2 1;6 5 4 3 2 1 0
05D0 0020 3008 05D1 002E 0031 232A;0;0;1 1 1 1 1 2 1;6 5 4 3 2 1 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.