Giter Site home page Giter Site logo

kirandevraj / slimt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jerinphilip/slimt

0.0 0.0 0.0 296 KB

Inference slice of marian for bergamot's tiny11 models. Faster to compile, and wield. Fewer model-archs than bergamot-translator.

License: GNU General Public License v2.0

Shell 2.42% C++ 77.89% Python 14.89% CMake 4.80%

slimt's Introduction

slimt

slimt (slɪm tiː) is an inference frontend for tiny models trained as part of the Bergamot project.

bergamot-translator builds on top of marian-dev and uses the inference code-path from marian-dev. While marian is a a capable neural network library with focus on machine translation, all the bells and whistles that come with it are not necessary to run inference on client-machines (e.g: autograd, multiple sequence-to-sequence architecture support, beam-search). For some use cases like an input-method engine doing translation (see lemonade) - single-thread operation existing along with other processes on the system suffices. This is the motivation for this transplant repository. There's not much novel here except easiness to wield. This repository is simply just the tiny part of marian. Code is reused where possible.

This effort is inspired by contemporary efforts like ggerganov/ggml and karpathy/llama2. tiny models roughly follow the transformer architecture, with Simpler Simple Recurrent Units (SSRU) in the decoder. The same models are used in Mozilla Firefox's offline translation addon.

Both tiny and base models have 6 encoder-layers and 2 decoder-layers, and for most existing pairs a vocabulary size of 32000 (with tied embeddings). The following table briefly summarizes some architectural differences between tiny and base models:

Variant emb ffn params f32 i8
base 512 2048 39.0M 149MB 38MB
tiny 256 1536 15.7M 61MB 17MB

The i8 models, quantized to 8-bit and as small as 17MB is used to provide translation for Mozilla Firefox's offline translation addon, among other things.

More information on the models are described in the following papers:

The large-list of dependencies from bergamot-translator have currently been reduced to:

  • For int8_t matrix-multiply intgemm (x86_64) or ruy (aarch64) or xsimd via gemmology.
  • For vocabulary - sentencepiece.
  • For sentence-splitting using regular-expressions PCRE2.
  • For sgemm - Whatever BLAS provider is found via CMake (openblas, intel-oneapimkl, cblas). Feel free to provide hints.
  • CLI11 (only a dependency for cmdline)

Source code is made public where basic functionality (text-translation) works for English-German tiny models. Parity in features and speed with marian and bergamot-translator (where relevant) is a work-in-progress. Eventual support for base models are planned. Contributions are welcome and appreciated.

Getting started

Clone with submodules.

git clone --recursive https://github.com/jerinphilip/slimt.git

Configure and build. slimt is still experimenting with CMake and dependencies. The following, being prepared towards linux distribution should work at the moment:

# Configure to use xsimd via gemmology
ARGS=(
    # Use gemmology
    -DWITH_GEMMOLOGY=ON               

    # On x86_64 machines use the following to enable a faster matrix
    # multiplication backend using SIMD. All of these can co-exist and dispatch
    # on best detecting CPU at runtime.
    -DUSE_AVX512=ON -DUSE_AVX2=ON -DUSE_SSSE3=ON -DUSE_SSE2=ON

    # Uncomment below line, comment x86_64 above and use for aarch64, armv7+neon)
    # -DUSE_NEON=ON 

    # Use sentencepiece installed via system.
    -DUSE_BUILTIN_SENTENCEPIECE=OFF        

    # Exports slimtConfig.cmake (cmake) and slimt.pc.in (pkg-config)
    -DSLIMT_PACKAGE=ON 

    # Customize installation prefix if need be.
    -DCMAKE_INSTALL_PREFIX=/usr/local
)

cmake -B build -S $PWD -DCMAKE_BUILD_TYPE=Release "${ARGS[@]}"
cmake --build build --target all

# Require sudo since /usr/local is writable usually only by root.
sudo cmake --build build --target install 

The above run expects the packages sentencepiece, xsimd and a BLAS provider to come from the system's package manager. Examples of this in distributions include:

# Debian based systems
sudo apt-get install -y libxsimd-dev libsentencepiece-dev libopenblas-dev

# ArchLinux
pacman -S openblas xsimd
yay -S sentencepiece-git

Successful build generate two executables slimt-cli and slimt-test for command-line usage and testing respectively.

build/bin/slimt-cli                           \
    --root <path/to/folder>                   \
    --model </relative/path/to/model>         \
    --vocabulary </relative/path/to/vocab>    \
    --shortlist </relative/path/to/shortlist>

build/slimt-test <test-name>

This is still very much a work in progress, towards being able to make lemonade available in distributions. Help is much appreciated here, please get in touch if you can help here.

Python

Python bindings to the C++ code are available. Python bindings provide a layer to download models and use-them via command line entrypoint slimt (the core slimt library only has the inference code).

python3 -m venv env
source env/bin/activate
python3 -m pip install wheel
python3 setup.py bdist_wheel
python3 -m pip install dist/<wheel-name>.whl

# Download en-de-tiny and de-en-tiny models.
slimt download -m en-de-tiny
slimt download -m de-en-tiny

Find an example of the built wheel running on colab below:

Open In Colab

You may pass customizing cmake-variables via CMAKE_ARGS environment variable.

CMAKE_ARGS='-D...' python3 setup.py bdist_wheel

slimt's People

Contributors

georg3tom avatar jerinphilip avatar kirandevraj avatar sivaprasad2000 avatar sphanit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.