Giter Site home page Giter Site logo

atul-ai08 / lingo-blend Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 23 KB

This repository contains Python implementations for processing multilingual text data, focusing on language classification and translation tasks. The project addresses two distinct tasks: language classification and English translation, each involving different complexities in the processing of text data.

License: MIT License

Jupyter Notebook 100.00%
ai4bharat muril nllb200 translation

lingo-blend's Introduction

Lingo Blend

Python PyTorch


This repository contains Python implementations for language classification and English translation tasks, specifically designed for Bengali, Hindi, Punjabi, Tamil, and Telugu languages.

Overview

  • Framework used: PyTorch, HuggingFace Transformers
  • Languages: Bengali, Hindi, Punjabi, Tamil, Telugu

Language Classification

Our language classification process utilizes the MuRIL (Multilingual Representations for Indian Languages) model. MuRIL is a state-of-the-art multilingual representation model developed specifically for Indian languages. The text is processed through the MuRIL to obtain embeddings which are passed through a dense layer for probability prediction. MuRIL is designed to handle code-mixed text effectively, making it ideal for processing Romanized code-mixed sentences.

English Translation

Our process involves identifying the language, transliterating the text to the native script, and then translating it into English. We employ the MuRIL to identify the language of the given text as described in the language classification task. Once the language is identified, we transliterate the text into its native script using the Python-based transliteration tool IndicXlit, a multilingual transliteration model developed by AI4Bharat. After transliteration, we translate the text into English using a distilled version of the NLLB-200 (No Language Left Behind) model, primarily intended for research in machine translation, especially for low-resource languages.

Acknowledgments

Feel free to contribute to the development and improvement of this project! Your contributions are valuable in advancing multilingual text processing techniques.

lingo-blend's People

Contributors

atul-ai08 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.