Giter Site home page Giter Site logo

nmt-project's Introduction

NMT-Project

Neural Machine Translation (NMT) project for EECS 595 Course.

What is NMT (Neural Machine Translation)

Machine translation represents a cornerstone in the field of Natural Language Processing (NLP), embodying the sophisticated endeavor of translating text or speech from one language to another. This process not only requires accurate transposition of words but also demands a deep understanding of the contextual, idiomatic, and pragmatic nuances inherent to each language. Historically, machine translation has navigated through various phases, initially leveraging rule-based and probabilistic machine learning approaches to surmount the linguistic barriers. However, the recent surge in machine learning advancements has ushered in the era of neural networks, revolutionizing machine translation with their capacity to deliver remarkably superior performance. This project delves into the realm of neural machine translation (NMT), a testament to the strides made in NLP, showcasing its potential as both a formidable tool in future industrial opportunities and a foundational pilot study for expansive research endeavors. Through this exploration, we aim to not only highlight the technical prowess of current NMT systems but also to pave the way for further innovations in overcoming the challenges of cross-lingual communication.

About this Repo

The project is based on the python (3.9.15) and Pytorch (>=2.0.1).

Hugging Face is also used for text tokenization. To use the related function, one has to install the following packages

pip install transformers sentencepiece sacremoses evaluate

The encoder-decoder architecture is used for machine translation tasks.

Different ML algorithms are tried to achieve a better performance and also for us to understand their differences.

  • RNN-based models like LSTM, GRU
  • Transformer model

How to run the Repo

Just go to the jupyter notebook NMT.ipynb. follow the instructions inside the notebook. All used data for the en-fr and en-zh translation models are stored in the data folder. You can have your own dataset, just need to modify the dataset.py file to read them.

nmt-project's People

Contributors

umichyujl avatar zmangosix avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.