Giter Site home page Giter Site logo

ryeeshudhurandhar / statistical-language-modeling-using-n-grams Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 5.11 MB

The model predicts the most probable next word and outputs the correctness of an input English sentence using the trigram model.

License: MIT License

Jupyter Notebook 100.00%

statistical-language-modeling-using-n-grams's Introduction

Statistical-Language-Modeling-Using-N-Grams

The model predicts the most probable next word and outputs the correctness of an input English sentence using the trigram model.

Course Project for MA 202 [Probability and Statistics]


Abstract

Using Natural Language Processing, the model predicts the most probable next word and outputs the correctness of an input English sentence. To achieve the optimum accuracy, a large reliable dataset or corpus is extracted from Wikipedia, preprocessed, and then analyzed before using it to train the model. Analyzing the dataset and its visualization can be an insightful technique to understand the corpus before using it for the model's training. Choosing an appropriate model for any problem is a crucial step. In our case, using a trigram model to train the data proved to be the best trade-off. This trained model is finally used in the code to predict the next word and find the perplexity of a given sentence based on the trigram model.

Problem Statement

Computers were once thought of as “dumb terminals,” and human interactions were based on the principle of “garbage in, garbage out.” Computers could only communicate in sophisticated hand-coded rules. Natural Language Processing bridges the gap between humans and computers by enabling humans to interact with computers in human-developed languages. It can have various use-cases such as voice assistants, speech recognition, computer-assisted coding, and word & sentence prediction. The boundless possibilities in NLP, yet to be explored, motivate us to work in this field.

Requirements

  • nltk

License

The code is licenced under the MIT license and free to use by anyone without any restrictions.

statistical-language-modeling-using-n-grams's People

Contributors

ryeeshudhurandhar avatar

Watchers

 avatar

Forkers

mihirsutariya

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.