Giter Site home page Giter Site logo

prithvi2226 / bilingual-sentiment-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 193 KB

Bilingual Sentiment Analysis on Two Regional Languages

Jupyter Notebook 100.00%
sentiment-analysis bilingual bilingual-corpora polarity-detection accuracy-score

bilingual-sentiment-analysis's Introduction

Bilingual Sentiment Analysis (On Two Regional Languages)

Start here: Analysis_OG.ipynb

πŸ’­ Background

This project applies concepts and techniques from Natural language processing and Opinion mining.The goal here is simply to build an artificial intelligience system that differentiates Hindi, Marathi code mixed with an english text on basis of their polarity. (ie positive, negative, neutral).overall.

Sentiment vs. Software

Using natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

πŸ”§ Progress

Mining and Collecting the data

The main goal is to get as much comments as possible for this model. We took these comments from major social media websites like facebook and Youtube related to social and political views from many sources which contributes in giving us data in from of polarities. We collected about 5000 comments.

Tagging the data

Next step was to tag all the data according to their polarity(i.e. Positive, Negative, Nuetral). Tagging scheme was basically according to --

  • Positive Comment : 3
  • Negative Comment : 1
  • Neutral Comment : 2

Data Pre-Processing

As the data is all tagged, before feeding it to the model we pre-process the data.The goal of preprocessing text data is to take the data from its raw, readable form to a format that the computer can more easily work with. Most text data, and the data we will work with in this article, arrive as strings of text. Preprocessing is all the work that takes the raw input data and prepares it for insertion into a model.

While preprocessing for numerical data is dependent largely on the data, preprocessing of text data is actually a fairly straightforward process, although understanding each step and its purpose is less trivial. Our preprocessing method consists of two stages: preparation and vectorization. The preparation stage consists of steps that clean up the data and cut the fat. The steps are 1. removing URLs, 2. making all text lowercase, 3. removing numbers, 4. removing punctuation, 5. tokenization, 6. removing stopwords, and 7. lemmatization. Stopwords are words that typically add no meaning.

Splitting data 75-25(train-test)

train_test_split returns four arrays namely training data, test data, training labels and test labels. By default train_test_split, splits the data into 75% training data and 25% test data which we can think of as a good rule of thumb.

test_size keyword argument specifies what proportion of the original data is used for the test set. Here we have mentioned the test_size=0.3 which means 70% training data and 30% test data.

Hyperparameter Tuning

Hyperparameter Tuning used on various algorithms such as linear Regression , XGBoost used in Analysis_OG.ipynb

Accuracy and other Values

Accuracy, precision, Recall and Fscore for every algorithm used is given in Values Simply got overall accuracy around 70%.

πŸ’‘Work to be done

  • Contextual understanding and tone
  • sentiment analysis at Brandwatch?
  • The caveats of sentiment analysis
  • Predictions for the future of sentiment analysis

❓ Open questions

πŸ“š Resources

Sentiment Analysis-related publications

bilingual-sentiment-analysis's People

Contributors

prithvi2226 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.