Giter Site home page Giter Site logo

jackshendrikov / sensus Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 55.05 MB

NLP with LSTM for Sentiment Analysis of Ukrainian texts

License: MIT License

Jupyter Notebook 99.76% Python 0.24%
sentiment-analysis sentiment-classification naive-bayes-classifier random-forest-classifier knn-classifier keras logistic-regression nlp gensim pymorphy2

sensus's Introduction

Sensus

This repository contains 3 parts of iPython notebooks, which reveal the whole process of model development for the sentiment analysis from data processing to comparative analysis of different LSTM models. Visualization is accompanied throughout the journey. The model was created for the analysis of the Ukrainian text.

📥  Downloading Data

Before running notebooks, we first need to download all the data we will be using.

As always, the first step is to clone the repository:

>> git clone https://github.com/JackShen1/sensus.git

Learning datasets now include 1,000 positive and 1,000 negative book reviews. Originally, this data was taken from a large dataset with a review from Amazon, you can download it here. And then reviews of books were translated with the help of Google Translator into Ukrainian and slightly edited by me. Raw reviews can be found in the data/ folder.

Since there is no support for the Ukrainian language in the NLTC library, we will take a different path. The most complete list of Ukrainian stop words was found here and they were used in this project.

Also at the processing stage (part 1) a stemmer was used for comparison, for good we would use PorterStemmer from nltk.stem, but for obvious reasons we can't. But this is not a problem, because writing your own PorterStemmer realization is not so difficult, so we wrote it for Python based on this PHP code.

And the last thing we need to download is a Word2Vec model. For simplicity, we will use a pretrained Word2Vec model with Ukrainian words-vectors, each of which has a dimension of 300. We chose the lematized version of this model because we already have our sample, which we processed in the part 1, which would fit perfectly here. The model can be found on this website. After downloading, unzip the bz2 archive (~1Gb), for example using this application;

📝  Requirements

In order to run the iPython notebook, you'll need Python (v3.6+) and the following libraries:

  • Keras (v2.4+)
  • Gensim (v3.8+)
  • Pandas (v1.2+)
  • NumPy (v1.19.5+)
  • NLTK (v3.5+)
  • python-decouple (v3.4+)
  • pymorphy2-dicts-uk (v2.4.1+)
  • pymorphy2 (v0.9+)
  • scikit-learn (v0.24.1)
  • SciPy (v0.19.1+)
  • Matplotlib (v2.1.1+)
  • Jupyter

The commands for installing these libraries will follow. First, let's create a virtual environment.

🐍  Creating a Virtual Environment

The easiest way to install Keras, Gensim, NumPy, Jupyter, matplotlib and our other libraries is to start with the Anaconda Python distribution.

  1. Select your OS and follow the installation instructions for Anaconda Python. We recommend using Python 3.6+ (64-bit).

  2. Install the Python development environment on your system:

    >> pip install -U pip virtualenv
  3. If you haven't done so already, download and unzip this entire repository from GitHub:

    >> git clone https://github.com/JackShen1/sensus.git
  4. Use cd to navigate into the top directory of the repo on your machine.

  5. Open Anaconda Promt and install JupyterLab, also enter the following commands:

    >> conda install -c conda-forge jupyterlab    # install JupyterLab
    >> conda create -n sensus pip python=3.7  # choose the Python version
    >> source activate sensus                 # activate the virtual environment

    Alternatively, you can install Jupyter with pip: pip install jupyterlab

  6. Now we can install all the libraries we need:

    >> pip install Keras gensim pandas numpy nltk python-decouple scikit-learn scipy matplotlib pymorphy2
    >> pip install -U pymorphy2-dicts-uk # dictionary for the Ukrainian language
  7. Launch Jupyter by entering:

    >> jupyter notebook

Once you have everything installed, the next time to activate everything, do the following:

  1. Open Anaconda Prompt and enter the project folder with the cd command. Now enter the following commands:

    >> conda activate sensus
    >> jupyter notebook

📋  Overview

In this project in 3 parts the whole process of data preparation and training of our model was described, the comparative analysis of classifiers and various models is carried out. Each stage is accompanied by data visualization. The results are good, as for such small datasets with not very accurate translation. In the future, I will expand the datasets and correct the translation. In everything else, the project works perfectly and can be easily adapted to English or Russian. Read the detailed description in notebooks.

📫  Get in touch

sensus's People

Contributors

jackshendrikov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.