The comments-toxicity-detection from globax89

Motivation

People tend to discuss or share opinions on social platforms but such activities sometimes encounter threats or harassments which compel people to not express themselves properly.

Many social platforms try to find out such harassments or threats in conversations so that such conversations can easily be prevented before it causes any further damage.

Toxicity detection in comments is one of such methodologies to find out the different types of conversations that can be classified as toxic in nature.

To increase the efficacy in classifying such comments, we can make use of machine learning algorithms to determine the toxicity in comments.

In this model, many toxic comments have been fed to build a Bidirectional Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) model for fulfilling the purpose.

Requirements

Python 3.7.0+
Tensorflow 2.4.1+
Keras 2.4.3+
matplotlib 3.3.3+
numpy 1.19.5+
pandas 1.2.1+
scikit-learn 0.24.1+
nltk 3.5+
spacy 3.0.3+
textblob 0.15.3+
gradio 1.5.3+

Dataset

You can downloaded the dataset from kaggle. Use the underlying download link to download the dataset.

Instructions

Navigate to data section
In the Data Explorer, you will find four separate zip archives to download
Download test.csv.zip, test_labels.csv.zip and train.csv.zip
Extract the files
Copy the CSV files to the data directory

The following list enumerates different classes (types) of comments -

Toxic	Very Toxic	Obscene	Threat	Insult	Hate	Neutral

Installation

Clone the repository

git clone https://github.com/baishalidutta/Comments-Toxicity-Detection.git

Install the required libraries

pip3 install -r requirements.txt

Model Ideology

Clean text:
- Lower all text
- Remove uncommon signs
- Expand abbreviations
- Correct misspelled words
- Remove punctuations
- Remove emojis
- Remove stop words
- Apply lemmatisation
Tokenize text data
Create Embedding Vector using Glove.6B
Train a Recurrent Neural Network (RNN) with a Bidirectional LSTM layer

Usage

Navigate to the source directory to execute the following source codes.

To generate the model on your own, run

python3 model_training.py

You can also provide your own CSV data:

python3 model_training.py --data=csv_file_location

To evaluate any dataset using the pre-trained model (in the model directory), run

python3 model_evaluation.py

Note that, for evaluation, model_evaluation.py will use the test.csv and test_labels.csv (inside data directory).

Alternatively, you can find the whole analysis in the notebook inside the notebook directory. To open the notebook, use either jupyter notebook or google colab or any other IDE that supports notebook feature such as PyCharm Professional.

Web Application

To run the web application locally, go to the webapp directory and execute:

python3 web_app.py

This will start a local server that you can access in your browser. You can type in any comment and find out what toxicity the model determines.

You can, alternatively, try out the hosted web application here.

Developer

Baishali Dutta ([email protected])

Contribution

If you would like to contribute and improve the model further, check out the Contribution Guide

License

This project is licensed under Apache License Version 2.0

globax89 / comments-toxicity-detection Goto Github PK

comments-toxicity-detection's Introduction

Motivation

Requirements

Dataset

Instructions

Installation

Model Ideology

Usage

Web Application

Developer

Contribution

License

comments-toxicity-detection's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent