Giter Site home page Giter Site logo

globax89 / comments-toxicity-detection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baishalidutta/comments-toxicity-detection

0.0 0.0 0.0 51.58 MB

A machine learning model to detect the toxicity of comments

License: Apache License 2.0

Python 2.31% Jupyter Notebook 97.69%

comments-toxicity-detection's Introduction

logo

online hosted application License - Apache-2.0 Build - Passing GitHub release Python contributions welcome

web-app-screencast

Motivation

People tend to discuss or share opinions on social platforms but such activities sometimes encounter threats or harassments which compel people to not express themselves properly.

Many social platforms try to find out such harassments or threats in conversations so that such conversations can easily be prevented before it causes any further damage.

Toxicity detection in comments is one of such methodologies to find out the different types of conversations that can be classified as toxic in nature.

To increase the efficacy in classifying such comments, we can make use of machine learning algorithms to determine the toxicity in comments.

In this model, many toxic comments have been fed to build a Bidirectional Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) model for fulfilling the purpose.

Requirements

  • Python 3.7.0+
  • Tensorflow 2.4.1+
  • Keras 2.4.3+
  • matplotlib 3.3.3+
  • numpy 1.19.5+
  • pandas 1.2.1+
  • scikit-learn 0.24.1+
  • nltk 3.5+
  • spacy 3.0.3+
  • textblob 0.15.3+
  • gradio 1.5.3+

Dataset

You can downloaded the dataset from kaggle. Use the underlying download link to download the dataset.

Instructions

  • Navigate to data section
  • In the Data Explorer, you will find four separate zip archives to download
  • Download test.csv.zip, test_labels.csv.zip and train.csv.zip
  • Extract the files
  • Copy the CSV files to the data directory

The following list enumerates different classes (types) of comments -

Toxic Very Toxic Obscene Threat Insult Hate Neutral

Installation

  • Clone the repository

git clone https://github.com/baishalidutta/Comments-Toxicity-Detection.git

  • Install the required libraries

pip3 install -r requirements.txt

Model Ideology

  • Clean text:
    • Lower all text
    • Remove uncommon signs
    • Expand abbreviations
    • Correct misspelled words
    • Remove punctuations
    • Remove emojis
    • Remove stop words
    • Apply lemmatisation
  • Tokenize text data
  • Create Embedding Vector using Glove.6B
  • Train a Recurrent Neural Network (RNN) with a Bidirectional LSTM layer

Usage

Navigate to the source directory to execute the following source codes.

  • To generate the model on your own, run

python3 model_training.py

  • You can also provide your own CSV data:

python3 model_training.py --data=csv_file_location

  • To evaluate any dataset using the pre-trained model (in the model directory), run

python3 model_evaluation.py

Note that, for evaluation, model_evaluation.py will use the test.csv and test_labels.csv (inside data directory).

Alternatively, you can find the whole analysis in the notebook inside the notebook directory. To open the notebook, use either jupyter notebook or google colab or any other IDE that supports notebook feature such as PyCharm Professional.

Web Application

To run the web application locally, go to the webapp directory and execute:

python3 web_app.py

This will start a local server that you can access in your browser. You can type in any comment and find out what toxicity the model determines.

You can, alternatively, try out the hosted web application here.

Developer

Baishali Dutta ([email protected])

Contribution contributions welcome

If you would like to contribute and improve the model further, check out the Contribution Guide

License License

This project is licensed under Apache License Version 2.0

comments-toxicity-detection's People

Contributors

baishalidutta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.