Giter Site home page Giter Site logo

kalus / toxic-twitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adamwespiser/toxic-twitter

0.0 0.0 0.0 60.66 MB

ToxicSense: A tool to analyze and visualize toxicity in social media

License: MIT License

Python 51.89% CSS 1.85% JavaScript 21.98% HTML 17.85% Makefile 0.06% Jupyter Notebook 6.36%

toxic-twitter's Introduction

ToxicSense

A tool to visualize toxicity in social media.

There are two parts:

  1. Web application + Chrome Extension.

  2. Machine Learning Model for toxicity classification

Document Overview

In this document, we give instructions on how to run the web application locally, as well as train, build, and make sample predictions with our machine learning toxic comment classifier.

Section 1: The Web Application

Quick Start

https://developer.twitter.com/content/developer-twitter/en.html\ Create a dev account and get required API Keys and fill them in ToxicSense/data_fetch_helpers/creds.py

  1. $ cd ToxicSense
  2. Create a python3 virtual environment and activate it. Follow instructions here https://docs.python.org/3/library/venv.html.\ Or if you have virtualenvwrapper,
    $ mkvirtualenv --python=`which python3` toxicsense-venv
  3. $ pip install -r requirements.txt
  4. $ python manage.py migrate
  5. $ python manage.py runserver localhost:8000

Visit http://localhost:8000/

Quick start does not include background processing which stores the retrieved tweets in database for future use. To set it up, follow the detailed installation instructions below.

Full Installation

  1. $ cd ToxicSense
  2. Create a python3 virtual environment and activate it. Follow instructions here https://docs.python.org/3/library/venv.html.
    • Or if you have virtualenvwrapper,
      $ mkvirtualenv --python=`which python3` toxicsense-venv-1
  3. Install the requirements by running
    $ pip install -r requirements.txt
  4. Run the migrations
    $ python manage.py migrate
  5. Install rabbitmq: Follow instructions here https://www.rabbitmq.com/download.html. If you are on mac,
    $ brew install rabbitmq
  6. Start the rabbitmq server in a separate terminal using
    $ rabbitmq-server
  7. Go to ToxicSense folder in a separate tab/terminal(follow step 1) and start the celery worker and beat
    $ celery -A ToxicSense worker -l info -B
  8. Go to ToxicSense folder and start the server. Ignore the warnings about migrations.
    $ python manage.py runserver localhost:8000 --settings=ToxicSense.local_celery_settings
  9. Visit http://localhost:8000/

Note: If you are on a Mac and you see some weird errors about Objective C, you may need to run export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES from your terminal. It has something to do with the twitter scraper not playing well with python3 on MacOS.

Section 2: Machine Learning

Toxicity Prediction

We include instruction for two tasks related to predicting toxicity in tweets. The first, is training the machine learning classifier we use in production and exporting it. The second, is using those saved files to run toxicity prediction from the command line.

Quick Start

We have a trained model saved to immediately test the toxic classifier. To run prediction against a tweet,

$ cd build-model
$ mkvirtualenv --python=`which python3` toxicsense-test-model
$ pip install -r test-requirements.txt
$ python make_prediction.py
for instance you could run,
$ python make_prediction.py "Thats one small step for a man, one giant leap for mankind"

The toxicity, along with whether the tweet is obsene, a threat, an insult, or identity hate will be printed. For each category, the prediction is a number between 0 and 1. Where 0 is not a member of the category, and 1 is likely to be a member of the category

Train our Char-NN Deep Learning Model

  1. Install virtualenv and install necessary libraries.
    $ cd build-model
    $ mkvirtualenv --python=`which python3` toxicsense-train-model
    $ pip install -r train-requirements.txt
  2. We took the data from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data\ Download the file and keep in home folder.
  3. Run jupyter notebook.
    $ jupyter notebook
  4. Run all the cells in the build-and-export.ipynb notebook.
  5. build-and-export.ipynb will export our model into the directory, "ascii-3-model/"

Follow Quick Start to run the model.

Deployment

We currently use AWS ElasticBeanstalk for deployment. Below is the link to the tutorial that was followed to make the deployment.. https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-django.html

First time:
$ eb init -p python-3.6 toxic-sense
$ eb create env

For successive deployments,
$ eb deploy

SSHing into machine

$ eb ssh

To get into the folder: From https://stackoverflow.com/a/20070161:

SSH login to Linux
(optional may need to run sudo su - to have proper permissions)
Run source /opt/python/run/venv/bin/activate
Run source /opt/python/current/env
Run cd /opt/python/current/app
Run python manage.py <commands>

toxic-twitter's People

Contributors

adamwespiser avatar ajaybhargavb avatar dependabot[bot] avatar mukund-murali avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.