Giter Site home page Giter Site logo

gaurav104 / reddit-flair-identification-flask Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 47.22 MB

Home Page: https://redditflairid.herokuapp.com/

Python 0.07% CSS 0.06% HTML 0.09% Jupyter Notebook 99.78%
reddit reddit-api text-classification scraping praw-reddit praw-api flask sklearn nltk python

reddit-flair-identification-flask's Introduction

Reddit Flair Identification

App home page screenshot

A flask app for flair Identification for r/india subreddit, which takes a r/india posts' URL and predicts the flair of the post. The web-application is hosted on Heroku at https://redditflairid.herokuapp.com/.

Python packages used

  • PRAW
  • Scikit-learn
  • NLTK
  • Numpy
  • Pandas
  • Flask
The requirements.txt file contains all the dependencies used in the notebook and for developing the flask app. 

Directory Structure

  • model: Contains the trained ML model which makes the prediction.
  • notebooks: containes ipynb notebooks of data scrapping, preprocessing, EDA and classification.
  • static: Contains the main.css file, used as for frontend.
  • support: Contains the scripts for prediction and preprocessing of the text data and config.json.
  • templates: Contains HTML files for the web-application.
  • app.py: File to run to start the web application.
  • requirements.txt: dependendancies.
Edit "config.json" and add in your PRAW credentials

Usage

Posts in r/india can be corresponding to multiple topics. Each post is tagged for filtering purposes. These tags are called a flares in the reddit world. r/india has flairs like Politics, AskIndia, Science/Technology etc. The web-application allows the user to enter a r/india URL and displays the predicted flair for the submitted post.

To run on a local server:

  1. Clone the repository.
git clone https://github.com/gaurav104/Reddit-Flair-Identification-Flask.git
  1. Create a virtual environment.
python3 -m venv flair_detector
source flair_detector/bin/activate
cd Reddit-Flair-Identification-Flask/
  1. Install the project dependencies.
pip3 install -r requirements.txt
  1. To run the server locally, execute the following command.
python3 app.py

Approach

In the notebook folder

  • Data Scraping.pynb: Depicts the data scrapping process using Pushshift.

  • Text Preprocessing.ipynb: This notebook describes the data cleaning and the preprocessing, which include steps such as punctuation removal, stopword removal, lemmatization, tokenization, etc.

  • EDA.ipynb: In this notebook, an Exploratory Data Analysis is performed on the cleaned data, we look for average post lengths and number of words present, perform topic modelling using LDA(Latent Dirichlet Allocation) and NMF(Non-negative Matrix Factorization), etc.

  • Classification.ipynb: Performing classification on the pre-processed data, evaluating model's performance, and analysis on the predicted and actual labels.

Future Additions

  1. Improving the prediction by automatic model parameter update, by training on post from r/india.
  2. Incorporating DL models, LSTMs/GRUs, Bert.

reddit-flair-identification-flask's People

Contributors

gaurav104 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.