Giter Site home page Giter Site logo

abalone23 / pulseoftheland Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 730 KB

Tracking sentiment and topics in the USA

Home Page: https://www.pulseoftheland.com

License: GNU General Public License v3.0

Python 32.80% Jupyter Notebook 35.24% TSQL 1.42% HTML 29.87% Shell 0.67%

pulseoftheland's Introduction

Pulse of the Land

Introduction

Pulse of the Land tracks geographic areas (states and cities) throughout the United States based on sentiment analysis and topic modeling using posts from location-based subreddits on Reddit as well as demographic characteristics such as income and population from the census.

screenprint

Tools

Python

Everything in this project is scripted using Python.

  • GeoPandas
  • Jupyter Notebooks

APIs

  • PRAW API
  • PSAW API
  • Google Maps API

Architecture

  • AWS
    • EC2
    • S3
    • Route 53

Data

Reddit

Data for the sentiment analysis and topic modeling is obtained from city and state location-based Reddit forums aka subreddits throughout the United States via the Pushshift.io API wrapper (PSAW). Only locations with populations over 50,000 and over 1,000 subreddit subscribers are included. Metadata for initial subreddit subscriber count and selection is accessed via the Python Reddit API wrapper (PRAW)

Census

The demographic data comes from:

Mapping

Coordinates are retrieved using Google Maps API via the googlemaps Python client library. (notebook)

Maps are generated using the GeoPandas library. (notebook)

Analysis

Sentiment Analysis

Sentiment analysis is performed using CountVectorizer with VADER.

Topic Modeling

Topic modeling is performed using TextBlob.

Rating System

The rating system uses a propietary score based on the following charactersitics:

  • Sentiment
  • Income
  • Population

Databases

MongoDB

The json files are loaded into MongoDB.

PostgreSQL

The aggregated data including sentiment, topic modeling and scores are stored in PostgreSQL.

Tables

A total of ten tables are used in the PostgreSQL schema. (notebook)

  • states
  • cities
  • topics
  • keywords
  • topics_keywords
  • topics_geo
  • models
  • states_archive
  • cities_archive
  • topics_archive

Web App

pulseoftheland.com is published using the web application framework Flask. It is then scraped internally using wget and the static files are uploaded to a public AWS S3 bucket

Workflow

The above process using the previous three month's data is scheduled to run on a daily basis:

  1. Retrieve latest reddit data
  2. Load into MongoDB
  3. Run sentiment analysis
  4. Run topic modeling
  5. Generate maps

Architecture

The web app runs on AWS.

S3

  • Private S3 bucket stores the Reddit json files
  • Public S3 bucket hosts static HTML files

EC2

  • Python
    • scikit-learn
    • textblob
  • MongoDB
  • PostgreSQL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.