The diverse-crowd from selwyth

diverse-crowd's Introduction

Installation

pipenv install if using pipenv

pip install -r requirements.txt if not

Run

python train.py --help for options

Use python train.py --filename test --refresh as an example.

--filename specifies what name to save the files as. These files will all be saved to the static folder with the following convention:

: a binary file of tweets in raw-text form
.model : a gensim-model pickle, this is saved but not used later (but can be used later if you want to feed more raw data into the model)
.kv : a gensim-model KeyedVector instance that is loaded in order to calculate vectors from input words

--refresh / --no-refresh is an option that speeds things up and saves money by calling from the local file instead of the Twitter API. If you put --refresh, it will definitely call Twitter's API; if you put --no-refresh, it'll scan your static folder for , and if available, load it up to proceed; if not available, it will do the same as --refresh.

--word_vectors is an option that can be supplied in addition to --filename. Setting this option with a model name provided here will result in the corresponding model being downloaded from gensim, instead of training a model using tweets from Twitter API or cached data. The tweets data will still be used to calculate user similarity etc, but not used to train a model.

Example: python train.py --word_vectors glove-twitter-25 --filename test

Model Development Ideas

More users, more tweets beyond most recent 20
Use bigrams
Tweaking gensim hyperparameters e.g. min_count
Clean out more stopwords like 'and', 'or', 'I'll'
Use representative words, phrases and tweets instead of representative users
Label dimensions like liberal/conservative, pro-life/pro-choice, crypto/anti-crypto

Recommend Projects

selwyth / diverse-crowd Goto Github PK

diverse-crowd's Introduction

Installation

Run

Model Development Ideas

diverse-crowd's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent