Giter Site home page Giter Site logo

debot's Introduction

Debot

Twitter bot detection using deep learning. This project is structured as following :

Running

python -i App.py (interactive mode to play around with the data)

Disclaimer : Currently missing a LOT of files required for running (eg. datasets, database files, word2vec etc.)

Requirements

  • Tweepy
  • Gensim
  • NLTK
  • Keras
  • Tensorflow

utils

This holds the utilities for the project. Mainly stores datasets and the scripts to extract them. Fetcher.py contains useful functions that is used by different _reader.py's to read from the dataset and store only the users' info into a Sqlite3 database. From there, tweet_saver.py is run which downloads upto 1200 tweets per user. It has all sorts of error handling built in to avoid failure.

keras

Placeholder

models

Currently, 3 models are defined:

  • Tweet
  • User
  • UserMaker

vectorize

This contains the pre trained gLoVe model that is used to get word-vector representations (https://nlp.stanford.edu/projects/glove/). The gLoVe model was converted to a Word2Vec model (easier for processing) using Gensim.

Dataset

Since I had trouble finding suitable datasets for this project, here is whatever I've found :

(Cresci 2017) - https://botometer.iuni.iu.edu/bot-repository/datasets.html

(ASONAM Honeypot 2015) - http://www.public.asu.edu/~fmorstat/bottutorial/ Many of the tweets in this dataset are in Arabic which is a bit disappointing. Also since it is older, a lot of the accounts have already been suspended.

debot's People

Contributors

dominoanty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

shokesu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.