Giter Site home page Giter Site logo

clickbaits_revisited's Introduction

Clickbaits Revisited

This repository provides the code used for : https://www.linkedin.com/pulse/clickbaits-revisited-deep-learning-title-content-features-thakur

Data Collection

To run the code you must first collect the data:

Data Pre-Processing

After the data has been collected, you need to run the following files to obtain training and test data. The order is important!

- $ cd data_processing
- $ python create_data.py
- $ python html_scraper.py
- $ python feature_generation.py
- $ python merge_data.py
- $ python data_cleaning.py

After the steps above, you will end up with train.csv and test.csv in data/

Please note that the above steps will require a lot of memory. So, if you have anything less than 64GB, please modify the code according to your needs.

GloVe embeddings

Obtain GloVe embeddings from the following URL:

http://nlp.stanford.edu/data/glove.840B.300d.zip

Extract the zip and place the CSV in data/

Deepnets

After all the above steps, you are ready to go and play around with the deep neural networks to classify clickbaits

Change directory to deepnets/

cd deepnets/

The deepnets are as folllows:

LSTM_Title.py : LSTM on title text without GloVe embeddings
LSTM_Title_Content.py : LSTM on title text and content text without GloVe embeddings
LSTM_Title_Content_with_GloVe.py : LSTM on title and content text with GloVe emebeddings
TDD_Title_Content_with_Glove.py : Time distributed dense on title and content text with GloVe embeddings
LSTM_Title_Content_Numerical_with_GloVe.py : LSTM on title + content text with GloVe embeddings & dense net for numerical features.

Performance

The network with LSTM on title and content text with GloVe embeddings with numerical features achieves an accuracy of 0.996 during validation and 0.992 on the test set.

All models were trained on NVIDIA TitanX, Ubuntu 16.04 system with 64GB memory.

clickbaits_revisited's People

Contributors

abhishekkrthakur avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.