Giter Site home page Giter Site logo

banluong / fake-news-classification-app Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 4 MB

Contains exploratory data analysis and machine learning models on real/fake news dataset found on Kaggle.

Jupyter Notebook 99.49% CSS 0.05% HTML 0.04% Python 0.43%
python django exploratory-data-analysis kaggle heroku machine-learning data-science

fake-news-classification-app's Introduction

fake-news-classification-app

Fake news classifier web application now available! Click here!

Introduction

This is an end-to-end data science/machine learning project exploring a fake news dataset with exploratory data analysis, using NLP tools and machine learning to classify fake and genuine news. The model is used in a Django web application where a news article URL is entered as input and predicts whether the article is genuine or fake.

Folders

"0. fake-news-analysis-training" contains exploratory data analysis of the data, model training and evaluation.

"1. hyperparameter-tuning" contains a notebook examining how to hyperparameter tune a Random Forest model. Because of the large dataset used, only a sample (2000 example for each class) is used for this investigation. The current model is NOT tuned, and is up to the user whether to go down this route.

"2. app" contains the web application, using Django and Heroku. Using and virtual environment is highly advisable. See "How to load virtual environment" below for details.

Data

Data used for this project can be found from Kaggle. Credit goes to Clément Bisaillon for creating the dataset.

Data contains over 23k examples of fake news and over 21k examples of genuine news.

How to load virtual environment

When working with web applications, it is important to work within a virtual environment. This is because we require certain modules/libraries to be a specific version for our project which in a way does not affect the local version installed on the computer. The project will have specific versions of libraries that are boxed up and won't affect your computer.

If your virtual environment is not yet installed, run the following command:
pip install virtualenv

Next, in the directory where you are working from, create a virtual environment. For Windows:
virtualenv <ENVIRONMENT_NAME>

Once created, enter in the command line of the root directory:
.\<ENVIRONMENT_NAME>\Scripts\activate

and for Mac/Linux:
source <ENVIRONMENT_NAME>/bin/activate

You can tell you're in the virtual environment where at the beginning of the directory you see it in brackets (ENVIRONMENT_NAME)

Once you have the virtual environment up and running, you can go ahead and install the dependencies. This is done by running the following command: pip install -r requirements.txt

You can see what's installed by running pip freeze or pip list.

When you finish working within the environment, you can deactivate just by entering deactivate in the command line.

Running the Django application

In the root of the application directory where manage.py is located, run the following in the command line (and while in the virtual environment): python manage.py runserver

This will run the Django application, and you can view this by entering in the address bar of a web browser localhost:8000.

Room for improvement

There is room for improvement on the application. The model is by no means perfect and can be updated on a new dataset with current news. The application requries a valid news URL, but breaks if a non-URl is entered. This leads on to further written testing is required to prevent breaking and what-ifs.

Currently, the model is trained only on English language articles, so perhaps more models required for different languages.

Updates

2020/10/02

Fake news classifier web application now available! Updated django files and necessary heroku files available in 2. app folder.

2020/09/20

Added fake news notebook from Kaggle containing exploratory data analysis and machine learning model training, plus the save model pkl file.

2020/09/19

Added Random Forest hyperparameter tuning notebook. Contains RandomSearchCV, GridSearchCV, training with best hyperparameters, and comparison of best to base model.

fake-news-classification-app's People

Contributors

banluong avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.