Giter Site home page Giter Site logo

newz-aware's Introduction

newz-aware

The growing problem in today's media is that almost every media outlet has some side of political bias to it. There is hardly any media website/channel that is 'neutral' in the true sense. Each one of them is split between either Right or Left. And this really creates an issue for the readers who are not, at the first glance, aware of the biases. This not only causes misunderstanding but also, on its extreme, can cause situations like riots.

Our idea was to help a reader understand and be aware of the kind of bias they are consuming. To do this, we thought of using predictive modeling to detect the bias of media sources. The data that we are using is centric to the political discourse in the USA. It is a crowdsourced dataset that has opinions of both sides (democrats and republicans) on a number of news articles. After building the model. We built a Flask App which fetches the latest news updates from News API (newsapi.org) and displays it along with the bias predicted.

For now, we don't believe it's perfect and we hope to improve and make it better in the future. Possible improvements could be, a more accurate model, a better UI that may also have separate categories for news portals, or different categories for according to the biases.

Model

The Colab notebook can also be seen here: https://colab.research.google.com/drive/1jJodHw2iRdnTw_c0obmLNZPg1fPbNlcn?usp=sharing

Dataset for bias detection: https://deepblue.lib.umich.edu/data/concern/data_sets/8w32r569d?locale=en

There were seven fields in the dataset:

1) Url
2) News Type: Has three possible values: other (remember that the articles are sampled from those that are predicted to be political based on our classifiers and so there are false positives we remove based on this label), News, or Opinion.
3) Perceived (whether the worker was looking at the blinded or unblinded version. perceived=1 means unblinded version)
4) Primary topic identified by the worker (If "None", the primary topic is not captured by our list of 14 topics)
5) Secondary topic (If "None", there is no secondary topic or the secondary topic is not captured by our list)
6) Democratic party  vote
7) Republican vote

Most of which were not of use to our purpose so we remove all columns except Url, democrat.vote, republican.vote

The votes have 5 possible values ['Negative', 'SomewhatNegative', 'Neutral', 'SomewhatPositive', 'Positive'] which we then label as [-2,-1,0,1,2]

Both of the votes are then merged into a single field in the following manner: bias = democrat.vote - republican.vote So a more positive value points to a democrat bias and vice versa. So now the new labels were like this: [-4,-3,-2,-1,0,1,2,3,4] We, then, again relabeled them as [0,0,0,1,1,1,2,2,2] where 0 corresponds to Right bias, 1 corresponds to Neutral and 2 corresponds to Left bias. The model is then trained using these three labels.

Setting up:

  1. We recommend creating a virtual environment using virtualenv

  2. Then all the requirements must be installed using pip3 install -r requirements.txt

  3. You may have to download nltk.corpus.stopwords seperately. Use the following code: import nltk nltk.download('stopwords')

  4. You may also have to download en_core_web_sm model seperately. Use the following code: python -m spacy download en_core_web_sm

  5. One must run the app.py to get the website running. The rest of the directory structure is as follows:

    static/ and template/ folders are for the Flask app.

    data.py contains code to preprocess and scrape the dataset.

    predicter.py is the prediction module that can be used to predict for new data.

    naiveBayesModel.pkl and tfidf_for_bias.pkl are the saved prediction model and TF-IDF vectorizer, respectively.

    TrainingData/ contains the data used for training, both raw and processed.

    NaiveBayesmodelPrep.ipynb is the notebook used for training the model on colab.

newz-aware's People

Contributors

actions-user avatar visheshks04 avatar ambuj-1211 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.