Giter Site home page Giter Site logo

south-african-media-bias's Introduction

South African Media Bias

This project calculates the over or under representation in the media of the three biggest political parties: ANC, DA, and EFF.

It was prompted by a discussion between Helen Zille and Ferial Haffajee on Tea with Helen, where Ms Haffajee claimed, in part, that the EFF is reported on proportionally to their vote share. I wanted to test that claim, and do a bit of "political" data science.

The media outlets examined are the South African news outlets with the highest Alexa rank, which is a fair summary of the South African media. The outlets are News24, IOL, and the Times. Only their politics pages were examined. This research adheres to the ToS and robots.txt of the examined websites.

The politics pages of these outlets were scraped. The number of articles about each political party was recorded. The focus of an article was determined based on the number of mentions of a party on the page. This metric covers the case that seemed to occur often, where the headline is about one party, but the body of the article was about the reaction to the event by a different party.

Results

The results as of 23rd of Decemeber 2019 are below. In all three outlets the EFF are over focused aby nearly double compared to their share of the vote, the DA are proportionally focused, and the ANC are underfocused.

News24

drawingdrawingdrawing

Times Live

drawingdrawingdrawing

IOL

drawingdrawingdrawing

Pages scraped

drawingdrawingdrawing

The raw data used to plot these figures can be found in the *.json files in the results folder.

These tools will not work indefinitely: if any of the websites change, the tools will no longer work.

Prerequisites

All instructions assume you are using Debian/Ubuntu. If you are running Windows or Mac, adjust them accordingly.

python3 -m venv venv
venv/bin/pip install -r requirements.txt
# sudo apt install linkchecker
source venv/bin/activate

Scraping

First run the initialisation tools to get a list of URLs from the politics pages of the respective websites.

python scraper/initialisation_times_live.py

Then run the corresponding scraping tool:

python scraper/scraper_times_live.py

Plot the results with:

python visualisation/visualise.py

Output files

Files are output into the results/ directory, they are:

  • *.urls: These are the initial urls that the scraper will start the search on. Produced by the scraper/initialisation_* scripts.
  • *.json: These are the results files from scraping. Produced by the scraper/scraper_* scripts.
  • *.stats: These are the stats output from scrapy. They aren't really of interest to this project, but are fun to look at. Produced by the scraper/scraper_* scripts.

Auxiliary tools

Some auxiliary tools are provided to help scraping.

auxiliary/json_to_url.py converts a .json results file into a .urls file to prime the scraper tool. Helpful if scraping failed midway, and you have a list of known good urls in the .json file.

auxiliary/combine_urls.py joins together two or more .urls files and removes the duplicates. Useful if you ran auxiliary/json_to_url.py and want to combine the output with the previous .urls file.

auxiliary/json_remove_duplicates.py removes duplicate entries from a json file. Sometimes the .json file will have multiple output blocks in it. No idea why this happens, might be a threading bug in scrapy.

auxiliary/json_diff.py compares two .json files and prints differences. Useful if you have two scrapes of the same site and want to compare them before merging.

auxiliary/json_merge.py merges two .json files and prints them to a new file. Useful if you have a new scrape to merge with an older one. This shouldn't really be necessary.

Thoughts and Future Work

The discontinuous vote share function I have used in the results, may be the wrong approach, a continuous straight line or a spline may be a better approach. Voter sentiment is not discreet, but continuous. This may be a better way to moddel voter sentiment.

south-african-media-bias's People

Contributors

igniparoustempest avatar

Stargazers

Matthew Sainsbury avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.