Giter Site home page Giter Site logo

a3-hcds-hcc-bias's Introduction

A3-hcds-hcc-bias

The goal of this project is to explore the concept of bias through data on Wikipedia articles. The project focuses on articles on political figures from a variety of countries. The analysis performed shows the coverage of politicians on Wikipedia and the quality of articles about politicians between countries.

Data sources

As data source one API and two existing datasets are used.

1. The ORES API (documentation, endpoint)

The ORES API is a service that provides information about the quality of revisions of Wikipedia articles.

2. A dataset of Wikipedia articles (documentation, download)

This dataset contains data on most English-language Wikipedia articles within the category "Category:Politicians by nationality". It was published by by Os Keyes and licensed under the CC-BY 4.0.

3. A dataset of country populations (documentation, download).

This dataset includes information about the population of countries at the end of the year 2019. Note: the downloaded file was edited before. The resulting file can be found here: src/_data/export_2019.csv.

Licensing

For the ORES API and the country population dataset no licensing was found. So please make sure you are useing this data sources properly. All resulting datasets follow the same licensing policy as the Wikipedia articles dataset (CC-BY 4.0).

Results

As result, you can find six CSV-formatted data files in the folder results.

Content

  1. country_coverage_data_top_10.csv: The countries with the greatest coverage of politicians on Wikipedia compared to their population
  2. country_coverage_data_bottom_10.csv: The countries with the least coverage of politicians on Wikipedia compared to their population
  3. country_relative_quality_data_top_10.csv: The countries with the highest proportion of high quality articles about politicians
  4. country_relative_quality_data_bottom_10.csv: The countries with the lowest proportion of high quality articles about politicians
  5. region_coverage_data.csv: The ranking of geographic regions by coverage of politicians.
  6. region_relative_quality_data.csv: The ranking of geographic regions by proportion of high quality articles

Fromats

Files 1 & 2

column name column description
country Country name
coverage Coverage

Files 3 & 4

column name column description
country Country name
relative_quality Percentage of high quality articles of all articles

File 5

column name column description
region Region name
coverage Coverage

File 6

column name column description
region Region name
relative_quality Percentage of high quality articles of all articles

Getting started

Prerequisites

In order to use this project (espaccilay the jupyter note book), please ensure that you have a Python version greater or equal to 3.6.1, a working installation of Poetry and [git][9] installed.

Setup

  1. Clone this repository (or use SSH) and move it into the repo root.

    git clone https://github.com/marisanest/A2-hcds-hcc.git cd A2-hcds-hcc

  2. Install the dependencies in the repo root.

    poetry install

  3. Create a subshell within the virtual environment by running:

    poetry shell

  4. Open the project with Jupyter in your browser.

    jupyter notebook


a3-hcds-hcc-bias's People

Contributors

marisanest avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.