Giter Site home page Giter Site logo

daconjam / detecting_gender_bias Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 90 KB

The primary goal of this project is to detecting and examining gender bias in news articles

Home Page: https://www.cse.msu.edu/~daconjam/

License: MIT License

Jupyter Notebook 100.00%
gender-detection gender-bias gender-classification gender-prediction gender-equality gender-recognition dataset bias news-articles

detecting_gender_bias's Introduction

Does Gender Matter in the News? Detecting and Examining Gender Bias

This repository contains code to reproduce analysis in "Does Gender Matter in the News? Detecting and Examining Gender Bias" by Jamell Dacon and Haochen Liu, as part of the The World Wide Web Conference 2021 -- WWW'21.

If you publish material based on material and/ or information obtained from this repository, then, in your acknowledgements, please note the assistance you received from utilizing this repository. By citing our paper and dataset repository as follows below, feel free to star GitHub stars and/or fork Github forks both repositories so that academics i.e. university researchers, faculty and other scientists may have quicker access to the available datasets and large lists of gender-specific and gender-neutral possessive nouns, and career and family words. This will aid in directing others in obtaining the same datasets and files, thus allowing both replication and improvement of detecting, examining and mitigating gender bias.

Addition Information: Correspondence

Personal Page: Portfolio

Citation

Paper BiBTeX citation:

@inbook{10.1145/3442442.3452325, author = {Dacon, Jamell and Liu, Haochen}, title = {Does Gender Matter in the News? Detecting and Examining Gender Bias in News Articles}, year = {2021}, isbn = {9781450383134}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3442442.3452325}, abstract = {To attract unsuspecting readers, news article headlines and abstracts are often written with speculative sentences or clauses. Male dominance in the news is very evident, whereas females are seen as “eye candy” or “inferior”, and are underrepresented and under-examined within the same news categories as their male counterparts. In this paper, we present an initial study on gender bias in news abstracts in two large English news datasets used for news recommendation and news classification. We perform three large-scale, yet effective text-analysis fairness measurements on 296,965 news abstracts. In particular, to our knowledge we construct two of the largest benchmark datasets of possessive (gender-specific and gender-neutral) nouns and attribute (career-related and family-related) words datasets1 which we will release to foster both bias and fairness research aid in developing fair NLP models to eliminate the paradox of gender bias. Our studies demonstrate that females are immensely marginalized and suffer from socially-constructed biases in the news. This paper individually devises a methodology whereby news content can be analyzed on a large scale utilizing natural language processing (NLP) techniques from machine learning (ML) to discover both implicit and explicit gender biases. }, booktitle = {Companion Proceedings of the Web Conference 2021}, pages = {385–392}, numpages = {8} }

Datasets

We conduct experiments to detect and examine the bias across the two datasets. We will now detail the datasets as follows:

MIND

The MIND dataset was collected from the Microsoft News website, for more detailed information about the MIND dataset, you can refer to the following paper: MIND paper, (Wu et al., 2020). They randomly sampled news from from October 12 to November 22, 2019 for 6 weeks creating two datasets i.e., MIND and MIND-small both totalling in 161,013 news articles. Each news article contains a news ID, a category label, a title, and a body (url); however, not every article contains an abstract resulting in 96,112 abstracts. We used the training set (largest set of news articles) since both the validation and test sets are subsets of the training set. MIND is created to serve as a new news recommendation benchmark dataset.

NCD

The NCD dataset was collected from Huffpost. The news articles were sampled from news headlines from the year 2012 to 2018 totalling in 202,372 news articles. Each news article contains a category label, headline, authors, link, and date; however, not every article contains a short description (abstract) resulting in 200,853 abstracts. NCD serves as a news classification and recommendation benchmark dataset.

Centering Reasonance Analysis

Please refere to Sociology Hacks for a step by step tutorial on Centering Reasonance Analysis (CRA) and mentions the necessary python packages needed for implementation.

In addition, before you download the dataset(s), please read these terms and click below button to confirm that you agree to them for their respective usage licenses, acknowledgments and other details.

detecting_gender_bias's People

Contributors

daconjam avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

vanessaschenkel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.