Giter Site home page Giter Site logo

the-markup / investigation-google-keyword-planner Goto Github PK

View Code? Open in Web Editor NEW
11.0 3.0 2.0 16.79 MB

Materials to reproduce findings in our story, "Google Ad Portal Equated 'Black Girls' With Porn"

Home Page: https://themarkup.org/google-the-giant/2020/07/23/google-advertising-keywords-black-girls

HTML 99.94% Jupyter Notebook 0.06%
google-ads bias-detection google-adwords algorithm-auditing

investigation-google-keyword-planner's Introduction

Google Keyword Planner

This repository contains materials reproduce the findings featured in our story, "Google Ad Portal Equated 'Black Girls' With Porn" from our series, Google the Giant.

Screenshots and figures from our story can be found in the data folder.

Jupyter notebooks used for data preprocessing and analysis are avialble in the notebooks folder.

๐Ÿ’ก Disclaimer: This repository contains code and data with explicit and graphically sexual language.

Installation

pip install -r requirements.txt

Data

Where the raw inputs and intermediaries are stored.

data/
โ”œโ”€โ”€ input
โ”‚ย ย  โ”œโ”€โ”€ browser
โ”‚ย ย  โ”œโ”€โ”€ raw-exports
โ”‚ย ย  โ””โ”€โ”€ screenshots
โ”œโ”€โ”€ intermediary
โ”‚ย ย  โ”œโ”€โ”€ all-keywords.csv
โ”‚ย ย  โ”œโ”€โ”€ keywords-labelled-as-adult.json
โ”‚ย ย  โ”œโ”€โ”€ preprocessed
โ”‚ย ย  โ”œโ”€โ”€ websites-from-search.csv
โ”‚ย ย  โ””โ”€โ”€ websites-we-found-to-be-pornographic.csv
โ””โ”€โ”€ output
 ย ย  โ”œโ”€โ”€ volume-of-adult-rec-keywords.csv
 ย ย  โ””โ”€โ”€ volume-of-adult-rec-keywords.png

We have raw exports from Google Keyword Planner in data/input/raw-exports.
The same input is exported with and without the "exclude Adult ideas" filters.
The only column we use is the recommended Keywords column.
Collected July 8-12, 2020.

You can view screenshots from Keyword Planner in data/input/screenshots.
We have two screenshots for a search for "Black girls" with- and without the adult filters.

We preprocess and merge these files in data/intermediary/preprocessed.
Here we add three boolean columns:
Google_Adult - True if Google filtered out the keyword when you "exclude adult ideas".
SERP_Adult - True if the recommended keyword's corresponding Google search is majority self-described pornographic sites.
All_Adult - True if either of the two previously mentioned bolumns is True.

We have the source code (HTML) of Google search results page (SERP) for all the 1.9K recommended keywords in data/input/browser

We have the 200 most-shared web domains (from the SERPs above) in data/intermediary/websites-labelled-as-pornographic.csv.
We determine which of these sites self identify as pornographic by looking for "porn" in the search listings for each website. We found 132 of these websites to be pornographic.

We have aggregated tables and figures featured in our story in data/output. The table volume-of-adult-rec-keywords.csv contains both counts and percentages of recommended keywords that Google identifies as "adult", which keywords have majority self-described pornographic sites in their search results, and neither adult or pornographic.

Notebooks

If you want to reroduce our results, the notebooks should be run sequentially.

0-search-analysis.ipynb

Gets the top-shared domains from the 1.9K keywords recommended by Keyword Manager. Determines how many recommended keywords' search results contain links to self-identified pornographic sites

Links: GitHub | nbviewer

1-analysis.ipynb

For each of our eight inputs, we get the count and percentage of recommended keywords which Google claims are "Adult", and which keywords we found to be pornographic. This is also where the figure featured in our story is produced.

Links: GitHub | nbviewer

Licensing

Copyright 2020, The Markup News Inc.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

investigation-google-keyword-planner's People

Contributors

yinleon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.