Giter Site home page Giter Site logo

the_facebook_scandal's Introduction

The Facebook scandal

Unipi Python Version Mongodb Version

This repository contains the code base for the Social Network Analysis course hosted by the Master's degree in Computer Science of the University of Pisa.

The case story

On Saturday 17th of March 2018, The New York Times and The Guardian / The Observer broke reports on how the consulting firm Cambridge Analytica harvested private information from the Facebook profiles of more than 50 million users without their permission, making it one of the largest data leaks in the social network’s history.

Cambridge Analytica described itself as a company providing consumer research, targeted advertising and other data-related services to both political and corporate clients. The whistleblower Christopher Wylie, datascientist and former director of research at Cambridge Analytica revealed to the Observer how Cambridge Analytica used personal information taken without authorisation in early 2014 to build a system that could profile individual US voters, in order to target them with personalised political advertisements.

Christopher Wylie, who worked with a Cambridge University academic to obtain the data, told the Observer:

We exploited Facebook to harvest millions of people’s profiles. And built models to exploit what we knew about them and target their inner demons. That was the basis the entire company was built on.

The network

We have considered a network composed by the authors of tweets about the case, during the first period of the scandal outbreak. The data have been collected via the Twitter API and we built the network using the following consecutive steps:

  1. Crawling of all the available tweets over a period of more than 15 days, since the 17th of March, containing at least one of the most popular hashtags regarding the case:
    • #cambridgeanalytica
    • #facebookgate
    • #deletefacebook
    • #zuckerberg
  2. Cleaning of the crawled tweets, by selecting and storing in a MongoDB database only the users informations about the authors of tweets, excluding retweets and mentions.
  3. Selection of the case outbreak time period by observing the time history. The selected time period consists of 8 days, from the 17th to the 24th of March included (considering the Italian timezone).
  4. Crawling of the following list for each of the selected authors, extracting the following/follower relationships.

the_facebook_scandal's People

Contributors

gianmarcoricciarelli avatar stecarp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.