Giter Site home page Giter Site logo

walfaelschung / sammlr Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 6.0 146 KB

Python scripts to collect data from the Facebook API and calculate networks of page likes, mutually liked pages, user overlap, and content overlap

Python 100.00%
network-analysis python facebook api graph sna networkx

sammlr's Introduction

Sammlr

Python scripts to collect data from the Facebook API and calculate networks of page likes, mutually liked pages, user overlap, and content overlap

UPDATE February 2018: As the Facebook API no longer seems to allow the unique identification of users, Sammlr is not able to compute a network of user-overlap anymore. Reactions are also not retrieved any more. The only networks you will get and that still work is content overlap, page likes, and common friends. The updated script file is still a bit rough, but should get the job done.

What the scripts does:

  • Sammlr uses the feed of a public Facebook page, to collect data from each post (e.g. status update, photo, video, etc.), all(!) comments made on that posts and all(!) reactions (e.g. like, love, hate, etc.). You can either specify a number of posts to collect or a date range. This data is stored as a comma-separated file on your drive and contains in each row info about the page's ID, page's name, unique user id (user name is not stored for privacy reasons), the type (of post, comment, reaction), a timestamp, and where applicable a permanent link to the post and the posts' or comments' message.
  • So Sammlr is just another Facebook data collection tool, yeah, you got it! But wait, there's more to it: If you choose to collect data on a network of pages, Sammlr will provide the csv files with raw data, but in addition calculate four types of networks between these pages
      1. A network of unique user overlap, in which the weighted edges (links) between each pair of nodes (pages) are the number of users that were active (i.e. posting, commenting, or reacting) on both pages.
      1. A network of unique content overlap, in which the weighted edges (links) between each pair of nodes (pages) are the number of pieces of content (photos, videos, newspaper articles, basically everything with a hyperlink) that were either posted or mentioned in a comment in both of the pages' feed.
      1. Optional (as it takes some time): A network of page likes between the selected pages. This network is unweighted and directed.
      1. Optional (as it takes some time): A network of the overlap of any pages liked by the selected pages or pages, by which the selected pages were liked. This means a network in which the number of common friends (pages) are the weighted edges between each par of nodes.
  • These networks are stored as simple edgelist to be processed with any network analysis package of choice, as well as a *.graphml-file, that can easily be opened with popular free network visualization software like Gephi and is easy to read with common network analysis packages like igraph (R,Python) or networkx (Python).

What you need before you can start:

  • You will need to register as a Facebook developer to gain access to the Facebook Graph API (https://developers.facebook.com/tools/explorer) - from there, you can get the access token needed by the Sammlr application.
  • You will also need the Facebook ID of the Facebook page you want to collect data fro. You can use pages like this (https://findmyfbid.com/) to find the numeric ID of a page.
  • Sammlr is written in Python 3, so make sure you have a Python environment installed

Dependencies

(What libraries Sammlr uses:)

Our scripts are still work in progress and depend on functions from a number of other libraries (most of them pretty common, but if your Python installation is missing any of these, install them before running our script

  • urllib.request
  • json
  • csv
  • re
  • time
  • sys
  • networkx
  • from networkx.algorithms the bipartite functions
  • pandas
  • os

install all via: $pip install request json csv re networkx pandas

or $pip3 install request json csv re networkx pandas

depending on your python installation

Usage

To use Sammlr,

  • get an access token for the Graph-API from https://developers.facebook.com/tools/explorer/ (you can use the temporary access_token from the Graph API Interface for most reasonable sized measurements but for long running tasks might need a permanent token.)
  • open your terminal
  • clone this directory: $git clone https://github.com/walfaelschung/Sammlr
  • open the downloaded directory $cd Sammlr
  • install all Requirements (see above)
  • run Sammlr $python3 sammlr_script_30_11_17.py
  • follow the instructions in the command prompt

Copyrights

What licenses apply:

Contributors:

  • All code written and poorly tested by Matthias Hoffmann with help by machinaeXphilip. For the construction of a network of content overlap between the pages, we rely on a regular expression to detect hyperlinks in strings, that was written by Diego Perini (https://gist.github.com/dperini/729294).

sammlr's People

Contributors

walfaelschung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.