Giter Site home page Giter Site logo

gianscuri / italian_twitch_community_graph_database Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 156.15 MB

Implementation of a graph database of the most popular Italian Twitch streamers based on the number of viewers shared. The information are scraped from multiple sources.

License: MIT License

Python 0.12% Jupyter Notebook 13.53% Cypher 86.35%
gephi-visualizations graph-model mongodb neo4j twitch

italian_twitch_community_graph_database's Introduction

Twitch community graph

Abstract

Twitch.tv is a live streaming platform that allows streamers to broadcast and users to enjoy content in real time. The broadcasts cover various categories related mainly to the world of videogames, entertainment, and the arts. Thanks to its great success, especially in the last few years, both the revenue opportunities for streamers and companies operating in these sectors have increased. Understanding the market and the platform, however, is crucial to discover the interests of users. This project therefore aims to collect and analyze data about the different streams in order to create an explorable and queryable graph model of the communities present thus enabling accurate market analysis.

The project consists in a series of scripts to collect, integrate, analyze and save data from different sources. It is thus a tool that can be run in any time frame to obtain the up-to-date graph of the situation. The data collection phase is done from two distinct data sources: Twitch for live information through the use of the official Web APIs and from SteamDB for videogames informations through dynamic scraping techniques. In the processing phase, the datasets containing the streamers, the different video games streamed, and the related bridge-tables that allow them to be linked are then obtained. The streamer-game relations were calculated by analyzing the broadcast categories, while the streamer-streamer relations were calculated by evaluating the percentage of common viewers between each pair of streamers.

This repository contains data collected over a two-week period in May 2022 regarding all Italian broadcasts on Twitch and data from SteamDB regarding the most played videogames. Approximately 2.5GB of data were collected during this period, which after a detailed analysis allowed the creation of a graph model on the Neo4j DBMS consisting of 4121 nodes and 54931 edges.

Graph visualization on Gephi May 2022

Execution scheme

Pipeline

1. Data Collection

  1. Follow this doc, obtain your Twitch API keys (ClientID and ClientSecret) and paste them in the Twitch_API_keys.txt file
  2. Create a repeated execution task for Twitch_stream_collection.py every xx minutes (Win: Task Scheduler, Linux: Crontab)
    • choose the details (es. language) of the desired streams
    • this script saves the collected stream files in individual json files but it's already supported the upload on MongoDB local server, uncomment the import function in the script (it requires MongoDB Community Server)
  3. Run steam_games_scraping.ipynb to scrape SteamDB website (if the website asks CAPTCHA try to clean browser cookies)
  4. Download the bot-users dataset from Twitch Insights using a browser extension (e.g. Table Capture for Chrome) and save it as Twitch_bot_list.csv
  5. Run Twitch_social_link.py to obtain the streamer's social link (this can be run only after the collecting and processing phases because it requires the complete streamer list)

2. Data Processing

  1. Run DataProcessing.ipynb selecting the parameters for the analysis in the first block:
    • data source (json files or MongoDB local server)
    • set the time interval acquisition (xx minutes)
    • set parameters and thresholds
  2. Run DataEnrichment.ipynb to add games info from SteamDB (verify manually the matches)
  3. Run DataExploration.ipynb and DataQuality.ipynb to obtain data insights

3. Data Modelling

  1. Install Neo4j Community Server
  2. Copy the CSVs obtained from the output_datasets folder to the neo4j import folder (neo4j/import/)
  3. Run graph_neo4j.ipynb to load data in Neo4j
  4. Execute desired queries

4. Data Visualization

  1. Install Gephi
  2. Import Streamer_dataset_short.csv and Streamer-Streamer_dataset_short.csv
  3. Execute some layout algorithms (e.g. Atlas Force), execute statistics analysis to detect communities (e.g. Modularity), edit nodes and edges colors (more details here)

For additional info on the project read ProjectReport_ita.pdf (in Italian)

italian_twitch_community_graph_database's People

Contributors

gianscuri avatar paolaimpi avatar silviagrosso avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.