Giter Site home page Giter Site logo

yt_overblik's Introduction

YT overblik

This folder includes code to create a tabular dataset from downloaded watch history from YT. It also includes a script to enrich this dataset with more information from YT's API and scripts to create a bar chart race and datsets for tag co-occurence networks.

# To update the list of packages 
conda list --export > package-list.txt

# Newcomers can create their own conda environment and install everything needed like this:
conda create -n myenv --file package-list.txt

# put html file of watch history in raw_data/<folder_number>
# first parse html into simple dataframe
python3 create_df/html_parser.py <folder_number> <OPTIONAL: True if watch history is in danish>

# then enrich dataset with info from API – make sure you have enough credentials for the # of videos in watch history
# for most that should be 3-4 credentials top
python3 create_df/api_info.py <folder_number>

# now you can create a barchart-race with
python3 Analyse/bcr.py <folder_number> <OPTIONAL: n_bars><OPTIONAL: cutoff>
# n_bars define how many bars the graph shows 
# cutoff is how many views a channel needs to have across the whole period to show up in the graph

# and create yearly csv matrices for visualizing tag co-occurence networks
python3 Analyse/network.py <folder_number> <OPTIONAL: another period delimiter instead of years – could be Q for quarters>

These csv matrices needs to be visualized in gephi or another software.

Current directory structure

.
├── Analyse
│   ├── bcr.py
│   └── network.py
├── README.md
├── cleaned_data
│   ├── 01
│   │   ├── 2021.csv
│   │   ├── 2022.csv
│   │   ├── bcr.gif
│   │   ├── history_df.csv
│   │   ├── history_info_df.csv
│   │   └── missing_tags.csv
│   ├── 02
│   ├── 03
│   ├── 04
│   ├── 05
│   ├── 06
│   ├── 07
│   ├── 08
│   ├── 09
│   ├── 10
├── create_df
│   ├── api_info.py
│   ├── clean_json.py
│   ├── html_parser.py
│   └── test.ipynb
├── credentials
│   ├── credentials1.json
│   ├── credentials2.json
│   ├── credentials3.json
│   ├── credentials4.json
│   ├── credentials5.json
│   └── credentials6.json
├── package-list.txt
├── raw_data
│   ├── 01
│   │   └── watch-history.json
│   ├── 02
│   ├── 03
│   ├── 04
│   ├── 05
│   ├── 06
│   ├── 07
│   ├── 08
│   ├── 09
│   ├── 10
└── results
    ├── 01
    │   ├── 2021.pdf
    │   ├── 2022.pdf
    │   ├── bcr.gif
    │   └── bcr.mp4
    ├── 02
    ├── 03
    ├── 04
    ├── 05
    ├── 06
    ├── 07
    ├── 08
    ├── 09
    ├── 10

yt_overblik's People

Contributors

jeppefoldberg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.