Giter Site home page Giter Site logo

paolobasso99 / polimi_recordings_downloader Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 1.0 227 KB

Python application to download a batch of lessons recordings from Polimi.

License: MIT License

Python 100.00%
downloader polimi python webeep webex aria2 beutifulsoup typer aria2c

polimi_recordings_downloader's Introduction

Polimi recordings downloader

This Python application is used to download a batch of lessons recordings of Politecnico di Milano.

This app is intended to download a large amount of recordings (i.e. entire courses) in an easy and fast way. If you want to download a single recoring consider using this browser extension.

Table of content

Set up

System dependencies

  • Python
  • aria2: this needs to be in your $PATH (for example, put aria2c.exe inside C:\Program Files\aria2c and add this filder to $PATH)

Python dependencies

  • (Optional) Create a virtual environment: inside the project folder use python -m venv .venv. Activate the environment using .venv\Scripts\activate.bat on Windows or source .venv/bin/activate on Unix/MacOS. See here for more informations about virtual envirorments. If you know how to use Poetry you could use that instead.
  • Install libraries: pip install -r requirements.txt

Usage

Run python -m prd --help for information about usage and additional options.

This app can download the recordings from:

  1. An URL to the recordings archives where there are all the links to the recordigs you want to download
  2. A txt file with the links to the recordings
  3. An URL to a Webeep "Recordings" page where the professor links the recordings.
  4. The URL to a public (no authentication) webpage where the professor links directly the links to the videos (for example, the personal site of the professor)
  5. An HTML file with the direct links to the videos. This is useful when the webpage is behind authentication.

GUIDE 1: Download from recording archives

This mode parses a page from the recordings archives to fetch the download links of the videos.

In order to download a batch of recordings some steps are required:

  1. With your browser open the recordings archives. From the browser copy the SSL_JSESSIONID (domain is www11.ceda.polimi.it) cookie value and set it using: python -m prd set-cookie SSL_JSESSIONID "{COOKIE_VALUE}". SSL_JSESSIONID must be taken from the recordings page, it can change in different pages of the web services.
  2. With your browser open Webex and login. From the browser copy the ticket cookie value and set it using: python -m prd set-cookie ticket "{COOKIE_VALUE}".
  3. With your browser navigare to the recordings archive and search for a course to download. Try to have all the recordings in a single page.
  4. Make sure to have all the recordings you want in the page Open "all" page size in new tab
  5. Copy the current URL and run: python -m prd archives "{URL}"

GUIDE 2: Download from a list of Webex urls or video ids

This mode parses an TXT file with the urls or video ids of some recordings in the format:

  • {VIDEO_ID}
  • https://politecnicomilano.webex.com/politecnicomilano/ldr.php?RCID={VIDEO_ID}
  • https://politecnicomilano.webex.com/recordingservice/sites/politecnicomilano/recording/playback/{VIDEO_ID}
  • https://politecnicomilano.webex.com/recordingservice/sites/politecnicomilano/recording/{VIDEO_ID}/playback
  • https://politecnicomilano.webex.com/webappng/sites/politecnicomilano/recording/{VIDEO_ID}/playback
  • https://politecnicomilano.webex.com/webappng/sites/politecnicomilano/recording/playback/{VIDEO_ID}
  • https://politecnicomilano.webex.com/webappng/sites/politecnicomilano/recording/{VIDEO_ID}

This command supports only downloading one course at the time.

Some steps are required:

  1. With your browser open Webex and login. From the browser copy the ticket cookie value and set it using: python -m prd set-cookie ticket "{COOKIE_VALUE}".
  2. Run python -m prd txt --course="My beutiful course" --academic-year="2021-22" {TXT_FILE}.

GUIDE 3: Download from Webeep "Recordings" page

This mode parses a "Recordings" page where the professor links the recordings.

Some steps are required:

  1. With your browser open Webeep. From the browser copy the MoodleSession cookie value and set it using: python -m prd set-cookie MoodleSession "{COOKIE_VALUE}".
  2. With your browser open Webex and login. From the browser copy the ticket cookie value and set it using: python -m prd set-cookie ticket "{COOKIE_VALUE}".
  3. With your browser navigare to the Webeep recordings section and copy the url of the page.
  4. Run python -m prd webeep "{WEBEEP_URL}".

GUIDE 4: Download from webpage url

This mode parses an URL to a public (i.e. without authentication) HTML page where the professor links directly the recordings.

Some steps are required:

  1. With your browser open Webex and login. From the browser copy the ticket cookie value and set it using: python -m prd set-cookie ticket "{COOKIE_VALUE}".
  2. With your browser navigate to the page where the direct links are placed.
  3. Copy the URL of the page.
  4. Run python -m prd webpage-url --course="{COURSE_NAME}" --academic-year="2021-22" "{URL}".

GUIDE 5: Download from webpage HTML

This mode parses an HTML file where the professor linked directly the recordings.

Some steps are required:

  1. With your browser open Webex and login. From the browser copy the ticket cookie value and set it using: python -m prd set-cookie ticket "{COOKIE_VALUE}".
  2. With your browser navigate to the page where the direct links are placed.
  3. Download the page HTML.
  4. Run python -m prd webpage-html --course="{COURSE_NAME}" --academic-year="2021-22" {FILE_PATH}.

Output

Inside the output folder there will be:

  • A dowaload_links.txt file which is the one fed to aria2. If the option --no-aria2c is used this file will contain a list of download links to be passed to another program (for example, Free Download Manager) to download the recordings.
  • One folder for each course parsed. Inside this folder there will be the recordings and an xlsx file with the recordings metadata (unless --no-create-xlsx is used).

Tips

Retrying downloads without reparsing, directly from dowaload_links.txt

Use the command aria2c --input-file=output/dowaload_links.txt --auto-file-renaming=false --dir=output --max-concurrent-downloads=16 --max-connection-per-server=16.

polimi_recordings_downloader's People

Contributors

ale-meacci avatar paolobasso99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

ale-meacci

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.