Giter Site home page Giter Site logo

hugojf / twitch-clip-downloader Goto Github PK

View Code? Open in Web Editor NEW
35.0 5.0 5.0 8.68 MB

[DONE] NodeJS tool to download every clip (and it's metadata) from a Twitch channel

Shell 0.32% TypeScript 99.68%
twitch clips twitch-api-v5 twitch-downloader nodejs youtube-dl

twitch-clip-downloader's Introduction

Twitch Clips (and VODs) Downloader

codecov

workflow

NodeJS tool to batch download clips and VODs (and it's metadata) from a Twitch channel.

This tool can PROBABLY download ALL clips from a channel (not only the top 1000). At this point in time this tool has been tested on multiple big channels and seems to be able to get all clips (433k clips from hasanabi).

In order to maximize clip coverage, this tool will not allow Twitch API to report more then 500 clips in a single period. Pagination beyond this point is unreliable (caps around 1k clips but varies alot). To fix this, periods with more than 500 clips, will be split in 2, and the process will restart until a single period reports less than 500 clips.

State of the project

This project is not abandoned but at the same time not being actively developed because of my time constraints.

I realized the project grew beyond the scope of its name: a batch clip downloader, and figured I needed to re-organize everything into more manageable pieces. I'm still figuring out what the final plan of attack will be, for now this is what I'm planning:

Export core functionalities into a separate package

This is mostly done by now, but was needed to keep user stuff from developer stuff. This also allows me to focus on keeping the core functionalities up-to-date and frequently tested and also share the most important code between all the tools

Make this project more usable

Currently this tool will only download EVERYTHING from a channel, and this is not the most common use-case (even for me). I plan on adding things like: download single VOD/clip, download from list of URLs, filters, a better CLI, etc.

A GUI version

Since most users are scared of the CLI, I want to implement a GUI using Electron to this project more accessible and user-friendly.

A VOD player

This tool is also capable of downloading the entire VOD chat from Twitch, allowing a player to replay the entire chat just like you can for VODs that are still available.

Proper documentation

The ultimate plan is to turn the core functionalities package into the swiss-knife of tools for Twitch media related backups, allowing any developer to easily write their own backup/download tool without having to worry about requests, multiple connections, API auth, fetching VOD .m3u8 playlists, etc

Dependencies

  • NodeJS - used to run this tool;
  • Python - used to run youtube-dl;
  • ffmpeg - used to transcode VODs from .ts to .mp4;
  • NPM or Yarn - to install dependencies;
  • Twitch App Client-ID and Client Secret (explained below) - to access Twitch's API.

How to use

Create an app on Twitch Console

Register an application on Twitch Console, click Manage and copy the Client ID and generate a Client Secret.

Install NodeJS dependencies

Run this command on your console:

npm install
Run via NPM

Run the script via NPM with (this is needed to get dotenv loaded):

npm run start
Prompts

Every information needed will be prompted on startup via de terminal.

Each time you run this script, it will ask you for a channel name, and then confirm if you want to download everything.

Environment Variables

Here are the descriptions for each variable:

  • DEBUG: print a fuck-metric-ton of information, just keep it false for normal use;
  • CLIENT_ID: Twitch API Client ID;
  • CLIENT_SECRET: Twitch API Client Secret;
  • BASEPATH: where files (clips, VODs, fragments) should be stored;
  • YOUTUBE_DL_PATH: where youtube-dl executable is located;
  • VIDEOS_PARALLEL_DOWNLOADS: how many VOD fragments should be downloaded at the same time.
  • CLIPS_PARALLEL_DOWNLOADS: how many clips should be downloaded at the same time;
  • BIN_PATH: path where binaries will be stored;
  • DEFAULT_PERIOD_HOURS: default period size in hours (12 is a good number for big channels. Lower this to avoid period splitting, increase this to reduce API counts and speedup URL fetching).

twitch-clip-downloader's People

Contributors

hugojf avatar masterl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

twitch-clip-downloader's Issues

Cache requests from older periods

After the update I released attempting to circumvent Twitch's weird API pagination that lead to lots of clips missing from bigger channels, fetching the every single clip from a big channel started taking a long time.

The idea is to locally cache requests for clips from periods older than the current day, that way the tool can be restarted without having to paginate the entire API again.

Control Twitch API requests with rate-limit headers

There are a few channels that Twitch API will respond very quickly (mostly popular streamers), and will exaust the 800 requests per minute limit and throw a bunch of errors.

This tool should be updated to either handle API rate-limiting errors and try again after the reset timer, or start rate-limiting requests the closer it reaches 0 requests remaining.

Download VODs, highlights and uploads

Since lots of streamers are trying to download VODs and stuff before deleting them to avoid DMCA, I've realized that this tool could be easily updated to also support VOD, highlights and upload downloads.

This would take a bit of work but should be pretty straight-forward.

If you think VOD downloading could be useful, please +1 this issue as a feedback.

Error downloading

I'm not sure what went wrong here. I was able to scrape the clips just fine in the first half but downloading part gave an error.

✔ Found 56940 clips to download, download now? … yes
 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% | ETA: 0s | 0/56940(node:41941) UnhandledPromiseRejectionWa'
    at Object.openSync (fs.js:461:3)
    at Object.writeFileSync (fs.js:1387:35)
    at /home/user/clips/twitch-clip-downloader/clip-downloader.js:30:12
    at new Promise (<anonymous>)
    at downloadClip (/home/user/clips/twitch-clip-downloader/clip-downloader.js:10:12)
    at process (/home/user/clips/twitch-clip-downloader/clip-downloader.js:38:15)
    at /home/user/clips/twitch-clip-downloader/node_modules/tiny-async-pool/lib/es7.js:18:44
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async asyncPool (/home/user/clips/twitch-clip-downloader/node_modules/tiny-async-pool/)
    at async startDownload (/home/user/clips/twitch-clip-downloader/clip-downloader.js:46:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:41941) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either)
(node:41941) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, p.




Error: ESOCKETTIMEDOUT
    at ClientRequest.<anonymous> (/home/user/clips/twitch-clip-downloader/node_modules/reques)
    at Object.onceWrapper (events.js:421:28)
    at ClientRequest.emit (events.js:315:20)
    at TLSSocket.emitRequestTimeout (_http_client.js:768:9)
    at Object.onceWrapper (events.js:421:28)
    at TLSSocket.emit (events.js:327:22)
    at TLSSocket.Socket._onTimeout (net.js:480:8)
    at listOnTimeout (internal/timers.js:549:17)
    at processTimers (internal/timers.js:492:7) {
  code: 'ESOCKETTIMEDOUT',
  connect: false
}
Error: ESOCKETTIMEDOUT
    at ClientRequest.<anonymous> (/home/user/clips/twitch-clip-downloader/node_modules/reques)
    at Object.onceWrapper (events.js:421:28)
    at ClientRequest.emit (events.js:315:20)
    at TLSSocket.emitRequestTimeout (_http_client.js:768:9)
    at Object.onceWrapper (events.js:421:28)
    at TLSSocket.emit (events.js:327:22)
    at TLSSocket.Socket._onTimeout (net.js:480:8)
    at listOnTimeout (internal/timers.js:549:17)
    at processTimers (internal/timers.js:492:7) {
  code: 'ESOCKETTIMEDOUT',
  connect: false
}
Error: ESOCKETTIMEDOUT
    at ClientRequest.<anonymous> (/home/user/clips/twitch-clip-downloader/node_modules/reques)
    at Object.onceWrapper (events.js:421:28)
    at ClientRequest.emit (events.js:315:20)
    at TLSSocket.emitRequestTimeout (_http_client.js:768:9)
    at Object.onceWrapper (events.js:421:28)
    at TLSSocket.emit (events.js:327:22)
    at TLSSocket.Socket._onTimeout (net.js:480:8)
    at listOnTimeout (internal/timers.js:549:17)
    at processTimers (internal/timers.js:492:7) {
  code: 'ESOCKETTIMEDOUT',
  connect: false
}

Unhandled YoutubeDL error forcing application to break

Been playing around and have run into this problem multiple times. When downloading clips I would run into an unhandled error from YoutubeDL (See attached image for the exact error). Whenever this happens, it forces the application to exit. After digging through the code it seems that there is nothing setup to catch the error thrown by YoutubeDL. Perhaps this is in the making?

I've checked the URL the error is referring, and can watch the clip without issues.

image

Downloading personal clips

Hello. I am very poor at programming, so do not scold me!

Is there a technical possibility to implement a similar script, but for clips made by a specific user (for example, by me)?
The fact is, I want to download my clips, but the limit of 1000 clips prevents me from doing this.

don't work?

Hello, i have problem.

I create ttv app, add client id like 6x13o4tm3h7vypndoxh******** and when i run script:

✔ What channel do you want to download clips from? … nameofchannel
⠋ Paginating API, please wait...No client id specified

i run DEBUG=TRUE, create more twitch app's and still problem.

Only getting ~1k clips

I see your warning regarding only getting ~1,000 clips and this is the behavior of simply asking for the clips of a user. To mitigate this, what you can do is request between time frames using started_at and ended_at.

The seconds on a timestamp will be ignored, & as well Twitch rounds to the nearest 10 minutes when querying for the range.

The default for ended_at is 7 days from the started_at timestamp, however, I found that the only way to not miss clips is handle on a 1-3 day range.

I've tested this technique with summit1g and using the default 7 day range resulted in 4,116 clips, where using a 3 day range resulted in 13,612 clips (1 day range was also 13, 612).

I hope this information helps you in getting your tool capable of getting all clips!

Update README

After the convertion to Typescript and inclusion of Laravel Mix, the current documentation is very outdated and will not work.

Compress cache

If default period length is too small there's a chance cache+log writes could slow down the entire process.

Output to file list

I want to archive a channel but it's showing 56,000 available clips and I'm worried it might take up more space than I physically have available to download. Could it be possible to export a file list instead of downloading directly through the app in youtube-dl? I was hoping to manually run youtube-dl so it can skip ones that already exist but it would need to crawl everytime with twitch-clip-downloader.

Add GitHub Actions tests

I'm not sure if there's a good way to simulate user interaction, but it shouldn't be very hard to test this tool.

Store VOD chat

Once VOD downloading is added, also support downloading the chat information.

This Gist should point to the right direction.

Default Export not found in Clip-Url-Fetcher.ts

After running npm install and npm run start I get the following error

image

Any suggestions on how to fix would be appreciated. It seems that it is not properly finding the default import for youtubedl.ts in clip-url-fetcher.ts

TypeScript update completely broke it

With your new commits into master, the repository is broken and it won't run. Looks like you just started looking into how to convert things from JavaScript to TypeScript, which is good, but it's definitely not tested as far as I can see.

In your package.json file, npm run start will run node build/index.js. However, that file does not exist, and there currently are no build steps (like npm run build).

Even if we add "outDir": "build" to tsconfig.json, it does compile to the build directory, but even then, the path to the index.js file is build/src/index.js.

I've fixed the issues and created a pull request. Closing this issue.

API change broke app

I tried to use this software again today after having used it back in June/July but I think a twitch API change has broken the app from working.

I get this error when I try to export from a channel

⠹ Paginating API, please wait...(node:1355) UnhandledPromiseRejectionWarning: Error: Request failed with status code 401
    at createError (/path/twitch-clip-downloader/node_modules/axios/lib/core/createError.js:16:15)
    at settle (/path/twitch-clip-downloader/node_modules/axios/lib/core/settle.js:17:12)
    at IncomingMessage.handleStreamEnd (/path/twitch-clip-downloader/node_modules/axios/lib/adapters/http.js:236:11)
    at IncomingMessage.emit (events.js:326:22)
    at endReadableNT (_stream_readable.js:1252:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:1355) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
(node:1355) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
⠏ Paginating API, please wait...

JavaScript heap out of memory

I am having difficulty running a full channel archive of clips. Tried to get it to run through on two different instances (VPS on cloud providers, 4GB ram each) but it failed both times at around the same page number. I am trying the username hasanabi but it doesn't get past page 38, it would seem. I tried this on node 10, 12 and I believe 14, all had some weird issues, like this.

Attaching hopefully the stack trace

<--- Last few GCs --->

[48381:0x9819b0]  3065294 ms: Mark-sweep 1221.7 (1284.7) -> 1221.7 (1253.7) MB, 432.2 / 0.0 ms  (average mu = 0.400, current mu = 0.000) last resort GC in old space requested
[48381:0x9819b0]  3065727 ms: Mark-sweep 1221.7 (1253.7) -> 1221.7 (1253.7) MB, 432.7 / 0.0 ms  (average mu = 0.246, current mu = 0.000) last resort GC in old space requested
<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x3e3a60c53601]
Security context: 0x1408f4c2ee11 <JSObject>
    1: byteLength(aka byteLength) [0x16c33c197e89] [buffer.js:~508] [pc=0x3e3a612623f7](this=0x20a881e825d9 <undefined>,string=0x07cc5197a6a1 <Very long string[114600168]>,encoding=0x1408f4c72a51 <String[4]: utf8>)
    2: arguments adaptor frame: 3->2
    3: fromString(aka fromString) [0x16c33c197d49] [buffer.js:~333] [pc=0x3e3a61264890](this=0x20a881e825d9 ...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 0x7f05f251746c node::Abort() [/lib/x86_64-linux-gnu/libnode.so.64]
 2: 0x7f05f25174b5  [/lib/x86_64-linux-gnu/libnode.so.64]
 3: 0x7f05f2743e6a v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/lib/x86_64-linux-gnu/libnode.so.64]
 4: 0x7f05f27440e1 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/lib/x86_64-linux-gnu/libnode.so.64]
 5: 0x7f05f2adec66  [/lib/x86_64-linux-gnu/libnode.so.64]
 6: 0x7f05f2af2a37 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/lib/x86_64-linux-gnu/libnode.so.64]
 7: 0x7f05f2abea8d v8::internal::Factory::AllocateRawWithImmortalMap(int, v8::internal::PretenureFlag, v8::internal::Map*, v8::internal::AllocationAlignment) [/lib/x86_64-linux-gnu/libnode.so.64]
 8: 0x7f05f2ac7178 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/lib/x86_64-linux-gnu/libnode.so.64]
 9: 0x7f05f2c072ad v8::internal::String::SlowFlatten(v8::internal::Handle<v8::internal::ConsString>, v8::internal::PretenureFlag) [/lib/x86_64-linux-gnu/libnode.so.64]
10: 0x7f05f274e2d8 v8::String::Utf8Length() const [/lib/x86_64-linux-gnu/libnode.so.64]
11: 0x7f05f2532115  [/lib/x86_64-linux-gnu/libnode.so.64]
12: 0x3e3a60c53601 
Aborted (core dumped)

full-log-twitch.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.