Giter Site home page Giter Site logo

Comments (14)

jonathanp0 avatar jonathanp0 commented on July 20, 2024 2

Since some Youtube channels simply consist of people talking for long periods of time in front of a camera, it's useful to be able to convert the video to an audio file and listen to it using podcast software. Podsync already has this feature.

from podsync.

amcgregor avatar amcgregor commented on July 20, 2024 2

I found the internal architecture of podsync to be… rather excessively complicated. (A testament to the power of ingenuity, but too complicated for me to be comfortable setting up locally.) DynamoDB? Lambda? Golang! Docker. But also Node… (How many programming languages does one need at a single time? Node does not touch my machines.) I looked at all that, then just dove into the code looking for the ultimate youtube-dl invocation. Then extracted and isolated that, automated using GNU parallel. That Gist also describes the patches made to youtube-dl to avoid excessive numbers of HTTP requests. (E.g. actually sys.exit() when failing a video because it is too old, all subsequent videos will be older.)

I've resolved the issue with playlists being named after their origin channel instead of the actual name of the playlist and will continue to keep this tiny little shell script updated. Also added optional lines for rate limiting, randomized sleep periods, and SOCKS5 proxy configuration; that is, ssh -D 8088 example.com and the proxy would be --proxy "socks5://localhost:8088/". Only real remaining issue, feed thumbnails. With this setup, it's taken YouTube two weeks to begin to throw up Captcha challenges, after ingesting 5,895 episodes totalling 1.6 TiB across 141 channels / playlists. (Each run taking, on average, around 10 minutes, run via cron every few hours.)

from podsync.

amcgregor avatar amcgregor commented on July 20, 2024 1

Direct use of youtube-dl (the command-line program powering all of this media ingest) permits retrieval of just the audio. My little automation script wins again, it already can do this! ;P

Some of us want audio-only…

It really is a bit flabbergasting to be repeatedly asked for something the user already has the ability to do… and search for.

from podsync.

grafmik avatar grafmik commented on July 20, 2024

Hello Max and thanks for your work so far and for your effort to make a self-hosting version !

Just wondering, why the mp3 format? Isn't that for audio only? Do you mean mp4?

Keep it up!

from podsync.

amcgregor avatar amcgregor commented on July 20, 2024

Ah, adding a second comment as it's an important note, my shell script there (basically a text file containing a channel or playlist URL per line…) explicitly gives you control over per-channel quality settings (see line 41; split that up with multiple formats if needed, I do) as well as extended video selection criteria, such as title exclusions (see line 74). (Run youtube-dl -h to see the many, many options available.)

from podsync.

grafmik avatar grafmik commented on July 20, 2024

Hello Alice,

Thanks for this work! I had started to dive into Max code, starting with early commits. Did you know podsync started as a .NET project? :)

I can understand Max used a database because he had to store every user playlist. For a self-hosted, single-user version, generating/serving just one file may indeed actually be a simpler/better solution.
For node I feel you.
As for Docker, it could be a nice feature to add to your code, as this could determine the right environment, especially for versions of parallel and python.

Anyway, I'm grateful because you made me save some time and effort.

from podsync.

amcgregor avatar amcgregor commented on July 20, 2024

especially for versions of parallel and python.

Any version will do:

brew install parallel

Python 3 is already a pretty universal standard; the given code will work with any Python 3.3 or newer, that is, virtually any Python released in the last 10 years in that series. Including the version that comes pre-installed on macOS.

Edited to add: thus, in this particular case, Docker would simplify nothing, and complicate everything. Like a Spartan soldier taking everything and giving nothing.

store every user playlist

On-disk directories are the database, in my case. My Python script and template will transform any directory containing youtube-dl .info.json files into a podcast. (Future improvement: only regenerate the index.xml if there are actually new/updated episodes, but feed generation is so minor compared to content collection, that's a low priority.)

Edited to add: ingest (pull.sh invocations of youtube-dl) are one half of the problem: actually getting the content. A problem tackled entirely separately: turning those collected media files into podcast feeds.

from podsync.

grafmik avatar grafmik commented on July 20, 2024

I understand you want to keep things simple. Loved the 300 reference and can't help imagining Docker as a bare-torso warrior now.

As I already said, didn't read podsync code, but does it store every mp4 on their server?
I'm using your script right now (will also check these youtube channels of yours, just curious).
I see the content (mp4) is directly "youtube-dl-ed" right here on the machine.

I can see the dl.podsync.net/* urls link to googlevideo.com. Is there an upload wrapper somewhere that could avoid using space on the podsync self-hosted server?

from podsync.

amcgregor avatar amcgregor commented on July 20, 2024

…does it store every mp4 on their server?

Yes, as part of the background "updater" process. Where that is Python code, so invokes youtube_dl directly, and my shell script is a shell script invoking the youtube-dl command itself. One layer out. ;)

«googlevideo.com links» … Is there an upload wrapper somewhere that could avoid using space on the podsync self-hosted server?

Well, where youtube-dl by command line, by default, will download the video content, if you are careful to pick a video format that comes "pre-muxed" (that is, audio and video together) you can hypothetically avoid downloading the video and pull the actual origin links from the .info.json for use in the RSS feed. Or, in Podsync's case, after a 302 redirect, likely looking up the local cache status vs. availability from YouTube of the pre-muxed version link.

That's a key difference, I think. I get 1080p episodes, as I re-mux the independent streams. Hypothetically I could choose a 4K --format. (But ye gods, the storage space, then!)

from podsync.

leekillough avatar leekillough commented on July 20, 2024

Just wondering, why the mp3 format? Isn't that for audio only? Do you mean mp4?

Some of us want audio-only, since we like listening to the audio of podcasts posted to YouTube, but we don't have the time to watch the video, since we're doing other things when we listen to the audio, such as driving, or we don't care to see the podcaster's studio, when what they say is more important than their studio.

I consider it a welcome addition.

Self-hosting seems like the way to go too, eliminating single points of failure.

from podsync.

davidAlittle avatar davidAlittle commented on July 20, 2024

Just wondering, why the mp3 format? Isn't that for audio only? Do you mean mp4?

Some of us want audio-only, since we like listening to the audio of podcasts posted to YouTube, but we don't have the time to watch the video, since we're doing other things when we listen to the audio, such as driving, or we don't care to see the podcaster's studio, when what they say is more important than their studio.

I consider it a welcome addition.

Self-hosting seems like the way to go too, eliminating single points of failure.

Second the audio only option. I don't know what APIs you're using, but I can tell you that as a Youtube Red subscriber (actually a Google Play Music subscriber, but that's the same thing now), there is a way to only stream audio, since this is a premium feature specifically offered as part of Red.

from podsync.

mirth avatar mirth commented on July 20, 2024

Is it expected that docker-compose pull produces the following?

ERROR: for api  pull access denied for mxpv/podsync_api, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR: for updater  pull access denied for mxpv/updater, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR: for resolver  pull access denied for mxpv/podsync_lambda, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR: for nginx  pull access denied for mxpv/nginx, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

from podsync.

mxpv avatar mxpv commented on July 20, 2024

CLI docker images are not yet published.

from podsync.

mxpv avatar mxpv commented on July 20, 2024

New functionality, docs and tutorials will be added in follow up PRs.

from podsync.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.