Giter Site home page Giter Site logo

podcast-db's People

Contributors

mitchdowney avatar scvnc avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

tarsbase

podcast-db's Issues

What are our options for podcast and episode unique ids?

Podverse's current podcast and episode unique id problem:

Today the app uses podcastFeedURLs as the unique ids for podcasts, and episodeMediaURLs (url to mp3/ogg/etc) as the unique ids for episodes.

After having the site deployed and parsing between 50 - 150 podcasts a day over the past 3 weeks, I've seen podcastFeedURLs and/or episodeMediaURLs change for ~5 podcasts. Whenever this happens, I have to manually fix / update the db.

I can live with that rate failure, and can manually fix podcasts if they fall out of sync for now while supporting ~2000 podcasts on the site, but hopefully we find better ways to handle these things...


Potentially options for unique ids that some podcasts currently use:

An example of an ideal guid is in #WeThePeople Live's RSS feed, where all episodes have what I believe to be a proper uuid:

<guid isPermaLink="false"><![CDATA[ c9aa7c12-334b-47fc-8d60-eb28f527c8d0 ]]></guid>

If every RSS feed used a different uuid like that we'd be in great shape, but as of today most do not. Many podcasts use a different guid format, like this one from the Joe Rogan Experience RSS feed:

<guid isPermaLink="false"><![CDATA[483a81100097301f38b7dc15427599ef]]></guid>

Have you ever seen a guid like this? What is this format called? Can we validate it?

Another guid format appears when isPermaLink="true". This gid:// example can be found in the Duncan Trussell Family Hour feed:

<guid isPermaLink="false">gid://art19-episode-locator/V0/Gx0Krxiq-AwcKcvw8RE2g-uWf_9A-iQPRnjqlj_EosQ</guid>:

I think I've seen https urls in there as well, although I don't have an example right now, it'd look something like:

<guid isPermaLink="true"><![CDATA[https://example.podcaster.com/unique123abc]]></guid>

While implementation of these two unique ids is spotty (sometimes people use integers for guids, sometimes people use non-permanent or non-unique urls as the permaLink)...still, it seems worthwhile to me to leverage each of these as unique ids wherever possible, in order to minimize maintenance / tech debt.

Proposed approach for handling unique ids for podcasts and episodes in Podverse

  1. check for a valid uuid in the guid field, if that's not available

  2. check for a valid one of those alternate guids without the hyphens, if that's not available

  3. check for a valid isPermaLink that uses the gid:// protocol, and if none of those are available

  4. check for a valid isPermaLink that uses the https:// protocol, and if none of those are available

  5. check for a valid podcastFeedURL / episodeMediaURL as a last resort.

NOTE: I have seen a feed that included multiple tags per episode, so we should probably store guids in an array, then loop over the values checking for the first match in the order listed above.


Any thoughts on this proposed direction?

Any other ideas for how these podcast feed unique id issues can be ameliorated?

How can we share Sequelize models between podverse-web and podverse-feedparser as separate apps?

First, a rough idea of how I imagine podverse-feedparser working:

  1. podverse-feedparser, podverse-web, and the podverse PostgreSQL database all listen on their separate ports, deployed on their separate servers.

  2. Every few hours or so, a cron job triggers podverse-feedparser to query for all podcast RSS feed URLs in the database.

  3. The parseFeeds method is called with the array of all the feed URLs currently in the db. parseFeeds adds each of these feeds on a queue to be parsed.

  4. The parseFeeds queue runs sequentially, calling the parseFeed method with each URL until finished. As it goes parseFeed writes updated podcast and episode feeds to the PostgreSQL db. (This parseFeed function already exists in podverse-web here.)

I feel confident I can write code to make each of these things happen, but I am not sure how to elegantly reuse the podverse-web repositories/sequelize/engineFactory.js and models in the separate podverse-feedparser app.

I considered using npm install git://podverse-web as a dependency in podverse-feedparser, then somehow loading the models within podverse-feedparser by loading podverse-web files available in node_modules...but I'm not quite sure how I'd do that yet, and I wonder if I'm heading down the wrong path.

Having two separate apps that share a PostgreSQL db is new territory for me. Any tips on how to architect this stuff?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.