Giter Site home page Giter Site logo

Comments (9)

mitchdowney avatar mitchdowney commented on September 28, 2024

I'm also considering whether there is a 4th app that needs to be here, podverse-api?

In that case I imagine I would remove all references to models and database stuff in podverse-web and podverse-feedparser, then whenever a database interaction needs to happen, I would make a request to podverse-api.

Decoupling all the db stuff from -web and -feedparser sounds like more work than I prefer right now, buuut if that is a good pattern then I am up for doing it.

from podcast-db.

scvnc avatar scvnc commented on September 28, 2024

In the spirit of open source and reusable tools this is how I'd stab at architecting it

High Level

There is a podcast-db application. It is the authority of having Podcast and Episode models and associated rss links and all that. It knows nothing about MediaRefs / clips / playlists etc. It exposes two APIs, one is a RESTful API and the other is the seuqelize models that it owns. Focus on the latter because it is more relevant for podcast-web.

Podverse-web consumes the podcast-db application. This means exposing an npm package... could use a git:// url for now. Essentially the podcast-db exposes sequelize models that know how to interact with a postgres instance containing all the podcast information. Podverse-web also contains information about MediaRefs and Clips, Playlists, etc. Podverse-api could happen but possibly later. Because of this architecture.. a mediaref wouldn't necessarilly be directly linked to an Episode in the database.. but that's fine for the sake of being more decentralized and decoupled

podcast-db (lower level)

It doesn't need to run on a port or a server like with express/feathers. Really it is a set of routines and some sort of job queue mechanism. These routines would probably be executed as a command line application or something similar. No need to get HTTP involved to invoke these routines (beyond fetching rss feeds and

Routine: update rss feed

Given an RSS feed, update the podcast-db.
Tonnes of logic and conditions would be required here to make it robust. What if the RSS feed is garbage or too big? What if it's a dead link? How should this be reported? What should happen if the episode name is changed? What should happen if the media URL is changed? + many other cases we haven't thought of yet?

Routine: add rss feed

Given an RSS feed, add a podcast/episode to the podcast-db.

Routine: determine which rss feeds need to be updated

When executed (perhaps hourly) it should result in adding the set of routines that need to be executed. It could be a query like "give me all podcast rss urls which have a last updated date older than 48 hrs"

job queue mechanism

There are a lot of these. Some you can run on yourself and some you can leverage a cloud service. https://aws.amazon.com/sqs/ There's amazon-- azure has one. Fundamentally it is the orchestrator of a task and ensuring that it is queued up and that it was completed in a robust way. It would take orders to execute routines and make sure that they get executed.

But maybe we don't need to build podcast-db so quickly

Consider using the audiosear.ch api... You can send it a term such as the podcast "Invisibilia" to it's api and it will return a json payload that will have everything we need to get podcast/episode information based on searching for it. It is essentially the podcast-db piece but via a RESTful api

from podcast-db.

mitchdowney avatar mitchdowney commented on September 28, 2024

I just played around with audiosear.ch api, and one issue I have is a few podcasts I listen to a lot (Rubin Report, #WeThePeople Live, Peace Propaganda) are not in the system. It looks like this doesn't have to be a show stopper though, as the website takes suggestions:

image

When I query for a podcast by ID (887 for "Waking Up with Sam Harris"), I see 39 episode IDs, but there are 61 episodes in Sam Harris's RSS feed, so Audiosear.ch apparently can return inaccurate data.

Furthermore, I do not see a way to request both a podcast AND have its episode information included. It appears we need to individually query for each episode by its ID if we want to know anything about the episode, such as its title or lastPubDate. This seems to me like it would be unusable for the Podcast page in Podverse.

Unless there is a workaround for handling displaying all episodes that I am not thinking of...

If there are not workarounds for these issues, then I am leaning towards building podcast-db...

from podcast-db.

mitchdowney avatar mitchdowney commented on September 28, 2024

Ooo Iiiii seee now. After rereading your proposal, audiosear.ch sounds very viable to me. Why run all these RSS parsing jobs ourselves, when someone else is already doing it? Audiosear.ch can take a huge load off our backs for the podcasts they support.

I'd still like to have our own RSS feed parser though, that we use just to fill in the gaps for feeds that audiosear.ch doesn't provide yet (like Waking Up with Sam Harris, and Peace Propaganda). By drastically limiting the number of feeds we parse ourselves, our feed parser robustness should be more manageable...

Whooops didn't mean to press close

from podcast-db.

scvnc avatar scvnc commented on September 28, 2024

There may be some dealbreakers for sure. Having a podcast/episode in one request is not one of them-- it's not a big deal in the grand scheme of things to have that as two requests at the moment. Not having control of which feeds show up is another.

It could be that audiosear.ch has strict standards to ensure their clean database. And so they gatekeeper rss feeds.

It is an illustration that this piece really is a whole other app sort of unrelated to the core podverse mvp. I'd be happier with a micro-service running on lambda or something that took an RSS feed, converted it into json, and shoved it back to the client to store in local storage. No need to maintain a database of podcast/episodes.

from podcast-db.

mitchdowney avatar mitchdowney commented on September 28, 2024

I hear ya on how not storing podcasts / episodes to db would simplify things greatly.

The only thing is I cannot imagine a UX where I would want to track down an RSS feed link every time I want to use the web clipper. Moreso I can't imagine most of my friends / family ever doing that. I can however imagine myself and others clicking a Search icon, typing in the name of the podcast we are looking for, and then listening to an episode and making clips that way.

Also I want a web clipper with a good UX because I don't want users to be totally limited to iOS (if we were to go solely the mobile app route).

Sooo the reason I'm not going the localStorage route is 100% UX related. If I am misunderstanding and there is a way to accomplish this UX without a podcast-db then I am interested in simplifying things.

In the mean time I'll be working getting podcast-db to work with podverse-web locally today.

from podcast-db.

mitchdowney avatar mitchdowney commented on September 28, 2024

Basically what I am planning on doing today is:

  1. Move podcast and episode models, services, and tests into podcast-db.
  2. Make podcast-db a node_module dependency of podverse-web.
  3. Write a podcast-db script that extracts podcast data from audiosear.ch and stores it in db.
  4. Write a podcast-db script that extracts episode data from audiosear.ch and stores it in db.
  5. Make sure the previous feedparser.js still works for outlier podcasts that audiosear.ch does not support.

from podcast-db.

mitchdowney avatar mitchdowney commented on September 28, 2024

@scvnc podverse-web and podcast-db have been decoupled. podverse-web fires up for me when I run npm start, and all the features seem to be working.

I can populate the db with feeds / podcasts / episodes with this cli command:

node -e 'require("./src/tasks/feedParser.js").parseFullFeedIfFeedHasBeenUpdated("http://joeroganexp.joerogan.libsynpro.com/rss")'

There's some hardcoded db stuff going on in podverse-web and podcast-db that I assume will have to get cleaned up for deployment.

I'm thinking I'll work on the shell scripts and queue stuff next. That stuff is more new territory for me. I'll look up tutorials on SQS and see what I can do...

from podcast-db.

scvnc avatar scvnc commented on September 28, 2024

With SQS and webfaction we would need to have a cron job on webfaction invoke every 5 minutes (or another considered inverval). It would connect to SQS and retrieve one message (which is a task for feedparsing) and then it would do the feedparsing job (add or update feed).

After it's done (or errored!!) it would have to interact with SQS again. If it's done, then it has to tell SQS to delete the message from the queue because it was successfully parsed. If it has errored, then we need to log that somewhere nice and then delete the message from SQS.

The other task is "determine which rss feeds need to be updated"... which should probably be a daily script which combs through the lastUpdated in the podcast's and adds appropriate update jobs to the SQS queue. They will later get picked up by the previously illustrated cronjob.

from podcast-db.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.