Giter Site home page Giter Site logo

readtomyshoe's Introduction

ReadToMyShoe logo: A sneaker wearing a headset with a microphone

ReadToMyShoe

Video Demo

ReadtoMyShoe (RTMS) is a web app that lets you upload articles (via URL or via directly pasting) and listen to them later. Some features:

  • High-quality text-to-speech: RTMS uses the Google Cloud Text to Speech WaveNet voices. It's not quite human yet, but it's pretty nice.
  • Listen as a podcast: To listen to your articles from your favorite podcast app, just add INSTANCE/api/feed.xml.
  • Web version features:
    • Offline-first: All the articles in your queue are available offline. The web version of RTMS is usable even in airplane mode.
    • Saves your progress: Don't lose your place in your reading material. RTMS will save where you are. So next time you play an article, it'll resume right where you left off.
    • Lockscreen controls: Play, pause, jump 10 seconds. It's all available from the lock screen or notification bar of your mobile device.
    • Runs anywhere: Since RTMS is a web app, it runs everywhere a (modern) web browser runs.
    • Add to Homescreen: RTMS can be added to your homescreen and behave just like a native app.

RTMS is written in Rust, using yew for the frontend (compiles to WASM) and axum for the backend.

Usage

To access the web interface, simply navigate to your instance URL in your web browser. From there, you can add articles or listen to them in-browser.

You can also use the podcast interface to listen to articles. Simply add INSTANCE/api/feed.xml to your favorite podcast app (where INSTANCE is your instance's URL).

Limitations

ReadToMyShoe uses some browser features that are new and/or buggy. Some limitations of the web app are:

  • Does not work in private mode. In Firefox and Safari, RTMS will not let you Add to Queue. This is because you cannot touch local storage from a private browsing window.
  • Lockscreen controls are broken in Firefox for Android. You can still play audio in Firefox for Android, but play/pause, seek, and jump buttons are all missing.
  • Add to Homescreen is not very functional in iOS. This is a documented Safari bug. Issue. Just use the website from within Safari.

Accessibility

It is important that ReadToMyShoe be accessible to the visually impaired and others who rely on text-to-speech for reading. If you have an accessibility issue while using ReadToMyShoe, please open up a Github Issue at this link. If you don't have a Github account, please email me at [email protected]

Running your own instance

To set up your own instance of ReadToMyShoe, check out the Getting Started page in the wiki.

Licenses

All code is licensed under either of

at your option.

Images are licensed by Michael Rosenberg under the CC BY 4.0.

Thanks

A lot of the ideas and code in this crate started with Robert Krahn's fantastic template. Thanks

Also, big thanks to my friend Sharon Ye for her immense help in the design of the logo.

readtomyshoe's People

Contributors

rozbb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

seanieb

readtomyshoe's Issues

Add Article creates corrupt file on error

Browser: All browsers

Description: If you add an article and an error occurs during TTS, the corrupt file will remain in the library.

Note: Simple fix is to do everything using a temp file and to move it to the library under the appropriate filename after everything succeeds.

Retry requests when Google TTS fails

Currently, if any TTS request fails, the whole article fails. Re-submitting the article is extremely wasteful. Make it so that individual TTS requests are auto-retried with exponential backoff by the backend.

Support articles that are PDF files

It would be nice if I could upload a PDF file and RTMS would extract and TTS the text for me. This is a fundamentally hard problem, but even a simple solution like pdf2txt would be fine as a first step.

UI improvements for Add to Queue

Add to Queue is pretty hard to use right now. There's a few things that should be done:

  1. Some UI feedback that acknowledges you pressed the button to add the article to queue
  2. Adding an article to the queue might take a long time if you have a poor internet connection. So a progress on the status of the download would be helpful
  3. Something should prevent you from adding the same article to queue multiple times (or at least optimize it so that it doesn't have to download anything twice). On the flip side, deleting from queue should remove all instances of the deleted article.

Improve error in Safari private mode

IndexedDB is not available in Safari's private mode, so RTMS fundamentally does not work. Make it error on add-to-queue, with error text nicer than attached.
Screen Shot 2022-09-12 at 00 13 17

Support more languages and voices

Currently, text breaking is only written to work for English text (though it probably works on anything with newlines), and TTS is fixed to a single (American English) voice. It should be possible for a user to select the language of the article they downloaded (or have it autodetected), and select a TTS voice to go with it.

Add Article terminates too early

Browser: All browsers

Description: If you submit an article and click the back button, the article will not be added to the library. This is because the Add Article backend will terminate early if the connection terminates early.

Notes: A fix for this would be to spawn a separate task to do the Add Article operation.

RSS doesn't work on actual deployments

The RSS feed in server/src/list_articles.rs has a localhost URL. This makes it unusable in deployments.

The solution should be to have a new server flag --hostname that gives the hostname of the server (default localhost). Then the RSS fead can use that as the base of the URL path.

Add note about nginx URL normalization

I ran into this issue. A filename had a > in the title, and the server would return a 400 (Bad Request) error when I attempted to Add to Queue. The reason: nginx was normalizing the percent-encoded URL before it got passed to readtomyshoe-server. It put > directly in the URL, making the URL invalid, and forcing hyper to choke and return a 400.

Solution: make sure your proxy_pass line in nginx is specified WITHOUT a URI. In my case, that means making sure there is no trailing backslash in the proxy_pass line below:

location / {
        proxy_pass http://localhost:32148;
        ...
}

More info here

Truncate spoken URLs

Sometimes articles have full URLs in them. Reading them out isn't particularly helpful. It'd be better to replace URLs in articles with "Link to [domain]" and remove the rest of the path.

Keyboard bindings

This really really needs keyboard bindings. You shouldn't need to use a screen reader just to find the pause button for the audio that's currently playing. Copy popular bindings, maybe from youtube.

Add note about "secure contexts" for offline mode

Currently, if you run scripts/prod.sh on your local network, offline mode will fail. The reason for this is because service workers can only be registered in "secure contexts" meaning localhost or http://..., and no more. This might be surprising for a developer trying out the server for the first time and finding that one of the core features doesn't work.

Make a prominent note somewhere saying that this is the expected behavior, and that you can only sample offline mode when you're localhost or running a real HTTPS server.

Make a distinct outro for each article

Problem

It's not always clear when an article is over. If you are listening in a podcast app, you might jump from the end of one article to the middle of another unfinished article. This can be jarring.

Solution

Add an outro to every article. It should be something distinct, not loud, and not language-specific. Methods:

  1. Concat an outro.mp3 to every single article MP3 before saving. Since the concatenation of two MP3 files is a valid MP3 file containing the audio of both in sequence, this solves the problem. This is also undoable: if you don't want the outro anymore, and you know the bytelength of outro.mp3, then you can just truncate every file by that amount. Downside of this is that this wastes space, copying outro.mp3 to every single article.
  2. Do as above, but instead of copying outro.mp3 to the file on the filesystem, simply have the webserver do the concatenation for requests. This requires a little more engineering effort, but it's less wasteful. The downside here is that the files on the disk no longer represent the files you hear in the app/website.

Let users use their own Google Cloud account for TTS

It should be possible to make RTMS use a service account that can bill to subordinate accounts. This would allow me to run a persistent RTMS server and only pay for storage, rather than the expensive TTS bills too.

The flow would roughly be:

  1. User makes a Google Cloud account just like in Getting Started
  2. User logs into RTMS and is presented with a special authentication link
  3. User clicks the link, which will take them to GCP and ask if they want to give access to their TTS API key. User clicks OK
  4. User returns to RTMS and can use it on their own dime.

Insert sentence breaks to titles

Currently, the TTS voice speeds through the entire title, byline, and first sentence of the article. These should all be spoken as separate sentences. The reason they're not is because the text extraction has them separated by newlines. The TTS for web-derived articles should replace newlines with periods, so that the TTS breaks at the appropriate places.

Blocked by reverse proxies

RTMS instances are being detected as bots, and receiving 403 (Forbidden) on pages hosted by Cloudlflare and AWS Cloudfront. The traffic volume of the instance is irrelevant to whether it gets banned.

Possible solutions:

  1. Register with Cloudflare as a friendly bot
  2. Write a bookmarklet that will submit articles directly to the server, rather than having the server make the request. This is a nice solution because it also solves #28 without any extra work.

Clicking Add to Queue should shift focus to the progress indicator

Browser: Chrome
OS: macOS

This is probably the case on other browsers too. Whenever you select Add To Queue with Voiceover, it leaves some unnamed group focused, and you hear the progress ticks. But it never says you're focused on a progress indicator, and it never says the percentage.

The example app in this Vue page does focusing correctly, at least for the button -> progress part. Maybe try to emulate that.

Play/Pause <button> does not always act like a play/pause button

Browser: All browsers

Description: If you pause, then seek, then hit the play/pause button to resume, it will play from the position before the seek. This is because the play/pause button loads the last known state when playing. It seems the play button and the play/pause button have two mutually exclusive, and useful functionalities.

Notes: One possible solution to make play/pause respect seeking/jumping is to trigger a save on every seek/jump. A better solution I think is to remove the play/pause button altogether. The reason it exists is because clicking the play button on the <audio> element will sometimes not play from the correct time, because the tab may have previously been unloaded and lost its place. This would be fixed if we had callbacks from the Page Lifecycle API that reload the player state whenever the player becomes visible again and was not playing something already. See visibilitychange

Make query-based subfeeds for RSS

It would be nice to be able to subscribe to https://RMTS_URL/api/feed?title-filter=Money%20Stuff, and have that return a feed containing only the articles with "Money Stuff" in the title.

Images are broken in dev mode

Running scripts/dev.sh (which calls trunk) runs a version of RTMS with broken images, broken content scripts, broken manifests, etc. This is because dev mode serves the index from /, while production serves all the static assets from /assets/ (aside from index.html, which is special-cased). So if I set the favicon URL to /assets/favicon.ico, then it'll load in prod mode but fail in dev mode. And if I set it to /favicon.ico it'll load in dev mode and fail in prod mode.

An obvious thing to try is to put everything inside an /assets folder in dev mode. But this doesn't work because 1) it needs to serve index.html from /, and 2) if you make index.html separate, and put everything else in /assets/, then the URLs in prod will be /assets/assets/, because Trunk doesn't distinguish dev and prod builds.

Another solution is to do the above, but also make a copy of every asset in / as well. But 1) Trunk doesn't have a notion of making 2 copies of everything (though maybe you could do this in post-build hooks, and 2) this is very ugly.

Make user accounts

It's an obviously terrible idea to have every article be public. Let users make accounts and optionally share their library with other users.

Also be sure to implement robust access controls, and prevent enumeration attacks where a malicious user might be able to discover the contents of another user's library.

Article extraction errors are bad

Currently, if trafilatura fails at extracting an article, the error presented in the Add Article view is a parsing error. This is because trafilatura exits with a 0 error code, even on failure.

Note: this is a trafilatura bug that was reported and fixed. Once the fix is upstreamed, this will be closed.

Stop speech if other media plays

Currently, on every platform, if you get a notification while listening to an article, or you start to play something else, the playback will not stop. In the best case, the audio will duck and continue playing, and in the worst case it will appear to be playing from the controls, but no audio will come out. It would nice if RTMS could detect when it lost audio focus and pause playback.

This may be impossible at the moment. The only API that seems to address this is the seemingly defunct Audio Focus API.

Warning due to "fake play"

Browser: All browsers

Description: Whenever playing an article from the queue, an warning shows up in the console, saying something like

HTTP “Content-Type” of “application/x-unknown-content-type” is not supported. Load of media resource blob:<base_url>/75fe7bcf-6205-4786-9c1a-9725da54a14a failed.

This is because the fake_play() function attempts to play an empty blob. The reason for that is because Safari requires an immediate play() action after a button click in order to permit future play()s without user interaction. So the button click handler calls play() immediately on invalid audio before loading the real audio.

Notes: A fix for this might be to play an valid, but empty MP3 file. I've tried this and couldn't get it to work for the life of me.

Make a Wiki

There's too much info in the README. The Wiki should have, at least:

  • A high-level overview
  • Design goals and non-goals
  • Dev notes
  • Deployment notes. Cover Fly.io, Caddy config, nginx config, and GCP setup and cost
  • A quickstart guide

Implement a readalong UI

It would be nice to have a way of reading the text along with the voice. This would be possible using the current API by adding an SSML <mark> tag to every sentence. The TTS will return timepoints that can tell the client which sentence they're reading at a given timestamp.

Add total queue length

It would be nice to be able to see the total length of the downloaded queue. Since you can't see the length of articles without opening them (including before you download them), the only way to know how much time you have queued up is to manually open every article. Total queue length solves this problem with less clutter than displaying every article length individually.

Support other ways to add to library

Currently you need to open RTMS in order to add an article to the library. It would be nice if there were a bookmarklet or share target that let you add it directly from the page. Push To Kindle is a good example of such a bookmarklet.

This has other benefits too. With a bookmarklet, you could possibly do text extraction client-side. This lets users with, e.g., Bloomberg accounts to add Bloomberg articles to their library without having to give RTMS their login.

Make an autoplay option

Make a selection box which, when selected, will autoplay the next article once the current one is done.

Let users enter credentials for paywalled content

Currently, there's no way to read a paywalled New York Times article. If the person running the RTMS server has an NYT account, they should be able to use the login to fetch that article.

Ditto goes for Bloomberg, WSJ, and SEC EDGAR (needs user agent).

Consider storage limits and what happens when those are hit

Currently, RTMS assumes it has infinite space for audio files. It would be wise to consider a way to limit space usage (either globally or per-user), and consider what happens when it's hit.

For example, a user who hits their space limit might not be able to add any articles until they "archive" some. Archiving might amount to deleting the MP3 file but saving the metadata somewhere. The the library, there might be an extra section for archived articles. It shows their title and source, but no Add to Queue button.

Make it so the Docker image can put the `audio_blobs/` directory on an external volume

Currently, audio_blobs/ lives inside the Docker image. That means 1) it is limited by whatever space constraints the docker runner is using, and 2) whenever the image is torn down, so is all the data.

It would be preferable to have the Dockerfile (or docker-compose file?) specify an external location to persistently keep the audio_blobs/ data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.