Giter Site home page Giter Site logo

lodestone / transcriber Goto Github PK

View Code? Open in Web Editor NEW

This project forked from voxmedia/transcriber

0.0 2.0 0.0 8.11 MB

NWJS os x desktop based application that given a video/audio file returns a transcription using IBM Watson Speech to text API

Home Page: https://voxmedia.github.io/Transcriber

License: BSD 3-Clause "New" or "Revised" License

JavaScript 43.38% HTML 56.62%

transcriber's Introduction

Lightweight Speech to text desktop app for OSX Using IBM Watson API

IBM Speech to text API

To use this app you need to get IBM Watson API keys for their speech to text service, by making an account with Bluemix

Usage - Development

If you clone the repo you can start the app with npm start.

Usage - User

Or you can get the latest release packaged and ready for use here

This is a Tray Menu app.

Transcriber menu

First you Select Media, audio or video you'd like to transcribe.

Notifications show when a transcription as started and when it's finished.

On completion a editable text area shows you the transcription.

demo

By default the transcription is also saved to clipboard.

You can disable Autosave to clipboard if working on text editing or making use of the system clipboard for some other program to avoid it overwriting something else you might be doing with it.

Setting IBM Watson API keys

First time you start the application you'd be prompt to set the API keys.

Should you need to change those you can use shortcut cmd + shift + a.

These are saved inside the app as a json file wttskeys.json at the root of the application.

Which is in the .gitignore so that it doesn't accidentally gets added to git by mistake, when in development mode.

Overview of project

  • Once you select a video, the app converts it into audio and sends it to the IBM Speech to text API.
  • When the transcription comes back it's copied to clipboard, unless you un-tick the option in the menu.
  • Paste the transcription wherever you want and take it from there.

Technical overview

Convert video to audio

The video_to_audio module converts video or audio into IBM audio specs. Initially modified from Sam Lavine's gist.

Audio files are saved in ./tmp/audio folder.

IBM Speech to text API

The stt folder contains the module to interact with the IBM Speech to text API. If you want to dive more into this their documentation on how to interact with the API is pretty good.

Transcribing video

transcribe.js requires both modules described above and brings it all together.

Converts audio into video, and then sends it to Watson for transcriptions. Transcriptions are saved onto a text file in ./tmp/text folder.

module returns the path to the text file.

index.js abstracts transcribe.js in case the interface needs to change at a later stage.

NWJS

indext.html contains the Implementation of the NWJS app. Adding Menu Tray to the application.

See comments in the code ``./index.html` and nwjs wiki as well as nwjs documentation for more on this.

User flow

When a user selects a video it's transcribed, appropriate system notifications for start and end are triggered.

When done unless option is un-ticked transcription is saved to clipboard.

in which case user can click on Copy transcriptions to cliboard to get the transcriptions.

Build NWJS app

Option 1

Use deploy script

node deploy.js

This creates a build folder inside the repo. The build folder is also in .gitignore to avoid accidentally pushing it to remote.

Option 2

To rebuild the app in NWJS refer to the documentation

Install nw-builder

npm install -g nw-builder

From one level above the application folder (cd .. from root of repo)

nwbuild -p osx64 ./transcriber

creates a build folder that contains the app

Todo

  • Write proper test using testing framework.
  • Ad some proper form of error handling
  • IBM has a size limit of 100mb per audio post request. Double check if there's a use case when converting video to audio it exceeds that size. Rough test with 54gb video to audio with that module ended up 50 to 70 mb. So it would seem ok for now?

transcriber's People

Contributors

pietrop avatar

Watchers

Matt Petty avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.