carsonthemonkey / gist Goto Github PK

View Code? Open in Web Editor NEW

3.0 1.0 0.0 39 MB

App to summarize audio files for the LC ACM spring 2023 hackathon

License: MIT License

HTML 0.51% CSS 11.43% TypeScript 67.77% SCSS 19.10% JavaScript 0.60% Python 0.59%

gist's Introduction

G.I.S.T

(GPT Interpreted Speech to Text)

App to summarize audio files for the LC ACM spring 2023 hackathon

Created by Kat Berge & Carson Reader

This is a website that can transcribe audio like lectures and then summarize it using ChatGPT's API. It is still a work in progress, but it has functionality right now. It requires an OpenAI API key to work. This will charge you, but it is very cheap (usually less than a cent per summary, it is very negligable). I would highlt recommend setting up a monthly spending limit in your OpenAI account settings under billing (5 dollars shoul be plenty) so that you do not have to worry about excess charges. Make sure you put you API key in the top left input box before attempting to transcribe or summarize. The site does not collect or store your API key in any way. There is also currently a word limit for summaries (roughly 4000 words give or take) so be make sure to check the word counter before you summarize (We have yet to add good error handling and messages when the summary fails lol) Please share any bugs you find or features you think would be good with us, if it is not already on the issues list. When summarizing, note that selecting the relevant topic from the topic dropdown will greatly improve summary results. If none of the topics match your audio, just select auto.

The site can be found here.

Dev rules:

React component files should start with a capital letter
React components go into components folder, any regular js or ts files go into utils
Always write stylesheets in Sass and transpile to CSS in order to stay more organized
Write Sass stylesheets per component

gist's People

Contributors

Stargazers

Watchers

gist's Issues

Add info popup

info button currently doesn't do anything

Add support for dragging in a transcript instead of an audio file.

This would be helpful if you only wanted summaries.

Add styling/animation to loading

Add dark mode

🌒 🌠

Add auto linting to predeploy scripts

This will allow us to automatically consistantly format the code every time it is built

Setup Electron on MacOS

We need to make sure we have it set up to build for Mac as well as test that styling and functionality is consistent.

Change the link to the GitHub page to something better

Fix empty first bullet point in summary panel bullet points

Add a nicer Key Icon

All the good ones I can find are payed so I'll probably just end up making one myself

Add support for markdown syntax

Would improve readability and allows for more formatting in the summary panel

Create logo

We need a logo for the app icon and favicon as well as the title in the app possibly. I have a few ideas I am going to look into.

Port to Electron

Would be fun to have this as a desktop app

Port to mobile with React Native

Haha no, I'm just kidding that would probably be too much work... unless? No no, of course I am joking that would probably take a while and be difficult... unless you wanted to? but no we shouldn't (even though it would look super cool on an iPad and could provide us (especially me) with a valuable learning experience with mobile development). Unless...

Filter out filler words in transcript before summary

To optimize the number of tokens we are sending to GPT API, we should filter out filler words in the transcript before sending it to ChatGPT. This could also be an option for the user on the transcript. We might want to look at the npm package natural to accomplish this easily

Get direct transcriptions in other languages working

So turns out the translate checkbox does nothing, and whisper will translate the audio regardless of if it is checked or not. I am not sure if this is even possible with our whisper, implementation, but there is a large variety of different forks of whisper that we may be able to use.

Allow easier file saving and reading

On web we can have a download button easily download transcript and summary. In electron, we could create a project file structure to easily store whole projects. Maybe even add multiple files in one project with tabs or hamburger menu?

Support a more feature rich transcription API

Speaker diarization would be pretty cool, so this could be something we look into. I think there is a fork of OpenAI's whisper API (Which we currently use) that could be used to do this, but It may only work in Python. If so we might have to do some backend stuff which I am scared of. Although we may be able to get it running locally instead of through an API call which would be pretty cool. There is also google cloud speech-to-text which has diarization but would require a separate API key and I'm not sure if it supports translation as well as whisper does.

Improve summary formatter robustness

It really needs to stop splitting up math equations and other things with - signs

Implement local whisper support

this package I think allows whisper to run locally on the CPU and (I think) supports timestamp sync which would be really neat for the audio panel.

Add cookie storage for text files and API key

Would be convenient to save user options and files on their machine for when they return to the page

Add Browse Files button for the Transcript panel

Support streaming from ChatGPT API

The GPT API supports streaming chat completions before the model has fully finished generating them (like the chatGPT website). This would effectively speed up the summarizing process since users will be able to read the summaries before it has finished being generated.

Add audio panel

Create project site

We could use github pages to create a small site to download the app if we wanted. (could we even host it on github pages? we'd have to look into it)

Add GPT-4 support (when we get access to the API)

I am on the waitlist, so either once I am accepted into it, or when it is released publicly. GPT-4 should allow for drastically larger input documents, and when GPT-4's image understanding model gets released, we can add support for including images to summarize as well. This would also mean that we could maybe do some things with video processing as well, like for lecture slides. This may be beyond our scope, but it would be really cool so we can keep it in mind.