Giter Site home page Giter Site logo

gist's Introduction

G.I.S.T

(GPT Interpreted Speech to Text)

App to summarize audio files for the LC ACM spring 2023 hackathon

Created by Kat Berge & Carson Reader

This is a website that can transcribe audio like lectures and then summarize it using ChatGPT's API. It is still a work in progress, but it has functionality right now. It requires an OpenAI API key to work. This will charge you, but it is very cheap (usually less than a cent per summary, it is very negligable). I would highlt recommend setting up a monthly spending limit in your OpenAI account settings under billing (5 dollars shoul be plenty) so that you do not have to worry about excess charges. Make sure you put you API key in the top left input box before attempting to transcribe or summarize. The site does not collect or store your API key in any way. There is also currently a word limit for summaries (roughly 4000 words give or take) so be make sure to check the word counter before you summarize (We have yet to add good error handling and messages when the summary fails lol) Please share any bugs you find or features you think would be good with us, if it is not already on the issues list. When summarizing, note that selecting the relevant topic from the topic dropdown will greatly improve summary results. If none of the topics match your audio, just select auto.

The site can be found here.

Dev rules:

  • React component files should start with a capital letter
  • React components go into components folder, any regular js or ts files go into utils
  • Always write stylesheets in Sass and transpile to CSS in order to stay more organized
  • Write Sass stylesheets per component

gist's People

Contributors

carsonthemonkey avatar katberge avatar maximumcold avatar

Stargazers

OedoSoldier avatar  avatar  avatar

Watchers

 avatar

gist's Issues

Setup Electron on MacOS

We need to make sure we have it set up to build for Mac as well as test that styling and functionality is consistent.

Create logo

We need a logo for the app icon and favicon as well as the title in the app possibly. I have a few ideas I am going to look into.

Port to mobile with React Native

Haha no, I'm just kidding that would probably be too much work... unless? No no, of course I am joking that would probably take a while and be difficult... unless you wanted to? but no we shouldn't (even though it would look super cool on an iPad and could provide us (especially me) with a valuable learning experience with mobile development). Unless...

Filter out filler words in transcript before summary

To optimize the number of tokens we are sending to GPT API, we should filter out filler words in the transcript before sending it to ChatGPT. This could also be an option for the user on the transcript. We might want to look at the npm package natural to accomplish this easily

Get direct transcriptions in other languages working

So turns out the translate checkbox does nothing, and whisper will translate the audio regardless of if it is checked or not. I am not sure if this is even possible with our whisper, implementation, but there is a large variety of different forks of whisper that we may be able to use.

Allow easier file saving and reading

On web we can have a download button easily download transcript and summary. In electron, we could create a project file structure to easily store whole projects. Maybe even add multiple files in one project with tabs or hamburger menu?

Support a more feature rich transcription API

Speaker diarization would be pretty cool, so this could be something we look into. I think there is a fork of OpenAI's whisper API (Which we currently use) that could be used to do this, but It may only work in Python. If so we might have to do some backend stuff which I am scared of. Although we may be able to get it running locally instead of through an API call which would be pretty cool. There is also google cloud speech-to-text which has diarization but would require a separate API key and I'm not sure if it supports translation as well as whisper does.

Implement local whisper support

this package I think allows whisper to run locally on the CPU and (I think) supports timestamp sync which would be really neat for the audio panel.

Support streaming from ChatGPT API

The GPT API supports streaming chat completions before the model has fully finished generating them (like the chatGPT website). This would effectively speed up the summarizing process since users will be able to read the summaries before it has finished being generated.

Create project site

We could use github pages to create a small site to download the app if we wanted. (could we even host it on github pages? we'd have to look into it)

Add GPT-4 support (when we get access to the API)

I am on the waitlist, so either once I am accepted into it, or when it is released publicly. GPT-4 should allow for drastically larger input documents, and when GPT-4's image understanding model gets released, we can add support for including images to summarize as well. This would also mean that we could maybe do some things with video processing as well, like for lecture slides. This may be beyond our scope, but it would be really cool so we can keep it in mind.

Make Website work on mobile

Formatting gets really messed up on mobile, so we may want to do something about that. Although making it work on smartphone may take a lot of ui rearranging for that screen size specifically.

Add dark mode toggle button

We may want to have a hamburger menu at some point since we are getting a lot of buttons and it could make the desktop version a little cleaner. I also don't think our Sass is really set up to manually toggle at this point, but I am not sure.

Fix/check word count function

Giving over 700 when about 300 words shown on screen (however, this is in debug mode so it might work with the API calls)

Add token counter and price estimator

I believe the tiktoken library can help with this. This should allow us to come up with an accurate price estimation for summaries. Simply looking at the length of the audio file is enough to calculate an accurate price for it.

Add a subject dropdown for the summary panel

If the user specifies the subject, it can improve the chatGPT output by giving it more specific instructions based on the subject. We can also have an option for any subject, or custom input.

Add confirmation step to npm run build

It is a little too easy to push your changes to the live page right now, so I will make a script that requires an extra confirmation step before doing so.

Surface API errors

Not having an API key or a failed API request are currently not viewable by users.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.