chengsokdara / use-whisper Goto Github PK
View Code? Open in Web Editor NEWReact hook for OpenAI Whisper with speech recorder, real-time transcription, and silence removal built-in
License: MIT License
React hook for OpenAI Whisper with speech recorder, real-time transcription, and silence removal built-in
License: MIT License
Hi,
Apologies for not explaining myself properly. Currently the hook returns the transcribed text which is automatically translated in English. I was hoping that the user can control this feature by passing a prop e.g. "translation: False". Based on the documentation of Whisper API, the input is called "translation".
Great job with this hook.
Anyone getting this? This is on a fresh install of this package. @chengsokdara
When replay the recorded speech, it would be lower in the pitch or (when I use headphone, it would higher and resultin like Donald duck sound).
I have no idea why it would be like this, I compare using webAPI to record and replay the same sound (pronounce by myself with same headphone), webAPI is normal.
Anyone experience the same issue?
In 5 minutes useWhisper
sent 100's of requests, uploaded 74684 seconds of audio which is over 20 hours, and cost over $17!
It looks like you're uploading the entire cumulative recording every second?
Luckily I had billing limits set.
const {
recording,
speaking,
transcribing,
transcript,
pauseRecording,
startRecording,
stopRecording,
} = useWhisper({
apiKey: getApiKey() ,
streaming: true,
timeSlice: 1_000, // 1 second
whisperConfig: {
language: 'he',
},
});
Failed to compile.
./node_modules/@chengsokdara/use-whisper/dist/chunk-3CCW4YJS.js 185:29
Module parse failed: Unexpected token (185:29)
File was processed with these loaders:
if (f && t.current && (H?.(e), l.current.push(e), (await t.current.getState()) === "recording")) {
| let n = new Blob(l.current, {
| type: "audio/webm;codecs=opus"
Help me resolving this issue please. I tried to save 10seconds of transcript for 5times in an array and display it.
import React, { useState } from 'react';
import { useWhisper } from '@chengsokdara/use-whisper';
import './App.css';
const App = () => {
const [transcriptions, setTranscriptions] = useState([]);
const { startRecording, stopRecording } = useWhisper({
apiKey: 'API_KEY', // Replace with your actual OpenAI API token
streaming: true,
removeSilence: true,
timeSlice: 1000, // 1 second
whisperConfig: {
language: 'en',
},
onTranscribe: (blob) => {
return new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => {
const text = reader.result;
resolve({ text });
};
reader.readAsText(blob);
});
},
});
const recordAndSave = async () => {
const recordingResult = await startRecording();
setTimeout(async () => {
const transcription = await stopRecording();
setTranscriptions((prevTranscriptions) => [...prevTranscriptions, recordingResult.transcript.text]);
}, 10000); // Record for 10 seconds
};
const repeatRecording = async () => {
for (let i = 0; i < 5; i++) {
await recordAndSave();
}
};
return (
<div className="App">
<header className="App-header">
<h1>Real-Time Audio Transcription</h1>
</header>
<main>
<div>
<p>Transcribed Texts:</p>
<ul>
{transcriptions.map((text, index) => (
<li key={index}>{text}</li>
))}
</ul>
</div>
<div>
<button onClick={repeatRecording}>Start Recording 5 Times</button>
</div>
</main>
</div>
);
};
export default App;
Even if it is stored in an env variable, using your api key in the client can still expose it. Do you have any suggestions to fix this issue? Maybe moving part of the architecture to the server?
Hi @chengsokdara, I'm loving this project, and I especially like the custom server functionality.
I am working on a project where I am recording multiple speakers talking in turns. My hope is to use useWhisper to transcribe the entire conversation, but I'd like to do so one piece at a time.
I was wondering if there is a way to configure useWhisper to trigger an on onTranscribe event when a speaker stops talking for a brief period (basically when the speaker variable goes from true to false) and then reset the audio file to set up for recording the next speaking block.
I saw your examples using the customizable callback functions but I wasn't sure how to properly configure them for this case.
Thanks,
Kyle
I don't know much about AI stuffs but I learned that OpenAI costs us money for using their API GPU Calculation. There's a docker webservice api for whisper ai. It won't cost money. So, I prefer that way. So, my question is can I use that docker stuffs on my react project using whisper ai?
I'm currently using your library in my project and everything work fine. But somehow, I can't store tanscript text as soon as the response get back from api. I could use setTimeout and store the textContext of DOM element output in a variable but it's not flexible and efficient. It would be better if I could store the transcript text right after getting the response.
This is how I'm currently using -
const data = useWhisper({ apiKey: "<MY API KEY>" }) const handleInput = async () => { await data.stopRecording(); setTimeout(() => { let text = document.getElementById('myText').textContent console.log(text); handleAnswer(text) }, 500); }
for typical backend streaming, how to call whisper.decode()
that would be helpful for noobs
.
Hi, iam running a 401 error while iam speaking:
I am using REACT Nextjs.
xhr.js:251 POST https://api.openai.com/v1/audio/transcriptions 401
dispatchXhrRequest @ xhr.js:251
xhr @ xhr.js:49
dispatchRequest @ dispatchRequest.js:51
request @ Axios.js:146
httpMethod @ Axios.js:185
wrap @ bind.js:5
eval @ chunk-32KRFHOA.js:5
await in eval (async)
eval @ chunk-YORICPLC.js:1
Z @ chunk-32KRFHOA.js:5
await in Z (async)
eval @ RecordRTC.js:3201
webWorker.onmessage @ RecordRTC.js:2810
client.js:1 useMemo AxiosError {message: 'Request failed with status code 401', name: 'AxiosError', code: 'ERR_BAD_REQUEST', config: {…}, request: XMLHttpRequest, …}
the status of transcribing was true when i was recording but producing this error
The code iam using is the same as you provided exept my API token
Any suggestions and help please?
`import { useWhisper } from "@chengsokdara/use-whisper";
const LiveWhisper = () => {
const {
recording,
speaking,
transcribing,
transcript,
pauseRecording,
startRecording,
stopRecording,
} = useWhisper({
apiKey: process.env.NEXT_PUBLIC_OPENAI_API_TOKEN, // YOUR_OPEN_AI_TOKEN
streaming: true,
timeSlice: 1_000, // 1 second
whisperConfig: {
language: "en",
},
});
console.log(
``
return (
Recording: {recording}
Speaking: {speaking}
Transcribing: {transcribing}
Transcribed Text: {transcript.text}
export default LiveWhisper;
`
Is there any way we can specify which microphone useWhisper should use?
When I tried custom url code and
const { transcript } = useWhisper({
// callback to handle transcription with custom server
onTranscribe
})
It breaks before page load and
I get these errors -------
Uncaught TypeError: Cannot read properties of null (reading 'useRef')
at useRef (react.development.js:1630:1)
at ue (chunk-32KRFHOA.js:5:1)
at App (App.tsx:38:1)
at renderWithHooks (react-dom.development.js:16305:1)
Warning: Invalid hook call. Hooks can only be called inside of the body of a function component. This could happen for one of the following reasons:
.
First off, thanks for your work. I find this very useful. The base application is working fine. I can use all my buttons and I can get the transcript ok.
But where I am facing issues is when I want to chain the actions. For example, I want to:
My functions:
My issue is getting an undefined transrcribe.text after stopRecording ends, so I can't feed it into sendMessage. I've tried a few different approaches and got close, but not quite there yet.
So, now It's all a manual task. Start, Stop, Send Message, Listen to response.
Any clues to make better use of the API and get transrcribe on demand?
When using your standard configuration:
const App = () => {
const {
recording,
speaking,
transcribing,
transcript,
pauseRecording,
startRecording,
stopRecording,
} = useWhisper({
apiKey: import.meta.env.VITE_OPENAI_API_KEY, // YOUR_OPEN_AI_TOKEN
})
return (
Recording: {recording}
Speaking: {speaking}
Transcribing: {transcribing}
Transcribed Text: {transcript.text}
Error: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
First of all, thank you for creating this awesome hook. The hook is automatically translating the transcribed text to English. Will it be possible to add a prop so that we can choose to set this translation to on or off? Additionally, instead of passing the server link, can we just also pass a function instead? The function then connects with the API directly & returns the response object.
Thank you once again.
So I tried to get this component to work for a couple hours now, finally realized that vite (in a fresh project initialized with pnpm create vite
) dev server does in fact not play well with it.
Have not yet figured out what exactly is stopping it from working, but wanna leave this issue here in the meantime so others are aware.
If you're using vite, the awkward but possible workaround is to vite build --watch
and vite preview
, then this component will work as expected.
Is there interest for a mode / is it already possible to do the following;
Start listening automatically to voice until a break (this is already possible using the config below), but later allow the user to restart the same flow again at any point by talking again.
nonStop: true, // keep recording as long as the user is speaking
stopTimeout: 2000, // auto stop after 5 seconds
At the minute the only option seems to be the streaming option to achieve this, but the problem I found was that the transcript becomes one continuous message, whereas I would like it broke up into different chunks, as of when they are spoken.
It is not giving any error though the output for trascript is
{blob: undefined, text: undefined}
I am using the very first example in the git repo.
`import React, { useState, useEffect } from 'react'
import { useWhisper } from '@chengsokdara/use-whisper'
export default function OpenAIDialog() {
const {
recording,
speaking,
transcribing,
transcript,
pauseRecording,
startRecording,
stopRecording,
} = useWhisper({
apiKey: 'Key', // YOUR_OPEN_AI_TOKEN
})
useEffect(() => {
console.log('transcribing', transcribing)
console.log('transcript', transcript)
console.log('recording', recording)
console.log('speaking', speaking)
}, [recording, speaking, transcribing, transcript])
return (
<div>
<p>Recording: {recording}</p>
<p>Speaking: {speaking}</p>
<p>Transcribing: {transcribing}</p>
<p>Transcribed Text: {transcript.text}</p>
<button onClick={() => startRecording()}>Start</button>
<button onClick={() => pauseRecording()}>Pause</button>
<button onClick={() => stopRecording()}>Stop</button>
</div>
)
}
`
When running a first code snippet provided in Readme I always get transcript text and blob as undefined
Can you help or advise please?
Hi @chengsokdara this is a very good project to explore real time streaming use case.
We released a very optimised whisper large-v2 API on MonsterAPI which reduces the cost of access of whisper model compared to openAI api by upto 6x. Our API scales on-demand as well.
I am raising a request for integrating whisper API into your project to make it super cost effective for developers using your project to simply get access to powerful whisper ASR using MonsterAPI.
Please find below links to our API docs and free playground:
All that a developer needs is an API token to get started with accessing the APIs.
Let me know your thoughts.
Currently, the streaming feature works perfectly fine, but it sends the entire audio stream from the beginning based on the timeSlice seconds. This results in exponential costs with longer periods. For instance, recording around 15 minutes can cost up to $10 with a timeSlice of 1 second.
To avoid such high costs, I suggest implementing a new feature that would resend only the last n-seconds of the audio stream. This would provide some context while reducing the amount of seconds being sent and thus, lowering the costs.
I believe that this improvement would not only make the streaming feature more cost-effective but also enhance its overall performance.
In the attached screenshot you can see the API usage from a 15 minutes streaming transcription:
The package does not seem to work for me for no apparent reason. I added a bunch of extra packages like @ffmpeg/ffmpeg, hark, openai, and recordrtc just to be absolutely sure.
I have a pretty simple setup just for demo purpose and when I start recording I do get the following logs in the console
These do suggest that everything seems to be working fine but when I log the transcript, I always get the following => {blob: undefined, text: undefined} as the output.
My recording status is also true when I am speaking but for some reason the output blob and text is always undefined.
My env:
Following is a snippet of what I have done:
import React from "react";
import {useWhisper} from "@chengsokdara/use-whisper";
const App: React.FC = () => {
const {startRecording, stopRecording, transcript, recording} = useWhisper({
apiKey: key,
});
console.log("transcript", transcript);
console.log("recording", recording);
console.log("...........");
return (
<>
<button onClick={() => startRecording()}>start</button>
</>
);
};
export default App;
Thanks in advance
Cheers
Thank you so much for this great project!
I am using your library to develop a chat application with voice input. However, I have encountered an issue where the transcript
variable retains the previous value after sending a message.
In this case, it would be great if you provide a method that resets the transcript
variable to its default state.
Here is my PR #29 where I provide my solution.
Alternatively, if you have any other suggestions for resolving this issue, please let me know.
First, thank you for making this hook!
It seems like there's no way to capture errors? We'd like to get insight into the failures, as we've been getting complaints from users about transcription failing but we have no visibility into errors, for example:
Is there a workaround to get access to internal errors?
If I use secrets
for my openAI key in Replit, I get this
Unhandled Runtime Error
Error: apiKey is required if onTranscribe is not provided
Here is where I call the key:
const {
recording,
speaking,
transcribing,
transcript,
pauseRecording,
startRecording,
stopRecording,
} = useWhisper({
apiKey: process.env['OPEN_API_KEY'],
})
If I just pass the key as a string it works, but obviously this is not something you want to do :)
npm ERR! Error while executing: npm ERR! /usr/bin/git ls-remote -h -t ssh://[email protected]/zhuker/lamejs.git npm ERR! npm ERR! Host key verification failed.
Very impressed by this project, thank you so much for it!
Is there some way to be able to stream the audio to a server endpoint (as in the examples) but also have it iteratively return results? Right now it seems like if streaming: true
is set, it will only hit the whisper api directly from the frontend (e.g. https://api.openai.com/v1/audio/transcriptions
).
That means there's quite a long pause at the end of recording to getting the result (since ffmpeg has to run at the end, and then upload quite a large file before getting the transcription). I'm curious if there's a way to avoid that with the current design?
Current behavior:
Desired behavior:
Benefits:
Thank you for considering this feature request.
I've had an issue running this and had to add @ffmpeg/core
as a dependency to fix. Should this be added as a library dependency?
I think Safari may not support the current codec used, but I'm not sure if there's a way to detect what codecs a browser supports at runtime.
When streaming mode is enabled, the onWhispered function is always called instead of the onTranscribe function.
This is easily fixed. I'd issue a pull request, but I made a few other commits, which shouldn't be integrated. So here's a diff of a fix:
Thanks for the great component!
I've tried to use it, I wasn't able to get any info why it wasn't working, I wanted to investigate it but wasn't able to because it failed silently
is it possible that the stream for real time transcription can be from an rtmp stream for example ?
Hey @chengsokdara! awesome job on writing this hook, I would absolutely love to try it out but I was not sure if this library is maintained anymore as there are couple of critical requests pending in issues but no response to them.
Are you still planning to maintain it or giving maintainer rights to anybody?
Thanks a lot for the great work!
I want to automatically start recording as soon as the browser detects that the user is speaking, thus if I can get that state in the return object, that will be really awesome.
Thanks a lot once again.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.