hahahumble / speechgpt Goto Github PK

View Code? Open in Web Editor NEW

2.7K 20.0 404.0 2.2 MB

💬 SpeechGPT is a web application that enables you to converse with ChatGPT.

Home Page: https://speechgpt.app

License: MIT License

HTML 0.57% JavaScript 0.37% TypeScript 96.46% CSS 2.15% Dockerfile 0.44%

chatbot chatgpt language-learning speech chat conversation

speechgpt's Introduction

Website • [中文]

🌟 Introduction

SpeechGPT is a web application that enables you to converse with ChatGPT.
You can utilize this app to improve your language speaking skills or simply have fun chatting with ChatGPT.

🚀 Features

📖 Open source and free: Anyone can use, modify it without cost.
🔒 Privacy First: All data is stored locally.
📱 Mobile friendly: Designed to be accessible and usable on mobile devices.
📚 Support for multiple languages: Supports over 100 languages.
🎙 Speech Recognition: Includes both built-in speech recognition and integration with Azure Speech Services.
🔊 Speech Synthesis: Includes built-in speech synthesis, as well as integration with Amazon Polly and Azure Speech Services.

📸 Screenshots

📖 Tutorial

Set the OpenAI API Key
- Go to Settings and navigate to the Chat section.
- Set the OpenAI API Key.
- If you don't have an OpenAI API Key, follow this tutorial on how to get an OpenAI API Key.
Set up Azure Speech Services (optional)
- Go to Settings and navigate to the Synthesis section.
- Change the Speech Synthesis Service to Azure TTS.
- Set the Azure Region and Azure Access Key.
Set up Amazon Polly (optional)
- Go to Settings and navigate to the Synthesis section.
- Change the Speech Synthesis Service to Amazon Polly.
- Set the AWS Region, AWS Access Key ID, and Secret Access Key (the Access Key should have the AmazonPollyFullAccess policy).
- If you don't have an AWS Access Key, follow this tutorial on how to create an IAM user in AWS.

💻 Development Guide and Changelog

For more information on setting up your development environment, please see our Development Guide.
To view the project's history of notable changes, please check the Changelog.

🚢 Deployment

Deploying with Vercel

Deploying with Docker

Pull the Docker image:arm64.

docker pull hahahumble/speechgpt

Run the Docker container.

docker run -d -p 8080:8080 --name speechgpt hahahumble/speechgpt

Visit http://localhost:8080/ to access the application.

Building and running the Docker image

Build the Docker image.

docker build -t speechgpt:arm64 -f Dockerfile .

Run the Docker container.

docker run -d -p 8080:8080 --name=speechgpt speechgpt

Visit http://localhost:8080/ to access the application.

📄 License

This project is licensed under the terms of the MIT license.

speechgpt's People

Contributors

Stargazers

Watchers

Forkers

jstony doornot pinkuburu mkdirmushroom noobm3 quorafind zcsunt miahelloworld sh7ning-mirror xiaojay hugetimor wrxx2019 xieren58 xxg90s asionbo ileeker pengzhizhuo allenxing safeoy haojiezzz zhjzls ryan-gsq aooohan liang7878 beurcrock safaritsai ljl618 radiorock wangdashuaihenshuai laijinyou sagebo ninehills lyhiving lipanpan-hub pengjinning arigatuo2022 maeganyork dengsihan rouxyang hqman zxcbbn iceskycn linqingyanga hijiaosir ycace mingxia cacarun trident-repos ctyx123 wuzhez happyday517 cloudzun lzyaom itsharex zjf1165 tmaaefu192 chenlicool ritreshgirdhar xhj ipanda9527 liu-ziting zjrwtx xiaoqiujun zxbin2000 jackli1942 xingfanxia pekaboo wowmarcomei clovebreath wengfada dullwolfs ryu2024 bottomheater willsowd han111 keniushadu parkerqian iwillcodeu youtaya jimmygao leeseon makibaom zhangniko zhanzhanyao maydayuiui v-laughing davidzhixing mvpmark gordonliu2008 nomoney2022 wuliqq qwwdemo bilalnawaz072 jackysmithhappy hartcrome codeyourwayup iuriimattos2 cyrilbhau jaedukseo leiguanape

speechgpt's Issues

如果能在国内正常访问呢？关了梯子都不能用T_T

如何能实现Vercel 和 Cloudflare 的快速部署呢，这样国内可以无需梯子了

Is your feature request related to a problem?

使用语音功能过程中一定要连接梯子吗，是否可以不连

Describe the solution you'd like

使用语音功能过程中一定要连接梯子吗，是否可以不连

Additional context

No response

Is your feature request related to a problem?

provide ARM64 docker version in dockerhub

Describe the solution you'd like

build arm64 docker and upload it to dockerhub.

Additional context

No response

Feature Request: A few suggestions to enhance speechgpt user experience

Is your feature request related to a problem?

My feature request is related to several problems I am experiencing while using the current version of the speechgpt. I am frustrated when:

The keyboard remains visible even after completing my input, which takes up unnecessary screen space and makes it harder to read the chat.
The keyboard still shows up while I interact with the assistant using speech recognition, which is unnecessary in that scenario and can be distracting.
Many average users need clarification on setting the speech recognition/synthesis language and language ID. So, I prefer an easier way to do this through environment variables and let the average users use it more easily with default configurations.
When the assistant generates a lengthy response, I have to wait for the honest answer to be developed before I can listen or read it. Streaming output for both text and TTS would make this process smoother and more enjoyable.
I often want to replay the assistant's response or my input via TTS but cannot curate more so, which can be inconvenient when I need to review previous interactions.

Describe the solution you'd like

Hide the keyboard after the user completes input and show back again after ChatGPT completes the response. This repo: ddiu8081/chatgpt-demo achieved this well. You can look around it if you like.
Do not show the keyboard when the user interacts with the assistant via speech recognition
Ability to set default speech recognition/synthesis language & language ID via environment variables. (As many average users find setting these at first a few confusing)
Assistant response streaming output, if it is possible, + streaming TTS output (This is very helpful when the assistant generates a long response)
Ability to replay the assistant response or the other input via the TTS engine

Additional context

No response

Is your feature request related to a problem?

1、可以在vercel的环境变量里设置网页访问密码
2、可以在部署的网站前端自填openai的api key

Describe the solution you'd like

1、可以在vercel的环境变量里设置网页访问密码
2、可以在部署的网站前端自填openai的api key

Additional context

No response

Describe the bug

Aruze recognition service doesn't work (even with Aruze key), but Aruze synthesis works well.

Steps to reproduce

1.press the record button;
2."there was an error with Aruze recognition service."

Screenshots or additional context

No response

How to fill in the parameter when using vercel deployment?

Is your feature request related to a problem?

How to fill in the parameters for deploying services on vercel

Describe the solution you'd like

I don't have any except the VITE_OPENAI_API_KEY parameter

Additional context

No response

Allow changes to the OpenAI Host

能否设置为后台发起openai和azur的请求，这样就能保护密钥了

Is your feature request related to a problem?

能否设置为后台发起openai和azur的请求，这样就能保护密钥了

Describe the solution you'd like

能否设置为后台发起openai和azur的请求，这样就能保护密钥了

Additional context

No response

Describe the bug

群晖docker拉取image，然后添加了url和key然后在web界面输入问题提交，提示api key 不能为空。

然后在web页面输入key和api地址，依然无法获得响应。可能是我使用了阿里云做了代理（参考项目Ice-Hazymoon / openai-scf-proxy ，但是在其他类似的chatgpt文字输入网页是没有任何问题的。

辛苦作者提供一些调试意见。

Steps to reproduce

Screenshots or additional context

No response

Is your feature request related to a problem?

For use with Azure TTS, the payload is built by string. I'm not sure if speech recognition would generate some text that is not valid.

Describe the solution you'd like

It could be more robust by ways such as xmlbuilder to generate SSML/XML.

Additional context

No response

关于发送信息后一直在”等待中“的问题

你好，
我的API地址填的是https://api.openai.com/v1/chat/completions
网络环境无论是大陆还是非大陆，发送信息后均显示为“等待中”。
甚至尝试使用hyperbeam等纯非大陆网络环境，仍然显示“等待中”。
部署方式使用的Vercel并做好了DNS，也尝试部署在旧金山的VPS上，
也都是显示的是“等待中”。语音识别和语音合成的接口没问题。
求解救。

Describe the bug

家里两台移动设备，iPad Air 5和iPhone 13，都一样的现象

网络环境，正常wifi链接，连接透明网关负责科学上网

现象：点击录音按键，一直显示“等待中”或“连接中”，无其他错误

Steps to reproduce

Open SpeechGPT on Safari
Fill in token for OpenAI & Azure
Click "Record"

Screenshots or additional context

Is your feature request related to a problem?

让系统更安全、控制token滥用

Describe the solution you'd like

在环境变量中加入网站访问密码

Additional context

No response

Is your feature request related to a problem?

One of the main functions of SpeechGPT is to simulate oral conversations. It is even better than voice control for chatGPT and other similar plugins. However when I use it, I have to press the send or enter button frequently. It interrupts the natural flow of conversations, especially in Continuous Recognition mode.

Describe the solution you'd like

Add vocal triggers for sending the message such as "send it" or other personalized vocal triggers. Then SpeechGPT will create vivid conversation senarios.

Additional context

No response

子路径配置支持

Is your feature request related to a problem?

想使用一个nginx域名代理多个服务，所需需要通过子路径（例如localhost/speechgpt）的方式访问，通过配置homepage，basename后发现还是需要访问/assets/xxx，请问需要如何修改呢？

Describe the solution you'd like

是否能通过增加配置文件，设置环境变量(Docker）的方式进行子路径配置？

Additional context

No response

There was an error with your request

Describe the bug

It use to be working, but today doesn't work and shown either "There was an error with your request" or " OpenAI error.." how can i de-bug it?

Steps to reproduce

It use to be working, but today doesn't work and shown either "There was an error with your request" or " OpenAI error.." how can i de-bug it?

Screenshots or additional context

It use to be working, but today doesn't work and shown either "There was an error with your request" or " OpenAI error.." how can i de-bug it?

目前安卓手机浏览器可用吗？鸿蒙系统可用吗？我试了几个浏览器都不行

Is your feature request related to a problem?

open the site with android browser,tts was not support

Describe the solution you'd like

Would you supply a full version for android ?

Additional context

No response

有没有人有Azure key可以借用一下

Is your feature request related to a problem?

有没有人有Azure key可以借用一下

Describe the solution you'd like

有没有人有Azure key可以借用一下

Additional context

No response

Proxy代理支持

Is your feature request related to a problem?

可以支持一下本地代理，不然会有封号危险。

Describe the solution you'd like

可以支持一下本地代理，不然会有封号危险。

Additional context

No response

是否支持流式阅读ChatGPT的回复

Is your feature request related to a problem?

因为配置语音合成账号还挺麻烦的，所以还没有尝试你们的方案。
目前ChatGPT是流式返回文字的，能否实时阅读流式返回的文字呢？而不是等一句话吐完了或者全部都合成完了再送去语音合成。

Describe the solution you'd like

像人类一样阅读屏幕上出现的文字。

Additional context

No response

docker里面的配置APIKEY的文件的具体位置在哪里？想在后台直接配好APIKEY，谢谢！

Describe the bug

docker里面的配置APIKEY的文件的具体位置在哪里？想在后台直接配好APIKEY，谢谢！

Steps to reproduce

Screenshots or additional context

No response

error

Describe the bug

openai key和azure key都是验证过没问题的，但是每次输入后反馈都是there was an error with your request

Steps to reproduce

1.使用chrome浏览器打开speechgpt.app
2.输入openai key和azure key
3。输入文字或语音
4。浏览器弹窗反馈there was an error with your request

Screenshots or additional context

No response

录制识别完成是否该在光标处插入？

Is your feature request related to a problem?

结合我的使用体验，我的想法是：如果一句话说错了，并且可以修改，应该删除错误内容，挪动光标在错误的地方重新插入。

Describe the solution you'd like

实现的效果如下：

2023-04-18.11.16.18.mov

Additional context

No response

Fail to deploy with environment variables

Describe the bug

I have deployed the project on Vercel and set the corresponding keys. However, it prompted that You don't have provided an OpenAI API Key.

Steps to reproduce

On Vercel's dashboard, I've set 4 keys as follows:

OPENAI_API_KEY: [my own key]
OPENAI_HOST: [my own address]
AZURE_REGION: eastus
AZURE_KEY: [my own key]

一些增强和优化建议

功能增强：
目前的状况总体上相当不错，但我希望能增加一些实用的功能，例如：

自动识别语音输入的语言，并在用户停顿指定时长后将识别结果自动发送给GPT；
在朗读功能中，不要忽略任何语言，而是在多语言环境中进行全面朗读，而不是跳过未在设置中指定的语言；
在每个对话旁边增加一个“重新朗读”的按钮（如果能同时暂停和停止就更好了）；
支持PWA。

需要改进：
此外，语音输出功能的稳定性有待提高，有时候无法正常输出。

以上是我个人的一些建议，希望你能予以考虑，谢谢！

设置了openai api key 但提示未提供

Describe the bug

设置里显示已经设定

Steps to reproduce

在vercel部署并提供了environment vars

Screenshots or additional context

No response

SpeechRecognition经常报错

第一次Record是ok的，大概2～3次之后就开始报错了
我猜测是不能无限次new SpeechRecognition()，应该是chrome的限制
我把第一次new的instance放到state里，之后每次都用同一个就解决了，不过我没法新建branch提PR所以麻烦作者亲自修复了

Can you add an access password feature?

Is your feature request related to a problem?

I hope you can set the access password, otherwise the token amount will be easily consumed once the page is discovered.

Describe the solution you'd like

I hope you can set the access password, otherwise the token amount will be easily consumed once the page is discovered.

Additional context

No response

功能和改进语音识别可以用官网的 whisper和加入浏览器内置语音成合

Is your feature request related to a problem?

YES

Describe the solution you'd like

下面的其它同类的项目演示，在安卓手机edge浏览器演示效果

Additional context

No response

built-in functions

Is your feature request related to a problem?

I don't get it. Why all the built-in functions of mobile versions have been cancelled? And the only other option Azure Speech Recognition Service doesn't work.

Describe the solution you'd like

please revert the built-in functions of mobile versions

Additional context

No response

一个非常简单的请求，录音/停止键增加一个键盘快捷键，这样对话和发送都不需要鼠标了

Is your feature request related to a problem?

not related to a problem, just a feature could help use it easier

Describe the solution you'd like

录音/停止键增加一个键盘快捷键，这样对话和发送都不需要鼠标了

Additional context

No response

安装报错，package.json是否缺少tippy.js依赖？

我用的是pnpm不是yarn，但是应该是一回事，似乎在dependencies里缺少了tippy.js
我自己install之后就解决了

add caption

Is your feature request related to a problem?

please add so that there is an explanation result as drawn

Describe the solution you'd like

Additional context

No response

可以提供docker版本吗

Is your feature request related to a problem?

请问可以做一个docker版本提供使用吗

Describe the solution you'd like

需要docker版本方便部署。

Additional context

No response

长对话记录会导致输入卡顿。

Describe the bug

如果对话记录变多, 输入变得卡顿,清空对话记录才恢复正常

Steps to reproduce

如果对话记录变多, 输入变得卡顿,清空对话记录才恢复正常

Screenshots or additional context

如果对话记录变多, 输入变得卡顿,清空对话记录才恢复正常

为啥不能用？

填了key，但不能用。

希望加入语音重听功能

Is your feature request related to a problem?

毕竟联系口语的初学者，很难一次听清😘

Describe the solution you'd like

在对话的内容加上重播按钮

Additional context

No response

输入法键盘挡住

Is your feature request related to a problem?

在手机上使用speechgpt时，每次点record或输入文字时，输入法键盘都会跳出挡住文本框和record键，这样无法看见输入的文字，很难修改文字，不知有没有好的解决方法。小米手机上似乎没有设置可以解决这个问题。

Describe the solution you'd like

重新调整UI使得输入法键盘不会挡住文本框和record键。

Additional context

No response

做的挺好的考不考虑出一个mac 客户端

把我想做的事做了.

顺便提个小建议能否支持多个会话

Make docker image public

Describe the bug

Seems like the docker image requires login

Steps to reproduce

logout from your user
run docker pull speechgpt
Observe the error message 'pull access denied for speechgpt'

Screenshots or additional context

No response

语音合成服务必须连接外网吗？

Is your feature request related to a problem?

语音合成服务必须连接外网吗？

Describe the solution you'd like

不用梯子能不能使用语音合成服务

Additional context

No response

API key not loaded in starting

Is your feature request related to a problem?

I have added settings as landing page so that we can load API key in the starting of the chat, as we can't find where to insert API key in starting, as user doesn't know that he has to insert API key in starting

Describe the solution you'd like

Made settings as landing instead of home chat page

Additional context

No response

Dokcer build 没报错，但是没有

Describe the bug

改了Dockfile 中的nginx监听端口，然后执行docker build，出现如下后，检查结果没有image.

docker build . -t speechgpt --network host
[+] Building 1213.7s (13/15)
=> [internal] load .dockerignore 0.0s
=> => transferring context: 206B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 932B 0.0s
=> [internal] load metadata for docker.io/library/nginx:alpine 1.7s
=> [internal] load metadata for docker.io/library/node:alpine 1.7s
=> [builder 1/6] FROM docker.io/library/node:alpine@sha256:53741c7511b1836b5eb7e788a7b399c058b0b549f205d2c6af831ec1a9a81c31 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 9.23kB 0.0s
=> [stage-1 1/4] FROM docker.io/library/nginx:alpine@sha256:dd2a9179765849767b10e2adde7e10c4ad6b7e4d4846e6b77ec93f080cd2db27 0.0s
=> CACHED [builder 2/6] WORKDIR /app 0.0s
=> CACHED [builder 3/6] COPY package.json yarn.lock ./ 0.0s
=> CACHED [builder 4/6] RUN yarn install 0.0s
=> CACHED [builder 5/6] COPY . . 0.0s
=> CACHED [stage-1 2/4] COPY nginx.conf /etc/nginx/nginx.conf 0.0s
=> CACHED [stage-1 3/4] WORKDIR /usr/share/nginx/html 0.0s
=> [builder 6/6] RUN yarn build 1364.3s
=> => # yarn run v1.22.19
=> => # $ tsc && vite build

Steps to reproduce

步骤同上

Screenshots or additional context

No response

可以加一个配置对话场景的页面么，

Is your feature request related to a problem?

Describe the solution you'd like

可以加一个配置对话场景的页面么，里面是内置的一些prompt 卡片的UI。用户可以自己添加卡片，也可以从外面导入。
复杂一点的话需要进一步分类，每一类下面才有详细的prompt。对于沉浸⑩，用考虑用类似chain of thought prompting的办法来实现。
prompt可以看看：https://myenglishdomain.com/chatgpt-prompts-for-language-learners/
https://github.com/f/awesome-chatgpt-prompts

Additional context

No response

返回结果中的markdown支持

无法正确的显示渲染后的md

补一句，质量很不错的项目 : )