Giter Site home page Giter Site logo

hahahumble / speechgpt Goto Github PK

View Code? Open in Web Editor NEW
2.7K 20.0 403.0 2.2 MB

💬 SpeechGPT is a web application that enables you to converse with ChatGPT.

Home Page: https://speechgpt.app

License: MIT License

HTML 0.57% JavaScript 0.37% TypeScript 96.46% CSS 2.15% Dockerfile 0.44%
chatbot chatgpt language-learning speech chat conversation

speechgpt's People

Contributors

ankitrout2903 avatar belm avatar chengfengfengwang avatar davidramos-om avatar hahahumble avatar misaka-9982-coder avatar rammiah avatar roiiiu avatar shfc avatar tryinggit avatar xamast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speechgpt's Issues

有没有人有Azure key可以借用一下

Is your feature request related to a problem?

有没有人有Azure key可以借用一下

Describe the solution you'd like

有没有人有Azure key可以借用一下

Additional context

No response

录制识别完成是否该在光标处插入?

Is your feature request related to a problem?

结合我的使用体验,我的想法是:如果一句话说错了,并且可以修改,应该删除错误内容,挪动光标在错误的地方重新插入。

Describe the solution you'd like

实现的效果如下:

2023-04-18.11.16.18.mov

Additional context

No response

Docker for ARM64 arch

Is your feature request related to a problem?

provide ARM64 docker version in dockerhub

Describe the solution you'd like

build arm64 docker and upload it to dockerhub.

Additional context

No response

Make docker image public

Describe the bug

Seems like the docker image requires login

Steps to reproduce

  1. logout from your user
  2. run docker pull speechgpt
  3. Observe the error message 'pull access denied for speechgpt'

Screenshots or additional context

No response

子路径配置支持

Is your feature request related to a problem?

想使用一个nginx域名代理多个服务,所需需要通过子路径(例如localhost/speechgpt)的方式访问,通过配置homepagebasename后发现还是需要访问/assets/xxx,请问需要如何修改呢?

Describe the solution you'd like

是否能通过增加配置文件,设置环境变量(Docker)的方式进行子路径配置?

Additional context

No response

There was an error with your request

Describe the bug

It use to be working, but today doesn't work and shown either "There was an error with your request" or " OpenAI error.." how can i de-bug it?

Steps to reproduce

It use to be working, but today doesn't work and shown either "There was an error with your request" or " OpenAI error.." how can i de-bug it?

Screenshots or additional context

It use to be working, but today doesn't work and shown either "There was an error with your request" or " OpenAI error.." how can i de-bug it?

Safari显示连接中

Describe the bug

家里两台移动设备,iPad Air 5和iPhone 13,都一样的现象

网络环境,正常wifi链接,连接透明网关负责科学上网

现象:点击录音按键,一直显示“等待中”或“连接中”,无其他错误

Steps to reproduce

  1. Open SpeechGPT on Safari
  2. Fill in token for OpenAI & Azure
  3. Click "Record"

Screenshots or additional context

IMG_7669

可以加一个配置对话场景的页面么,

Is your feature request related to a problem?

NO

Describe the solution you'd like

可以加一个配置对话场景的页面么,里面是内置的一些prompt 卡片的UI。用户可以自己添加卡片,也可以从外面导入。
复杂一点的话需要进一步分类,每一类下面才有详细的prompt。对于沉浸⑩,用考虑用类似chain of thought prompting的办法来实现。
prompt可以看看:https://myenglishdomain.com/chatgpt-prompts-for-language-learners/
https://github.com/f/awesome-chatgpt-prompts

Additional context

No response

API key not loaded in starting

Is your feature request related to a problem?

I have added settings as landing page so that we can load API key in the starting of the chat, as we can't find where to insert API key in starting, as user doesn't know that he has to insert API key in starting

Describe the solution you'd like

Made settings as landing instead of home chat page

Additional context

No response

add caption

Is your feature request related to a problem?

please add so that there is an explanation result as drawn
Screenshot (6)

Describe the solution you'd like

.

Additional context

No response

SpeechRecognition经常报错

第一次Record是ok的,大概2~3次之后就开始报错了
我猜测是不能无限次new SpeechRecognition(),应该是chrome的限制
我把第一次new的instance放到state里,之后每次都用同一个就解决了,不过我没法新建branch提PR所以麻烦作者亲自修复了

关于发送信息后一直在”等待中“的问题

你好,
我的API地址填的是https://api.openai.com/v1/chat/completions
网络环境无论是大陆还是非大陆,发送信息后均显示为“等待中”。
甚至尝试使用hyperbeam等纯非大陆网络环境,仍然显示“等待中”。
部署方式使用的Vercel并做好了DNS,也尝试部署在旧金山的VPS上,
也都是显示的是“等待中”。语音识别和语音合成的接口没问题。
求解救。

希望加上访问密码

Is your feature request related to a problem?

让系统更安全、控制token滥用

Describe the solution you'd like

在环境变量中加入网站访问密码

Additional context

No response

How to fill in the parameter when using vercel deployment?

Is your feature request related to a problem?

How to fill in the parameters for deploying services on vercel

Describe the solution you'd like

I don't have any except the VITE_OPENAI_API_KEY parameter

image

Additional context

No response

一些增强和优化建议

功能增强:
目前的状况总体上相当不错,但我希望能增加一些实用的功能,例如:

  • 自动识别语音输入的语言,并在用户停顿指定时长后将识别结果自动发送给GPT;
  • 在朗读功能中,不要忽略任何语言,而是在多语言环境中进行全面朗读,而不是跳过未在设置中指定的语言;
  • 在每个对话旁边增加一个“重新朗读”的按钮(如果能同时暂停和停止就更好了);
  • 支持PWA。

需要改进:
此外,语音输出功能的稳定性有待提高,有时候无法正常输出。

以上是我个人的一些建议,希望你能予以考虑,谢谢!

【优化】增加访问密码或前端自填API KEY

Is your feature request related to a problem?

1、可以在vercel的环境变量里设置网页访问密码
2、可以在部署的网站前端自填openai的api key

Describe the solution you'd like

1、可以在vercel的环境变量里设置网页访问密码
2、可以在部署的网站前端自填openai的api key

Additional context

No response

Can you add an access password feature?

Is your feature request related to a problem?

I hope you can set the access password, otherwise the token amount will be easily consumed once the page is discovered.

Describe the solution you'd like

I hope you can set the access password, otherwise the token amount will be easily consumed once the page is discovered.

Additional context

No response

语音合成服务必须连接外网吗?

Is your feature request related to a problem?

语音合成服务必须连接外网吗?

Describe the solution you'd like

不用梯子能不能使用语音合成服务

Additional context

No response

输入法键盘挡住

Is your feature request related to a problem?

在手机上使用speechgpt时,每次点record或输入文字时,输入法键盘都会跳出挡住文本框和record键,这样无法看见输入的文字,很难修改文字,不知有没有好的解决方法。小米手机上似乎没有设置可以解决这个问题。

Describe the solution you'd like

重新调整UI使得输入法键盘不会挡住文本框和record键。

Additional context

No response

Vocal trigger

Is your feature request related to a problem?

One of the main functions of SpeechGPT is to simulate oral conversations. It is even better than voice control for chatGPT and other similar plugins. However when I use it, I have to press the send or enter button frequently. It interrupts the natural flow of conversations, especially in Continuous Recognition mode.

Describe the solution you'd like

Add vocal triggers for sending the message such as "send it" or other personalized vocal triggers. Then SpeechGPT will create vivid conversation senarios.

Additional context

No response

Fail to deploy with environment variables

Describe the bug

I have deployed the project on Vercel and set the corresponding keys. However, it prompted that You don't have provided an OpenAI API Key.

Steps to reproduce

On Vercel's dashboard, I've set 4 keys as follows:

OPENAI_API_KEY: [my own key]
OPENAI_HOST: [my own address]
AZURE_REGION: eastus
AZURE_KEY: [my own key]

docker 配置环境变量无效

Describe the bug

Screenshot 2023-04-08 at 20 55 10

群晖docker拉取image, 然后添加了url和key然后在web界面输入问题提交,提示api key 不能为空。

然后在web页面输入key和api地址,依然无法获得响应。 可能是我使用了阿里云做了代理(参考项目Ice-Hazymoon / openai-scf-proxy ,但是在其他类似的chatgpt文字输入网页是没有任何问题的。

辛苦作者提供一些调试意见。

Steps to reproduce

Screenshot 2023-04-08 at 21 00 37

Screenshots or additional context

No response

是否支持流式阅读ChatGPT的回复

Is your feature request related to a problem?

因为配置语音合成账号还挺麻烦的,所以还没有尝试你们的方案。
目前ChatGPT是流式返回文字的,能否实时阅读流式返回的文字呢?而不是等一句话吐完了或者全部都合成完了再送去语音合成。

Describe the solution you'd like

像人类一样阅读屏幕上出现的文字。

Additional context

No response

Dokcer build 没报错,但是没有

Describe the bug

改了Dockfile 中的nginx监听端口,然后执行docker build, 出现如下后,检查结果没有image.

docker build . -t speechgpt --network host
[+] Building 1213.7s (13/15)
=> [internal] load .dockerignore 0.0s
=> => transferring context: 206B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 932B 0.0s
=> [internal] load metadata for docker.io/library/nginx:alpine 1.7s
=> [internal] load metadata for docker.io/library/node:alpine 1.7s
=> [builder 1/6] FROM docker.io/library/node:alpine@sha256:53741c7511b1836b5eb7e788a7b399c058b0b549f205d2c6af831ec1a9a81c31 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 9.23kB 0.0s
=> [stage-1 1/4] FROM docker.io/library/nginx:alpine@sha256:dd2a9179765849767b10e2adde7e10c4ad6b7e4d4846e6b77ec93f080cd2db27 0.0s
=> CACHED [builder 2/6] WORKDIR /app 0.0s
=> CACHED [builder 3/6] COPY package.json yarn.lock ./ 0.0s
=> CACHED [builder 4/6] RUN yarn install 0.0s
=> CACHED [builder 5/6] COPY . . 0.0s
=> CACHED [stage-1 2/4] COPY nginx.conf /etc/nginx/nginx.conf 0.0s
=> CACHED [stage-1 3/4] WORKDIR /usr/share/nginx/html 0.0s
=> [builder 6/6] RUN yarn build 1364.3s
=> => # yarn run v1.22.19
=> => # $ tsc && vite build

Steps to reproduce

步骤同上

Screenshots or additional context

No response

希望加入语音重听功能

Is your feature request related to a problem?

毕竟联系口语的初学者,很难一次听清😘

Describe the solution you'd like

在对话的内容加上重播按钮

Additional context

No response

Proxy代理支持

Is your feature request related to a problem?

可以支持一下本地代理,不然会有封号危险。

Describe the solution you'd like

可以支持一下本地代理,不然会有封号危险。

Additional context

No response

built-in functions

Is your feature request related to a problem?

I don't get it. Why all the built-in functions of mobile versions have been cancelled? And the only other option Azure Speech Recognition Service doesn't work.

Describe the solution you'd like

please revert the built-in functions of mobile versions

Additional context

No response

error

Describe the bug

openai key和azure key都是验证过没问题的,但是每次输入后反馈都是there was an error with your request

Steps to reproduce

1.使用chrome浏览器打开speechgpt.app
2.输入openai key和azure key
3。输入文字或语音
4。浏览器弹窗反馈there was an error with your request

Screenshots or additional context

No response

Aruze

Describe the bug

Aruze recognition service doesn't work (even with Aruze key), but Aruze synthesis works well.

Steps to reproduce

1.press the record button;
2."there was an error with Aruze recognition service."

Screenshots or additional context

No response

长对话记录会导致输入卡顿。

Describe the bug

如果对话记录变多, 输入变得卡顿,清空对话记录才恢复正常

Steps to reproduce

如果对话记录变多, 输入变得卡顿,清空对话记录才恢复正常

Screenshots or additional context

如果对话记录变多, 输入变得卡顿,清空对话记录才恢复正常

可以提供docker版本吗

Is your feature request related to a problem?

请问可以做一个docker版本提供使用吗

Describe the solution you'd like

需要docker版本方便部署。

Additional context

No response

Feature Request: A few suggestions to enhance speechgpt user experience

Is your feature request related to a problem?

My feature request is related to several problems I am experiencing while using the current version of the speechgpt. I am frustrated when:

  1. The keyboard remains visible even after completing my input, which takes up unnecessary screen space and makes it harder to read the chat.
  2. The keyboard still shows up while I interact with the assistant using speech recognition, which is unnecessary in that scenario and can be distracting.
  3. Many average users need clarification on setting the speech recognition/synthesis language and language ID. So, I prefer an easier way to do this through environment variables and let the average users use it more easily with default configurations.
  4. When the assistant generates a lengthy response, I have to wait for the honest answer to be developed before I can listen or read it. Streaming output for both text and TTS would make this process smoother and more enjoyable.
  5. I often want to replay the assistant's response or my input via TTS but cannot curate more so, which can be inconvenient when I need to review previous interactions.

Describe the solution you'd like

  • Hide the keyboard after the user completes input and show back again after ChatGPT completes the response. This repo: ddiu8081/chatgpt-demo achieved this well. You can look around it if you like.
  • Do not show the keyboard when the user interacts with the assistant via speech recognition
  • Ability to set default speech recognition/synthesis language & language ID via environment variables. (As many average users find setting these at first a few confusing)
  • Assistant response streaming output, if it is possible, + streaming TTS output (This is very helpful when the assistant generates a long response)
  • Ability to replay the assistant response or the other input via the TTS engine

Additional context

No response

ssml builder

Is your feature request related to a problem?

For use with Azure TTS, the payload is built by string. I'm not sure if speech recognition would generate some text that is not valid.

Describe the solution you'd like

It could be more robust by ways such as xmlbuilder to generate SSML/XML.

Additional context

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.