Giter Site home page Giter Site logo

kedreamix / paddleavatar Goto Github PK

View Code? Open in Web Editor NEW
120.0 2.0 20.0 1.46 MB

你是否曾经幻想过与自己的虚拟人交互?现在,使用PaddleAvatar,您可以将自己的图像、音频和视频转化为一个逼真的数字人视频,与其进行人机交互。 PaddleAvatar是一种基于PaddlePaddle深度学习框架的数字人生成工具,基于Paddle的许多套件,它可以将您的数字图像、音频和视频合成为一个逼真的数字人视频。除此之外,PaddleAvatar还支持进一步的开发,例如使用自然语言处理技术,将数字人视频转化为一个完整的人机交互系统,使得您能够与虚拟的自己进行真实的对话和互动。 使用PaddleAvatar,您可以将数字人视频用于各种场合,例如游戏、教育、虚拟现实等等。PaddleAvatar为您提供了一个自由创作的数字世界,让您的想象力得到了充分的释放!

Python 23.32% Jupyter Notebook 76.68%
paddlepaddle

paddleavatar's Introduction

“数字人交互,与虚拟的自己互动”——用PaddleAvatar打造数字分身,探索人机交互的未来

GitHub Repo stars

你是否曾经幻想过与自己的虚拟人交互?现在,使用PaddleAvatar,您可以将自己的图像、音频和视频转化为一个逼真的数字人视频,与其进行人机交互。

PaddleAvatar是一种基于PaddlePaddle深度学习框架的数字人生成工具,基于Paddle的许多套件,它可以将您的数字图像、音频和视频合成为一个逼真的数字人视频。除此之外,PaddleAvatar还支持进一步的开发,例如使用自然语言处理技术,将数字人视频转化为一个完整的人机交互系统,使得您能够与虚拟的自己进行真实的对话和互动。

使用PaddleAvatar,您可以将数字人视频用于各种场合,例如游戏、教育、虚拟现实等等。PaddleAvatar为您提供了一个自由创作的数字世界,让您的想象力得到了充分的释放!

所以,现在就使用PaddleAvatar,打造自己的数字分身,探索人机交互的未来吧!

更新

🔥🔥🔥🔥🔥🔥2023.12 已经成功加入语音识别和类GPT的对话系统,这一部分已经放在了另一个Github中,大家可以随时follow,用pytorch也会更加的稳定,得到更好的效果

同时b站也有简单的讲解视频,大家代码和教程都可以自取

https://github.com/Kedreamix/Linly-Talker

🪀 环境说明

  • Anaconda Anconda
  • Python 3.8 python38
  • paddlepaddle paddlepaddle

⚙️ 1. 安装环境

我们需要安装paddlepaddle环境,环境的安装,可以看这里paddlepaddle安装,我自己安装的版本是2.3.2,应该2.4也是能正常运行的

conda install paddlepaddle-gpu==2.3.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge

安装了paddlepaddle以后,可以安装所需要和对应的库,我已经写入requirements.txt中了

pip install -r requirements.txt

😀 2.PaddleAvator技术原理

PaddleAvator

🔮 3.网页部署(Streamlit)

这里可以使用streamlit进行网页端的部署,这样就利用可视化进行体验,这里可以看到有一个8501的端口,打开来即可看到网页

streamlit run avatar.streamlit.py

在这里插入图片描述

对于TTS,文字转语音来说,我设置了两种方式

  • PaddleSpeech语音合成,可选择多种声音和语调,可以调节语言和人
  • Azure微软语音合成,调用微软的API进行语音合成,不过需要填入密钥(这里不提供)

🔥 4.未来展望(人机交互)

我设计了一个基于自然语言处理、语音和图像处理等人工智能技术的人机交互系统。该系统致力于实现高度逼真的数字人多模态交互,以提供更加自然和亲密的用户体验。如图所示,该系统由四个核心模块组成:

(1) 自动语音识别(ASR)模块,用于将用户的语音输入转化为文本信息。

(2) 对话系统(DS),用于接收ASR模块输出的文本信息,并进行对话处理。

(3) 文本到语音(TTS)模块,用于将DS模块输出的文本信息转化为高度逼真的语音信息。

(4) 数字人生成模块,用于预处理模型输入的图片和视频,以提取面部特征。接下来,该模型利用TTS模块将低维语音信号映射到高维视频信号,包括嘴巴、表情和动作等。最后,该模型使用神经网络来融合特征和多模态输出视频,并将其在客户端上显示。

HcI system

🎯 TO DO LIST

在本仓库 https://github.com/Kedreamix/PaddleAvatar 之中,已经实现了第3和第4个模块,但是离完整的人机交互系统差一部分,所以这一部分还可以继续努力

  • 加入表情迁移(丰富头部动作信息)
  • 实时语音识别(人与数字人之间就可以通过语音进行对话交流)
  • 语音克隆技术(语音克隆合成自己声音,提高数字人分身的真实感和互动体验)
  • 类GPT对话系统(提高数字人的交互性和真实感,增强数字人的智能)

✨ AIstudio在线体验

不过似乎AIstudio的应用部署加载好久,不知道有什么快的方法

https://aistudio.baidu.com/aistudio/projectdetail/6154230

📑 参考资料

paddleavatar's People

Contributors

kedreamix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

paddleavatar's Issues

error: (-215:Assertion failed) !ssize.empty() in function 'resize'

2023-06-22 08:39:51.174 Uncaught app exception
Traceback (most recent call last):
File "/data/conda/avatar/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/data/conda/avatar/PaddleAvatar/avatar.streamlit.py", line 169, in
fom(face.name,'zimeng.mp4')
File "/data/conda/avatar/PaddleAvatar/avatar.streamlit.py", line 125, in fom
fom_predictor.run(input_face, driving_video)
File "/data/conda/avatar/lib/python3.8/site-packages/ppgan/apps/first_order_predictor.py", line 190, in run
face_image = cv2.resize(face_image, (self.image_size, self.image_size)) / 255.0
cv2.error: OpenCV(4.7.0) /io/opencv/modules/imgproc/src/resize.cpp:4062: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

Got a problem, "The registered buffer should be a Paddle.Tensor, but received Variable. ", how to fix it

[12/18 16:15:09] ppgan INFO: Found /home/wmao/.cache/ppgan/vox-cpk.pdparams
2023-12-18 16:15:09.347 Uncaught app exception
Traceback (most recent call last):
File "/home/wmao/.local/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "/data/PaddleAvatar/avatar.streamlit.py", line 169, in
fom(face.name,'zimeng.mp4')
File "/data/PaddleAvatar/avatar.streamlit.py", line 119, in fom
fom_predictor = FirstOrderPredictor(filename = output,
File "/home/wmao/.local/lib/python3.9/site-packages/ppgan/apps/first_order_predictor.py", line 114, in init
self.generator, self.kp_detector = self.load_checkpoints(
File "/home/wmao/.local/lib/python3.9/site-packages/ppgan/apps/first_order_predictor.py", line 225, in load_checkpoints
generator = OcclusionAwareGenerator(
File "/home/wmao/.local/lib/python3.9/site-packages/ppgan/models/generators/occlusion_aware.py", line 35, in init
self.dense_motion_network = DenseMotionNetwork(
File "/home/wmao/.local/lib/python3.9/site-packages/ppgan/modules/dense_motion.py", line 93, in init
self.down = AntiAliasInterpolation2d(num_channels,
File "/home/wmao/.local/lib/python3.9/site-packages/ppgan/modules/first_order.py", line 457, in init
self.register_buffer('weight', kernel)
File "/home/wmao/.local/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1075, in register_buffer
raise TypeError(
TypeError: The registered buffer should be a Paddle.Tensor, but received Variable.

大佬,这个问题怎么解决?

ModuleNotFoundError: No module named 'paddle.nn.layer.layers'
Traceback:
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 562, in _run_script
exec(code, module.dict)
File "/home/aistudio/PaddleAvatar/PaddleAvatar/avatar.streamlit.py", line 10, in
from paddlespeech.cli.tts import TTSExecutor
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/cli/tts/init.py", line 14, in
from .infer import TTSExecutor
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/cli/tts/infer.py", line 33, in
from paddlespeech.t2s.exps.syn_utils import get_am_inference
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/t2s/exps/syn_utils.py", line 35, in
from paddlespeech.t2s.frontend.mix_frontend import MixFrontend
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/t2s/frontend/mix_frontend.py", line 22, in
from paddlespeech.t2s.frontend.zh_frontend import Frontend
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/t2s/frontend/zh_frontend.py", line 31, in
from paddlespeech.t2s.frontend.g2pw import G2PWOnnxConverter
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/t2s/frontend/g2pw/init.py", line 1, in
from .onnx_api import G2PWOnnxConverter
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlespeech/t2s/frontend/g2pw/onnx_api.py", line 28, in
from paddlenlp.transformers import BertTokenizer
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/init.py", line 35, in
from . import (
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/data/init.py", line 18, in
from .data_collator import *
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/data/data_collator.py", line 26, in
from ..transformers import BertTokenizer
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/transformers/init.py", line 17, in
from .model_utils import PretrainedModel, register_base_model
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/transformers/model_utils.py", line 68, in
from ..generation import GenerationConfig, GenerationMixin
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/generation/init.py", line 15, in
from .logits_process import (
File "/home/aistudio/.data/webide/pip/lib/python3.7/site-packages/paddlenlp/generation/logits_process.py", line 23, in
from paddle.nn.layer.layers import in_declarative_mode

PaddleSpeech语音合成

语音合成中选择PWGan,输入文本内容无效,合成的语音内容仍然是“你好,我是数字人分身,很高兴认识大家!”;在高质量男声音色中,支持输入文本内容,是bug还是仅支持?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.