Giter Site home page Giter Site logo

text2video's Introduction

一个文本转视频的工具

这个工具可以将一段文本转换为视频,并保存到指定的本地,初衷是想实现小说的可视化阅读功能。

效果图如下:

效果图

实现原理

  • 将文本进行分段,现在没有想到好的办法,就是通过标点符号句号分段,分成一个个的句子
  • 通过句子生成图片,生成声音,图片开源的有很多,本方案采用 stable-diffusion,语言转文字使用 edge-tts
  • 通过大模型生成midjourney类的提示词,然后通过huggingface的模型生成图片
  • 在通过 opencv 将图片合并为视频,目前输出 mp4 格式的视频,句子作为字母贴到视频内容的底部区域。
  • 音频是一个有时间概念的东西,恰好可以通过音频控制一张画面的播放时长
  • 在通过 ffmpeg 将音频合并到原始视频中。

最终,一个有画面,有字幕,有声音的视频就出现了,咱们实现了一个 文本转视频

Docker 一键启动

docker-compose up --build

本地开发

开发时,需要安装的环境是 macOS python 3.10.12,其他环境可能存在兼容性问题,需要安装 ffmpeg

ffmpeg -version
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)

pip install -r requirements.txt

支持生成绘图提示词来提高绘图质量

需要配置 openai 的 api key,支持代理

OPEN_AI_API_KEY="your open ai api key"
OPEN_AI_BASE_URL="https://api.moonshot.cn/v1" # for moonshot demo

生成 huggingface api key

token 申请地址:https://huggingface.co/settings/tokens

因为,该项目中使用了 huggingface 上的开源文生图模型生成图片,中文生成图片效果不大好,因此,本项目对中文进行了翻译,感谢有道,直接使用有道翻译,比较方便。翻译后,生成图的质量有一定的提高。

token 可以写入到 .env 文件里面 API_TOKEN="your huggingface api token"

如果使用的 pollinations-ai ,则不填写 token 也可以,这个模型是使用的 ChatGPT 的 Dalle-2 模型。

安装 ffmpeg

因为视频合成声音需要

开始使用

python3.10 app.py
http://127.0.0.1:5001/

赞助

随意打赏,请备注 github 名 image

关注作者微信公众号,老码沉思录,与作者交流。 image

License: MIT

本项目采用 MIT 许可证授权。

text2video's People

Contributors

bravekingzhang avatar brzhang666 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text2video's Issues

启动报错Not Found', 'path': '/translate'

首先安装后缺失dotenv,然后我手动安装,启动时报以下错误:
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

经过查询,这个只是警告,但是我继续执行时报错:
127.0.0.1 - - [31/Mar/2024 10:48:11] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [31/Mar/2024 10:48:11] "GET /models HTTP/1.1" 200 -
127.0.0.1 - - [31/Mar/2024 10:48:11] "GET /favicon.ico HTTP/1.1" 404 -
zh-cn
{'timestamp': '2024-03-31T02:48:17.078+00:00', 'status': 404, 'error': 'Not Found', 'path': '/translate'}

Win中报错

错误日志如下:
zh-cn
{'type': 'ZH_CN2EN', 'errorCode': 0, 'elapsedTime': 0, 'translateResult': [[{'src': '白云朵朵', 'tgt': 'White clouds'}]]}
[2023-06-14 11:21:43,288] ERROR in app: Exception on /convert [POST]
Traceback (most recent call last):
File "C:\Users\17892\AppData\Roaming\Python\Python311\site-packages\flask\app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\17892\AppData\Roaming\Python\Python311\site-packages\flask\app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\17892\AppData\Roaming\Python\Python311\site-packages\flask\app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\17892\AppData\Roaming\Python\Python311\site-packages\flask\app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text2video\app.py", line 21, in convert_text_to_video
video_path = convertTextToVideo(validate_model(model), text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text2video\text_to_video.py", line 102, in convertTextToVideo
generateImage(model, sentence.strip())
File "D:\text2video\text_to_video.py", line 48, in generateImage
convert_text_to_speech(
File "D:\text2video\text_to_video.py", line 61, in convert_text_to_speech
result = subprocess.run(command, cwd=current_directory, timeout=10)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\subprocess.py", line 1024, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Python311\Lib\subprocess.py", line 1509, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

感觉是文字转音频失败了

使用docker启动后,转换时报错

192.168.65.1 - - [24/Apr/2024 02:11:47] "POST /convert HTTP/1.1" 500 -
app-1 | Traceback (most recent call last):
app-1 | File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1498, in call
app-1 | return self.wsgi_app(environ, start_response)
app-1 | File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1476, in wsgi_app
app-1 | response = self.handle_exception(e)
app-1 | File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1473, in wsgi_app
app-1 | response = self.full_dispatch_request()
app-1 | File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 882, in full_dispatch_request
app-1 | rv = self.handle_user_exception(e)
app-1 | File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 880, in full_dispatch_request
app-1 | rv = self.dispatch_request()
app-1 | File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 865, in dispatch_request
app-1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
app-1 | File "/app/app.py", line 21, in convert_text_to_video
app-1 | video_path = convertTextToVideo(validate_model(model), text)
app-1 | File "/app/text_to_video.py", line 129, in convertTextToVideo
app-1 | clear_folder("images")
app-1 | File "/app/text_to_video.py", line 106, in clear_folder
app-1 | for filename in os.listdir(folder_path):
app-1 | FileNotFoundError: [Errno 2] No such file or directory: 'images'

ModuleNotFoundError: No module named 'dotenv'

python3.10 app.py
Traceback (most recent call last):
File "/Users/wangwenjie/Documents/text2video/app.py", line 3, in
from text_to_video import convertTextToVideo
File "/Users/wangwenjie/Documents/text2video/text_to_video.py", line 7, in
from dotenv import load_dotenv
ModuleNotFoundError: No module named 'dotenv'

然后我执行了:pip install python-dotenv
还是不行

要支持设置代理啊

要梯子又没有设置代理的入口,还非得是python3.10.12,其它的我试过几个都报错了,现在还在报错

报错了

CompletedProcess(args=['edge-tts', '--voice', 'zh-CN-XiaoyiNeural', '--text', '天空蔚蓝', '--write-media', 'voices/1695236088-pollinations-ai.mp3', '--write-subtitles', 'voices/1695236088-pollinations-ai.mp3.vtt'], returncode=0)
voices/1695236088-pollinations-ai.mp3.vtt
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 1.706 seconds.
Prefix dict has been built successfully.
[2023-09-21 02:54:54,993] ERROR in app: Exception on /convert [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "app.py", line 21, in convert_text_to_video
video_path = convertTextToVideo(validate_model(model), text)
File "/root/tts/text2video/text_to_video.py", line 126, in convertTextToVideo
add_text_to_image(draw_text, image_path,
File "/root/tts/text2video/add_text_to_image.py", line 30, in add_text_to_image
if draw.textsize(current_line + word, font=font)[0] <= target_size[0] - 2 * padding:
AttributeError: 'ImageDraw' object has no attribute 'textsize'
127.0.0.1 - - [21/Sep/2023 02:54:55] "POST /convert HTTP/1.1" 500 -

FEATURE:新的特性提醒

  • 支持使用openai生成图片的提示词,图片的生成质量大幅提高,有经济实力的可以自己部署 ByteDance/SDXL-Lightning 。
  • 去掉 translate ,这个方式生成图片的效果不怎么好
  • 支持kimi,和配置openai key 以及 baseurl的方式类似
  • 支持Docker 一键启动,注意在.env 中配置相关的key

生成的图片

要是能让生成的图片有一致性就好了,不然前后图片风格都很不搭,看着很怪。

PIL或Pillow库的10.0.0版本中被移除了textsize

ImageDraw对象的textsize方法,在PIL或Pillow库的10.0.0版本中被移除了,我用textbbox方法来代替了

from PIL import Image, ImageDraw, ImageFont
import jieba

def add_text_to_image(text, image_path, text_color, background, padding=10, target_size=(640, 480), font_path="fonts/Hiragino Sans GB.ttc", font_size=16):
    # Open the image
    image = Image.open(image_path)

    # Resize the image to the target size while maintaining the aspect ratio
    image = image.resize(target_size)

    # Convert the image to RGBA mode
    image = image.convert("RGBA")

    # Create a new transparent image with the same size as the original image
    overlay = Image.new("RGBA", image.size, (0, 0, 0, 0))

    # Create a draw object for the overlay image
    draw = ImageDraw.Draw(overlay)

    # Load the font
    font = ImageFont.truetype(font_path, font_size)

    # Split the text into lines if it exceeds the available width
    lines = []

    # Split the text into individual Chinese characters
    words = [char for char in jieba.cut(text)]
    current_line = words[0]
    for word in words[1:]:
        if draw.textbbox((0, 0), current_line  + word, font=font)[2] <= target_size[0] - 2 * padding: # 修改这里
            current_line +=  word
        else:
            lines.append(current_line)
            current_line = word
    lines.append(current_line)

    # Calculate the height of the background rectangle based on the number of lines
    text_height = draw.textbbox((0, 0), lines[0], font=font)[3] # 修改这里

    if len(lines) == 1:
        box_height = text_height + padding * 2
    else:
        box_height = (text_height + padding) * len(lines) +  padding

    # Calculate the position of the background rectangle
    box_position = ((target_size[0] - draw.textbbox((0, 0), lines[0], font=font)
                    [2]) // 2 - padding, target_size[1] - box_height - padding) # 修改这里

    # Calculate the width of the background rectangle
    box_width = draw.textbbox((0, 0), lines[0], font=font)[2] + 2 * padding # 修改这里

    # Draw a rectangle with the specified background color and alpha
    draw.rectangle(
        (box_position, (box_position[0] + box_width, box_position[1] + box_height)), fill=background)

    # Calculate the starting y-position for drawing the text
    start_y = box_position[1] + padding

    # Draw the text on the overlay image
    for line in lines:
        text_width, text_height = draw.textbbox((0, 0), line, font=font)[2:] # 修改这里
        text_position = ((target_size[0] - text_width) // 2, start_y)
        print(line, text_position)
        draw.text(text_position, line, font=font, fill=text_color)
        start_y += text_height + padding

    # Paste the overlay image onto the original image using alpha composite
    image = Image.alpha_composite(image, overlay)

    # Convert the image back to RGB mode
    image = image.convert("RGB")

    # Save the resulting image
    image.save(image_path)

API

Hello ! Do you have an API interface

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.