Giter Site home page Giter Site logo

occr's Introduction

videocr

Extract hardcoded (burned-in) subtitles from videos using the PaddleOCR OCR engine with Python.

# example.py

from videocr import save_subtitles_to_file

if __name__ == '__main__':
    save_subtitles_to_file('example_cropped.mp4', 'example.srt', lang='ch', time_start='7:10', time_end='7:34',
     sim_threshold=80, conf_threshold=75, use_fullframe=True,
     brightness_threshold=210, similar_image_threshold=1000, frames_to_skip=1)

$ python3 example.py

example.srt:

0
00:07:10,000 --> 00:07:10,083
商城......现在没什么东西

1
00:07:10,416 --> 00:07:12,000
这边是战斗辅助系统

2
00:07:13,083 --> 00:07:14,500
要进去才能了解了

3
00:07:15,083 --> 00:07:15,916
没问题了吧

4
00:07:16,333 --> 00:07:17,166
我们准备登录

5
00:07:18,416 --> 00:07:21,083
啊对了, 登录没有服务器的选择么

6
00:07:21,333 --> 00:07:25,000
没有本游戏所有玩家, 都在个服务器内

7
00:07:25,833 --> 00:07:28,833
刺激了, 这么多玩家居然都不分流的么

8
00:07:29,500 --> 00:07:31,083
那......现在登录吗?

9
00:07:31,166 --> 00:07:32,416
好,登录吧!

Install prerequisites

  1. Python 3.7
  2. PaddleOCR
    • 2.0+ (Recommended): download the latest release from https://github.com/PaddlePaddle/PaddleOCR/releases, unzip and run python -m pip install -e . from the root project directory (pip does not appear to have latest version at the moment)
    • or 1.1: python -m pip install paddleocr==1.1.1
  3. PaddlePaddle - python -m pip install paddlepaddle or if you want to run OCR with a CUDA 9 or CUDA 10 GPU use python -m pip install paddlepaddle-gpu

Installation

  1. Clone or download and extract this repo
  2. From the root directory of this repository run python -m pip install -e .

Performance

The OCR process can be very slow on CPU. Running with paddlepaddle-gpu is recommended if you have a CUDA 9 or CUDA 10 GPU.

Tips

To shorten the amount of time it takes to perform OCR on each frame, you can use a tool such as ffmpeg to crop out only the areas of the videos where the subtitles appear. When cropping, leave a bit of buffer space above and below the text to ensure accurate readings.

Quick Configuration Cheatsheet

More Speed More Accuracy Notes
Prebuilt PaddleOCR Models Use default 'mobile' models Use 'server' models Running on CPU, 'server' models take significantly more time to run.
Input Video Quality Use lower quality Use higher quality Performance impact of using higher resolution video can be reduced with cropping
frames_to_skip Higher number Lower number
brightness_threshold Higher threshold N/A A brightness threshold can help speed up the OCR process by filtering out dark frames. In certain circumstances such as when subtitles are white and against a bright background, it may also help with accuracy.

API

  1. Return subtitle string in SRT format

    get_subtitles(
        video_path: str, lang='ch', time_start='0:00', time_end='',
        conf_threshold=75, sim_threshold=80, use_fullframe=False,
        det_model_dir=None, rec_model_dir=None,
        brightness_threshold=None, similar_image_threshold=100, similar_pixel_threshold=25, frames_to_skip=1)
  2. Write subtitles to file_path

    save_subtitles_to_file(
        video_path: str, file_path='subtitle.srt', lang='ch', time_start='0:00', time_end='', 
        conf_threshold=75, sim_threshold=80, use_fullframe=False,
        det_model_dir=None, rec_model_dir=None,
        brightness_threshold=None, similar_image_threshold=100, similar_pixel_threshold=25, frames_to_skip=1)

Parameters

  • lang

    The language of the subtitles.

  • conf_threshold

    Confidence threshold for word predictions. Words with lower confidence than this value will be discarded. The default value 75 is fine for most cases.

    Make it closer to 0 if you get too few words in each line, or make it closer to 100 if there are too many excess words in each line.

  • sim_threshold

    Similarity threshold for subtitle lines. Subtitle lines with larger Levenshtein ratios than this threshold will be merged together. The default value 80 is fine for most cases.

    Make it closer to 0 if you get too many duplicated subtitle lines, or make it closer to 100 if you get too few subtitle lines.

  • time_start and time_end

    Extract subtitles from only a clip of the video. The subtitle timestamps are still calculated according to the full video length.

  • use_fullframe

    By default, only the bottom third of each frame is used for OCR. You can explicitly use the full frame if your subtitles are not within the bottom third of each frame.

  • det_model_dir

    the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to ~/.paddleocr/det; 2. The path of a specific inference model, the model and params files must be included in the model path.

    Prebuilt detection models (including bigger/slower ones with better accuracy than the default mobile models) can be found here: https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#1-text-detection-model.

  • rec_model_dir

    the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to ~/.paddleocr/rec; 2. The path of a specific inference model, the model and params files must be included in the model path.

    Prebuilt recognition models (including bigger/slower ones with better accuracy than the default mobile models) can be found here: https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#2-text-recognition-model.

  • brightness_threshold

    If set, pixels whose brightness are less than the threshold will be blackened out. Valid brightness values range from 0 (black) to 255 (white). This can help improve accuracy when performing OCR on videos with white subtitles.

  • similar_image_threshold

    The number of non-similar pixels there can be before the program considers 2 consecutive frames to be different. If a frame is not different from the previous frame, then the OCR result from the previous frame will be used (which can save a lot of time depending on how fast each OCR inference takes).

  • similar_pixel_threshold

    Brightness threshold from 0-255 used with the similar_image_threshold to determine if 2 consecutive frames are different. If the difference between 2 pixels exceeds the threshold, then they will be considered non-similar.

  • frames_to_skip

    The number of frames to skip before sampling a frame for OCR. Keep in mind the fps of the input video before increasing.

TODO

  • parallel processing
  • handle multiple lines of text in the same frame
  • publish to pypi
  • commandline interface

occr's People

Contributors

soebb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.