Giter Site home page Giter Site logo

ocrvid's Introduction

ocrvid

PyPI Changelog Tests License

CLI tool to extract text from videos using OCR on macOS.

Note

Currently, this tool only tested and works on macOS 13 or later.

Caution

This tool is still in early development stage. Current v0.x releases are not stable and may have breaking changes.

Installation

Install this tool using pip:

pip install ocrvid

Usage

Usage: ocrvid [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  detect  Run OCR on a single picture, and print the results as json
  langs   Show supported recognition languages
  props   Show properties of video file
  run     Run OCR on a video, and save result as a json file

Run OCR on a video

Use ocrvid run sub command to run ocr on a video file:

Usage: ocrvid run [OPTIONS] INPUT_VIDEO

  Run OCR on a video, and save result as a json file

Options:
  -o, --output FILE            Path to output json file. By default, if you run
                               `ocrvid run some/video.mp4` then the output file
                               will be `./video.json`
  -fd, --frames-dir DIRECTORY  If passed, then save video frames to this
                               directory. By default, frames are not saved.
  -fs, --frame-step INTEGER    Number of frames to skip between each frame to be
                               processed. By default, 100 which means every 100
                               frames, 1 frame will be processed.
  -bs, --by-second FLOAT       If passed, then process 1 frame every N seconds.
                               This option relies on fps metadata of the video.
  -l, --langs TEXT             Prefered languages to detect, ordered by
                               priority. See avalable languages run by `ocrvid
                               langs`. If not passed, language is auto detected.
  --help                       Show this message and exit.

For example, run against the test video file at tests/video/pexels-eva-elijas.mp4 in this repo:

ocrvid run tests/video/pexels-eva-elijas.mp4

Then pexels-eva-elija.json is generated in the current directory which looks like this:

{
    "video_file":"tests/video/pexels-eva-elijas.mp4",
    "frames":[
        {
            "frame_index":0,
            "results":[
                {
                    "text":"INSPIRING WORDS",
                    "confidence":1.0,
                    "bbox":[
                        0.17844826551211515,
                        0.7961793736859821,
                        0.3419540405273438,
                        0.10085802570754931
                    ]
                },
                {
                    "text":"\"Foar kills more dre",
                    "confidence":1.0,
                    "bbox":[
                        0.0724226723609706,
                        0.6839455987759758,
                        0.4780927975972494,
                        0.14592710683043575
                    ]
                },
                {
                    "text":"than failure ever",
                    "confidence":1.0,
                    "bbox":[
                        0.018455287246445035,
                        0.6549868414269003,
                        0.45329265594482426,
                        0.14363905857426462
                    ]
                },
                {
                    "text":"IZY KASSEM",
                    "confidence":0.5,
                    "bbox":[
                        -0.015967150208537523,
                        0.6675747977206025,
                        0.23065692583719888,
                        0.08114868486431293
                    ]
                },
                {
                    "text":"Entrepreneur",
                    "confidence":1.0,
                    "bbox":[
                        0.01941176222542875,
                        0.1353812367971159,
                        0.9058370590209961,
                        0.26137274083956863
                    ]
                }
            ]
        },
...

Show supported languages

You can run ocrvid langs to show supported languages to detect. Results may change depending on running macos version.

On macOS version:

platform.mac_ver()[0]='14.2.1'

Result of ocrvid langs:

en-US
fr-FR
it-IT
de-DE
es-ES
pt-BR
zh-Hans
zh-Hant
yue-Hans
yue-Hant
ko-KR
ja-JP
ru-RU
uk-UA
th-TH
vi-VT

How can I run OCR on YouTube videos?

Take a look at yt-dlp.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd ocrvid
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

make test

ocrvid's People

Contributors

kj-9 avatar

Stargazers

 avatar

Watchers

 avatar

ocrvid's Issues

custom path option errors

ocrvid run data/videos/i0o52_x_oFU/video.mp4 -o data/videos/i0o52_x_oFU/video.json
⚠️  ocrvid is already on your PATH and installed at /Users/kh03/.pyenv/shims/ocrvid. Downloading and running anyway.
Traceback (most recent call last):
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/bin/ocrvid", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kh03/.local/pipx/.cache/025c4983c76e437/lib/python3.12/site-packages/ocrvid/cli.py", line 56, in run_ocr
    frames_dir = Path.cwd() / ".ocrvid/frames" / output_file.stem
                                                 ^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'output_file' where it is not associated with a value

`--frame-rate` should be renamed

currently, --frame-rate is "Number of frames to skip between each frame to be processed".

but In common sense, frame rate refers to the frequency at which consecutive images or frames are displayed or captured in a video sequence. eg. 120 fps reperesents quality of video, 120 frame per second.

so shoud be renamed.

change default behavior

if I run:

ocrvid some.mp4

should yields:

  • frame pictures at ./.ocrvid/frames/*.png
  • result video json as ./some.json

delete youtube related features

I found yt-dlp covers rich features around youtube.

you can also fetch playlist metadata:

yt-dlp --flat-playlist -J $PLAYLYST_ID"  | jq . > data/playlist.json

so delete these features from ocrvid.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.