Giter Site home page Giter Site logo

clip-video-encode's Introduction

clip-video-encode

pypi Open In Colab Try it on gitpod

Easily compute clip embeddings from video frames.

Install

Using pip:

pip install clip-video-encode

Or build from source:

python setup.py install

Usage

NAME
    clip-video-encode - Encode frames using CLIP image encoder

SYNOPSIS
    clip-video-encode SRC <flags>

DESCRIPTION
    Input:
      src:
        str: path to mp4 file
        str: youtube link
        str: path to txt file with multiple mp4's or youtube links
        list: list with multiple mp4's or youtube links
      dest:
        str: directory where to save embeddings to
        None: dest = src + .npy
      output_format:
        str: "files" or "webdataset"
      take_every_nth:
        int: only take every nth frame
      frame_workers:
        int: number of Processes to distribute video reading to.
      frame_memory_size:
        int: GB of memory for FrameReader.
      metadata_columns:
        str: a comma separated list of metadata column names to look for in src
      use_dst_name:
        bool: use the save name suggested by video2numpy
      distribute:
        str: distribution strategy, currently either slurm or none
      oc_model_name:
        str: open_clip model name, used for selecting CLIP architecture
      pretrained:
        str: open_clip pretrained weights name

POSITIONAL ARGUMENTS
    SRC

FLAGS
    --dest=DEST
        Default: ''
    --output_format=OUTPUT_FORMAT
        Default: 'files'
    --take_every_nth=TAKE_EVERY_NTH
        Default: 1
    --frame_workers=FRAME_WORKERS
        Default: 1
    --frame_memory_size=FRAME_MEMORY_SIZE
        Default: 4
    --metadata_columns=METADATA_COLUMNS
        Default: ''
    --use_dst_name=USE_DST_NAME
        Default: False
    --distribute=DISTRIBUTE
        Default: 'none'
    --oc_model_name=OC_MODEL_NAME
        Default: 'ViT-B-32'
    --pretrained=PRETRAINED
        Default: 'laion2b_s34b_b79k'

API

This module exposes a single function clip_video_encode which takes the same arguments as the command line tool:

import glob
from clip_video_encode import clip_video_encode

VIDS = glob.glob("some/path/my_videos/*.mp4")
EMBEDDING_DIR = "some/path/my_embeddings"
take_every_5 = 5

clip_video_encode(VIDS, EMBEDDING_DIR, take_every_5)

Who is using clip-video-encode?

  • CLIP-Kinetics700 - The Kinetics700 dataset (700GB) can be compressed to ~8GB using clip-video-encode at 1 FPS
  • CLIP-WebVid - The WebVid dataset (10M videos) encoded as CLIP ViT-B/32 embeddings at 1 FPS.

Examples

Check out some cool clip-video-encode examples:

  • Thing detector - Look for things in videos using clip-video-encode generated embeddings.
  • Large dataset processing - If you want to process a large dataset (like WebVid) into CLIP embeddings see the example at the bottom of the linked README.md.

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -e .

to run tests:

pip install -r requirements-test.txt

then

make lint
make test

You can use make black to reformat the code

python -m pytest -x -s -v tests -k "dummy" to run a specific test

clip-video-encode's People

Contributors

danielmend avatar iejmac avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.