Giter Site home page Giter Site logo

transpeller's Introduction

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

This is the official implementation of the paper. The code has been tested with Python version 3.6.8. Pre-trained checkpoint for fingerspelling is also released below.

Environment & checkpoints

  • pip install -r requirements.txt
  • Download the pre-trained Transpeller checkpoint:
  • Get the video features:
    • Follow the instructions on the BOBSL page to get the username and password to access parts of the BOBSL dataset.
    • cd features
    • sh download_features.sh username password
  • Get the annotations:
    • cd data/
    • sh download.sh username password. This is a fast download that obtains the manually verified test annotations and the automatically obtained annotations for the BOBSL episodes. For the automatic annotations, the ? in the word column indicates the fingerspelled word could not be determined by the automatic pseudolabeling method.

Reproducing the scores on the test set

python test.py --ckpt_path data/transpeller.pth --builder localizer_ctc --test_csv data/fingerspelling-data-bmvc2022/transpeller-test.csv --feat_root features/video-swin-s_c8697_16f_bs32/

The above run should give a CER of 53.1. You can also turn on the --full_word_test flag to compute CER with the full words, which should be 59.9.

Using Video-Swin as a feature extractor

We also release the pre-trained Video-Swin model which is used to extract the features mentioned above. The model has been trained on person crops of the BOBSL dataset. The model will work best if the signer crops are similar to that of the BOBSL signer crops. You can get the pre-trained checkpoint here. Below is a small example of how to use it:

from videoswin import SwinTransformer3D, VideoPreprocessing
from utils import load

# BOBSL person-crop video input
model = SwinTransformer3D()
model = load("video-swin-s.pth")[0]

vp = VideoPreprocessing()

clip = # read an *RGB* video clip with size (batch_size, 3, 16, 256, 256). This can be done using OpenCV, for example. 

clip = vp(clip) # (batch_size, 3, 16, 224, 224)

features = model(clip) # (batch_size, 768)

License and Citation

The code, models, and the released annotations are bound by the exact same licensing terms stated on the official BOBSL page.

Please cite the following paper if you use this repository:

@InProceedings{Prajwal22a,
  author       = "K R Prajwal and Hannah Bull and Liliane Momeni and Samuel Albanie and G{\"u}l Varol and Andrew Zisserman",
  title        = "Weakly-supervised Fingerspelling Recognition in British Sign Language Videos",
  booktitle    = "British Machine Vision Conference",
  year         = "2022",
  keywords     = "sign language, fingerspelling, bsl, bobsl",
}

transpeller's People

Contributors

prajwalkr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

transpeller's Issues

Username and Password

Hi,
Thanks for the nice work. I was wondering how can we get a username and password to download the data.

Unexpected CER Reproducing the scores on the test set

After following the steps described in the README file and downloading the provided data (I've downloaded the data as explained), I am obtaining a Character Error Rate (CER) of 0.92 instead of the expected 53.1 for the test.

I used the command provided :

python test.py --ckpt_path data/transpeller.pth --builder localizer_ctc --test_csv data/fingerspelling-data-bmvc2022/transpeller-test.csv --feat_root features/video-swin-s_c8697_16f_bs32/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.