Giter Site home page Giter Site logo

corrnet's Introduction

CorrNet_CSLR

This repo holds codes of the paper: Continuous Sign Language Recognition with Correlation Network. (CVPR 2023) [paper]

This repo is based on VAC (ICCV 2021). Many thanks for their great work!

Prerequisites

  • This project is implemented in Pytorch (better >=1.13 to be compatible with ctcdecode or these may exist errors). Thus please install Pytorch first.

  • ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.

  • [Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite: mkdir ./software ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite

    You may use the python version evaluation tool for convenience (by setting 'evaluate_tool' as 'python' in line 16 of ./configs/baseline.yaml), but sclite can provide more detailed statistics.

  • You can install other required modules by conducting pip install -r requirements.txt

Implementation

The implementation for the CorrNet (line 18) is given in ./modules/resnet.py.

It's then equipped with the BasicBlock in ResNet in line 58 ./modules/resnet.py.

We later found that the Identification Module with only spatial decomposition could perform on par with what we report in the paper (spatial-temporal decomposition) and is slighter faster, and thus implement it as such.

Data Preparation

You can choose any one of following datasets to verify the effectiveness of CorrNet.

PHOENIX2014 dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess.py --process-image --multiprocessing

PHOENIX2014-T dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-T.py --process-image --multiprocessing

CSL dataset

  1. Request the CSL Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-CSL.py --process-image --multiprocessing

CSL-Daily dataset

  1. Request the CSL-Daily Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL-Daily

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-CSL-Daily.py --process-image --multiprocessing

Inference

PHOENIX2014 dataset

Backbone Dev WER Test WER Pretrained model
ResNet18 18.8% 19.4% [Baidu] (passwd: skd3)
[Google Drive]

We wrongly delete the original checkpoint and retrain the model with similar accuracy (Dev: 18.9%, Test: 19.7%)

PHOENIX2014-T dataset

Backbone Dev WER Test WER Pretrained model
ResNet18 18.9% 20.5% [Baidu] (passwd: deuq)
[Google Drive]

CSL-Daily dataset

To evaluate upon CSL-Daily with this checkpoint, you should remove the CorrNet block after layer2, i.e., comment line 102 and 145 in resnet.py and change the num from 3 to 2 in line 105, change self.alpha[1] & self.alpha[2] to self.alpha[0] & self.alpha[1] in line 147 & 149, respectively.

Backbone Dev WER Test WER Pretrained model
ResNet18 30.6% 30.1% [Baidu] (passwd: u2iv)
[Google Drive]

​ To evaluate the pretrained model, choose the dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml first, and run the command below:
python main.py --device your_device --load-weights path_to_weight.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:

python main.py --device your_device

Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.

For CSL-Daily dataset, You may choose to reduce the lr by half from 0.0001 to 0.00005, change the lr deacying rate (gamma in the 'optimizer.py') from 0.2 to 0.5, and disable the temporal resampling strategy (comment line 121 in dataloader_video.py).

Visualizations

For Grad-CAM visualization, you can replace the resnet.py under "./modules" with the resnet.py under "./weight_map_generation", and then run python generate_cam.py with your own hyperparameters.

Citation

If you find this repo useful in your research works, please consider citing:

@inproceedings{hu2023continuous,
  title={Continuous Sign Language Recognition with Correlation Network},
  author={Hu, Lianyu and Gao, Liqing and Liu, Zekang and Feng, Wei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023},
}

corrnet's People

Contributors

hulianyuyy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.