vipl-slp / vac_cslr Goto Github PK

Visual Alignment Constraint for Continuous Sign Language Recognition. ( ICCV 2021)

Home Page: https://openaccess.thecvf.com/content/ICCV2021/html/Min_Visual_Alignment_Constraint_for_Continuous_Sign_Language_Recognition_ICCV_2021_paper.html

License: Apache License 2.0

Python 98.32% Shell 1.68%

continuous-sign-language sign-language-recognition sequence-learning

vac_cslr's Introduction

VAC_CSLR

This repo holds the code of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]

Update (2022.05.14)

In recent experiments, we found an implementation improvement about the proposed method. In our early experiments, we adopt nn.DataParallel to parallel the visual feature extractor on multiple GPUs. However, only statistic updated on device 0 is kept during training (Dataparallel), which leads to unstable training results (results may be different when adopting different numbers of GPUs and batch sizes). Therefore, we adopt syncBN in this update, the training schedule can be shorten to 40 epochs, and the relevant results are also provided. Experimental results on other datasets will be provided in our future journal version.

from modules.sync_batchnorm import convert_model

def model_to_device(self, model):
    model = model.to(self.device.output_device)
    if len(self.device.gpu_list) > 1:
        model.conv2d = nn.DataParallel(
            model.conv2d,
            device_ids=self.device.gpu_list,
            output_device=self.device.output_device)
    model = convert_model(model)
    model.cuda()
    return model

With the provided code, the updated results are expected as:

Backbone	WER on Dev	WER on Test	Pretrained model
ResNet18 (baseline)	23.8	25.4	[Baidu] [GoogleDrive]
ResNet18+VAC (CTC only)	21.5	22.1	[Baidu] [GoogleDrive]
ResNet18+VAC+SMKD	19.8	20.5	[Baidu] [GoogleDrive]

The VAC result is corresponding to the setting ofloss_weights: SeqCTC: 1.0, ConvCTC: 1.0. In addition to that, the VAC+SMKD adopt the setting of model_args: share_classifier: True, weight_norm: True.

If you find this repo useful in your research works, please consider cite our papers VAC and SMKD.

Prerequisites

This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode]，for beam search decode.
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics.
[Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.

Data Preparation

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python data_preprocess.py --process-image --multiprocessing
```

Inference

We provide the pretrained models for inference, you can download them from:

Backbone	WER on Dev	WER on Test	Pretrained model
ResNet18	21.2%	22.3%	[Baidu] (passwd: qi83) [Dropbox]

To evaluate the pretrained model, run the command below：
python main.py --load-weights resnet18_slr_pretrained.pt --phase test

(When evaluating the SMKD pretrained model, please modify the weight_norm and share_classifier in config files as True).

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:

python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS

Feature Extraction

We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:

python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features

To Do List

Pure python implemented evaluation tools.
WAR and WER calculation scripts.

Citation

If you find this repo useful in your research works, please consider citing:

@InProceedings{Min_2021_ICCV,
    author    = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
    title     = {Visual Alignment Constraint for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11542-11551}
}

Self-Mutual Distillation Learning for Continuous Sign Language Recognition [paper]

@InProceedings{Hao_2021_ICCV,
    author    = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
    title     = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11303-11312}
}

Acknowledge

We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)

vac_cslr's People

Contributors

Stargazers

Watchers

Forkers

tanyu1102 shivani-b07 subburajs shungjhon baris-unver hw140701 hanchenchen anusornc kimx3966 icute123 mbencherif ethio-artifical bupabupa rio-nyx dzy-cxy anuachu1128 wayenvan mohankrishna12

vac_cslr's Issues

ctcdecode

hello ,i have a question.can ctcdecode apply in windows system
thankyou

Finetuning and continue training

Hello, Thank you for the awesome work. I am trying to use the model on another dataset, so I figure I should structure my data accordingly to the format of phoenix2014. Is there anything else I should worry about or just running the preprocessing with the same structure is gonna be alright?

Also, since I am training on google colab, I won't be able to train for 80 epochs consecutively and plan to split it into several different runs. Is there a built in function to load the previous model and continue training (or finetuning, if I want to finetune the pretrain) or how should I begin to tackle this problem? I am not sure if --load-weights tag is enough. Thank you so much.

ctcdecode does not support windows, is there any other decoder alternative

ctcdecode does not support windows, is there any other library that can be used instead?

Hardware and Software Specifications for this research.

I appreciate the work you have done. Can you tell us about the hardware and software specifications used to carry out this research? What Python libraries do you use? Provide detailed requirements

I am very grateful also you want to share it.

Thank You

Issue about alignment between label and frames.

Thanks for your great job. I'm wondering how to draw a picture like Fig.5 in your paper. The key point lies in how to align labels with frames. Could you provide some advice? Thanks in advance!

Training the baseline (without VAC, SMKD)

The VAC result is corresponding to the setting ofloss_weights: SeqCTC: 1.0, ConvCTC: 1.0. In addition to that, the VAC+SMKD adopt the setting of model_args: share_classifier: True, weight_norm: True.

With the default setting, is it training for the baseline?

hello，How to use this model to detect continuous sign language

Pseudo Label

I'm wondering how to assign labels for frames with CTC loss. It seems CTC Loss can be viewed as sequential SoftMax losses. But the key point is how to obtain the pseudo labels for frames via back propagation. Thanks in advance!

Question about CPU or GPU error

I ran your code and found the following error, where are the parameters put into the GPU?

Traceback (most recent call last):
File "main.py", line 218, in
processor.start()
File "main.py", line 46, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,self.device, epoch, self.recoder)
File "/home/quchunguang/sunday/CSLR/seq_scripts.py", line 24, in seq_train
loss = model.criterion_calculation(ret_dict, label, label_lgt)
File "/home/quchunguang/sunday/CSLR/slr_network.py", line 96, in criterion_calculation
label_lgt.cpu().int()).mean()
File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1295, in forward
self.zero_infinity)
File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/functional.py", line 1767, in ctc_loss
zero_infinity)
RuntimeError: Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)

Detailed code running steps

Hello, author. Could you please provide a specific code execution process, such as which code to run in the first step. I really want to reproduce your code, but my ability is really insufficient. I sincerely hope you can help me。

Visualizing the predicted alignments

Thank you so much for releasing your code!

I'm trying to visualize the predicted alignments but the timings in the out.output-hypothesis-dev.ctm.sgml and the out.output-hypothesis-test.ctm.sgml files don’t make sense. The timings t1+t2 indicate the start time + end time? This means that the gloss es overlap. Also I get the same timings for all the samples e.g.
0.000+0.010
0.010+0.030
0.020+0.050
0.030+0.07

Example from the out.output-hypothesis-test.ctm.sgml file:

<SYSTEM title="./work_dir/baseline_res18_SD_VAC_Phoenix/out.output-hypothesis-test.ctm" ref_fname="./work_dir/baseline_res18_SD_VAC_Phoenix/tmp.stm" hyp_fname="./work_dir/baseline_res18_SD_VAC_Phoenix/out.output-hypothesis-test.ctm" creation_date="Sun Dec 11 22:13:30 2022" format="2.4" frag_corr="FALSE" opt_del="FALSE" weight_ali="FALSE" weight_filename="">
<SPEAKER id="signer04">
<PATH id="(signer04-000)" word_cnt="6" file="01april_2010_thursday_heute_default-5" channel="1" sequence="0" R_T1="0.000" R_T2="inf" word_aux="h_t1+t2">
C,"aber","aber",0.000+0.010:S,"freuen","woche",0.010+0.030:C,"morgen","morgen",0.020+0.050:C,"sonne","sonne",0.030+0.070:S,"selten","kaum",0.040+0.090:C,"regen","regen",0.050+0.110
</PATH>
<PATH id="(signer04-001)" word_cnt="7" file="01april_2010_thursday_tagesschau_default-7" channel="1" sequence="1" R_T1="0.000" R_T2="inf" word_aux="h_t1+t2">
C,"samstag","samstag",0.000+0.010:C,"wechselhaft","wechselhaft",0.010+0.030:C,"besonders","besonders",0.020+0.050:C,"freundlich","freundlich",0.030+0.070:D,"nordost",,:S,"bisschen","nord",0.040+0.090:S,"bereich","ix",0.060+0.130
</PATH>
<PATH id="(signer04-002)" word_cnt="7" file="01april_2010_thursday_tagesschau_default-8" channel="1" sequence="2" R_T1="0.000" R_T2="inf" word_aux="h_t1+t2">
C,"sonntag","sonntag",0.000+0.010:C,"regen","regen",0.010+0.030:C,"teil","teil",0.020+0.050:C,"gewitter","gewitter",0.030+0.070:C,"suedost","suedost",0.040+0.090:D,"durch",,:C,"regen","regen",0.050+0.110
</PATH>

Gloss Segment Boundary Assignment Algorithm (GSBA)

Thank you for your great work!
I'm wondering where the GSBA from your paper "Self-Mutual Distillation Learning for Continuous Sign Language Recognition" is applied in the code.

Thanks again!

Visualize Epoch vs Loss

Haloo,
Is there a way in the your code where you can plot Epoch vs Loss?

Time to train

Hello, great work with this paper and repo!
I would like to ask you how much time you spent training the model (for the dataset Phoenix12) and what kind gpu you used for the training. Because I am trying to replicate it but with other dataset (specificly the Phoenix14-T), and in my first test I spent around 14h to train 10 epochs. I used a TitanXP with 12Gb for the training and a batch = 1.

Thank you again for your work and congratulation for this repo.

unable to reconize any word but the loss is decreasing???

hello, i get an error on the training phase The loss is decreasing but when i evaluate the model it doesn't recognize any word i get 100 always.
i install pytorch 1.13.0
python 3.10.13
ctcdecode-1.0.3

this is my log file

Sat Jan 27 01:36:04 2024 ] Parameters:

{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'python', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 8, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 20}

[ Sat Jan 27 01:36:31 2024 ] Epoch: 0, Batch(0/122) done. Loss: 110.28868103 lr:0.000100
[ Sat Jan 27 01:38:26 2024 ] Epoch: 0, Batch(50/122) done. Loss: 13.18387794 lr:0.000100
[ Sat Jan 27 01:40:25 2024 ] Epoch: 0, Batch(100/122) done. Loss: 12.18678570 lr:0.000100
[ Sat Jan 27 01:41:07 2024 ] Mean training loss: 18.2596124587.
[ Sat Jan 27 01:41:58 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:42:24 2024 ] Epoch: 1, Batch(0/122) done. Loss: 12.15300369 lr:0.000100
[ Sat Jan 27 01:44:21 2024 ] Epoch: 1, Batch(50/122) done. Loss: 11.67739010 lr:0.000100
[ Sat Jan 27 01:46:22 2024 ] Epoch: 1, Batch(100/122) done. Loss: 13.26895523 lr:0.000100
[ Sat Jan 27 01:47:08 2024 ] Mean training loss: 12.1612764968.
[ Sat Jan 27 01:47:58 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:48:27 2024 ] Epoch: 2, Batch(0/122) done. Loss: 12.09643936 lr:0.000100
[ Sat Jan 27 01:50:20 2024 ] Epoch: 2, Batch(50/122) done. Loss: 11.06025696 lr:0.000100
[ Sat Jan 27 01:52:13 2024 ] Epoch: 2, Batch(100/122) done. Loss: 9.84243107 lr:0.000100
[ Sat Jan 27 01:53:01 2024 ] Mean training loss: 10.5143460211.
[ Sat Jan 27 01:53:52 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:54:22 2024 ] Epoch: 3, Batch(0/122) done. Loss: 9.38849068 lr:0.000100
[ Sat Jan 27 01:56:19 2024 ] Epoch: 3, Batch(50/122) done. Loss: 9.07399940 lr:0.000100
[ Sat Jan 27 01:58:09 2024 ] Epoch: 3, Batch(100/122) done. Loss: 8.66645050 lr:0.000100
[ Sat Jan 27 01:58:55 2024 ] Mean training loss: 9.0431265127.
[ Sat Jan 27 01:59:45 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:00:12 2024 ] Epoch: 4, Batch(0/122) done. Loss: 8.63507748 lr:0.000100
[ Sat Jan 27 02:02:05 2024 ] Epoch: 4, Batch(50/122) done. Loss: 7.65232229 lr:0.000100
[ Sat Jan 27 02:04:04 2024 ] Epoch: 4, Batch(100/122) done. Loss: 7.27032137 lr:0.000100
[ Sat Jan 27 02:04:47 2024 ] Mean training loss: 7.6128989556.
[ Sat Jan 27 02:05:38 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:06:09 2024 ] Epoch: 5, Batch(0/122) done. Loss: 6.52053165 lr:0.000100
[ Sat Jan 27 02:07:59 2024 ] Epoch: 5, Batch(50/122) done. Loss: 4.85380507 lr:0.000100
[ Sat Jan 27 02:10:03 2024 ] Epoch: 5, Batch(100/122) done. Loss: 7.19156647 lr:0.000100
[ Sat Jan 27 02:10:44 2024 ] Mean training loss: 5.7774419706.
[ Sat Jan 27 02:11:35 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:12:00 2024 ] Epoch: 6, Batch(0/122) done. Loss: 3.87025928 lr:0.000100
[ Sat Jan 27 02:14:02 2024 ] Epoch: 6, Batch(50/122) done. Loss: 3.52518511 lr:0.000100
[ Sat Jan 27 02:16:07 2024 ] Epoch: 6, Batch(100/122) done. Loss: 3.84364915 lr:0.000100
[ Sat Jan 27 02:16:45 2024 ] Mean training loss: 3.9095683430.
[ Sat Jan 27 02:17:36 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:18:05 2024 ] Epoch: 7, Batch(0/122) done. Loss: 3.43237042 lr:0.000100
[ Sat Jan 27 02:20:00 2024 ] Epoch: 7, Batch(50/122) done. Loss: 2.54930735 lr:0.000100
[ Sat Jan 27 02:21:59 2024 ] Epoch: 7, Batch(100/122) done. Loss: 2.43364787 lr:0.000100
[ Sat Jan 27 02:22:40 2024 ] Mean training loss: 2.6058940282.
[ Sat Jan 27 02:23:30 2024 ] Dev WER: 100.00%

WHAT IS THE PROBLEM GUYS and HOW TO SOLVE IT I AM TRYING IN ETHIOPIA SIGN LANGUAGE DATASET THAT IS AMHARIC CHARACTER

about preprocess.sh

Can you explain why you need to use preprocess.sh to process the predicted result?

硬件配置咨询

作者您好，请问您是用了多大的显存和内存呀？

Video augmentation methods for Pre-trained model

What are the video augmentation options used in the pre-trained model ([Dropbox]) ?
In the code I can see that these are the ones uncommented, is that the case for the pretrained model?
dataset/dataloader_video.py

unable to successfully install CTC

hello guys iam unable to successfullly install ctc in window machine???

is there another metod to install and use ctc that are used in the project. Please???

Weird glosses in the annotation of phoenix dataset

Hi @ycmin95 , recently, I checked the annotation of phoenix dataset and the gloss dictionary generated during the progress of data preparation.
There are many weird glosses, such as "ON", "OFF", "LEFTHAND" ...

I wonder whether we should keep these weird glosses in the label...
Any advice?

Does your PHOENIX-2014-T dataset have /1/ folder?

After download dataset, I got manual annotations in PHOENIX-2014-T.dev.corpus dataset like this:

name|video|start|end|speaker|orth|translation
11August_2010_Wednesday_tagesschau-2|11August_2010_Wednesday_tagesschau-2/1/*.png|-1|-1|Signer08|DRUCK TIEF KOMMEN|tiefer luftdruck bestimmt in den nﾃ､chsten tagen unser wetter

But in features, I cant see /1/ folder.

Does your model produce different results each time it is run?

Hi author.
Does your model produce different results each time it is run?

How can I use a trained model to test a video?

Hi author

How can I use a trained model to test a video?

About the resnet18 backbone

Hi, I was wondering if the resnet18 backbone that you guys posted here is from the non-iterative or the iterative approach. I suppose is from the non-iterative but I'm not quite sure.

Finally, can I use the feature extractor function that you provided into another dataset? and if so, what should be the set up?

Thanks in advance.

Error when i run the training part main.py

I run on the cpu
python main.py --work-dir PATH_TO_SAVE_RESULTS --config F:\code\VAC_CSLR-main1\configs\baseline.yaml --device 'cpu'

and i got these error
.git does not exist in current dir
<main.Processor object at 0x0000017E7A49DD50>
[ Sun Mar 12 22:24:52 2023 ] Parameters:
{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': 'F:\code\VAC_CSLR-main1\configs\baseline.yaml', 'random_fix': True, 'device': 'cpu', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'sclite', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'train', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 508, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 2, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 3}

Traceback (most recent call last):
File "F:\code\VAC_CSLR-main1\main.py", line 220, in
processor.start()
File "F:\code\VAC_CSLR-main1\main.py", line 49, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,
KeyError: 'train'

i got the error when i run python main.py
can you help me please

these is

About start index

Hi, thank you for your work.

I'm not sure whether the start index is False or True for KD loss calculation.

Could you let me know about it?

about the feature extractor architecture

Hi, I am really sorry to ask question here, but this is something important for my research. I really want to know more about the delta t from the frame wise features. There are intersections between the delta ts. Could you explain how long are the intersections between those delta ts? Or maybe you could mention the code about the delta t, so I can check it? Thank you.

Getting IndexError while Training or Inference

This is the command I use: !python main.py --work-dir results --device 0 --num-worker 4 --batch-size 1

Here is the error that I get:
Traceback (most recent call last):
File "main.py", line 219, in
processor.start()
File "main.py", line 46, in start
self.device, epoch, self.recoder)
File "/home/jupyter/VAC_CSLR/seq_scripts.py", line 21, in seq_train
for batch_idx, data in enumerate(loader):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jupyter/VAC_CSLR/dataset/dataloader_video.py", line 47, in getitem
input_data, label = self.normalize(input_data, label)
File "/home/jupyter/VAC_CSLR/dataset/dataloader_video.py", line 78, in normalize
video, label = self.data_aug(video, label, file_id)
File "/home/jupyter/VAC_CSLR/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/home/jupyter/VAC_CSLR/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

Can you tell me what am I doing wrong?

how to solve this error in the training model. I look forward to your answer

Traceback (most recent call last):
File "main.py", line 211, in
processor.start()
File "main.py", line 44, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,
File "/home/linux/data2/sun/VAC_CSLR/seq_scripts.py", line 18, in seq_train
for batch_idx, data in enumerate(tqdm(loader)):
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/linux/data2/sun/VAC_CSLR/dataset/dataloader_video.py", line 47, in getitem
input_data, label = self.normalize(input_data, label)
File "/home/linux/data2/sun/VAC_CSLR/dataset/dataloader_video.py", line 78, in normalize
video, label = self.data_aug(video, label, file_id)
File "/home/linux/data2/sun/VAC_CSLR/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/home/linux/data2/sun/VAC_CSLR/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

实验复现结果不一致

作者您好，我们通过下载您的代码并对您提出的VAC进行了重跑了50个epoch（没有使用BN），结果最好只有35.1%的词错率。此外，我们调整代码中的权重，对baseline算法进行实验（不使用BN），发现结果也与论文中结果相差甚多，请问是否代码版本不一致，又或我们训练时间过短？

关于baseline复现结果不一致的问题

您好，我有一些关于实验代码的一些问题。
在您的论文表3中，baseline在DEV上的结果是25.4，我在代码中尝试将loss中的ConvCTC和Dist去掉来实现它，但是得到了：仅在epoch=40时，WER=24.8%，最终结果与表3中的结果相差较多，出现这样的结果是否是因为我疏忽了某些应该去掉的部分？

log.txt
config.txt

unable to run these repository on google colab

hello, i run these repository on my local machine in CPU and i don't get the result and don't finished the run

so i want run these repository on GPU and i find the free GPU that is Google Colab.
can you give me please the detail on how to run these repository on google colab???

Data augmention error

Hi,author! I'm doing a data augmentation task，and I get problems every time I run to batch 777(The training set is twice as large as before)。I've seen your previous answer about "Index error:list out of range" but I have detected that the frame path is correct。（by the way batch_size is 4,device is 4）

Are there plans to supplement the code on the CSL dataset?

Thank you very much for your contribution to the community.
In the paper, I saw that experiments were carried out on both the PHOENIX14 dataset and the CSL dataset. I would like to ask if there are plans to supplement the data processing part and the training part of the code on the CSL dataset?

CSL Dataset

Hello, could you provide the process and evaluation scripts of CSL dataset?

Final accuracy

I want to make sure that you report 22.1 Dev WER and 23.0 Test WER, while 21.2 Dev WER and 22.3 Test WER of released pretrained model ? Thanks in advance for response!

About the normalization of the classifier's weights

Hi, may I ask where the implementation of the normalization of the classifier's weights mentioned in the paper? Thanks!

unable to run successfully when I run main.py

i got an error when i run the main,py

have got the same problem guys like this???

Traceback (most recent call last):
File "F:\code\VAC_CSLR-main1\main.py", line 219, in
processor = Processor(args)
File "F:\code\VAC_CSLR-main1\main.py", line 33, in init
self.model, self.optimizer = self.loading()
File "F:\code\VAC_CSLR-main1\main.py", line 99, in loading
model = model_class(
File "F:\code\VAC_CSLR-main1\slr_network.py", line 52, in init
self.decoder = utils.Decode(gloss_dict, num_classes, 'beam')
File "F:\code\VAC_CSLR-main1\utils\decode.py", line 19, in init
self.ctc_decoder = ctcdecode.CTCBeamDecoder(vocab, beam_width=10, blank_id=blank_id,
AttributeError: module 'ctcdecode' has no attribute 'CTCBeamDecoder'

model or model.parameters()?

Hi author.
Is this line of code correct? I think it should be model.parameters()
https://github.com/ycmin95/VAC_CSLR/blob/71f3e0334fbc8cecc7ce9816ec69781068abaac0/utils/optimizer.py#L12

Torch not compiled with CUDA enabled

hello, how are guys

i try this code but my machine don't have GPU service enabled
So, please give me the solution how to try with out CUDA

the error is :

File "F:\code\VAC_CSLR-main\main.py", line 209, in
processor = Processor(args)
File "F:\code\VAC_CSLR-main\main.py", line 34, in init
self.model, self.optimizer = self.loading()
File "C:\Users\ANTENEH\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\cuda_init_.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

this is the error that
How Can I test in CPU???

请问ctcdecode初始化所用的vocab为什么能用chr(20000-21296)生成呢？

您的工作非常出色！
在ctcdecode的文档中，vocab要用待解码的字典来初始化，为什么代码实现用chr(20000+(0~1296))就可以实现呢？20000这个数字是特定的吗？
另外，您的论文中图5给出了模型生成标签与ground_truth和视频的对齐效果，但是我通过ctcdecode只能生成标签但无法用于对齐标注，请问这部分工作是需要额外的代码实现吗？
期待您的答复！

Request for experiment dataset phoenix-2014.v3.tar.gz.

Hi
Thanks for your great work in Sign language recognition.
The download link seems expire, would you mind sharing the dataset you used below?

Successfully inference, but unable to train

Hello author, I encountered this issue while training the model. Could you kindly provide me with some advice? Thank you very much.

(vac) user2@com:~/data/VAC_CSLR-main$ python main.py --work-dir ./work_dir/vac/ --config ./configs/baseline.yaml --device 0
Loading model
/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and may be removed in the future, "
/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model finished.
Loading data
train 5671
Apply training transform.

train 5671
Apply testing transform.

dev 540
Apply testing transform.

test 629
Apply testing transform.

Loading data finished.
.git does not exist in current dir
[ Wed Jul 19 21:43:36 2023 ] Parameters:
{'work_dir': './work_dir/vac/', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'sclite', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 0, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 1296, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 2, 'test_batch_size': 4, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 40}

0%| | 0/2835 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 211, in
processor.start()
File "main.py", line 45, in start
self.device, epoch, self.recoder)
File "/home/user2/data/VAC_CSLR-main/seq_scripts.py", line 18, in seq_train
for batch_idx, data in enumerate(tqdm(loader)):
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user2/data/VAC_CSLR-main/dataset/dataloader_video.py", line 47, in getitem
input_data, label = self.normalize(input_data, label)
File "/home/user2/data/VAC_CSLR-main/dataset/dataloader_video.py", line 78, in normalize
video, label = self.data_aug(video, label, file_id)
File "/home/user2/data/VAC_CSLR-main/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/home/user2/data/VAC_CSLR-main/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

Error when I try to do the inference

Hello, I'm replicating this model but when I execute the command for do the inferece an unknowns error appears. However, I don't know why I have this error.
My setup it's:

RTX 3060ti
16GB RAM
Ryzen 7 5800X

The complete error is:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 233, in _forward_impl
    x = self.bn1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward
    return F.batch_norm(
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/functional.py", line 2149, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA error: unknown error

And I have change the config file:
-batch_size: 2
+batch_size: 1
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 1
+num_worker: 1
+device: 0

Also my torch version its 1.8.1+cu111

Thank you for the help!

UPDATE

Also i found this error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 232, in _forward_impl
    x = self.conv1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: unknown error

whit the next config
-batch_size: 2
+batch_size: 1
random_seed: 0
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 2
+num_worker: 2
+device: 0