Giter Site home page Giter Site logo

vipl-slp / vac_cslr Goto Github PK

View Code? Open in Web Editor NEW
112.0 2.0 18.0 365 KB

Visual Alignment Constraint for Continuous Sign Language Recognition. ( ICCV 2021)

Home Page: https://openaccess.thecvf.com/content/ICCV2021/html/Min_Visual_Alignment_Constraint_for_Continuous_Sign_Language_Recognition_ICCV_2021_paper.html

License: Apache License 2.0

Python 98.32% Shell 1.68%
continuous-sign-language sign-language-recognition sequence-learning

vac_cslr's Introduction

VAC_CSLR

PWC

This repo holds the code of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]

framework


Update (2022.05.14)

In recent experiments, we found an implementation improvement about the proposed method. In our early experiments, we adopt nn.DataParallel to parallel the visual feature extractor on multiple GPUs. However, only statistic updated on device 0 is kept during training (Dataparallel), which leads to unstable training results (results may be different when adopting different numbers of GPUs and batch sizes). Therefore, we adopt syncBN in this update, the training schedule can be shorten to 40 epochs, and the relevant results are also provided. Experimental results on other datasets will be provided in our future journal version.

from modules.sync_batchnorm import convert_model

def model_to_device(self, model):
    model = model.to(self.device.output_device)
    if len(self.device.gpu_list) > 1:
        model.conv2d = nn.DataParallel(
            model.conv2d,
            device_ids=self.device.gpu_list,
            output_device=self.device.output_device)
    model = convert_model(model)
    model.cuda()
    return model

With the provided code, the updated results are expected as:

Backbone WER on Dev WER on Test Pretrained model
ResNet18 (baseline) 23.8 25.4 [Baidu] [GoogleDrive]
ResNet18+VAC (CTC only) 21.5 22.1 [Baidu] [GoogleDrive]
ResNet18+VAC+SMKD 19.8 20.5 [Baidu] [GoogleDrive]

The VAC result is corresponding to the setting ofloss_weights: SeqCTC: 1.0, ConvCTC: 1.0. In addition to that, the VAC+SMKD adopt the setting of model_args: share_classifier: True, weight_norm: True.

If you find this repo useful in your research works, please consider cite our papers VAC and SMKD.


Prerequisites

  • This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.

  • ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.

  • [Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
    ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics.

  • [Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.

Data Preparation

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess.py --process-image --multiprocessing

Inference

​ We provide the pretrained models for inference, you can download them from:

Backbone WER on Dev WER on Test Pretrained model
ResNet18 21.2% 22.3% [Baidu] (passwd: qi83)
[Dropbox]

​ To evaluate the pretrained model, run the command below:
python main.py --load-weights resnet18_slr_pretrained.pt --phase test

​ (When evaluating the SMKD pretrained model, please modify the weight_norm and share_classifier in config files as True).

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:

python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS

Feature Extraction

We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:

python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features

To Do List

  • Pure python implemented evaluation tools.
  • WAR and WER calculation scripts.

Citation

If you find this repo useful in your research works, please consider citing:

@InProceedings{Min_2021_ICCV,
    author    = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
    title     = {Visual Alignment Constraint for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11542-11551}
}

Self-Mutual Distillation Learning for Continuous Sign Language Recognition [paper]

@InProceedings{Hao_2021_ICCV,
    author    = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
    title     = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11303-11312}
}

Acknowledge

We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)

vac_cslr's People

Contributors

ycmin95 avatar yulv-git avatar

Stargazers

 avatar  avatar  avatar  avatar ann-yuan avatar  avatar ante01 avatar Junyang Kong avatar  avatar  avatar N avatar carol avatar Qianyu Jiang avatar  avatar  avatar Yang Jixing avatar  avatar  avatar Miracle avatar  avatar intere_dog avatar  avatar  avatar ppoitier avatar  avatar xyuu avatar Chenwei avatar Rishika Sharma avatar  avatar Max avatar tan ttt avatar Peipei avatar Aamir Baugwala avatar simzhang avatar rzhao avatar Henrique Lucas França avatar Enrico Randellini avatar DECEM avatar  avatar Suhail Kamal avatar  avatar  avatar  avatar PoilZero avatar Claire TAN avatar Fransisca Celia Kartamihardja avatar  avatar  avatar  avatar  avatar  avatar 尹傲雄 avatar Nguyễn Quang Gia Thuận avatar Cripes avatar  avatar Weipeng Zhang avatar Matheus Lima avatar GgAb avatar  avatar  avatar kuki avatar Zihang Guo avatar Huaiwen Zhang avatar  avatar Victor Chen avatar  avatar Hao avatar  avatar Sylla avatar Xuyang Guo avatar schen avatar Jie-Ying Li avatar Hamzah Luqman avatar Syed Waqas Zamir avatar Hu Jiade avatar chenchen avatar 爱可可-爱生活 avatar HW140701 avatar Tower avatar Mathieu De Coster avatar icute123 avatar  avatar Ke Sun avatar  avatar  avatar  avatar  avatar  avatar  avatar Jiangbin Zheng avatar Ge Wu avatar Mingshuang Luo avatar Pengfei Ren 任鹏飞 avatar kas-one avatar zhengye avatar yu_he avatar  avatar arda avatar  avatar Songyao Jiang avatar

Watchers

Kostas Georgiou avatar  avatar

vac_cslr's Issues

ctcdecode

hello ,i have a question.can ctcdecode apply in windows system
thankyou

Finetuning and continue training

Hello, Thank you for the awesome work. I am trying to use the model on another dataset, so I figure I should structure my data accordingly to the format of phoenix2014. Is there anything else I should worry about or just running the preprocessing with the same structure is gonna be alright?

Also, since I am training on google colab, I won't be able to train for 80 epochs consecutively and plan to split it into several different runs. Is there a built in function to load the previous model and continue training (or finetuning, if I want to finetune the pretrain) or how should I begin to tackle this problem? I am not sure if --load-weights tag is enough. Thank you so much.

Hardware and Software Specifications for this research.

I appreciate the work you have done. Can you tell us about the hardware and software specifications used to carry out this research? What Python libraries do you use? Provide detailed requirements

I am very grateful also you want to share it.

Thank You

Issue about alignment between label and frames.

Thanks for your great job. I'm wondering how to draw a picture like Fig.5 in your paper. The key point lies in how to align labels with frames. Could you provide some advice? Thanks in advance!

Training the baseline (without VAC, SMKD)

The VAC result is corresponding to the setting ofloss_weights: SeqCTC: 1.0, ConvCTC: 1.0. In addition to that, the VAC+SMKD adopt the setting of model_args: share_classifier: True, weight_norm: True.

With the default setting, is it training for the baseline?

Pseudo Label

I'm wondering how to assign labels for frames with CTC loss. It seems CTC Loss can be viewed as sequential SoftMax losses. But the key point is how to obtain the pseudo labels for frames via back propagation. Thanks in advance!

Question about CPU or GPU error

I ran your code and found the following error, where are the parameters put into the GPU?

Traceback (most recent call last):
File "main.py", line 218, in
processor.start()
File "main.py", line 46, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,self.device, epoch, self.recoder)
File "/home/quchunguang/sunday/CSLR/seq_scripts.py", line 24, in seq_train
loss = model.criterion_calculation(ret_dict, label, label_lgt)
File "/home/quchunguang/sunday/CSLR/slr_network.py", line 96, in criterion_calculation
label_lgt.cpu().int()).mean()
File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1295, in forward
self.zero_infinity)
File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/functional.py", line 1767, in ctc_loss
zero_infinity)
RuntimeError: Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)

Detailed code running steps

Hello, author. Could you please provide a specific code execution process, such as which code to run in the first step. I really want to reproduce your code, but my ability is really insufficient. I sincerely hope you can help me。

Visualizing the predicted alignments

Thank you so much for releasing your code!

I'm trying to visualize the predicted alignments but the timings in the out.output-hypothesis-dev.ctm.sgml and the out.output-hypothesis-test.ctm.sgml files don’t make sense. The timings t1+t2 indicate the start time + end time? This means that the gloss es overlap. Also I get the same timings for all the samples e.g.
0.000+0.010
0.010+0.030
0.020+0.050
0.030+0.07

Example from the out.output-hypothesis-test.ctm.sgml file:

<SYSTEM title="./work_dir/baseline_res18_SD_VAC_Phoenix/out.output-hypothesis-test.ctm" ref_fname="./work_dir/baseline_res18_SD_VAC_Phoenix/tmp.stm" hyp_fname="./work_dir/baseline_res18_SD_VAC_Phoenix/out.output-hypothesis-test.ctm" creation_date="Sun Dec 11 22:13:30 2022" format="2.4" frag_corr="FALSE" opt_del="FALSE" weight_ali="FALSE" weight_filename="">
<SPEAKER id="signer04">
<PATH id="(signer04-000)" word_cnt="6" file="01april_2010_thursday_heute_default-5" channel="1" sequence="0" R_T1="0.000" R_T2="inf" word_aux="h_t1+t2">
C,"aber","aber",0.000+0.010:S,"freuen","woche",0.010+0.030:C,"morgen","morgen",0.020+0.050:C,"sonne","sonne",0.030+0.070:S,"selten","kaum",0.040+0.090:C,"regen","regen",0.050+0.110
</PATH>
<PATH id="(signer04-001)" word_cnt="7" file="01april_2010_thursday_tagesschau_default-7" channel="1" sequence="1" R_T1="0.000" R_T2="inf" word_aux="h_t1+t2">
C,"samstag","samstag",0.000+0.010:C,"wechselhaft","wechselhaft",0.010+0.030:C,"besonders","besonders",0.020+0.050:C,"freundlich","freundlich",0.030+0.070:D,"nordost",,:S,"bisschen","nord",0.040+0.090:S,"bereich","ix",0.060+0.130
</PATH>
<PATH id="(signer04-002)" word_cnt="7" file="01april_2010_thursday_tagesschau_default-8" channel="1" sequence="2" R_T1="0.000" R_T2="inf" word_aux="h_t1+t2">
C,"sonntag","sonntag",0.000+0.010:C,"regen","regen",0.010+0.030:C,"teil","teil",0.020+0.050:C,"gewitter","gewitter",0.030+0.070:C,"suedost","suedost",0.040+0.090:D,"durch",,:C,"regen","regen",0.050+0.110
</PATH>

Time to train

Hello, great work with this paper and repo!
I would like to ask you how much time you spent training the model (for the dataset Phoenix12) and what kind gpu you used for the training. Because I am trying to replicate it but with other dataset (specificly the Phoenix14-T), and in my first test I spent around 14h to train 10 epochs. I used a TitanXP with 12Gb for the training and a batch = 1.

Thank you again for your work and congratulation for this repo.

unable to reconize any word but the loss is decreasing???

hello, i get an error on the training phase The loss is decreasing but when i evaluate the model it doesn't recognize any word i get 100 always.
i install pytorch 1.13.0
python 3.10.13
ctcdecode-1.0.3

this is my log file

Sat Jan 27 01:36:04 2024 ] Parameters:

{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'python', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 8, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 20}

[ Sat Jan 27 01:36:31 2024 ] Epoch: 0, Batch(0/122) done. Loss: 110.28868103 lr:0.000100
[ Sat Jan 27 01:38:26 2024 ] Epoch: 0, Batch(50/122) done. Loss: 13.18387794 lr:0.000100
[ Sat Jan 27 01:40:25 2024 ] Epoch: 0, Batch(100/122) done. Loss: 12.18678570 lr:0.000100
[ Sat Jan 27 01:41:07 2024 ] Mean training loss: 18.2596124587.
[ Sat Jan 27 01:41:58 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:42:24 2024 ] Epoch: 1, Batch(0/122) done. Loss: 12.15300369 lr:0.000100
[ Sat Jan 27 01:44:21 2024 ] Epoch: 1, Batch(50/122) done. Loss: 11.67739010 lr:0.000100
[ Sat Jan 27 01:46:22 2024 ] Epoch: 1, Batch(100/122) done. Loss: 13.26895523 lr:0.000100
[ Sat Jan 27 01:47:08 2024 ] Mean training loss: 12.1612764968.
[ Sat Jan 27 01:47:58 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:48:27 2024 ] Epoch: 2, Batch(0/122) done. Loss: 12.09643936 lr:0.000100
[ Sat Jan 27 01:50:20 2024 ] Epoch: 2, Batch(50/122) done. Loss: 11.06025696 lr:0.000100
[ Sat Jan 27 01:52:13 2024 ] Epoch: 2, Batch(100/122) done. Loss: 9.84243107 lr:0.000100
[ Sat Jan 27 01:53:01 2024 ] Mean training loss: 10.5143460211.
[ Sat Jan 27 01:53:52 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:54:22 2024 ] Epoch: 3, Batch(0/122) done. Loss: 9.38849068 lr:0.000100
[ Sat Jan 27 01:56:19 2024 ] Epoch: 3, Batch(50/122) done. Loss: 9.07399940 lr:0.000100
[ Sat Jan 27 01:58:09 2024 ] Epoch: 3, Batch(100/122) done. Loss: 8.66645050 lr:0.000100
[ Sat Jan 27 01:58:55 2024 ] Mean training loss: 9.0431265127.
[ Sat Jan 27 01:59:45 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:00:12 2024 ] Epoch: 4, Batch(0/122) done. Loss: 8.63507748 lr:0.000100
[ Sat Jan 27 02:02:05 2024 ] Epoch: 4, Batch(50/122) done. Loss: 7.65232229 lr:0.000100
[ Sat Jan 27 02:04:04 2024 ] Epoch: 4, Batch(100/122) done. Loss: 7.27032137 lr:0.000100
[ Sat Jan 27 02:04:47 2024 ] Mean training loss: 7.6128989556.
[ Sat Jan 27 02:05:38 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:06:09 2024 ] Epoch: 5, Batch(0/122) done. Loss: 6.52053165 lr:0.000100
[ Sat Jan 27 02:07:59 2024 ] Epoch: 5, Batch(50/122) done. Loss: 4.85380507 lr:0.000100
[ Sat Jan 27 02:10:03 2024 ] Epoch: 5, Batch(100/122) done. Loss: 7.19156647 lr:0.000100
[ Sat Jan 27 02:10:44 2024 ] Mean training loss: 5.7774419706.
[ Sat Jan 27 02:11:35 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:12:00 2024 ] Epoch: 6, Batch(0/122) done. Loss: 3.87025928 lr:0.000100
[ Sat Jan 27 02:14:02 2024 ] Epoch: 6, Batch(50/122) done. Loss: 3.52518511 lr:0.000100
[ Sat Jan 27 02:16:07 2024 ] Epoch: 6, Batch(100/122) done. Loss: 3.84364915 lr:0.000100
[ Sat Jan 27 02:16:45 2024 ] Mean training loss: 3.9095683430.
[ Sat Jan 27 02:17:36 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:18:05 2024 ] Epoch: 7, Batch(0/122) done. Loss: 3.43237042 lr:0.000100
[ Sat Jan 27 02:20:00 2024 ] Epoch: 7, Batch(50/122) done. Loss: 2.54930735 lr:0.000100
[ Sat Jan 27 02:21:59 2024 ] Epoch: 7, Batch(100/122) done. Loss: 2.43364787 lr:0.000100
[ Sat Jan 27 02:22:40 2024 ] Mean training loss: 2.6058940282.
[ Sat Jan 27 02:23:30 2024 ] Dev WER: 100.00%

WHAT IS THE PROBLEM GUYS and HOW TO SOLVE IT I AM TRYING IN ETHIOPIA SIGN LANGUAGE DATASET THAT IS AMHARIC CHARACTER

about preprocess.sh

Can you explain why you need to use preprocess.sh to process the predicted result?

unable to successfully install CTC

hello guys iam unable to successfullly install ctc in window machine???

is there another metod to install and use ctc that are used in the project. Please???

Weird glosses in the annotation of phoenix dataset

Hi @ycmin95 , recently, I checked the annotation of phoenix dataset and the gloss dictionary generated during the progress of data preparation.
There are many weird glosses, such as "ON", "OFF", "LEFTHAND" ...
image
I wonder whether we should keep these weird glosses in the label...
Any advice?

Does your PHOENIX-2014-T dataset have /1/ folder?

After download dataset, I got manual annotations in PHOENIX-2014-T.dev.corpus dataset like this:

name|video|start|end|speaker|orth|translation
11August_2010_Wednesday_tagesschau-2|11August_2010_Wednesday_tagesschau-2/1/*.png|-1|-1|Signer08|DRUCK TIEF KOMMEN|tiefer luftdruck bestimmt in den nテ、chsten tagen unser wetter

But in features, I cant see /1/ folder.

About the resnet18 backbone

Hi, I was wondering if the resnet18 backbone that you guys posted here is from the non-iterative or the iterative approach. I suppose is from the non-iterative but I'm not quite sure.

Finally, can I use the feature extractor function that you provided into another dataset? and if so, what should be the set up?

Thanks in advance.

Error when i run the training part main.py

I run on the cpu
python main.py --work-dir PATH_TO_SAVE_RESULTS --config F:\code\VAC_CSLR-main1\configs\baseline.yaml --device 'cpu'

and i got these error
.git does not exist in current dir
<main.Processor object at 0x0000017E7A49DD50>
[ Sun Mar 12 22:24:52 2023 ] Parameters:
{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': 'F:\code\VAC_CSLR-main1\configs\baseline.yaml', 'random_fix': True, 'device': 'cpu', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'sclite', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'train', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 508, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 2, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 3}

Traceback (most recent call last):
File "F:\code\VAC_CSLR-main1\main.py", line 220, in
processor.start()
File "F:\code\VAC_CSLR-main1\main.py", line 49, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,
KeyError: 'train'

i got the error when i run python main.py
can you help me please

these is

About start index

Hi, thank you for your work.

I'm not sure whether the start index is False or True for KD loss calculation.

Could you let me know about it?

about the feature extractor architecture

Hi, I am really sorry to ask question here, but this is something important for my research. I really want to know more about the delta t from the frame wise features. There are intersections between the delta ts. Could you explain how long are the intersections between those delta ts? Or maybe you could mention the code about the delta t, so I can check it? Thank you.
Screenshot from 2023-05-08 14-49-30

Getting IndexError while Training or Inference

This is the command I use: !python main.py --work-dir results --device 0 --num-worker 4 --batch-size 1

Here is the error that I get:
Traceback (most recent call last):
File "main.py", line 219, in
processor.start()
File "main.py", line 46, in start
self.device, epoch, self.recoder)
File "/home/jupyter/VAC_CSLR/seq_scripts.py", line 21, in seq_train
for batch_idx, data in enumerate(loader):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jupyter/VAC_CSLR/dataset/dataloader_video.py", line 47, in getitem
input_data, label = self.normalize(input_data, label)
File "/home/jupyter/VAC_CSLR/dataset/dataloader_video.py", line 78, in normalize
video, label = self.data_aug(video, label, file_id)
File "/home/jupyter/VAC_CSLR/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/home/jupyter/VAC_CSLR/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

Can you tell me what am I doing wrong?

how to solve this error in the training model. I look forward to your answer

Traceback (most recent call last):
File "main.py", line 211, in
processor.start()
File "main.py", line 44, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,
File "/home/linux/data2/sun/VAC_CSLR/seq_scripts.py", line 18, in seq_train
for batch_idx, data in enumerate(tqdm(loader)):
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/linux/data2/sun/VAC_CSLR/dataset/dataloader_video.py", line 47, in getitem
input_data, label = self.normalize(input_data, label)
File "/home/linux/data2/sun/VAC_CSLR/dataset/dataloader_video.py", line 78, in normalize
video, label = self.data_aug(video, label, file_id)
File "/home/linux/data2/sun/VAC_CSLR/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/home/linux/data2/sun/VAC_CSLR/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

实验复现结果不一致

作者您好,我们通过下载您的代码并对您提出的VAC进行了重跑了50个epoch(没有使用BN),结果最好只有35.1%的词错率。此外,我们调整代码中的权重,对baseline算法进行实验(不使用BN),发现结果也与论文中结果相差甚多,请问是否代码版本不一致,又或我们训练时间过短?

关于baseline复现结果不一致的问题

您好,我有一些关于实验代码的一些问题。
在您的论文表3中,baseline在DEV上的结果是25.4,我在代码中尝试将loss中的ConvCTC和Dist去掉来实现它,但是得到了:仅在epoch=40时,WER=24.8%,最终结果与表3中的结果相差较多,出现这样的结果是否是因为我疏忽了某些应该去掉的部分?

log.txt
config.txt

unable to run these repository on google colab

hello, i run these repository on my local machine in CPU and i don't get the result and don't finished the run

so i want run these repository on GPU and i find the free GPU that is Google Colab.
can you give me please the detail on how to run these repository on google colab???

Data augmention error

Hi,author! I'm doing a data augmentation task,and I get problems every time I run to batch 777(The training set is twice as large as before)。I've seen your previous answer about "Index error:list out of range" but I have detected that the frame path is correct。(by the way batch_size is 4,device is 4)
image

Are there plans to supplement the code on the CSL dataset?

Thank you very much for your contribution to the community.
In the paper, I saw that experiments were carried out on both the PHOENIX14 dataset and the CSL dataset. I would like to ask if there are plans to supplement the data processing part and the training part of the code on the CSL dataset?

CSL Dataset

Hello, could you provide the process and evaluation scripts of CSL dataset?

Final accuracy

I want to make sure that you report 22.1 Dev WER and 23.0 Test WER, while 21.2 Dev WER and 22.3 Test WER of released pretrained model ? Thanks in advance for response!

unable to run successfully when I run main.py

i got an error when i run the main,py

have got the same problem guys like this???

Traceback (most recent call last):
File "F:\code\VAC_CSLR-main1\main.py", line 219, in
processor = Processor(args)
File "F:\code\VAC_CSLR-main1\main.py", line 33, in init
self.model, self.optimizer = self.loading()
File "F:\code\VAC_CSLR-main1\main.py", line 99, in loading
model = model_class(
File "F:\code\VAC_CSLR-main1\slr_network.py", line 52, in init
self.decoder = utils.Decode(gloss_dict, num_classes, 'beam')
File "F:\code\VAC_CSLR-main1\utils\decode.py", line 19, in init
self.ctc_decoder = ctcdecode.CTCBeamDecoder(vocab, beam_width=10, blank_id=blank_id,
AttributeError: module 'ctcdecode' has no attribute 'CTCBeamDecoder'

Torch not compiled with CUDA enabled

hello, how are guys

i try this code but my machine don't have GPU service enabled
So, please give me the solution how to try with out CUDA

the error is :

File "F:\code\VAC_CSLR-main\main.py", line 209, in
processor = Processor(args)
File "F:\code\VAC_CSLR-main\main.py", line 34, in init
self.model, self.optimizer = self.loading()
File "C:\Users\ANTENEH\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\cuda_init_.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

this is the error that
How Can I test in CPU???

请问ctcdecode初始化所用的vocab为什么能用chr(20000-21296)生成呢?

您的工作非常出色!
在ctcdecode的文档中,vocab要用待解码的字典来初始化,为什么代码实现用chr(20000+(0~1296))就可以实现呢?20000这个数字是特定的吗?
另外,您的论文中图5给出了模型生成标签与ground_truth和视频的对齐效果,但是我通过ctcdecode只能生成标签但无法用于对齐标注,请问这部分工作是需要额外的代码实现吗?
期待您的答复!

Successfully inference, but unable to train

Hello author, I encountered this issue while training the model. Could you kindly provide me with some advice? Thank you very much.

(vac) user2@com:~/data/VAC_CSLR-main$ python main.py --work-dir ./work_dir/vac/ --config ./configs/baseline.yaml --device 0
Loading model
/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and may be removed in the future, "
/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model finished.
Loading data
train 5671
Apply training transform.

train 5671
Apply testing transform.

dev 540
Apply testing transform.

test 629
Apply testing transform.

Loading data finished.
.git does not exist in current dir
[ Wed Jul 19 21:43:36 2023 ] Parameters:
{'work_dir': './work_dir/vac/', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'sclite', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 0, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 1296, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 2, 'test_batch_size': 4, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 40}

0%| | 0/2835 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 211, in
processor.start()
File "main.py", line 45, in start
self.device, epoch, self.recoder)
File "/home/user2/data/VAC_CSLR-main/seq_scripts.py", line 18, in seq_train
for batch_idx, data in enumerate(tqdm(loader)):
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/anaconda3/envs/vac/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user2/data/VAC_CSLR-main/dataset/dataloader_video.py", line 47, in getitem
input_data, label = self.normalize(input_data, label)
File "/home/user2/data/VAC_CSLR-main/dataset/dataloader_video.py", line 78, in normalize
video, label = self.data_aug(video, label, file_id)
File "/home/user2/data/VAC_CSLR-main/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/home/user2/data/VAC_CSLR-main/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

Error when I try to do the inference

Hello, I'm replicating this model but when I execute the command for do the inferece an unknowns error appears. However, I don't know why I have this error.
My setup it's:

  • RTX 3060ti
  • 16GB RAM
  • Ryzen 7 5800X

The complete error is:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 233, in _forward_impl
    x = self.bn1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward
    return F.batch_norm(
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/functional.py", line 2149, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA error: unknown error

And I have change the config file:
-batch_size: 2
+batch_size: 1
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 1
+num_worker: 1
+device: 0

Also my torch version its 1.8.1+cu111

Thank you for the help!

UPDATE

Also i found this error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 232, in _forward_impl
    x = self.conv1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: unknown error

whit the next config
-batch_size: 2
+batch_size: 1
random_seed: 0
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 2
+num_worker: 2
+device: 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.