vitae-transformer / deepsolo Goto Github PK

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting"

License: Other

Python 82.86% C++ 1.68% Cuda 15.45%

chinese-text-spotting detection-transformer explicit-point-query multilingual-text-spotting scene-text-spotting

deepsolo's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

The pretrained models for ViTAE on matting and remote sensing are released! Please try and have fun!

24/03/2021

The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

The code is released!

19/10/2021

The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

deepsolo's People

Contributors

Stargazers

Watchers

deepsolo's Issues

nan value

when i try to train a thai model, there are some nan/inf accrued in DeformableTransformer.

and i check the out put of self.detection_transformer:
the output is [nan, nan, nan,......]
so i go to deeper:
i have found that the problem is the forward of 'DeformableTransformer', but i don know how to fix it.

my config file is blew:
BASE: "../Base_det.yaml"

MODEL:
WEIGHTS: ""
TRANSFORMER:
VOC_SIZE: 230
NUM_POINTS: 50

DATASETS:
TRAIN: ("thaicard_train",)
TEST: ("thaicard_eval",)

SOLVER:
IMS_PER_BATCH: 4
BASE_LR: 1e-4
LR_BACKBONE: 1e-4
WARMUP_ITERS: 0
STEPS: (320000,)
MAX_ITER: 375000
CHECKPOINT_PERIOD: 100000

TEST:
EVAL_PERIOD: 10000

OUTPUT_DIR: "output/vitaev2_s/thai_pre"

How to fix it ? Thank you!

Custom dictionary visualization expects ints instead of strings

The adet/utils/visualizer.py text decoding method _ctc_decode_recognition() includes the following code:

c = int(c)
if c < self.voc_size - 1:
    if last_char != c:
        if self.voc_size == 37 or self.voc_size == 96:
            s += self.CTLABELS[c]
            last_char = c
        else:
            s += str(chr(self.CTLABELS[c]))
            last_char = c

This seems to assume that a custom character dictionary is a list containing Unicode ordinals of characters, as opposed to the characters themselves, as in the builtin dictionaries (from adet.utils.visualizer.TextVisualizer.__init__()):

if self.voc_size == 96:
    self.CTLABELS = [' ','!','"','#','$','%','&','\'','(',')','*','+',',','-','.','/','0','1','2','3','4','5','6','7','8','9',':',';','<','=','>','?','@','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','[','\\',']','^','_','`','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','{','|','}','~']
elif self.voc_size == 37:
    self.CTLABELS = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9']
else:
    with open(self.use_customer_dictionary, 'rb') as fp:
        self.CTLABELS = pickle.load(fp)

Trying to use a list of characters results in:

  TypeError: 'str' object cannot be interpreted as an integer

Not sure if this was intended to be this way or was just an oversight.

Better backbone of Chinese model

Hello author, will ViTAEv2-S as the backbone Chinese model be updated in the future of this project? If not, I would like to ask about the speed of training Chinese model. Thank you very much!

importing an executed module

adet.data.__init__.py contains an import:

from . import builtin

while adet.data.builtin.py contains code in global scope (starting on line 81) which is meant to be run when executing the module from the shell. This causes ipykernel to explode when running any code that imports anything from adet.data in a Notebook or Google Colab. The simple solution is to hide the executable code in an if __name__ == '__main__' block.

Issue with Evaluation Metrics and Missing CSV File for Total-Test Dataset

I am encountering an issue with the chosen model's evaluation on the Total-Test dataset. When I attempted to evaluate the model using 300 testing images, the prediction results were saved in a JSON file called "text_results.json." However, upon inspection of the evaluation metrics (precision, recall, hmean), I noticed that all of them are set to 0, indicating that the evaluation process did not produce valid results.

Furthermore, I am unable to locate the CSV file that should contain the evaluation information.

All pretrained model weight links is unavailable

I just raise this error for updating ondrive links , thanks for owner <3

Can you disclose the performance of the pre-trained model? Or how to evaluate?

_C问题

Can it run on windows?

Error when running demo.py: "AttributeError: ms_deform_attn_forward"

It seems that there is no "ms_deform_attn_forward" in "_C"

Get text

Hello author, how to get the text identified in the network

Not implemented on the CPU

I have tried to run the code on the CPU by adding this line of code:

cfg = setup_cfg(config_file)
cfg.defrost()
cfg.MODEL.DEVICE='cpu'

but i got this error :

DeepSolo/adet/layers/ms_deform_attn.py:24, in _MSDeformAttnFunction.forward(ctx, value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, im2col_step)
     21 @staticmethod
     22 def forward(ctx, value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, im2col_step):
     23     ctx.im2col_step = im2col_step
---> 24     output = _C.ms_deform_attn_forward(
     25         value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, ctx.im2col_step)
     26     ctx.save_for_backward(value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights)
     27     return output
RuntimeError: Not implemented on the CPU

is there any way to run this code on the CPU???

Chinese text recognition scene

Hello~. Thanks for your good job. I want to ask
whether the framework of deepsolo suitable for Chinese text recognition.Compared to the english in natural scene, Chinese text recognition scene has the characteristics of more number of categories and longer text lines.Moreover, The visual features of Chinese characters are more complex than Latin characters. I would like to know if you have tried Chinese character scenes, if I want to use Deepsolo for Chinese character scenes, do you have any suggestions?

Can not run visualization demo in CPU

can not build adet in CPU.

Google Colab Demo Notebook

Could you please create a colab demo we can run to evaluate on a test image? Thanks

question about evaluation

Hi,

I use the provided config configs/R_50/pretrain/150k_tt.yaml to pretrain on syntext150k and totaltext. During pretraining, there is an evaluation every 10000 iters and it gives an evaluation result A. But when I finish pretraining, I use the saved checkpoint for evaluation with your provided evaluation command but get a different evaluation result B.

Why there is a difference between A and B? Which result is valid?

load_zip_file Error loading the ZIP archive

When I evaluate the vitae on ic15, some error happened: The det.zip exists on the path

[12/22 07:47:44 d2.evaluation.evaluator]: Total inference time: 0:01:48.998131 (0.220198 s / iter per device, on 1 devices)
[12/22 07:47:44 d2.evaluation.evaluator]: Total inference pure compute time: 0:01:47 (0.217200 s / iter per device, on 1 devices)
[12/22 07:47:44 adet.evaluation.text_evaluation_all]: Saving results to /data_local/deepsolo/output/vitaev2_s/150k_tt_mlt_13_15/finetune/totaltext/inference/text_results.json
['text_results.json', 'det.zip', 'det_full.zip']
/data_local/deepsolo/output/vitaev2_s/150k_tt_mlt_13_15/finetune/totaltext/inference/det.zip
Error!
load_zip_file Error loading the ZIP archive

Traceback (most recent call last):
  File "train_net.py", line 304, in <module>
    launch(
  File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
  File "train_net.py", line 281, in main
    res = Trainer.test(cfg, model) # d2 defaults.py
  File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 619, in test
    results_i = inference_on_dataset(model, data_loader, evaluator)
  File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/evaluation/evaluator.py", line 213, in inference_on_dataset
    results = evaluator.evaluate()
  File "/data_local/deepsolo/adet/evaluation/text_evaluation_all.py", line 438, in evaluate
    text_result["e2e_method"] = "None-" + text_result["e2e_method"]
KeyError: 'e2e_method'

why the decode result is messy

the command is
python demo.py --config-file ../configs/R_50/TotalText/finetune_150k_tt_mlt_13_15.yaml --input ../testpics/ --output ../testresult/ --opts MODEL.WEIGHTS ../weights/res50_pretrain_synch-art-lsvt-rects.pth

Can i use multiprocessing for prediction?

IC13 finetune Config

Can you provide IC13 finetune config file, thx

Cannot use ViT as a backbone

Hello there, I'm facing the below issue while trying to finetune totaltext using VITAE backbone

Traceback (most recent call last):
File "tools/train_net.py", line 368, in
launch(
File "/home/isi_cvpr/anaconda3/envs/scorpio/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "tools/train_net.py", line 293, in main
trainer = Trainer(cfg)
File "/home/isi_cvpr/anaconda3/envs/scorpio/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 376, in init
model = self.build_model(cfg)
File "/home/isi_cvpr/anaconda3/envs/scorpio/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 505, in build_model
model = build_model(cfg)
File "/home/isi_cvpr/anaconda3/envs/scorpio/lib/python3.8/site-packages/detectron2/modeling/meta_arch/build.py", line 22, in build_model
model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
File "/home/isi_cvpr/Documents/Kunal/DeepSolo/adet/modeling/text_spotter.py", line 124, in init
d2_backbone = MaskedBackbone(cfg)
File "/home/isi_cvpr/Documents/Kunal/DeepSolo/adet/modeling/text_spotter.py", line 36, in init
self.backbone = build_backbone(cfg)
File "/home/isi_cvpr/anaconda3/envs/scorpio/lib/python3.8/site-packages/detectron2/modeling/backbone/build.py", line 32, in build_backbone
backbone = BACKBONE_REGISTRY.get(backbone_name)(cfg, input_shape)
File "/home/isi_cvpr/anaconda3/envs/scorpio/lib/python3.8/site-packages/fvcore/common/registry.py", line 71, in get
raise KeyError(
KeyError: "No object named 'build_vitaev2_backbone' found in 'BACKBONE' registry!"

I've run the convert-vitae.py from pretrained_backbone and followed all the steps like updating the weight path and all. Still I'm getting this issue over and over again.
Please help me fix this issue

How can I fine-tune the DeepSolo model and trained it in different language?

Export the model for deployment in a C++ environment

Has anyone tried exporting a model to a C++ environment for inference? I have encountered some problems while dealing with this step and would like to receive your help.
I had some problems converting and exporting the model, but I finally got the model through the trace method, but there was a problem with this model.
RuntimeError: The size of tensor a (32) must match the size of tensor b (237) at non-singleton dimension 1

Deepsolo++ model/code release

Hi all,

Thank you for the great work on your model.
I wonder when could be the date of releasing the model/code of Deepsolo++ ?

question about custom data set

您好，我在训练自定义的数据集时，自己手动创建的类似于chn_cls_list,的txt文件（包含十几个中文、英文字母以及数字）后，在训练过程中会报错，在self.CTLABELS = pickle.load(fp)这里load函数会报错，我觉得应该是我的txt文件没有转成类似chn_cls_list,的 binary file，请问您知道如何将自己定义的txt文件转换成系统所需的binary file形式的txt吗？

Training Log

Hi, I am using 4 v100 gpus for pretraining to reproduce your results but the training speed seems slow. The estimated time is about 8days. Can you release your training log for reference?

issue

raise Exception("The sample %s not present in GT" %k)
Exception: The sample 0000116 not present in GT

The errors on multiprocessing may related to the dataset "syntext1_96voc"

When I try to run the code python tools/train_net.py --config-file configs/R_50/CTW1500/finetune_96voc_50maxlen.yaml --num-gpus 4
some errors occur. However ,it can run correctly if I set the --num-gpus 1 or change the code

DATASETS:
  TRAIN: ("ic13_train_96voc","totaltext_train_96voc")
  TEST: ("ctw1500_test",)

on config file configs/R_50/CTW1500/pretrain_96voc_50maxlen.yaml
and it will be error when set ' TRAIN: ("syntext1_96voc","ic13_train_96voc","totaltext_train_96voc")'

[10/11 08:19:10 adet.data.dataset_mapper]: Cropping used in training: RandomCropWithInstance(crop_type='relative_range', crop_size=[0.1, 0.1], crop_instance=False)
[10/11 08:19:11 adet.data.datasets.text]: Loaded 229 images in COCO format from /dataset/ic13/train_96voc.json
[10/11 08:19:46 adet.data.datasets.text]: Loading /dataset/syntext1/annotations/train_96voc.json takes 35.33 seconds.
[10/11 08:19:47 adet.data.datasets.text]: Loaded 94723 images in COCO format from /dataset/syntext1/annotations/train_96voc.json
[10/11 08:24:02 d2.data.build]: Removed 0 images with no usable annotations. 94950 images left.
[10/11 08:24:02 d2.data.build]: Using training sampler TrainingSampler
[10/11 08:24:03 d2.data.common]: Serializing 94950 elements to byte tensors and concatenating them all ...
Traceback (most recent call last):
  File "train_net.py", line 304, in <module>
    launch(
  File "/usr/local/lib/python3.8/dist-packages/detectron2/engine/launch.py", line 67, in launch
    mp.spawn(
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 130, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL

The structure tree of my dataset is as follow:

.
├── ArT
│   ├── art_train.json
│   └── rename_artimg_train
├── CTW1500
│   ├── test.json
│   ├── test_images
│   ├── train_96voc.json
│   ├── train_images
│   ├── weak_voc_new.txt
│   └── weak_voc_pair_list.txt
├── ChnSyntext
│   ├── chn_syntext.json
│   └── syn_130k_images
├── LSVT
│   ├── annotations
│   ├── lsvt_train.json
│   └── rename_lsvtimg_train
├── ReCTS
│   ├── ReCTS_test_images
│   ├── ReCTS_train_images
│   ├── ReCTS_val_images
│   ├── rects_test.json
│   ├── rects_train.json
│   └── rects_val.json
├── evaluation
│   ├── gt_ctw1500.zip
│   ├── gt_icdar2015.zip
│   ├── gt_inversetext.zip
│   └── gt_totaltext.zip
├── ic13
│   ├── train_37voc.json
│   ├── train_96voc.json
│   └── train_images
├── ic15
│   ├── GenericVocabulary.txt
│   ├── GenericVocabulary_new.txt
│   ├── GenericVocabulary_pair_list.txt
│   ├── ch4_test_vocabulary.txt
│   ├── ch4_test_vocabulary_new.txt
│   ├── ch4_test_vocabulary_pair_list.txt
│   ├── ic15_test.json
│   ├── ic15_train.json
│   ├── new_strong_lexicon
│   ├── strong_lexicon
│   ├── test.json
│   ├── test_images
│   ├── train_37voc.json
│   ├── train_96voc.json
│   └── train_images
├── inversetext
│   ├── inversetext_lexicon.txt
│   ├── inversetext_pair_list.txt
│   ├── test.json
│   └── test_images
├── mlt2017
│   ├── train_37voc.json
│   ├── train_96voc.json
│   └── train_images
├── syntext1
│   ├── annotations
│   ├── train.json
│   └── train_images
├── syntext2
│   ├── annotations
│   ├── train.json
│   ├── train_37voc.json
│   ├── train_96voc.json
│   └── train_images
├── textocr
│   ├── train_37voc_1.json
│   ├── train_37voc_2.json
│   └── train_images
└── totaltext
    ├── test.json
    ├── test_images
    ├── train.json
    ├── train_37voc.json
    ├── train_96voc.json
    ├── train_images
    ├── weak_voc_new.txt
    └── weak_voc_pair_list.txt

Help me I'm stuck in this bug.

[06/16 08:28:16 fvcore.common.checkpoint]: [Checkpointer] Loading from model_ctw_1500/pretrain_ctw_96voc.pth ...
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.0.weight' to the model due to incompatible shapes: (97, 256) in the checkpoint but (185, 256) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.0.bias' to the model due to incompatible shapes: (97,) in the checkpoint but (185,) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.1.weight' to the model due to incompatible shapes: (97, 256) in the checkpoint but (185, 256) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.1.bias' to the model due to incompatible shapes: (97,) in the checkpoint but (185,) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.2.weight' to the model due to incompatible shapes: (97, 256) in the checkpoint but (185, 256) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.2.bias' to the model due to incompatible shapes: (97,) in the checkpoint but (185,) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.3.weight' to the model due to incompatible shapes: (97, 256) in the checkpoint but (185, 256) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.3.bias' to the model due to incompatible shapes: (97,) in the checkpoint but (185,) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.4.weight' to the model due to incompatible shapes: (97, 256) in the checkpoint but (185, 256) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.4.bias' to the model due to incompatible shapes: (97,) in the checkpoint but (185,) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.5.weight' to the model due to incompatible shapes: (97, 256) in the checkpoint but (185, 256) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Skip loading parameter 'detection_transformer.ctrl_point_text.5.bias' to the model due to incompatible shapes: (97,) in the checkpoint but (185,) in the model! You might want to double check if this is expected.
WARNING [06/16 08:28:17 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
detection_transformer.ctrl_point_text.0.{bias, weight}
detection_transformer.ctrl_point_text.1.{bias, weight}
detection_transformer.ctrl_point_text.2.{bias, weight}
detection_transformer.ctrl_point_text.3.{bias, weight}
detection_transformer.ctrl_point_text.4.{bias, weight}
detection_transformer.ctrl_point_text.5.{bias, weight}
[06/16 08:28:17 adet.trainer]: Starting training from iteration 0
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
[06/16 08:28:34 d2.utils.events]: eta: 2:01:25 iter: 19 total_loss: 108.8 loss_ce: 0.3455 loss_texts: 15.88 loss_ctrl_points: 0.3433 loss_bd_points: 0.486 loss_ce_0: 0.3538 loss_texts_0: 15.44 loss_ctrl_points_0: 0.4311 loss_bd_points_0: 0.633 loss_ce_1: 0.3417 loss_texts_1: 15.9 loss_ctrl_points_1: 0.2806 loss_bd_points_1: 0.4443 loss_ce_2: 0.3082 loss_texts_2: 15.85 loss_ctrl_points_2: 0.3673 loss_bd_points_2: 0.5558 loss_ce_3: 0.3106 loss_texts_3: 16.09 loss_ctrl_points_3: 0.3821 loss_bd_points_3: 0.5867 loss_ce_4: 0.3525 loss_texts_4: 16.02 loss_ctrl_points_4: 0.3054 loss_bd_points_4: 0.4894 loss_ce_enc: 0.3253 loss_bezier_enc: 0.2709 time: 0.6396 data_time: 0.0838 lr: 5e-05 max_mem: 6364M
[06/16 08:28:46 d2.utils.events]: eta: 2:00:09 iter: 39 total_loss: 74.76 loss_ce: 0.196 loss_texts: 7.714 loss_ctrl_points: 0.3833 loss_bd_points: 0.5231 loss_ce_0: 0.1592 loss_texts_0: 16.03 loss_ctrl_points_0: 0.3541 loss_bd_points_0: 0.5504 loss_ce_1: 0.1697 loss_texts_1: 15.64 loss_ctrl_points_1: 0.3386 loss_bd_points_1: 0.5445 loss_ce_2: 0.1722 loss_texts_2: 11.14 loss_ctrl_points_2: 0.4048 loss_bd_points_2: 0.6204 loss_ce_3: 0.1806 loss_texts_3: 10.09 loss_ctrl_points_3: 0.3388 loss_bd_points_3: 0.5707 loss_ce_4: 0.1305 loss_texts_4: 8.643 loss_ctrl_points_4: 0.3578 loss_bd_points_4: 0.5557 loss_ce_enc: 0.1454 loss_bezier_enc: 0.2904 time: 0.6404 data_time: 0.0090 lr: 5e-05 max_mem: 6364M
[06/16 08:28:58 d2.utils.events]: eta: 1:57:58 iter: 59 total_loss: 42.5 loss_ce: 0.2289 loss_texts: 5.277 loss_ctrl_points: 0.3165 loss_bd_points: 0.5684 loss_ce_0: 0.1707 loss_texts_0: 6.32 loss_ctrl_points_0: 0.3324 loss_bd_points_0: 0.5427 loss_ce_1: 0.2002 loss_texts_1: 5.629 loss_ctrl_points_1: 0.2784 loss_bd_points_1: 0.505 loss_ce_2: 0.1872 loss_texts_2: 5.354 loss_ctrl_points_2: 0.3171 loss_bd_points_2: 0.5474 loss_ce_3: 0.1977 loss_texts_3: 5.386 loss_ctrl_points_3: 0.2946 loss_bd_points_3: 0.6001 loss_ce_4: 0.2031 loss_texts_4: 5.267 loss_ctrl_points_4: 0.3017 loss_bd_points_4: 0.5565 loss_ce_enc: 0.1571 loss_bezier_enc: 0.3466 time: 0.6252 data_time: 0.0080 lr: 5e-05 max_mem: 6364M
[06/16 08:29:13 d2.utils.events]: eta: 2:00:46 iter: 79 total_loss: 35.53 loss_ce: 0.1849 loss_texts: 4.97 loss_ctrl_points: 0.2122 loss_bd_points: 0.3615 loss_ce_0: 0.1567 loss_texts_0: 5.136 loss_ctrl_points_0: 0.2158 loss_bd_points_0: 0.3878 loss_ce_1: 0.1236 loss_texts_1: 5.048 loss_ctrl_points_1: 0.2266 loss_bd_points_1: 0.3623 loss_ce_2: 0.1315 loss_texts_2: 5.036 loss_ctrl_points_2: 0.2335 loss_bd_points_2: 0.3605 loss_ce_3: 0.1555 loss_texts_3: 5.046 loss_ctrl_points_3: 0.2266 loss_bd_points_3: 0.3617 loss_ce_4: 0.1578 loss_texts_4: 4.934 loss_ctrl_points_4: 0.2117 loss_bd_points_4: 0.3596 loss_ce_enc: 0.1506 loss_bezier_enc: 0.2414 time: 0.6458 data_time: 0.0113 lr: 5e-05 max_mem: 6364M
Traceback (most recent call last):
File "tools/train_net.py", line 304, in
launch(
File "/usr/local/lib/python3.8/dist-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "tools/train_net.py", line 298, in main
return trainer.train()
File "tools/train_net.py", line 107, in train
self.train_loop(self.start_iter, self.max_iter)
File "tools/train_net.py", line 96, in train_loop
self.run_step()
File "/usr/local/lib/python3.8/dist-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.8/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/DeepSolo/adet/modeling/text_spotter.py", line 213, in forward
loss_dict = self.criterion(output, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/DeepSolo/adet/modeling/model/losses.py", line 228, in forward
indices = self.dec_matcher(outputs_without_aux, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/DeepSolo/adet/modeling/model/matcher.py", line 45, in forward
targe_texts_batch_temp = torch.cat([
NotImplementedError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors, or that you (the operator writer) forgot to register a fallback function. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:16286 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/RegisterCUDA.cpp:20674 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:1025 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
ADInplaceOrView: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradMLC: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradHPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradNestedTensor: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9928 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9621 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:259 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:1019 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Recognition of combinations of numbers and uppercase letters

Hello, I reproduced the DeepSolo algorithm on the Chinese dataset of the document. The overall accuracy rate is not bad, but the field recognition of the combination of numbers and uppercase letters is not good, and the problem of missing characters is prone to occur. The length of these texts is about 20. I tried to increase NUM_POINTS from 25 to 50, but there is still not much improvement. If it continues to increase, the speed will be slow. Have you encountered a similar situation? Thank you！

what do the `rec` values in annotation .json files mean?

How to visualize the attention on different scale features and attention of point queries?

Datasets conversion tool

This is a great job, Can you provide dataset conversion script, thanks.

How much GPU VRAM is required to fine-tune DeepSolo?

I've tried to finetune DeepSolo with CTW1500 dataset on two systems
i) RTX 3050 4GB
ii) RTX 2080 8GB

In both of them I got the same Runtime error i.e. CUDA out of memory. I've tried to adjust the batch size as well with creating config.yaml but still no improvements I noticed and continued to get the same error.
Please let me know what I can do now?

visualization code for spotting results

Could you please release code of the visualization code of spotting results as you showed in Figure 4 and Figure 11? Thank you

When to release code?

Thanks for your great job!

Hope the code can be released as soon as possible.

Cannot implement DeepSolo on local environment or on Kaggle or any other cloud

When trying to finetune or pretrain the DeepSolo I'm getting the following error on Kaggle

Traceback (most recent call last):
File "tools/train_net.py", line 27, in
from detectron2.data import MetadataCatalog, build_detection_train_loader, build_detection_test_loader
File "/opt/conda/lib/python3.7/site-packages/detectron2/data/init.py", line 4, in
from .build import (
File "/opt/conda/lib/python3.7/site-packages/detectron2/data/build.py", line 14, in
from detectron2.structures import BoxMode
File "/opt/conda/lib/python3.7/site-packages/detectron2/structures/init.py", line 6, in
from .keypoints import Keypoints, heatmaps_to_keypoints
File "/opt/conda/lib/python3.7/site-packages/detectron2/structures/keypoints.py", line 6, in
from detectron2.layers import interpolate
File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/init.py", line 3, in
from .deform_conv import DeformConv, ModulatedDeformConv
File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/deform_conv.py", line 10, in
from detectron2 import _C
ImportError: /opt/conda/lib/python3.7/site-packages/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceISt7complexIdEEEPKNS_6detail12TypeMetaDataEv

Also how can I use DeepSolo on my Ubuntu 22.04 (My hardware specs are CPU-Ryzen 7 5600H, GPU- 4GB Nvidia RTX 3050, RAM- 16GB)

Either I face the issue of AdelaiDet or some CUDA error all the time in my local machine

ValueError: matrix contains invalid numeric entries

使用ABCNet V2上的中文数据集ReCTS训练时报错ValueError: matrix contains invalid numeric entries

ImportError: cannot import name '_C' from partially initialized module 'adet' (most likely due to a circular import) (/content/DeepSolo/adet/init.py)

please help me

setup.py build develop generates a lot of warnings

First off, thank you for the good work.

I'm following the installation part and the last command python setup.py build develop generated a lot of warnings. After inspection, I didn't find them to be significant, but decided to ask anyway. The whole log file resides >> sh.log. May I proceed with my exploration further? Thanks.

OS: Linux 20.04.

weak_voc_new.txt问题

我在运行train.net.py时，报错FileNotFoundError: [Errno 2] No such file or directory: 'datasets/totaltext/weak_voc_new.txt'，'datasets/totaltext/weak_voc_new.txt'文件是只得什么，我在readme.md的数据集组成并没有看到关于这里的介绍

Loading the dataset seems to be slower

I'm facing an issue with a significant delay when loading my dataset, and I'm unsure about the underlying cause. I would greatly appreciate any insights or suggestions you can provide.

License

Hello! Thank you for the excellent release, and congratulations on CVPR :D

Will this repository be licensed under Apache 2.0 like the other repositories under ViTAE-Transformer? A license file would be very helpful.

Thank you!

demo

请问一下，在跑demo的时候遇到一下问题，应该如何解决呢？
[06/27 10:30:24 detectron2]: Arguments: Namespace(confidence_threshold=0.3, config_file='/home/hhq/data/code/DeepSolo/configs/R_50/IC15/finetune_150k_tt_mlt_13_15.yaml', input=['/home/hhq/data/code/DeepSolo/imputimg/IMG_20230626_204410.jpg'], opts=['MODEL.WEIGHTS', '/home/hhq/data/code/DeepSolo/weight/ic15_res50_finetune_synth-tt-mlt-13-15.pth'], output='/home/hhq/data/code/DeepSolo/outputimg', video_input=None, webcam=False)
段错误 (核心已转储)

how to reduce gpu memory usage?

Hello, I'm trying to train the model on a custom toy dataset using google's colab, the free version of colab "usually" offers t4 gpu (16gb memory), using a batch size of 8 images produces the cuda out of memory error, even when changed it to only 2 images per patch I still run into that error after ~200 iters, what can I do to reduce the memory usage ? 2 images per patch will make the loss fluctuate too much (I guess)
Edit: even when using resnet50

how to export onnx？

thanks for your work！

About demo question

Hey,I'm using python demo/demo.py --config-file configs/R_50/ReCTS/pretrain.yaml --input ~/test_imgs/camera/IMG_1892.jpg --output output --opts MODEL.WEIGHTS checkpoints/res50_pretrain_synch-art-lsvt-rects.pth to demo an image but it tell me FileNotFoundError: [Errno 2] No such file or directory: ''. the code in DeepSolo/adet/utils/visualizer.py +24.My question is:if i only want to demo,will i need a CUSTOM_DICT?

`numpy` and `polygon` are not compatible

Hello, thanks for the great job! When I setup the env, I find that:

>>> from Polygon.cPolygon import *
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xe . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/qbw/anaconda3/envs/deepsolo/lib/python3.8/site-packages/Polygon/__init__.py", line 5, in <module>
    from Polygon.cPolygon import *
ImportError: numpy.core.multiarray failed to import

This error ruined my eval when pretrain. How can I fix it up? Thanks for your answer sincerely.