Giter Site home page Giter Site logo

byeonghu-na / matrn Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 9.0 1.52 MB

Official PyTorch implementation for Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features (MATRN) in ECCV 2022.

License: MIT License

Dockerfile 0.77% Python 99.23%

matrn's People

Contributors

byeonghu-na avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

matrn's Issues

Predict More Characters

Hello there!

  • Great work. I'd like to ask how to train the align model with more characters. The current implementation can only recognize 36 characters (09, az). I want to recognize 90 characters (09, az, A~Z, and some symbols).
  • I tried to modify some code and now I can train on 90 characters. However, I am facing a problem that I can not load the pre-trained language model and vision model, as they are trained on 36 characters. Is there any way to modify the code so that I can load the pre-trained weights?

LMDB Dataset & Loss Function Calculation

Hey there,

I've got a couple of questions I'd like to throw your way:

  • So, I've been tinkering around with MATRN and trying to train it on my own custom dataset using PyTorch Lightning. I've been trying to mimic your approach with fastai, but it seems like my model isn't learning anything. I've got a feeling it's something to do with how I'm calculating the loss using multiloss. Even though I borrowed some code from yours for the multiloss, I'm still hitting the same snag. Any chance you could help me figure out the right way to calculate the loss during training?
  • Also, I'm hitting a bit of a roadblock while trying to convert my own dataset into an LMDB dataset for training using your code. Any tips on how the input path structure should look when converting it into an LMDB dataset?

Thanks a bunch for your help, your MATRN architecture is wicked cool!

handwriting

How does it perform in a handwritten data set

Question about evaluation on each dataset

Thanks for your nice work.
I noticed the evaluation results on different dataset in your paper, just like this image.
image
When I evaluated the dataset, I only got the total result like this instead of each dataset result:
image
I wander that how to evaluate each dataset.

Thanks!

Training always ends

d5aebaadc510b758dbc06b885fa9d7e
Excuse me?
Why does the process still end when the training reaches epoch9 after the configuration file is modified

Question about pretraining on language model

Hi, thank you for your nice work.
When I try to pretrain the language model, I have a problem like this:
image
Here is my yaml of pretrain-language config, I only changed the epoch-related values.

global:
  name: my-pretrain-language
  phase: train
  stage: pretrain-language
  workdir: results
  seed: ~

dataset:
  train: {
    roots: ['data/WikiText-103.csv'],
    batch_size: 1024
  }
  test: {
    roots: ['data/WikiText-103_eval_d1.csv'],
    batch_size: 1024
  }
  valid: {
    roots: [ 'data/validation' ],
    batch_size: 384
  }

training:
  epochs: 80
  show_iters: 50
  eval_iters: 100
  save_iters: 3000

optimizer:
  type: Adam
  true_wd: False
  wd: 0.0
  bn_wd: False
  clip_grad: 20
  lr: 0.0001
  args: {
    betas: !!python/tuple [0.9, 0.999], # for default Adam
  }
  scheduler: {
    periods: [70, 10],
    gamma: 0.1,
  }

model:
  name: 'modules.model_language.BCNLanguage'
  language: {
    num_layers: 4,
    loss_weight: 1.,
    use_self_attn: False
  }

May I ask if you have encountered any relevant situation?
Thank you!

Question about reproducing.

Thanks for your great work.

Can you tell me how long the model needs to be trained under the configuration of 4 NVIDIA GeForce RTX 3090GPUs to converge to the results in the paper? If it is convenient, could you provide your training logs?

I'm in the process of reproducing it now, but I found that the loss became jittery after a period of training, I don't know if I configured it wrong or if it's inherently so, slowly converging to the result in a long period of jitters. So I hope the author will provide a training log, if possible(thanks a lot).

Thank you very much!

Question about the performance of pre-trained model that the link contains.

First, thank you very much for your work, it is very impressive. However, when I evaluate with the pre-trained model provided by the link, I get results that are lower than the performance of the report. What is the reason for this? Thank you very much for your answer.
And my results on 6 datasets of IIIT5k_3000, SVT, SVTP, IC13_857, IC15_1811, CUTE80 are as follows:

[2022-03-03 23:28:19,374 main.py:276 INFO train-matrn] validation time: 62.44528245925903 / batch size: 384
[2022-03-03 23:28:19,374 main.py:281 INFO train-matrn] eval loss = 1.435, ccr = 0.957, cwr = 0.904, ted = 1542.000, ned = 297, ted/w = 0.213.

you results of the same six datasets average cwr is 93.450.

Thank you very much again!

Error about the model when using resnet as the backbone.

Hello author, the following error occurred in the model when I used Resnet as the model backbone instead of ResTransformer. No such error occurred when I ran ABINet using ResNet as the backbone of the model.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:
 [torch.FloatTensor [10, 256, 512]], which is output 0 of ViewBackward, is at version 28; expected version 0 instead. 
Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I found that backbone must have transformer in your model, if there is no transformer behind CNN, there will be mistakes, but I can't find more specific reasons.
I only changed the backbone in the train_matrn.yaml configuration file.

Thanks for your reply!

change input image size error

Is it possible to learn by changing the image size? When I resized 128x32(default size) to 256x32, an error occurred.

Question about metrics

Hi, when I was training vision model, I noticed several metrics which I don't understand. "ccr cwr ted ned ted/w"
image
I guess the metrics are from fast-ai, but I cannot find them in the docs.
image
Thank you!

Question about code

Thank you for sharing the code!

The 34-th line in modules/model_matrn_iter.py: the self.semantic_visual has no attribute about pe.
So I get the error "torch.nn.modules.module.ModuleAttributeError: 'BaseSemanticVisual_backbone_feature' object has no attribute 'pe'"

So as the 39-th and 44-th lines.

Is there something wrong here?

About Code in Line 81 of main.py

Hi!
Thanks for your great work. When I debug your code, I find Line81 and Line 82 in file main.py seems like they're written backwards. I have no idea whether this is correct.
Thanks :D

MATRN/main.py

Line 81 in f4d43a9

valid_ds = _get_dataset(ImageDataset, config.dataset_test_roots, False, config)

MATRN/main.py

Line 82 in f4d43a9

test_ds = _get_dataset(ImageDataset, config.dataset_valid_roots, False, config)

Question about the usage of text input

I noticed that texts(index encoding) is passed to the forward function, but not used anywhere. Just curious that, are you going to "ADD" text embedding to the final output? My guess is that, you probably have tried it out, but got limited performance improvements. I've been thinking about the usage of text embedding for a while, but it's too hard to convince myself to add text embedding to the training pipeline. As in the inference time, no text information will be given. Please correct me if my guess is wrong. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.