byeonghu-na / matrn Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch implementation for Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features (MATRN) in ECCV 2022.
License: MIT License
Official PyTorch implementation for Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features (MATRN) in ECCV 2022.
License: MIT License
Could you please explain how to train vision and language models for Korean and Japanese languages?
Hello there!
Hey there,
I've got a couple of questions I'd like to throw your way:
Thanks a bunch for your help, your MATRN architecture is wicked cool!
How does it perform in a handwritten data set
In the requirements.txt
editdistance
should be added.
Hello! I have trained the model according to your deployment, and would like to ask how to test the accuracy of IIIT5k, SVT, IC15, SVTP, CUTE and other data sets. I hope you can answer it, thank you
Hi, thank you for your nice work.
When I try to pretrain the language model, I have a problem like this:
Here is my yaml of pretrain-language config, I only changed the epoch-related values.
global:
name: my-pretrain-language
phase: train
stage: pretrain-language
workdir: results
seed: ~
dataset:
train: {
roots: ['data/WikiText-103.csv'],
batch_size: 1024
}
test: {
roots: ['data/WikiText-103_eval_d1.csv'],
batch_size: 1024
}
valid: {
roots: [ 'data/validation' ],
batch_size: 384
}
training:
epochs: 80
show_iters: 50
eval_iters: 100
save_iters: 3000
optimizer:
type: Adam
true_wd: False
wd: 0.0
bn_wd: False
clip_grad: 20
lr: 0.0001
args: {
betas: !!python/tuple [0.9, 0.999], # for default Adam
}
scheduler: {
periods: [70, 10],
gamma: 0.1,
}
model:
name: 'modules.model_language.BCNLanguage'
language: {
num_layers: 4,
loss_weight: 1.,
use_self_attn: False
}
May I ask if you have encountered any relevant situation?
Thank you!
Nice paper! Do you plan to add a demo?
Thanks for your great work.
Can you tell me how long the model needs to be trained under the configuration of 4 NVIDIA GeForce RTX 3090GPUs to converge to the results in the paper? If it is convenient, could you provide your training logs?
I'm in the process of reproducing it now, but I found that the loss became jittery after a period of training, I don't know if I configured it wrong or if it's inherently so, slowly converging to the result in a long period of jitters. So I hope the author will provide a training log, if possible(thanks a lot).
Thank you very much!
First, thank you very much for your work, it is very impressive. However, when I evaluate with the pre-trained model provided by the link, I get results that are lower than the performance of the report. What is the reason for this? Thank you very much for your answer.
And my results on 6 datasets of IIIT5k_3000, SVT, SVTP, IC13_857, IC15_1811, CUTE80 are as follows:
[2022-03-03 23:28:19,374 main.py:276 INFO train-matrn] validation time: 62.44528245925903 / batch size: 384
[2022-03-03 23:28:19,374 main.py:281 INFO train-matrn] eval loss = 1.435, ccr = 0.957, cwr = 0.904, ted = 1542.000, ned = 297, ted/w = 0.213.
you results of the same six datasets average cwr is 93.450.
Thank you very much again!
Hello author, the following error occurred in the model when I used Resnet as the model backbone instead of ResTransformer. No such error occurred when I ran ABINet using ResNet as the backbone of the model.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:
[torch.FloatTensor [10, 256, 512]], which is output 0 of ViewBackward, is at version 28; expected version 0 instead.
Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I found that backbone must have transformer in your model, if there is no transformer behind CNN, there will be mistakes, but I can't find more specific reasons.
I only changed the backbone in the train_matrn.yaml configuration file.
Thanks for your reply!
Is it possible to learn by changing the image size? When I resized 128x32(default size) to 256x32, an error occurred.
Thank you for sharing the code!
The 34-th line in modules/model_matrn_iter.py: the self.semantic_visual has no attribute about pe.
So I get the error "torch.nn.modules.module.ModuleAttributeError: 'BaseSemanticVisual_backbone_feature' object has no attribute 'pe'"
So as the 39-th and 44-th lines.
Is there something wrong here?
I noticed that texts(index encoding) is passed to the forward function, but not used anywhere. Just curious that, are you going to "ADD" text embedding to the final output? My guess is that, you probably have tried it out, but got limited performance improvements. I've been thinking about the usage of text embedding for a while, but it's too hard to convince myself to add text embedding to the training pipeline. As in the inference time, no text information will be given. Please correct me if my guess is wrong. Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.