Giter Site home page Giter Site logo

lbh1024 / can Goto Github PK

View Code? Open in Web Editor NEW
351.0 23.0 57.0 1.02 MB

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition (ECCV’2022 Poster).

License: MIT License

Python 100.00%
counting hmer ocr recognition

can's Introduction

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

This is the official pytorch implementation of CAN (ECCV'2022).

Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai

Abstract

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism. However, such methods may fail to accurately read formulas with complicated structure or generate long markup sequences, as the attention results are often inaccurate due to the large variance of writing styles or spatial layouts. To alleviate this problem, we propose an unconventional network for HMER named Counting-Aware Network (CAN), which jointly optimizes two tasks: HMER and symbol counting. Specifically, we design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations, and then plug it into a typical attention-based encoder-decoder model for HMER. Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models, and CAN consistently outperforms the state-of-the-art methods. In particular, compared with an encoder-decoder model for HMER, the extra time cost caused by the proposed counting module is marginal.

Pipeline

Counting Module

Datasets

Download the CROHME dataset from BaiduYun (downloading code: 1234) and put it in datasets/.

The HME100K dataset can be download from the official website HME100K.

Training

Check the config file config.yaml and train with the CROHME dataset:

python train.py --dataset CROHME

By default the batch size is set to 8 and you may need to use a GPU with 32GB RAM to train your model.

Testing

Fill in the checkpoint (pretrained model path) in the config file config.yaml and test with the CROHME dataset:

python inference.py --dataset CROHME

Note that the testing dataset path is set in the inference.py.

Citation

@inproceedings{CAN,
  title={When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition},
  author={Li, Bohan and Yuan, Ye and Liang, Dingkang and Liu, Xiao and Ji, Zhilong and Bai, Jinfeng and Liu, Wenyu and Bai, Xiang},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={197--214},
  year={2022}
}

Recommendation

Some other excellent open-sourced HMER algorithms can be found here:

WAP[PR'2017] DWAP-TD[ICML'2020] BTTR[ICDAR'2021] ABM[AAAI'2022] SAN[CVPR'2022] CoMER[ECCV'2022]

can's People

Contributors

lbh1024 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

can's Issues

在HME100K数据集上ExpRate与论文中相差5%

您好,首先感谢您做出了一项很有启发性的工作并开源了代码。

在运行代码过程中我遇到了两个问题,如下:

问题1:在HME100K数据集上,训练90个epoch,测试集的ExpRate最高为53.58,最高点出现于epoch90。之后重新开始,训练240个epoch,测试集的ExpRate最高为62.03,最高点出现于epoch198,与论文中汇报的67.31相差5.28。想问一下是否我在复现哪个环节时没有与您保持一致?如何达到您在论文中汇报的结果?(240个epoch的实验忘了调整init_epoch,实际训练了215个epoch)

细节:

  1. 我注意到您在#19中提供了在HME100K数据集上训练的主要参数,我使用的主要参数如下,其它参数与config.yaml一致:
seed: 20211024

epochs: 90
batch_size: 8
workers: 8
train_parts: 3
valid_parts: 1
valid_start: 0
save_start: 0

optimizer: Adadelta
lr: 1
lr_decay: cosine
step_ratio: 10
step_decay: 5
eps: 1e-6
weight_decay: 2e-5
beta: 0.9

counting_decoder:
  in_channel: 684
  out_channel: 247
  1. 训练时未对图片做Resize操作;
  2. 训练所使用的GPU为Nvidia Tesla V100 (32GB),单张。
  3. Train loss:
    train_loss
  4. Train ExpRate:
    epoch_train_ExpRate
  5. Eval loss:
    eval_loss
  6. Eval ExpRate:
    epoch_eval_ExpRate

问题2:论文中描述HME100K共包含249种符号(包括‘eos’和‘sos’),但我在去重后只得到247种符号(包括‘eos’和‘sos’),不清楚丢失了哪两种。

以上,感谢!

模型预测

李老师好,模型训练完成之后,我用自己的照片(jpg格式)让模型预测,模型无法输出正确的结果,请问这是为什么呢,我是一名本科生,现在在参考您的文献学习,十分期待您的回答

预测长公式

作者你好,非常感谢开源,我测试了我们的数据,数据中的公式可能会比CROHME或者HME100K更长一些。训练时我基本采用了你的参数,但是我发现在预测的过程中,对于长公式模型只预测了其中一部分,图像的尺寸也在1600 * 320范围内,请问是否还有什么其他因素影响了预测的输出?

关于代码复现问题

李老师您好,我发现复现获取数据的时候出现了以下错误,use_avg没有定义导致后面路径读取不出来是什么问题呀

image

有开源的训练好的模型吗?

您好,请教几个问题:
有没有开源的训练好的模型?
数据增强部分好像没有找到?
配置文件config.yaml对应的是论文表1指标中的哪个模型?
希望能尽量多开源一些,学习并复现一下!谢谢!

Request HME100K dataset

Hi, Thank you for your great work!
Because, I'm not from China, I can't create an account and download the HME100K dataset in official website. Can you send me this data via email: [email protected].
Thank you so much!

有开源的训练好的模型吗?

您好,请教几个问题:
有没有开源的训练好的模型?
数据增强部分好像没有找到?
配置文件config.yaml对应的是论文表1指标中的哪个模型?
希望能尽量多开源一些,学习并复现一下!谢谢!

关于代码中对图像的设置

想知道设置mask的作用具体是什么?
并且网络对图像的size也没有进行限制,宽高和label长度都是取自每次批处理的最大值,想知道这样做的用意是什么?
但是对于通道数是设置了684

Cannot Find HME100K dataset

Thanks for sharing such a great source.
When I tried to access the website to get the HME100K dataset, I only got an error saying that the system is in use..!, how can I get the data and run it?

为什么复现不出论文中的效果

直接使用Github中的代码,没有任何改动,epoch为240,batchsize为8,不使用数据增强,随机种子也没用修改。为什么在CROHME2014数据集的正确率为56.09%,比论文中的57%的正确率低了0.91%。如果想复现出论文中的效果,超参数应该如何设置

模型推理CPU占用过高

如题,在训练时,使用多GPU一块GPU占1.5核,推理时单个进程占用5个核,请问这样正常吗

Is it appropriate to set initial_lr to 1 ?

in config.yaml:

optimizer: Adadelta
lr: 1
lr_decay: cosine
step_ratio: 10
step_decay: 5
eps: 1e-6
weight_decay: 1e-4
beta: 0.9

in training,py:
new_lr = 0.5 * (1 + math.cos((current_step + 1 + (current_epoch - 1) * steps) * math.pi / (200 * steps))) * initial_lr

Did you set set initial_lr to 1 ?
On my own data set, the "eval_ExpRate" fluctuates greatly.

关于优化器和学习率

请问,Adadelta优化器是自适应更新学习率的,请问您为什么还要对学习率进行余弦调整呢?非常抱歉打扰了,中秋佳节快乐!

训练较大的数据集

您好,请问如果想在该模型上训练类似HMER100K的数据集,需要调整什么超参数吗

关于CAN模型

你好,你在该项目中提出的CAN模型是CAN-DWAP还是CAN-ABM?

复现时遇到的问题

训练可以正常进行,但是不知道训练中途保存的参数会保存在哪里,同时发现inference.py需要的标签checkpoint的内容是空的,请问配置文件里的checkpoint应该怎么填,什么时候填,以及训练时的参数会保存在哪里

推理单张图片

在inference.py中,希望能够推理单张图片,加载训练好的权重,进行推理,
但输出的prediction几乎都是空的,请教一下这种原因是什么
with torch.no_grad():
for line in tqdm(lines):
name, *labels = line.split()
name = name.split('.')[0] if name.endswith('jpg') else name
input_labels = labels
print("input_labels:",input_labels)
labels = ' '.join(labels)
img = images[name]
img = torch.Tensor(255-img) / 255
img = img.unsqueeze(0).unsqueeze(0)
img = img.to(device)
a = time.time()
input_labels = words.encode(input_labels)
input_labels = torch.LongTensor(input_labels)
input_labels = input_labels.unsqueeze(0).to(device)

    probs, _, mae, mse = model(img, input_labels, os.path.join(params['decoder']['net']))
    mae_sum += mae
    mse_sum += mse
    model_time += (time.time() - a)

    prediction = words.decode(probs)
    print("pre:",prediction)

About data augmentation

Hi@LBH1024, it seems that the code about data augmentation is missing, which is essential to get better performance, can you share this code? Thanks!

About reproduction performance

Greetings, I have reproduced according to your code and parameter with almost no adjustment. And my results on three CROHME test set is 10% lower than your paper results. Would you share your experiment log?

About counting loss

在实验前几个批次,counting_loss就迅速从100多降到了0.1多一点,这是什么原因呢,代码没有动过

想请教一些关于参数的问题

您好,config.yaml里面的workers、train_parts、valid_start、save_start、step_ratio、step_decay、eps等参数不太明白是什么意思,不知道您是否方便解释一下这些参数的意思呢?

在复制实验时遇到的问题

当我执行python train.py --dataset CROHME的时候,train.py执行到第72行代码的时候开始报错,但是这个错误很奇怪,我不是很明白,不知道你们有遇到过吗?
image

params['word_num'] in yaml

请问params['word_num']这个参数的具体数值是多少呢?好像没有在config.yaml文件中看到相关设置。

CUDA out of memory

随着不断的训练显存占用逐渐升高,最终会超出显存。

请教关于预测的问题

你好,我想测试一下模型在印刷体公式识别上的效果,测试的ExpRate达到0.8,但是另外使用外来的图片预测的时候,结果总是一串相同的字符,这会是什么原因引起的呢?

输入图片:
00000495

程序结果:
92959e6c1ddf7e6c0be4ba116882d78

About requirements

Hi@LBH1024, can you tell me the requirements~(such as the python and pytorch version) to run the code. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.