jianshuzhang / tap Goto Github PK
View Code? Open in Web Editor NEWTrack, Attend and Parse for Online Handwritten Mathematical Expression Recognition
Track, Attend and Parse for Online Handwritten Mathematical Expression Recognition
请问data里的.ascii文件是通过预训练提取.inkml的特征么?还是其他的东西呀?
您好,我尝试使用pytorch架构的v2模型进行复现,在实现整个模型的搭建后,两个架构分别传入相同的输入和权重,验证了Tracker输出ctx、Attention输出ctxs、Parser输出proj_h、模型类别输出概率probs、损失值cost等值均相等,优化器使用带有梯度截断的adadelta,使用相同数据集和配置参数进行训练(唯一的区别是没有添加noise),然而训练了超过400个Epoch模型并为收敛。
是我训练的Epoch还没足够才会未收敛么?请问您训练了多少个Epoch模型可以收敛呢?
还是需要其他的策略才能让模型收敛呢?
期待您任何有帮助的回复。
when I run test.sh, it occurs some errors, KeyError: 'dim_attention'.
I load pkl files in WAP and TAP,
WAP:
{'dim_ConvBlock': [32, 64, 64, 128], 'decay_c': 0.0001, 'patience': 15, 'max_epochs': 5000, 'dispFreq': 100, 'batch_Imagesize': 500000, 'alpha_c': 0.0, 'bn_saveto': './models/bn_params.npz', 'saveto': './models/attention_maxlen[200]dimWord256_dim256.npz', 'clip_c': 100.0, 'kernel_Convenc': [3, 3], 'dim_coverage': 128, 'valid_result': ['./result/valid.wer'], 'valid_batch_size': 8, 'maxImagesize': 500000, 'dim_dec': 256, 'validFreq': -1, 'kernel_coverage': [5, 5], 'optimizer': 'adam', 'input_channels': 1, 'use_dropout': True, 'batch_size': 8, 'encoder': 'gru', 'dim_target': 111, 'finish_after': 10000000, 'lrate': 0.0002, 'valid_datasets': ['../data/offline-test.pkl', '../data/test_caption.txt'], 'layersNum_block': [4, 4, 4, 4], 'valid_output': ['./result/valid_decode_result.txt'], 'datasets': ['../data/offline-train.pkl', '../data/train_caption.txt'], 'dim_word': 256, 'sampleFreq': -1, 'dim_attention': 128, 'dictionaries': ['../data/dictionary.txt'], 'reload': False, 'maxlen': 200, 'decoder': 'gru_cond', 'saveFreq': -1, 'valid_batch_Imagesize': 500000}
TAP:
{'lrate': 1e-08, 'decay_c': 0.0, 'patience': 15, 'max_epochs': 5000, 'dispFreq': 100, 'alpha_c': 0.0, 'clip_c': 1000.0, 'saveto': './models/attention_maxlen[2000]dimWord256_dim256.npz', 'dim_coverage': 121, 'valid_batch_size': 8, 'dim_dec': 256, 'optimizer': 'adadelta', 'validFreq': -1, 'norm_file': ['norm.pkl'], 'batch_size': 8, 'encoder': 'gru', 'dim_target': 111, 'decoder': 'gru_cond', 'valid_datasets': ['../../prepare_data/data/9feature-valid-dis-0.005-revise.pkl', '../../prepare_data/data/valid_data_v3.txt'], 'dim_feature': 9, 'use_dropout': False, 'datasets': ['../../prepare_data/data/9feature-train-dis-0.005-revise.pkl', '../../prepare_data/data/train_data_v3.txt'], 'dim_word': 256, 'sampleFreq': -1, 'dim_enc': [250, 250, 250, 250], 'dictionaries': ['../../prepare_data/data/dictionary.txt'], 'reload': False, 'maxlen': 2000, 'finish_after': 10000000, 'down_sample': [0, 0, 1, 1], 'saveFreq': -1}
I notice some options are not in TAP's model. How can I solve this problem? thx
作者您好,我最近在尝试使用torch复现您的模型,遇到了一些困难,想向您请教一下:
目前前向传播的各层feature map包括最后过了softmax输出的prob输出的像素点平均误差在1e-7左右;并且可以将theano框架训练出的weights导入到现在的torch模型,取得和theano相同的预测指标。
但训练的时候同样选择Adadelta优化器,torch需要使用3e-1左右的学习率才能让loss下降,并且当训练集ExpRate=0.99时测试只能达到0.13。
感觉还需要对齐一下梯度传播,想请问一下theano的scan的循环变量梯度是怎么处理的呀?会一直把梯度往前传到step-0么?
I spend two days works on GPU-Theano environment on Win10+Cuda10.
After I got this, I transfer your code from python2 to python3.
But when I tried your code with 986 "online-test.pkl",it cose 43 mins in total.
And in your paper, you declear it was 70s.
I don't know where my mistake is.
作者您好,我用pytorch 对您的代码进行了复现,基础代码是没有问题的。但是当我复现weight noise的时候,总会出现梯度爆炸的现象。 我注意到adative weight noise的噪声比例是由小到大的, 当我的噪声比例达到0.01以上的时候,训练集就会出现梯度爆炸,loss越来越大,但是您的代码就没有这个问题,您有遇到过这个问题吗?会是什么原因呢?
张老师您好!
我在git bash中输入sh test.sh来运行sh文件时碰到了这个问题:
IOError: [Errno 2] No such file or directory: '../data/online-test.pkl'
data目录中确实没有这个文件,请问这个文件需要自己生成吗
你好,我想问一下,你有没有实验过用LSTM而不是GRU的作为语言模型的效果呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.