Giter Site home page Giter Site logo

tap's People

Contributors

jianshuzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tap's Issues

data的生成

请问data里的.ascii文件是通过预训练提取.inkml的特征么?还是其他的东西呀?

模型评估

尊敬的作者您好,我git clone了本项目,并使用v2模型中的test.sh命令对模型进行了评估,评估结果如下:
test set decode done, cost time ... 374.329162049
Valid WER: 2011.69%, ExpRate: 1.22%
此评估结果和论文中的描述有较大差距,论文测试结果截图如下:
image

请问这是什么原因呢?

期待您的解答。

模型复现

您好,我尝试使用pytorch架构的v2模型进行复现,在实现整个模型的搭建后,两个架构分别传入相同的输入和权重,验证了Tracker输出ctx、Attention输出ctxs、Parser输出proj_h、模型类别输出概率probs、损失值cost等值均相等,优化器使用带有梯度截断的adadelta,使用相同数据集和配置参数进行训练(唯一的区别是没有添加noise),然而训练了超过400个Epoch模型并为收敛。

是我训练的Epoch还没足够才会未收敛么?请问您训练了多少个Epoch模型可以收敛呢?
还是需要其他的策略才能让模型收敛呢?

期待您任何有帮助的回复。

some question for testing

when I run test.sh, it occurs some errors, KeyError: 'dim_attention'.
I load pkl files in WAP and TAP,
WAP:
{'dim_ConvBlock': [32, 64, 64, 128], 'decay_c': 0.0001, 'patience': 15, 'max_epochs': 5000, 'dispFreq': 100, 'batch_Imagesize': 500000, 'alpha_c': 0.0, 'bn_saveto': './models/bn_params.npz', 'saveto': './models/attention_maxlen[200]dimWord256_dim256.npz', 'clip_c': 100.0, 'kernel_Convenc': [3, 3], 'dim_coverage': 128, 'valid_result': ['./result/valid.wer'], 'valid_batch_size': 8, 'maxImagesize': 500000, 'dim_dec': 256, 'validFreq': -1, 'kernel_coverage': [5, 5], 'optimizer': 'adam', 'input_channels': 1, 'use_dropout': True, 'batch_size': 8, 'encoder': 'gru', 'dim_target': 111, 'finish_after': 10000000, 'lrate': 0.0002, 'valid_datasets': ['../data/offline-test.pkl', '../data/test_caption.txt'], 'layersNum_block': [4, 4, 4, 4], 'valid_output': ['./result/valid_decode_result.txt'], 'datasets': ['../data/offline-train.pkl', '../data/train_caption.txt'], 'dim_word': 256, 'sampleFreq': -1, 'dim_attention': 128, 'dictionaries': ['../data/dictionary.txt'], 'reload': False, 'maxlen': 200, 'decoder': 'gru_cond', 'saveFreq': -1, 'valid_batch_Imagesize': 500000}

TAP:
{'lrate': 1e-08, 'decay_c': 0.0, 'patience': 15, 'max_epochs': 5000, 'dispFreq': 100, 'alpha_c': 0.0, 'clip_c': 1000.0, 'saveto': './models/attention_maxlen[2000]dimWord256_dim256.npz', 'dim_coverage': 121, 'valid_batch_size': 8, 'dim_dec': 256, 'optimizer': 'adadelta', 'validFreq': -1, 'norm_file': ['norm.pkl'], 'batch_size': 8, 'encoder': 'gru', 'dim_target': 111, 'decoder': 'gru_cond', 'valid_datasets': ['../../prepare_data/data/9feature-valid-dis-0.005-revise.pkl', '../../prepare_data/data/valid_data_v3.txt'], 'dim_feature': 9, 'use_dropout': False, 'datasets': ['../../prepare_data/data/9feature-train-dis-0.005-revise.pkl', '../../prepare_data/data/train_data_v3.txt'], 'dim_word': 256, 'sampleFreq': -1, 'dim_enc': [250, 250, 250, 250], 'dictionaries': ['../../prepare_data/data/dictionary.txt'], 'reload': False, 'maxlen': 2000, 'finish_after': 10000000, 'down_sample': [0, 0, 1, 1], 'saveFreq': -1}

I notice some options are not in TAP's model. How can I solve this problem? thx

pytorch复现模型

作者您好,我最近在尝试使用torch复现您的模型,遇到了一些困难,想向您请教一下:
目前前向传播的各层feature map包括最后过了softmax输出的prob输出的像素点平均误差在1e-7左右;并且可以将theano框架训练出的weights导入到现在的torch模型,取得和theano相同的预测指标。
但训练的时候同样选择Adadelta优化器,torch需要使用3e-1左右的学习率才能让loss下降,并且当训练集ExpRate=0.99时测试只能达到0.13。
感觉还需要对齐一下梯度传播,想请问一下theano的scan的循环变量梯度是怎么处理的呀?会一直把梯度往前传到step-0么?

The Processing Time is so slow in GPU-Theano on Win10+CUDA10+py3.7.9

I spend two days works on GPU-Theano environment on Win10+Cuda10.
After I got this, I transfer your code from python2 to python3.
But when I tried your code with 986 "online-test.pkl",it cose 43 mins in total.
And in your paper, you declear it was 70s.
I don't know where my mistake is.

guided hybrid attention model

Hi, jianshu,
In v2, I can't found code about 'guided hybrid attention model'? Is this version consistent with the model described in your paper?
image

weight noise 问题

作者您好,我用pytorch 对您的代码进行了复现,基础代码是没有问题的。但是当我复现weight noise的时候,总会出现梯度爆炸的现象。 我注意到adative weight noise的噪声比例是由小到大的, 当我的噪声比例达到0.01以上的时候,训练集就会出现梯度爆炸,loss越来越大,但是您的代码就没有这个问题,您有遇到过这个问题吗?会是什么原因呢?

语言模型问题

你好,我想问一下,你有没有实验过用LSTM而不是GRU的作为语言模型的效果呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.