Giter Site home page Giter Site logo

text_infilling's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

text_infilling's Issues

关于这个模型的大概过程

template是一句话,然后可能句子有两个blank。将两个答案合在一起,做个masked的self-attention(只能看到当前预测词之前的信息),然后以这个结果作为query。另外一边,那句话(带有两个mask token)作为key和value,再做一次attention,再经过一个FF就得到了一层的输出。下一层的query来源于上一层的输出再经过一个self-attention,key和value不变,还是template。然后重复这一过程。是不是这就是训练过程啊。
老铁,是不是这就是模型大概过程啊,我主要的疑惑是训练时候decoder那边的input是不是就是只是两个blank处真实答案的拼接,还是原句都会和他们拼接在一起

Regarding introduction of MaskGAN in your paper

I have read your paper Text Infilling. The introduction of MaskGAN in the paper is a little confusing.

In the Introduction section, it writes: "For example, the recent MaskGAN work (Fedus et al., 2018) and the sentence completion task (Zweig and Burges, 2011) have assumed each missing portion of a sentence contains only a single word."

But in the paper of MaskGAN, it writes that
"For a discrete sequence x = (x1, · · · , xT ), a binary mask is generated (deterministically or stochastically) of the same length m = (m1, · · · , mT ) where each mt ∈ {0, 1}, selects which tokens will remain."

So I think that the missing portion of a sentence in MaskGAN is not a single word.

我在用一个240万的中文数据集训练模型出现了一些问题,不知道如何解决

你好!我是一名初学者,对于您的这篇论文很感兴趣,想着用中文的数据集来训练模型会是什么样的效果,但是我用了一个240万的中文数据集训练模型时遇到了一个问题,可能是自己能力不足没法解决,只能求助您,这个问题就是:在训练模型时,老是报错‘’tensorflow.python.framework.errors_impl.UnknownError: IndexError: too many indices for array‘’,我只跑通pos数据集,对于neg数据集也会出现同样的错误,我用的是服务器运行程序,而对于中文数据集,只训练‘’epoch:0 test_bleu:30.07800579071045 template_bleu:79.62971329689026 test_loss:6.98167085647583 test_ppl:1187.98974609375‘’然后就出现以下报错:
root@a8f8e2b9891d:/notebooks# python self_attn.py --mask_rate 0.2 --blank_num 2 --filename_prefix 'data.' --data_dir './yelp_data/data/'
/usr/local/lib/python3.5/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
train_file:/notebooks/yelp_data/data/data.train.txt
valid_file:/notebooks/yelp_data/data/data.valid.txt
logdir:./log_dir/data.bsize150.epoch120.seqlen64.dynamic_lr.present0.8.partition2.hidden256.self_attn/
WARNING:tensorflow:From /notebooks/texar/utils/beam_search.py:87: calling reduce_logsumexp (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
epoch:0 test_bleu:30.07800579071045 template_bleu:79.62971329689026 test_loss:6.98167085647583 test_ppl:1187.98974609375
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: IndexError: too many indices for array
[[Node: PyFunc_6 = PyFunc[Tin=[DT_INT64, DT_INT64], Tout=[DT_INT64, DT_INT64], token="pyfunc_6", _device="/job:localhost/replica:0/task:0/device:CPU:0"](PyFunc_5, Variable_7/read)]]
[[Node: decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather/_1765 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12352_decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.