Giter Site home page Giter Site logo

sogou / sogoumrctoolkit Goto Github PK

View Code? Open in Web Editor NEW
745.0 745.0 166.0 242 KB

This toolkit was designed for the fast and efficient development of modern machine comprehension models, including both published models and original prototypes.

License: Apache License 2.0

Python 100.00%

sogoumrctoolkit's People

Contributors

dengchao007 avatar henryfriedlander avatar jgkimi avatar nonewait avatar shihanqunnie avatar sunnymarkliu avatar yxk9810 avatar yylun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sogoumrctoolkit's Issues

improvement to 'load' and 'save' method

The 'load' and 'save' methods in BaseModel only deal with the model, other related data like 'vocab' is not processed. So, to load a model from file, the vocabulary needs to be rebuilt or loaded separately, which is not ideal. It's better to save and load all related stuff.

Class vocabulary has 'load' and 'save' methods, but all data are save in JSON. It's larger and easy to corrupt. I don't see the benefit of using JSON format here.

No module named 'sogou_mrc'的问题

您好,在运行examples中的文件时出现“No module named 'sogou_mrc'”的错误,但是sogou_mrc下包含__init__.py文件,不知道这个问题是怎么产生的,可以烦请您解答一下么?谢谢!

NotFoundError: ./vacab.txt

When I run the run_bert_coqa.py, it comes out an error.
tensorflow.python.framework.errors_impl.NotFoundError: ./vocab.txt; No such file or directory
I want to know where does this file come from.

Shuffle buffer filled

command: python run_bert_squadv2.py
GPU: P40
CPU memory: 126G

log:
2019-07-28 01:28:56.654470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-28 01:28:56.866822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21139 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:05:00.0, compute capability: 6.1)
2019-07-28 01:29:01.610923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-28 01:29:01.611087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-28 01:29:01.611125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-28 01:29:01.611143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-28 01:29:01.792284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21139 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:05:00.0, compute capability: 6.1)
2019-07-28 01:29:04.056203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-28 01:29:04.056312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-28 01:29:04.056346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-28 01:29:04.056365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-28 01:29:04.057344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21139 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:05:00.0, compute capability: 6.1)
2019-07-28 01:29:49.138936: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 7971 of 130497
2019-07-28 01:29:59.139110: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 15820 of 130497
2019-07-28 01:30:09.139348: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 23963 of 130497
2019-07-28 01:30:19.138907: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 31963 of 130497
2019-07-28 01:30:29.139424: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 39830 of 130497
2019-07-28 01:30:39.139672: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 48245 of 130497
2019-07-28 01:30:49.138921: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 56364 of 130497
2019-07-28 01:30:59.139422: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 64376 of 130497
2019-07-28 01:31:09.139234: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 72092 of 130497
2019-07-28 01:31:19.139490: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 79574 of 130497
2019-07-28 01:31:29.139453: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 87148 of 130497
2019-07-28 01:31:39.139663: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 94795 of 130497
2019-07-28 01:31:49.139455: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 102515 of 130497
2019-07-28 01:31:59.140305: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 110001 of 130497
2019-07-28 01:32:09.138877: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 117752 of 130497
2019-07-28 01:32:19.139864: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 125678 of 130497
2019-07-28 01:32:25.347410: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:136] Shuffle buffer filled.
~

Can I handle Chinese using other models?

In the examples given , you just handled Chinese using model BIDAF by choosing the CMRCReader as the reader,I guess I can handle Chinese with other models by using CMRCReader,too. But I not very sure about it,could you give me the answer?

中文ELMo支持

如果想实现支持中文的ELMo+BIDAF,是否需要自己封装tf.hub的接口。还是有一些更好的解决方案?

ran out of memory

TF: tensorflow-gpu==1.12
显卡:Tesla P4 8G
尝试运行run_bidafplus_squad.py,报了显存分配的问题,我不知道这会不会对运行结果有影响

2019-04-07 05:11:40.657538: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-07 05:11:41.446788: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-07 05:11:41.447151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:00:06.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2019-04-07 05:11:41.447178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-07 05:11:41.882084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-07 05:11:41.882132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-07 05:11:41.882141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-07 05:11:41.882363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7051 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:06.0, compute capability: 6.1)
2019-04-07 05:11:42,321 - root - INFO - Reading file at train-v1.1.json
2019-04-07 05:11:42,322 - root - INFO - Processing the dataset.
87599it [07:43, 189.13it/s]
2019-04-07 05:19:25,497 - root - INFO - Reading file at dev-v1.1.json
2019-04-07 05:19:25,497 - root - INFO - Processing the dataset.
10570it [00:53, 196.53it/s]
2019-04-07 05:20:19,349 - root - INFO - Building vocabulary.
100%|███████████████████████████████████| 98169/98169 [00:30<00:00, 3218.07it/s]
2019-04-07 05:21:05.747563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-07 05:21:05.747695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-07 05:21:05.747711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-07 05:21:05.747718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-07 05:21:05.747925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7051 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:06.0, compute capability: 6.1)
2019-04-07 05:21:06.489069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-07 05:21:06.489145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-07 05:21:06.489156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-07 05:21:06.489162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-07 05:21:06.489389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7051 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:06.0, compute capability: 6.1)
2019-04-07 05:21:07.117979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-07 05:21:07.118055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-07 05:21:07.118066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-07 05:21:07.118072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-07 05:21:07.118278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7051 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:06.0, compute capability: 6.1)
2019-04-07 05:21:13,046 - root - INFO - Epoch 1/15
2019-04-07 05:21:13,351 - root - INFO - Eposide 1/2
2019-04-07 05:21:23.422390: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 10494 of 87599
2019-04-07 05:21:33.422566: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 21931 of 87599
2019-04-07 05:21:43.422157: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 32210 of 87599
2019-04-07 05:21:53.422415: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 42018 of 87599
2019-04-07 05:22:03.422089: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 52336 of 87599
2019-04-07 05:22:13.422587: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 62125 of 87599
2019-04-07 05:22:23.422099: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 72157 of 87599
2019-04-07 05:22:33.421957: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 82242 of 87599
2019-04-07 05:22:38.605655: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:136] Shuffle buffer filled.
2019-04-07 05:22:57.952087: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.88G (3091968768 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-07 05:23:27.134938: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.96GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:24:09.911666: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.28GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:01.375542: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.23GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:01.673176: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.94GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:33.173192: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.92GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:33.490319: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.93GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:33.502105: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:44,872 - root - INFO - - Train metrics: loss: 5.875
2019-04-07 05:28:46.141381: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:46.477394: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.64GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:28:47.501813: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.09GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-07 05:29:05,078 - root - INFO - - Eval metrics: loss: 3.759
2019-04-07 05:29:21,705 - root - INFO - - Eval metrics: exact_match: 51.325 ; f1: 63.040
2019-04-07 05:29:21,705 - root - INFO - - epoch 1 eposide 1: Found new best score: 63.039909
2019-04-07 05:29:21,705 - root - INFO - Eposide 2/2
2019-04-07 05:34:47,135 - root - INFO - - Train metrics: loss: 4.882
2019-04-07 05:35:02,895 - root - INFO - - Eval metrics: loss: 3.376
2019-04-07 05:35:19,210 - root - INFO - - Eval metrics: exact_match: 57.313 ; f1: 68.490
2019-04-07 05:35:19,210 - root - INFO - - epoch 1 eposide 2: Found new best score: 68.490210
2019-04-07 05:35:19,210 - root - INFO - Epoch 2/15
2019-04-07 05:35:19,213 - root - INFO - Eposide 1/2

Can't find model 'en'

Traceback (most recent call last):
File "E:/HDL/SMRCToolkit/run_bidafplus.py", line 21, in
reader = CoQAReader(history=-1)
File "E:\HDL\SMRCToolkit\sogou_mrc\dataset\coqa.py", line 16, in init
self.tokenizer = SpacyTokenizer()
File "E:\HDL\SMRCToolkit\sogou_mrc\utils\tokenizer.py", line 9, in init
self.nlp = spacy.load('en', disable=['parser','tagger','entity'])
File "D:\Program Files (x86)\Anaconda\lib\site-packages\spacy_init_.py", line 27, in load
return util.load_model(name, **overrides)
File "D:\Program Files (x86)\Anaconda\lib\site-packages\spacy\util.py", line 136, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

TypeError: sparse_to_dense() missing 2 required positional arguments

在运行bidaf_squadv2.py时遇到了如下报错:
File "G:\SMRCToolkit-master\sogou_mrc\data\batch_generator.py", line 121, in extract_char
out = tf.sparse.to_dense(out, default_value=default_value)
AttributeError: module 'tensorflow' has no attribute 'sparse'

我将文件中的tf.sparse.to_dense改为tf.sparse_to_dense,但是依旧报错
File "G:\SMRCToolkit-master\sogou_mrc\data\batch_generator.py", line 145, in transform_new_instance
context_char = extract_char(context_tokens)
File "G:\SMRCToolkit-master\sogou_mrc\data\batch_generator.py", line 121, in extract_char
out = tf.sparse_to_dense(out, default_value=default_value)
TypeError: sparse_to_dense() missing 2 required positional arguments: 'output_shape' and 'sparse_values'
请问这个问题该如何解决呢?

Need help regarding the installation

Hey @litao-buptsse, @yukyang, @wujindou

I am getting the following error while running any examples:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

It seems, either I do not have a compatible CUDA version or cudnn version.
My current versions are as follows:
CUDA: 9.0
cudnn: 7.5.1

Please help

several minor errors of Trainer

This framework is handy, I like it. Here are several problems I found:

  1. Trainer._evaluate requires 'model_path', but BaseModel doesn't provide that parameter;

  2. Trainer._inference call Trainer.inference (with 3 parameters) which doesn't exists.

Also, the default logging level disable all training information, that's inconvenient. At the end of training, there is no information of training at all.

some questions in application

你好,请问SMRCToolkit具体是做什么的?当我采用CMRC中的数据集进行训练之后,需要将模型用于我们自己的AI机器人对话之中。那么是不是当用cmrc2018_train数据集训练完之后,保存下来的模型就可以用于其他的中文阅读理解了?如果我输入自己的文章和问题,是否也能得到答案?如果可以,请问应该在哪里输入自己的文章和问题,怎么获取答案。(如果用中文回复那就再好不过了)

FailedPreconditionError: Error Loading Models

Hi,

Thank you very much for your code. I have been able to replicate your results for many on datasets using the model.train_and_evaluate() method. However, when I have tried to save and load a model, I have experienced an error. Initially I tried to save and evaluate using the BertCoQA model, but I am even experiencing errors when running the code from model_save_load.md tutorial.

Below is the error thrown (here is a pastebin with the full error if that would be helpful).

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value eval_metrics/mean/count [[node eval_metrics/mean/AssignAdd_1 (defined at /juicier/scr126/scr/hnf035/fresh/SMRCToolkit/sogou_mrc/model/bert_coqa.py:199) = AssignAdd[T=DT_FLOAT, use_locking=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](eval_metrics/mean/count, eval_metrics/mean/ToFloat, ^add_8)]]

Thank you very much for the help!

no 'session' and TypeError

Two questions come up when I run run_bert_coqa.py.
Traceback (most recent call last): File "/data2/wangfuyu/NQ/ycl/SMRCToolkit-master/sogou_mrc/model/base_model.py", line 21, in __del__ self.session.close() AttributeError: 'BertCoQA' object has no attribute 'session'

File "/data2/wangfuyu/NQ/ycl/SMRCToolkit-master/examples/run_coqa/run_bert_coqa.py", line 37, in <module> model = BertCoQA(bert_dir=bert_dir,answer_verificatioin=True) TypeError: __init__() got an unexpected keyword argument 'answer_verificatioin'

The version of my python is 3.6.8
Could anyone give me some advices?

CMRC2018数据集的支持

看到examples中有使用BiDAF在CMRC2018数据集上的测试。在cmrc_bidaf.py中使用了词向量embedding_folder,是否提供该词向量下载地址?谢谢!

Missing import sys and other issues

flake8 testing of https://github.com/sogou/SMRCToolkit on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./sogou_mrc/dataset/squadv2.py:88:55: F632 use ==/!= to compare str, bytes, and int literals
            "answer_start": answer_token_starts[0] if len(answer_token_starts) > 0 is not None else None,
                                                      ^
./sogou_mrc/dataset/squadv2.py:89:51: F632 use ==/!= to compare str, bytes, and int literals
            "answer_end": answer_token_ends[0] if len(answer_token_ends) > 0 is not None else None,
                                                  ^
./sogou_mrc/dataset/coqa.py:363:21: F821 undefined name 'sys'
                    sys.stderr.write("Turn id should match index {}: {}\n".format(i + 1, qa))
                    ^
./sogou_mrc/dataset/coqa.py:368:25: F821 undefined name 'sys'
                        sys.stderr.write("Question turn id does match answer: {} {}\n".format(qa, answer))
                        ^
./sogou_mrc/dataset/coqa.py:372:21: F821 undefined name 'sys'
                    sys.stderr.write("Gold file has duplicate stories: {}".format(source))
                    ^
./sogou_mrc/libraries/tokenization.py:39:27: F821 undefined name 'unicode'
    elif isinstance(text, unicode):
                          ^
./sogou_mrc/libraries/tokenization.py:62:27: F821 undefined name 'unicode'
    elif isinstance(text, unicode):
                          ^
./sogou_mrc/libraries/modeling.py:364:10: F821 undefined name 'output'
  return output
         ^
2     F632 use ==/!= to compare str, bytes, and int literals
6     F821 undefined name 'sys'
8

No OpKernel was registered to support Op 'CudnnRNN'

/home/purabi/anaconda3/envs/smrc/bin/python /home/purabi/SMRCToolkit-master/examples/run_bidaf/main.py
WARNING: Logging before flag parsing goes to stderr.
W0415 17:05:55.514122 139645281789760 init.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14
87599it [30:24, 48.01it/s]
10570it [03:26, 51.23it/s]
100%|██████████| 98169/98169 [01:22<00:00, 1194.49it/s]

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

2019-04-15 17:41:39.343160: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-15 17:41:39.370044: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2712000000 Hz
2019-04-15 17:41:39.370318: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55756a467530 executing computations on platform Host. Devices:
2019-04-15 17:41:39.370343: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
Traceback (most recent call last):
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
self._extend_graph()
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' used by {{node cu_dnnlstm/CudnnRNN}}with these attrs: [is_training=true, seed2=0, dropout=0, seed=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[{{node cu_dnnlstm/CudnnRNN}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/purabi/SMRCToolkit-master/examples/run_bidaf/main.py", line 31, in
model.train_and_evaluate(train_batch_generator, eval_batch_generator, evaluator, epochs=15, eposides=2)
File "/home/purabi/SMRCToolkit-master/sogou_mrc/model/base_model.py", line 47, in train_and_evaluate
self.session.run(tf.global_variables_initializer())
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' used by node cu_dnnlstm/CudnnRNN (defined at /home/purabi/SMRCToolkit-master/sogou_mrc/nn/recurrent.py:41) with these attrs: [is_training=true, seed2=0, dropout=0, seed=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[node cu_dnnlstm/CudnnRNN (defined at /home/purabi/SMRCToolkit-master/sogou_mrc/nn/recurrent.py:41) ]]

Caused by op 'cu_dnnlstm/CudnnRNN', defined at:
File "/home/purabi/SMRCToolkit-master/examples/run_bidaf/main.py", line 29, in
model = BiDAF(vocab, pretrained_word_embedding=word_embedding)
File "/home/purabi/SMRCToolkit-master/sogou_mrc/model/bidaf.py", line 34, in init
self._build_graph()
File "/home/purabi/SMRCToolkit-master/sogou_mrc/model/bidaf.py", line 93, in _build_graph
context_repr, _ = phrase_lstm(dropout(context_repr, self.training), self.context_len)
File "/home/purabi/SMRCToolkit-master/sogou_mrc/nn/recurrent.py", line 41, in call
fw = self.fw_layer(seq)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 701, in call
return super(RNN, self).call(inputs, **kwargs)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 111, in call
output, states = self._process_batch(inputs, initial_state)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 501, in _process_batch
is_training=True)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 142, in cudnn_rnn
seed2=seed2, is_training=is_training, name=name)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/purabi/anaconda3/envs/smrc/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNN' used by node cu_dnnlstm/CudnnRNN (defined at /home/purabi/SMRCToolkit-master/sogou_mrc/nn/recurrent.py:41) with these attrs: [is_training=true, seed2=0, dropout=0, seed=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[node cu_dnnlstm/CudnnRNN (defined at /home/purabi/SMRCToolkit-master/sogou_mrc/nn/recurrent.py:41) ]]

Process finished with exit code 1

run run_bert_coqa.py OOM

I have used 3 GPUs to run this program. But it still comes out an OOM.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[12,12,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node bert/encoder/layer_6/attention/self/Softmax (defined at /data2/wangfuyu/NQ/ycl/SMRCToolkit-master/sogou_mrc/libraries/modeling.py:728) = Softmax[T=DT_FLOAT, _class=["loc:@bert/encoder/layer_6/attention/self/cond/Switch_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_6/attention/self/add)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node truediv/_771}} = _Recv[client_terminated=false,recv_device="/job:localhost/replica:0/task:0/device:CPU:0",send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1,tensor_name="edge_5803_truediv", tensor_type=DT_FLOAT,_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Input text and Question and get Answer

I have successfully passed the data through the model, though I got an output of tensors from the inference in trainer.py. How should I decode these tensors into a string that contains the answer?
Thanks.

question about coqa data processing

I think 'skip' answer type is actually answerable (but no span exist).
why is skip type skipped in training time? is there any performance gain?
I'm afraid that skipped conversation ruins context of conversation history.
btw, thanks for good code.

预处理COQA数据集的问题

在预处理COQA数据集的时候,我看到到history_question_tokens 是所有历史问题和答案的拼接。
question answer。
在训练集中这样拼接操作应该是正确的。但是在eval验证集中也采用这样的方式感觉并不合适。
因为相当于引入了本来应该未知的数据,在验证集中的answer应该是未知的,不能和question拼接到一起。
希望可以解答我的疑惑,谢谢您。

CoQA in Google Colab - OOM (BertWrapper?)

Hello i'm currently trying to get the CoQA example running in google colab. Unfortunatly i get a OOM at "train_data = bert_data_helper.convert(train_data,data='coqa')". The colab machines only have 12,7 gb of RAM. When I run the toolkit on my local machine i can see that this process takes up to 14gb of RAM.
My Question is, is it possible to reduce the memory usage of the bert data helper (bert wrapper)? (and if, could you tell me where exactly?)

Thank you in advance

cudnn error

在运行bidaf_squadv2.py时,报出如下错误:
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1d_1/conv1d/Conv2D (defined at G:\SMRCToolkit-master\sogou_mrc\nn\layers.py:115) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/conv1d_1/conv1d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv1d_1/conv1d/ExpandDims_1)]]
[[{{node add_18/_227}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2587_add_18", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'conv1d_1/conv1d/Conv2D', defined at:
File "F:\Users\ylwang\Anaconda3\envs\SMRCToolkit-master\lib\runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
使用tensorflow版本为tensorflow-gpu==1.12,cuda9.0,cudnn7.0;怀疑是cudnn版本过低所以重新安装cudnn7.5.0,但问题依然存在,请问一下这个问题该如何解决呢?谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.