gaoq1 / rasa_nlu_gq Goto Github PK

View Code? Open in Web Editor NEW

302.0 21.0 97.0 5.8 MB

turn natural language into structured data(支持中文，自定义了N种模型，支持不同的场景和任务)

License: Apache License 2.0

Shell 0.07% Python 99.93%

natural-language bilstm-idcnn nlp nlu jieba rasa-nlu rasa-nlu-gao bert tensorflow rasa

rasa_nlu_gq's Introduction

Rasa NLU GQ

Rasa NLU (Natural Language Understanding) 是一个自然语义理解的工具，举个官网的例子如下：

"I'm looking for a Mexican restaurant in the center of town"

And returning structured data like:

  intent: search_restaurant
  entities: 
    - cuisine : Mexican
    - location : center

Introduction

原来的项目在分支0.2.7上，可自由切换。这个版本的修改是基于最新版本的rasa，将原来rasa_nlu_gao里面的component修改了下，并没有做新增。并且之前做法有些累赘，并不需要在rasa源码中修改。可以直接将原来的component当做addon加载，继承最新版本的rasa，可实时更新。

New features

目前新增的特性如下（请下载最新的rasa-nlu-gao版本）(edit at 2019.06.24)：

新增了实体识别的模型，一个是bilstm+crf，一个是idcnn+crf膨胀卷积模型，对应的yml文件配置如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: "(?u)\b\w+\b"
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 100

新增了jieba词性标注的模块，可以方便识别名字，地名，机构名等等jieba能够支持的词性，对应的yml文件配置如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "rasa_nlu_gao.extractors.jieba_pseg_extractor.JiebaPsegExtractor"
    part_of_speech: ["nr", "ns", "nt"]
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: "(?u)\b\w+\b"
  - name: "EmbeddingIntentClassifier"

新增了根据实体反向修改意图，对应的文件配置如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "JiebaPsegExtractor"
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: '(?u)\b\w+\b'
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.classifiers.entity_edit_intent.EntityEditIntent"
    entity: ["nr"]
    intent: ["enter_data"]
    min_confidence: 0

新增了bert模型提取词向量特征，对应的配置文件如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "EmbeddingIntentClassifier"
  - name: "CRFEntityExtractor"

新增了对CPU和GPU的利用率的配置，主要是EmbeddingIntentClassifier和ner_bilstm_crf这两个使用到tensorflow的组件，配置如下（当然config_proto可以不配置，默认值会将资源全部利用）：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: '(?u)\b\w+\b'
  - name: "EmbeddingIntentClassifier"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }

新增了embedding_bert_intent_classifier分类器，对应的配置文件如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "rasa_nlu_gao.classifiers.embedding_bert_intent_classifier.EmbeddingBertIntentClassifier"
  - name: "CRFEntityExtractor"

在基础词向量使用bert的情况下，后端的分类器使用tensorflow高级api完成，tf.estimator,tf.data,tf.example,tf.saved_model intent_estimator_classifier_tensorflow_embedding_bert分类器，对应的配置文件如下：

language: "zh"

pipeline:
- name: "JiebaTokenizer"
- name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
  ip: '127.0.0.1'
  port: 5555
  port_out: 5556
  show_server_config: True
  timeout: 10000
- name: "rasa_nlu_gao.classifiers.embedding_bert_intent_estimator_classifier.EmbeddingBertIntentEstimatorClassifier"
- name: "SpacyNLP"
- name: "CRFEntityExtractor"

rasa-nlu的究极形态，对应的配置文件如下(edit at 2019.10.01)可参考上面的文章

Quick Install

pip install rasa-nlu-gao

Some Examples

具体的例子请看rasa_chatbot_cn

external link

liveportraitweb novelling

rasa_nlu_gq's People

Contributors

Stargazers

Watchers

Forkers

nacyzhaomin azuredsky nanhaishun happyyolanda jiniaoxu 1073521013 iorilu yuxuan2015 hblu bijibing cicean aleckvivi whitespur lvcheer colinsongf moonlione chdkzl quickyue myselfpi cuizhengliang andy51002000 middle-plat-ai jacklee20151 aaroncao yalong9528 fishguysword lazyliang95 nidhoggurz wengbenjue wzs951015 linkfar why-not-sky wang-zhengxin zlxwl huoran559 strategist922 gregorywu williamgalindezarias juvu ustcrding zhangyuteng ttgit iterator99 1483576736 liguiming77 yuri789 alreal0 chladams bringtree lzpfmh lllowen nanayin ydhqfly qygjw zhiaiai liangtianxin chanqi4444 angelherosong dineshkumares leospecial frankiegu xingzhoupy zhangxt mason0629 javyxu chenny0808 easonshow yw1991 yehuangcn orangeices fcpluto fuhongyuan ishine qiusyang haiming2019 qiannianqiannian killinux lullabyafa dataxujing josiahmg junan007 archimedes1027 lyonleelpl dtxwhzw fengjianli007 coderlxn lbxcfx cchengz linecode curiszhou gallantzhangyu wangjifei121 stevenlol kiminh lyhiving

rasa_nlu_gq's Issues

包安装问题

您好，源码安装出现FileNotFoundError: [Errno 2] No such file or directory: 'version.py'这应该怎么解决

discuss: in practics, the rule-based dialog-policy/management VS e2e(NN)-based ?

JiebaPsegExtractor与CRFEntityExtractor抽取的实体重复

测试了一下，假设nlu文件中存在某个样例text，text中存在name=“李白”，当输入text时，JiebaPsegExtractor与CRFEntityExtractor都会抽取出name="李白"这个实体，如果name的slot type为list，这时候会slot name=[“李白”，“李白”]。也就是JiebaPsegExtractor和CRFEntityExtractor抽取的实体重复了，没有去重，应该在JiebaPsegExtractor中进行去重操作。

bert_vectors_featurizer.py中_combine_with_existing_text_features函数定义在哪里？

请教大佬个问题，意图识别成功了，但是entities识别不成功，应该调整哪里呢？

比如，查询系统医药系统 -------》意图是查询系统。实体是医药系统，但是能发现意图，得到的实体为空。jieba分词这里已经添加这些系统的名字到自定义词库了。不知道应该再去优化一下哪里。

大佬有什么建议吗？我想做一个系统查询的助手。

配置：
language: "zh"

pipeline:

name: "tokenizer_jieba"
name: "bert_vectors_featurizer"
ip: '127.0.0.1'
port: 5555
port_out: 5556
show_server_config: True
timeout: 10000
name: "intent_classifier_tensorflow_embedding_bert"
name: "ner_crf"
name: "jieba_pseg_extractor"

==================================================

训练数据

{
"text": "查询系统KPTV-ISMX",
"intent": "search_system",
"entities": [
{
"start": 4,
"end": 13,
"value": "KPTV-ISMX",
"entity": "sysCode"
}
]
},
{
"text": "查询系统ISMX",
"intent": "search_system",
"entities": [
{
"start": 4,
"end": 8,
"value": "ISMX",
"entity": "sysCode"
}
]
},
{
"text": "查询系统医药系统",
"intent": "search_system",
"entities": [
{
"start": 4,
"end": 8,
"value": "医药系统",
"entity": "sysName"
}
]
},
{
"text": "查询系统即时通讯系统",
"intent": "search_system",
"entities": [
{
"start": 4,
"end": 10,
"value": "即时通讯系统",
"entity": "sysName"
}
]
}

PyLTPEntityExtractor组件不能导入

最近使用pyltp的实体抽取的时候，引用组件rasa_nlu_gao.extractors.pyltp_extractor.PyLTPEntityExtractor报出没有导入这个组件，但用相同方式导入rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor却可以，不知道这边是缺少了哪个步骤导致的

怎么使用自己预训练的词向量？

Gao大佬，你好，使用自己预训练的词向量的话就要先训练然后使用mitie工具转换成词向量xxx.dat，然后在pipline文件里配置是吗？官网不是说快要弃用mitie了，那以后怎么使用自己训练的词向量呢？

ModuleNotFoundError: No module named 'pyltp'

pip install pyltp 报错求帮忙！

是否支持rasa 1.3.9或更高版本？

如题，请问是否支持rasa1.3.9以及更高版本？

jieba_pseg_extractor pipeline: 自定义词典时遇到的问题

大佬好，我使用的是jieba词性标注的组件，想要使用我自定义的字典，但是不知该如何加入，在posseg方法中使用jieba. load_userdict()并不能成功，希望可以提示一下

请问有没有升级到tensorflow2.x的代码

BilstmCRFEntityExtractor迭代次数的问题

在训练BilstmCRFEntityExtractor模型的过程中，为什么只能看到iteration的次数，而不能自定义epochs的数量

SeqGAN训练小黄鸡语料

楼主拜读了您用SeqGAN训练生成小黄鸡语料的文章，https://www.ctolib.com/GaoQ1-seqgan.html感到很感兴趣。请问可以分享一下代码吗？不会做商业用途，只为了交流学习。O(∩_∩)O谢谢啦。我的邮箱是[email protected]

Failed to find component class for 'rasa_nlu_gao.extractors.entity_synonyms.EntitySynonymMapper'

我发现用官方的EntitySynonymMapper似乎对中文起不了作用，于是想试一下rasa_nlu_gao的EntitySynonymMapper但是导入不成功。另外我想知道是我的nlu文件写法有问题还是他这个官方的synonym真的对中文不管用？还有regex好像也不管用

config.yml
pipeline:

name: "JiebaTokenizer"
name: "CRFEntityExtractor"
name: "EntitySynonymMapper"
name: "rasa_nlu_gao.extractors.jieba_pseg_extractor.JiebaPsegExtractor"
part_of_speech: ["nr","ns","nt"]
name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
ip: '127.0.0.1'
port: 5555
port_out: 5556
show_server_config: False
timeout: 10000
name: "rasa_nlu_gao.classifiers.embedding_bert_intent_classifier.EmbeddingBertIntentClassifier"

nlu.md 片段

intent:check_report

synonym:今天

昨天
前天

synonym:本周

这周
上周

regex:month

[上这本]{1,2}月
[1十]?[一二三四五六七八九十0-9]月

KeyError: 'ner_bilstm'

您好, 我在安装了您的repo后, 然后进行训练, 报错keyerror:
我使用的rasa_nlu版本是: v0.13.8

config.yml:



language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "intent_featurizer_count_vectors"
  OOV_token: oov
  token_pattern: '(?u)\b\w+\b'
- name: "ner_bilstm_crf"
  lr: 0.001
  char_dim: 100
  lstm_dim: 100
  batches_per_epoch: 10
  seg_dim: 20
  num_segs: 4
  batch_size: 200
  tag_schema: "iobes"
  model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
  clip: 5
  optimizer: "adam"
  dropout_keep: 0.5
  steps_check: 100
- name: "ner_synonyms"
- name: "jieba_pseg_extractor"
  part_of_speech: ["nr", "ns", "nt"]

- name: "intent_classifier_tensorflow_embedding"
path: "./models/inquiry_nlu"
data: "./data/data.json"

我还需要做别的什么配置吗?

BERT分类准确率

embedding_bert_intent_classifier
我训练的时候准确率达到了98%，在进行测试的时候，使用了几个样例，有的样例在训练集里出现过，有的没有，但准确率不高，是语料的问题吗？如果是语料的问题，大概是什么原因呢

识别oov的配置

在训练的过程中，如何对未登录词进行识别呢。我这边的配置如下，

pipeline:
  - name: "MitieNLP"
    model: "total_word_feature_extractor_zh.dat"
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "idcnn"
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 300
  - name: "EntitySynonymMapper"
  - name: "RegexFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "EmbeddingIntentClassifier"

查看了官方文档的RegexFeaturizer，没有发现可以调整的参数。而我针对CountVectorsFeaturizer使用token_pattern时，报出了如下的错误
Cannot feed value of shape (64,) for Tensor 'a:0', which has shape '(?, 4232)'
大佬方便解答一下吗

bert_as_service组件和bert offline的区别是什么

大神在rasa里集成的bert_as_service组建和bert offline的区别是什么？二者达到的效果不是一样的？还是大神在用了bert的情况下只用CPU就达到了194个并发，是因为用的bert offline吗？如果是用bert_as_service就需要用GPU，而且单纯CPU也达不到194个并发

The number of samples is too large to be trained

Rasa NLU version:
rasa-core 0.12.4
rasa-core-sdk 0.12.2
rasa-nlu 0.13.8
rasa-nlu-gao 0.2.7
sklearn-crfsuite 0.3.6
sklearn-pandas 1.8.0
tensorflow 1.10.0

Operating system (windows, osx, ...):
ubuntu 18.04.1 LTS

Content of model configuration file:

language: "zh"
  
pipeline:
- name: "tokenizer_jieba"
- name: "bert_vectors_featurizer"
  #ip: '172.16.10.46'
  port: 5555
  port_out: 5556
  show_server_config: True
  timeout: 10000

- name: "ner_crf"
- name: "jieba_pseg_extractor"
  part_of_speech: ["nr", "ns", "nt"]
- name: "intent_featurizer_count_vectors"
  OOV_token: oov
  token_pattern: '(?u)\b\w+\b'
- name: "intent_classifier_tensorflow_embedding_bert"
  intent_tokenization_flag: True
  intent_split_symbol: "+"

Issue:
When using the 1860 samples , it was successful. However, when using 3065 samples, it was failed. The feedback listed as follows:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/train.py", line 175, in <module>
    num_threads=cmdline_args.num_threads)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/train.py", line 150, in do_train
    interpreter = trainer.train(training_data, **kwargs)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/model.py", line 196, in train
    **context)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/classifiers/embedding_bert_intent_classifier.py", line 339, in train
    self.drop_out:self.droprate}
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be broadcastable: logits_size=[256,28] labels_size=[256,22]
     [[Node: softmax_cross_entropy_with_logits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ConvNet/dense_layer_dense/BiasAdd, softmax_cross_entropy_with_logits/Reshape_1)]]


Caused by op 'softmax_cross_entropy_with_logits', defined at:
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/train.py", line 175, in <module>
    num_threads=cmdline_args.num_threads)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/train.py", line 150, in do_train
    interpreter = trainer.train(training_data, **kwargs)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/model.py", line 196, in train
    **context)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/rasa_nlu_gao/classifiers/embedding_bert_intent_classifier.py", line 307, in train
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits_train, labels=self.b_in)) + tf.losses.get_regularization_loss()
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1879, in softmax_cross_entropy_with_logits_v2
    precise_logits, labels, name=name)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 7209, in softmax_cross_entropy_with_logits
    name=name)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/home/miniconda3/envs/rasa_nlu_gao/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()


InvalidArgumentError (see above for traceback): logits and labels must be broadcastable: logits_size=[256,28] labels_size=[256,22]
     [[Node: softmax_cross_entropy_with_logits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ConvNet/dense_layer_dense/BiasAdd, softmax_cross_entropy_with_logits/Reshape_1)]]

The additional samples is also successful to be trained.
Looking forward to your respond. Thank you.

data/vectors.txt 用的是那个

我本地为啥起了serve 还是timeout

TimeoutError: no response from the server (with "timeout"=10000 ms), pleas│ File "src/gevent/__greenlet_primitives.pxd",
e check the following:is the server still online? is the network broken? a│line 35, in gevent.__greenlet_primitives._green
re "port" and "port_out" correct? are you encoding a huge amount of data w│let_switch
hereas the timeout is too small for that?

只有gpu训练分类，要怎么设置embedding_intent_classifier的配置

Rasa NLU version:

Operating system (windows, osx, ...):

Content of model configuration file:

Issue:

BertVectorsFeaturizer好像有个bug

BertVectorsFeaturizer好像有个bug，当intent训练的数据数量恰好等于batch_size*n+1时(n>=1),如batch_size为128而训练数据量为129或者257时，最后一条数据转bert向量有问题。这种情况下使用EmbeddingBertIntentClassifier会报错：all input arrays must have the same shape。

bilstm+crf pipeline：在evaluation中遇到实体重叠问题

Rasa NLU version:

Operating system (windows, osx, ...):

Content of model configuration file:

Issue:

在使用测试数据进行测试时出现了以上问题

然而打印出测试数据之后，发现数据标注并没有问题。在调试的过程中，发现是源代码中在执行下面第二个标红线的模块报错，这里使用的是预测的实体，请问是否是预测时使用ner_bilstm_crf抽取实体出现Bug呢？

无法安装

pip3安装时有些依赖项需要从多个版本中找到兼容的项，结果安装了七八个小时都没安装上

使用自定义组件训练好模型后，运行出现如下错误

Exception in thread Thread-11:
Traceback (most recent call last):
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/rasa_core/channels/channel.py", line 324, in on_message_wrapper
on_new_message(message)
File "/usr/local/lib/python3.6/site-packages/rasa_core/agent.py", line 322, in handle_message
return processor.handle_message(message)
File "/usr/local/lib/python3.6/site-packages/rasa_core/processor.py", line 73, in handle_message
tracker = self.log_message(message)
File "/usr/local/lib/python3.6/site-packages/rasa_core/processor.py", line 119, in log_message
self._handle_message_with_tracker(message, tracker)
File "/usr/local/lib/python3.6/site-packages/rasa_core/processor.py", line 260, in _handle_message_with_tracker
parse_data = self._parse_message(message)
File "/usr/local/lib/python3.6/site-packages/rasa_core/processor.py", line 245, in _parse_message
parse_data = self.interpreter.parse(message.text)
File "/usr/local/lib/python3.6/site-packages/rasa_core/interpreter.py", line 245, in parse
result = self.interpreter.parse(text)
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/model.py", line 370, in parse
component.process(message, **self.context)
File "/home/rasa_nlu_core_use/custom_components/extractors/entity_extractor.py", line 138, in process
extracted = self.add_extractor_name(self.extract_entities(message))
File "/home/rasa_nlu_core_use/custom_components/extractors/entity_extractor.py", line 146, in extract_entities
entities, result = self.model.predict_entities([list(message.text)], join_chunk=''), []
File "/home/rasa_nlu_core_use/core/tasks/labeling/base_model.py", line 55, in predict_entities
res = self.predict(x_data, batch_size, debug_info, predict_kwargs)
File "/home/rasa_nlu_core_use/core/tasks/base_model.py", line 419, in predict
pred = self.tf_model.predict(tensor, batch_size=batch_size, **predict_kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1078, in predict
callbacks=callbacks)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 157, in model_iteration
f = _make_execution_function(model, mode)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 532, in _make_execution_function
return model._make_execution_function(mode)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2282, in _make_execution_function
self._make_predict_function()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2272, in _make_predict_function
**kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3479, in function
return GraphExecutionFunction(inputs, outputs, updates=updates, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3142, in init
with ops.control_dependencies([self.outputs[0]]):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5426, in control_dependencies
return get_default_graph().control_dependencies(control_inputs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4867, in control_dependencies
c = self.as_graph_element(c)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3875, in _as_graph_element_locked
raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("layer_crf/cond/Merge:0", shape=(?, 128, 6), dtype=float32) is not an element of this graph.

请问本项目使用的Rasa版本？

首先感谢大佬在Rasa上的研究，解决了不少问题，万分感谢！

本人在阅读rasa对话系统踩坑记（二）时，安装了该库

pip install rasa-nlu-gao

使用配置config.yml训练

language: "zh"

 pipeline:
   - name: "tokenizer_jieba"

   - name: "intent_featurizer_count_vectors"
     token_pattern: '(?u)\b\w+\b'
   - name: "intent_classifier_tensorflow_embedding"

   - name: "ner_bilstm_crf"
     lr: 0.001
     char_dim: 100
     lstm_dim: 100
     batches_per_epoch: 10
     seg_dim: 20
     num_segs: 4
     batch_size: 200
     tag_schema: "iobes"
     model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
     clip: 5
     optimizer: "adam"
     dropout_keep: 0.5
     steps_check: 100

报错

  File "_ruamel_yaml.pyx", line 706, in _ruamel_yaml.CParser.get_single_node
  File "_ruamel_yaml.pyx", line 724, in _ruamel_yaml.CParser._compose_document
  File "_ruamel_yaml.pyx", line 775, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 891, in _ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 904, in _ruamel_yaml.CParser._parse_next_event
ruamel.yaml.parser.ParserError: while parsing a block mapping
  in "<unicode string>", line 1, column 1
did not find expected key
  in "<unicode string>", line 3, column 2

本人才疏学浅无从下手，请问本项目使用的Rasa版本？期待您的回复！

list assignment index out of range的问题

在训练nlu模型的时候，出现了如下错误


自己也检查了训练文件的json格式和编码，出现图片上问题的原因至今还不清楚

是不是一定要支持GPU

Collecting tensorflow-gpu==1.14.0 (from rasa-nlu-gao)
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.14.0 (from rasa-nlu-gao) (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0)
ERROR: No matching distribution found for tensorflow-gpu==1.14.0 (from rasa-nlu-gao)

在使用pip3 install rasa-nlu-gao命令之后遇到上面的问题
当前机器MBP13，系统版本：10.15

请问，KashgariEntityExtractor 在pipeline中如何使用？

麻烦你给一个pipeline流程配置demo

未登录词（OOV）的配置

在使用rasa的过程中，想识别未登录词。我自己的配置如下

pipeline:
  - name: "MitieNLP"
    model: "total_word_feature_extractor_zh.dat"
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "idcnn"
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 300
  - name: "EntitySynonymMapper"
  - name: "RegexFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "EmbeddingIntentClassifier"

我在RegexFeaturizer的源码没有找到相关参数的配置，大佬你写的代码里CountVectorsFeaturizer使用了token_pattern的参数，没加这个参数训练时可以正常训练，但加了自定义的token_pattern参数后，却报出了Cannot feed value of shape (64,) for Tensor 'a:0', which has shape '(?, 52000)'的错误。想问一下大佬有解决方案吗？或者能否提供一个解决未登录词（OOV）的解决方案？

现在是断更了还是怎么了？

支持多个模型问题

rasa通过自带的project参数可用来切换模型，
curl 'ip:port/parse?q=123&project=project1'
curl 'ip:port/parse?q=123&project=project2'
但是该项目的bilstm 实体识别不支持多个模型的切换。

AttributeError: 'BertVectorsFeaturizer' object has no attribute '_combine_with_existing_text_features'

BertVectorsFeaturizer中调用了_combine_with_existing_text_features方法，是作者粗心敲错方法名了吗？应该改为继承自Featurizer的_combine_with_existing_features方法是吧？

windwos下能安装吗，似乎很多问题，即使virtualenv还是不行

结巴分词存在重复载入词库导致性能下降

https://github.com/GaoQ1/rasa_nlu_gq/blob/master/rasa_nlu_gao/tokenizers/jieba_tokenizer.py#L78

这里结巴分词器的分词方法存在重复载入自定义词典的问题。这样子会导致引入了自定义词典后，nlu的处理效率大幅度下降

module 'tensorflow.contrib.estimator' has no attribute 'LinearEstimator'

在rasa_nlu_gao==0.13.4版本的原代码上使用rasa_nlu_gao\classifiers\embedding_bert_intent_estimator_classifier.py 组件会遇到这个错误：

Traceback (most recent call last):
File "F:\Program Files\Python36\Lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "F:\Program Files\Python36\Lib\runpy.py", line 85, in _run_code
File "F:\rasa_nlu_gq\rasa_nlu_gao\train.py", line 175, in
num_threads=cmdline_args.num_threads)
File "F:\rasa_nlu_gq\rasa_nlu_gao\train.py", line 150, in do_train
interpreter = trainer.train(training_data, **kwargs)
File "F:\rasa_nlu_gq\rasa_nlu_gao\model.py", line 196, in train
**context)
File "F:\rasa_nlu_gq\rasa_nlu_gao\classifiers\embedding_bert_intent_estimator_classifier.py", line 238, in train
self.estimator = tf.contrib.estimator.LinearEstimator(
AttributeError: module 'tensorflow.contrib.estimator' has no attribute 'LinearEstimator'

操作系统是WIN10 64位家庭中文版 1809
TensorFlow的版本是：1.13.1

Name: tensorflow
Version: 1.13.1
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: [email protected]
License: Apache 2.0
Location: d:\python\3.6\chatbot\lib\site-packages
Requires: keras-applications, tensorflow-estimator, tensorboard, keras-preprocessing, six, absl-py, grpcio, wheel, numpy, gast, astor, termcolor, protobuf
Required-by: rasa-core

gaoq1 / rasa_nlu_gq Goto Github PK

rasa_nlu_gq's Introduction

Rasa NLU GQ

Introduction

New features

Quick Install

Some Examples

external link

rasa_nlu_gq's People

Contributors

Stargazers

Watchers

Forkers

rasa_nlu_gq's Issues

大佬有什么建议吗？我想做一个系统查询的助手。

训练数据

intent:check_report

synonym:今天

synonym:本周

regex:month

Recommend Projects

Recommend Topics

Recommend Org