<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

难道是tag_to_id不能超过76行？<a class="user-mention notranslate" data-hovercard-type="user" dat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

NER训练报错 about deepnlp HOT 13 OPEN

rockingdingo commented on August 24, 2024

NER训练报错

from deepnlp.

Comments (13)

rockingdingo commented on August 24, 2024

您好，我看了您的记录，新模型需要更新 ModelLargeConfig 那个类, 就是 target output size 是要改成你的标签的个数。可以修改 ner_model.py中的 get_config() 函数

from deepnlp.

onep2p commented on August 24, 2024

嗯嗯已经看到了数据整理了还有个轮次的问题我测试的数据较小

from deepnlp.

onep2p commented on August 24, 2024

其实可以提供一个训练语料的例子

from deepnlp.

onep2p commented on August 24, 2024

还存在这个错误，是语料中不支持某些词吗比如/标点符号等具体那些不支持呢？我这边第一次以10篇新闻的数据为语料能够跑起来现在5005篇就不行了

from deepnlp.

onep2p commented on August 24, 2024

难道是tag_to_id不能超过76行？@rockingdingo

from deepnlp.

onep2p commented on August 24, 2024

已经处理了第一版的财经数据（目前还只识别公司实体）不过num_steps参数只有5才能正常跑完，不知道这是为什么

from deepnlp.

onep2p commented on August 24, 2024

经常出现词性无法训练的问题请问词性训练是否可以带入标点、空格等如果不是，求告知下那些不能带入目前总是出现indices[x,x] = xxxx not in [0,60000]

from deepnlp.

rockingdingo commented on August 24, 2024

@onep2p Hello 不好意思过年回来才有时间处理，词性训练可以带入标点的，没有完整的 bug trace 也不方便找原因，可以接个图吗。另外公司实体的感兴趣contribe 出来吗，欢迎提 merge request哈？分词或者是什么的。

from deepnlp.

onep2p commented on August 24, 2024

好的，我弄好了就提merge request 那天的BUG貌似是却的范围没对我在10万多词用6完去取，有的不存在张量里面

from deepnlp.

onep2p commented on August 24, 2024

@rockingdingo 我在ner训练的时候num_steps不能写30是什么情况啊数据少了吗

from deepnlp.

onep2p commented on August 24, 2024

@rockingdingo 感觉并没有上下文关系的处理啊，都是通过dic来匹配实体的如果同一个词有多个意思呢？比如KODA可以是公司名，也可以是产品名，还可以是人名

from deepnlp.

rockingdingo commented on August 24, 2024

Hello，在 ner_tagger.py 模块中有多个函数, predict() 基本的将词典和 ner lstm 模型预测的标签进行 merge。
model_tagging = self._predict_ner_tags_model(self.session, self.model, words, self.data_path) dict_tagging = self._predict_ner_tags_dict(words, merge = True, tagset = tagset, udfs = [udf_default]) merge_tagging = self._merge_tagging(model_tagging, dict_tagging)

实体消除歧义:
tagger._predict_ner_tags_dict() 函数的 udfs=[] 参数列表传入自定义的函数.
目前实现了共线频率的函数，需要提前设置 tag_feat_dict 一个{}, 就是每个标签下面的特征词的 list,
udfs = [udf_disambiguation_cooccur]

然后定义标签的特征词典，和领域相关调用:
tag_feat_dict={}
tag_feat_dict[XXX]=[A,B,C]
tagger.set_tag_feat_dict(tag_feat_dict)
tagger._predict_ner_tags_dict(words, merge = True, tagset = ['list_name', 'teleplay'], udfs = [udf_disambiguation_cooccur])

这样就对两个常见实体进行了消歧。

参考 test/test_ner_dict_udf.py 例子中的那个，给不同类别的 tag 抽取不同的词作为强特征, 可以通过自定义的UDF传入，
现在有一个基于每个类别标签的常见共线 cooccur_word的UDF，例如:

对专辑list_name和电视剧 teleplay 进行消除歧义, 需要提前挖掘一些Domain特征词，如:
list_name 常见的词 '听', '专辑', '音乐'
teleplate 常见共线 ['看', '电视', '影视']

'琅琊榜' have two category: 'list_name' and 'teleplay'
Disambiguation

#!/usr/bin/python
# -*- coding:utf-8 -*-

from __future__ import unicode_literals # compatible with python3 unicode
from deepnlp.ner_tagger import udf_disambiguation_cooccur
from deepnlp.ner_tagger import udf_default
from deepnlp import ner_tagger
tagger = ner_tagger.load_model(name = 'zh_entertainment')    # Base LSTM Based Model
tagger.load_dict("zh_entertainment")

# input sentence
text = "今天 我 看 了 琅琊榜"
words = text.split(" ")

tags = ['list_name', 'teleplay']

# 更新两个标签共线最常见的词, 可以有重复
tag_feat_dict = {}
tag_feat_dict['list_name'] = ['听', '专辑', '音乐']
tag_feat_dict['teleplay'] = ['看', '电视', '影视']
tagger.set_tag_feat_dict(tag_feat_dict)

# Combine the results and load the udfs
tagging = tagger._predict_ner_tags_dict(words, merge = True, tagset = ['list_name', 'teleplay'], udfs = [udf_disambiguation_cooccur])
for (w,t) in tagging:
    pair = w + "/" + t
    print (pair)


## 计算的是 Context 词和特征词共线

from deepnlp.ner_tagger import udf_disambiguation_cooccur

tag_feat_dict = {}
# Most Freq Word Feature of two tags
tag_feat_dict['list_name'] = ['听', '专辑', '音乐']
tag_feat_dict['teleplay'] = ['看', '电视', '影视']

# Disambuguiation Prob
word="琅琊榜"
context = ["今天", "我", "看", "了","电视", "音乐", "很", "好听"]
tag, prob = udf_disambiguation_cooccur(word, tags, context, tag_feat_dict)
print ("DEBUG: NER tagger zh_entertainment with user defined function for disambuguiation")
print ("Word:%s, Tag:%s, Prob:%f" % (word, tag, prob))

@onep2p 有什么问题可以加个邮件沟通哈

from deepnlp.

onep2p commented on August 24, 2024

哈哈搞定了谢谢老大！

from deepnlp.

NER训练报错 about deepnlp HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent