Giter Site home page Giter Site logo

NER训练报错 about deepnlp HOT 13 OPEN

rockingdingo avatar rockingdingo commented on August 24, 2024
NER训练报错

from deepnlp.

Comments (13)

rockingdingo avatar rockingdingo commented on August 24, 2024

您好,我看了您的记录,新模型需要更新 ModelLargeConfig 那个类, 就是 target output size 是要改成你的标签的个数。可以修改 ner_model.py中的 get_config() 函数

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

嗯嗯 已经看到了 数据整理了 还有个轮次的问题 我测试的数据较小

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

其实可以 提供一个训练语料的例子

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

image

还存在这个错误,是语料中不支持某些词吗 比如/标点符号等 具体那些不支持呢? 我这边第一次以10篇新闻的数据为语料能够跑起来 现在5005篇就不行了

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

难道是tag_to_id不能超过76行?@rockingdingo

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

image

已经处理了第一版的财经数据(目前还只识别公司实体) 不过num_steps参数只有5才能正常跑完,不知道这是为什么

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

经常出现词性无法训练的问题 请问词性训练是否可以带入标点、空格等 如果不是,求告知下那些不能带入 目前总是出现indices[x,x] = xxxx not in [0,60000]

from deepnlp.

rockingdingo avatar rockingdingo commented on August 24, 2024

@onep2p Hello 不好意思过年回来才有时间处理,词性训练可以带入标点的,没有完整的 bug trace 也不方便找原因,可以接个图吗。另外公司实体的感兴趣contribe 出来吗,欢迎提 merge request哈?分词或者是什么的。

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

好的,我弄好了就提merge request 那天的BUG貌似是却的范围没对 我在10万多词用6完去取,有的不存在张量里面

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

@rockingdingo 我在ner训练的时候num_steps不能写30是什么情况啊 数据少了吗

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

@rockingdingo 感觉并没有上下文关系的处理啊,都是通过dic来匹配实体的 如果同一个词有多个意思呢? 比如KODA可以是公司名,也可以是产品名,还可以是人名

from deepnlp.

rockingdingo avatar rockingdingo commented on August 24, 2024

Hello,在 ner_tagger.py 模块中有多个函数, predict() 基本的将词典和 ner lstm 模型预测的标签进行 merge。
model_tagging = self._predict_ner_tags_model(self.session, self.model, words, self.data_path) dict_tagging = self._predict_ner_tags_dict(words, merge = True, tagset = tagset, udfs = [udf_default]) merge_tagging = self._merge_tagging(model_tagging, dict_tagging)

实体消除歧义:
tagger._predict_ner_tags_dict() 函数的 udfs=[] 参数列表传入自定义的函数.
目前实现了共线频率的函数,需要提前设置 tag_feat_dict 一个{}, 就是每个标签下面的特征词的 list,
udfs = [udf_disambiguation_cooccur]

然后定义标签的特征词典,和领域相关调用:
tag_feat_dict={}
tag_feat_dict[XXX]=[A,B,C]
tagger.set_tag_feat_dict(tag_feat_dict)
tagger._predict_ner_tags_dict(words, merge = True, tagset = ['list_name', 'teleplay'], udfs = [udf_disambiguation_cooccur])

这样就对两个常见实体进行了消歧。

参考 test/test_ner_dict_udf.py 例子中的那个,给不同类别的 tag 抽取不同的词作为强特征, 可以通过自定义的UDF传入,
现在有一个基于每个类别标签的常见共线 cooccur_word的UDF,例如:

对专辑list_name和电视剧 teleplay 进行消除歧义, 需要提前挖掘一些Domain特征词,如:
list_name 常见的词 '听', '专辑', '音乐'
teleplate 常见共线 ['看', '电视', '影视']

'琅琊榜' have two category: 'list_name' and 'teleplay'
Disambiguation

#!/usr/bin/python
# -*- coding:utf-8 -*-

from __future__ import unicode_literals # compatible with python3 unicode
from deepnlp.ner_tagger import udf_disambiguation_cooccur
from deepnlp.ner_tagger import udf_default
from deepnlp import ner_tagger
tagger = ner_tagger.load_model(name = 'zh_entertainment')    # Base LSTM Based Model
tagger.load_dict("zh_entertainment")

# input sentence
text = "今天 我 看 了 琅琊榜"
words = text.split(" ")

tags = ['list_name', 'teleplay']

# 更新两个标签共线最常见的词, 可以有重复
tag_feat_dict = {}
tag_feat_dict['list_name'] = ['听', '专辑', '音乐']
tag_feat_dict['teleplay'] = ['看', '电视', '影视']
tagger.set_tag_feat_dict(tag_feat_dict)

# Combine the results and load the udfs
tagging = tagger._predict_ner_tags_dict(words, merge = True, tagset = ['list_name', 'teleplay'], udfs = [udf_disambiguation_cooccur])
for (w,t) in tagging:
    pair = w + "/" + t
    print (pair)


## 计算的是 Context 词和特征词共线

from deepnlp.ner_tagger import udf_disambiguation_cooccur

tag_feat_dict = {}
# Most Freq Word Feature of two tags
tag_feat_dict['list_name'] = ['听', '专辑', '音乐']
tag_feat_dict['teleplay'] = ['看', '电视', '影视']

# Disambuguiation Prob
word="琅琊榜"
context = ["今天", "我", "看", "了","电视", "音乐", "很", "好听"]
tag, prob = udf_disambiguation_cooccur(word, tags, context, tag_feat_dict)
print ("DEBUG: NER tagger zh_entertainment with user defined function for disambuguiation")
print ("Word:%s, Tag:%s, Prob:%f" % (word, tag, prob))

@onep2p 有什么问题可以加个邮件沟通哈

from deepnlp.

onep2p avatar onep2p commented on August 24, 2024

哈哈 搞定了 谢谢老大!

vd 2dpm s n9i8u9_pqmri
0 xipz 0e6m2e2l8 djus b
8 k f 3e 1m s ir4 _v

from deepnlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.