Giter Site home page Giter Site logo

Could we add new words? about chinesener HOT 16 OPEN

hgjt8989 avatar hgjt8989 commented on August 20, 2024
Could we add new words?

from chinesener.

Comments (16)

buppt avatar buppt commented on August 20, 2024

E.g. if a word (北大) is not recognized as an organisation, could we add this word to let the model know this word?

of course, you can add 北/B_org 大/E_org into the train set.

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

@buppt thanks 有个疑问 tensorflow那个是 先python train.py 然后再python train.py pretrained 吗

from chinesener.

buppt avatar buppt commented on August 20, 2024

@buppt thanks 有个疑问 tensorflow那个是 先python train.py 然后再python train.py pretrained 吗

不用,python train.py是不使用预训练词向量的训练,python train.py pretrained 是使用预训练的词向量训练。

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

恩 O(∩_∩)O谢谢 还有个疑问 这个怎么增量数量的 每次的新句子都要加在前面那个训练集吗 然后重新跑一次train吗

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

@buppt 这个train.py 会被执行吗
elif len(sys.argv)==3: 看了很久都没有看到过有输入3个参数的 冗余代码吗 谢谢

from chinesener.

buppt avatar buppt commented on August 20, 2024

恩 O(∩_∩)O谢谢 还有个疑问 这个怎么增量数量的 每次的新句子都要加在前面那个训练集吗 然后重新跑一次train吗

什么意思,是想自己加一些实体的例句?放训练集里或者在训练好的模型基础上继续训练都可以。
三个参数不是文件名那个么,readme里有。

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

@buppt 恩 原来是我看漏了 原来还有个文件批处理的 谢谢 , 有比较详细的步骤 ,现在我已经跑完了train.py 如果要加新的语料训练 在现在模型基础 继续训练 要执行那个命令呢 谢谢

from chinesener.

badbabys avatar badbabys commented on August 20, 2024

谁能提供一下TensorFlow训练的模型

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

你训练不了吗 用显卡大概3个小时

from chinesener.

bobkentt avatar bobkentt commented on August 20, 2024

说一下我遇到的问题哈,
cd data/renMinRiBao/
python data_renmin_word.py
然后 cd tensorflow/
python train.py pretrained
然后报错如下:
train len: 24271
test len: 7585
word2id len 3917
Creating the data generator ...
Finished creating the data generator.
use pretrained embedding
begin to train...
Traceback (most recent call last):
File "train.py", line 107, in
model = Model(config,embedding_pre,dropout_keep=0.5)
File "/home/liyang22/github/ChineseNER/tensorflow/bilstm_crf.py", line 20, in init
self._build_net()
File "/home/liyang22/github/ChineseNER/tensorflow/bilstm_crf.py", line 56, in _build_net
self.viterbi_sequence, viterbi_score = tf.contrib.crf.crf_decode(bilstm_out, self.transition_params,tf.tile(np.array([self.sen_len]),np.array([self.batch_size])))
File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/crf/python/ops/crf.py", line 537, in crf_decode
false_fn=_multi_seq_fn)
File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/layers/utils.py", line 206, in smart_cond
pred, true_fn=true_fn, false_fn=false_fn, name=name)
File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/smart_cond.py", line 56, in smart_cond
return false_fn()
File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/crf/python/ops/crf.py", line 501, in _multi_seq_fn
sequence_length_less_one = math_ops.maximum(0, sequence_length - 1)
File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4602, in maximum
"Maximum", x=x, y=y, name=name)
File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Maximum' Op has type int64 that does not match type int32 of argument 'x'.

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

@bobkentt 你看一下你的 语料是不是有问题 是你自己编写的吗

from chinesener.

bobkentt avatar bobkentt commented on August 20, 2024

@bobkentt 你看一下你的 语料是不是有问题 是你自己编写的吗

就是把项目直接clone下去啊,没用自己的语料,难到是我TensorFlow版本的问题?你是啥版本的啊?我这俩虚拟机安装的tf环境,版本分别是:1.10.0 1.12.0 都不行

from chinesener.

bobkentt avatar bobkentt commented on August 20, 2024

train.py 中改成int64也不行,同时也试了把数据label强转成int32

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

你重新训练前 有将前面训练好的模型文件删掉吗 我用的是tensorflow-gpu==1.10.0

from chinesener.

chenbaicheng avatar chenbaicheng commented on August 20, 2024

@bobkentt

from chinesener.

bubblewu avatar bubblewu commented on August 20, 2024

@bobkentt 类型转为int32就可以了
self.viterbi_sequence, viterbi_score = tf.contrib.crf.crf_decode(tf.cast(bilstm_out, dtype=tf.int32), tf.cast(self.transition_params, dtype=tf.int32), tf.cast(sequence_length, dtype=tf.int32))

from chinesener.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.