dsxiangli / chinesener Goto Github PK

中文NER的那些事儿

Python 99.66% Shell 0.10% PureBasic 0.24%

bert-bilstm-crf crf bert-fine-tuning ner bilstm-crf adversarial-transfer-learning multitask-learning chinese-ner msra msr

chinesener's People

Contributors

Stargazers

Watchers

Forkers

borao78 zuochong994 zeqiangwangai qshuang123 qfzxhy wangbq18 zhangyujie209 evgreenhua liqinzhang pst2016 kiminh xxxrxxx bailixuance linhong00316 bit-engd 245293206 ybshaw chengli327 bikong2 jay931003 wentingtseng yao311 pilgrimgrey1 oyy64102 qfxlcyc m-senatus hyl2048 gshan4056 philisterd thewingyan fengxuefx 76annie catherinezhou pink-duck-chao shamepoo lichuanxiang zhangnn520 kaiyangh kioco flyrainkey enlaijiang jackycheng86 shenyi666666 yong988 lbeing skypow2012 ellinjune xiejinwen113 jw2100 yugenlgy nick-2008 lizeyubuaa lang22 sunny8898 mainmainer gx110387 shouqingchen1 stanleysun233

chinesener's Issues

ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./pretrain_model/ch_google\bert_model.ckpt

请问tensorflow的版本是多少的？我看requirements.txt里tensorflow是1.14.0的，但是运行不了，报错ModuleNotFoundError: No module named 'tensorflow.python.platform'，我搜了一下有的遇到过类似的情况，说是版本不对应，然后按照别人的版本下了一个1.13.1 ，也是不行。想问一下，这所有的代码都是用的同一个版本的tensorflow么？具体的tensorflow和cuda版本是多少呢？

glove词频

你好，我是刚做深度学习，好多东西不太熟练，我想请问一下SoftLexicon代码在哪里加入了glove词频进行权重计算？

数据预处理，为什么提示 ModuleNotFoundError: No module named 'data'

tensorboard --logdir ./checkpoint/ner_msra_bert_bilstm_crf

已经生成events.out之后，根据tensorboard --logdir ./checkpoint/ner_msra_bert_bilstm_crf得到的网址进去，却显示不出图形

No scalar data was found.
Probable causes:

You haven’t written any scalar data to your event files.
TensorBoard can’t find your event files.
If you’re new to using TensorBoard, and want to find out how to add data and set up your event files, check out the README and perhaps the TensorBoard tutorial.

If you think TensorBoard is configured properly, please see the section of the README devoted to missing data problems and consider filing an issue on GitHub.

bert_bilstm_crf_adv：ValueError: Shape must be rank 2 but is rank 1 for 'task1_msra/crf_layer/Slice_2' (op: 'Slice') with input shapes: [?], [2], [2].

FileNotFoundError: [Errno 2] No such file or directory: './data/msra\\train\\sentences.txt'

您好，我运行preprocess文件时报错，但是路径没问题呀

运行main.py --model bert_bilstm_crf_adv --data msra,msr 时报错：

pretrain_model/lattice/word_char_mix_50d.vec没有找到下载链接

您好，我在pretrain_model的README文件里没有找到word_char_mix_50d.vec的下载链接，无法运行pretrain_model/lattice/preprocess.py，您能发下该文件的下载链接吗？谢谢~

维度的问题

运行“bilstm_softlexicon_cry”模型报错。
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[28,0,1] = 704369 is not in [0, 704369)
应该是词嵌入的向量长度问题，请问该改哪里

HELP

hello,你的coding能力好强，弱弱地问下有没有pytorch版本的？

ctb.50d.vec文件没找到。 data/msra/preprocess.py

Traceback (most recent call last):
File "/XXXXX/ChineseNER-main/data/msra/preprocess.py", line 46, in
prep = get_instance(tokenizer, MAX_SEQ_LEN, TAG2IDX, MAPPING, word_enhance)
File "/XXXXX/ChineseNER-main/data/base_preprocess.py", line 90, in get_instance
instance = cls(tokenizer_type, max_seq_len, tag2idx, mapping, **kwargs)
File "/XXXXX/ChineseNER-main/data/base_preprocess.py", line 342, in init
ctb50_handler.init()# init词表
File "/XXXXX/ChineseNER-main/data/word_enhance.py", line 50, in init
self.model = getattr(importlib.import_module(self.model_dir), 'model')
File "/anaconda/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 950, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 678, in exec_module
File "", line 205, in _call_with_frames_removed
File "/XXXXX/ChineseNER-main/pretrain_model/ctb50/init.py", line 5, in
model = convert('ctb50/ctb.50d.vec')
File "/XXXXX/ChineseNER-main/pretrain_model/glove_2_wv.py", line 18, in convert
_ = glove2word2vec(glove_file, tmp_file)
File "/anaconda/lib/python3.6/site-packages/gensim/scripts/glove2word2vec.py", line 104, in glove2word2vec
num_lines, num_dims = get_glove_info(glove_input_file)
File "/anaconda/lib/python3.6/site-packages/gensim/scripts/glove2word2vec.py", line 81, in get_glove_info
with utils.open(glove_file_name, 'rb') as f:
File "/anaconda/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 195, in open
newline=newline,
File "/anaconda/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 361, in _shortcut_open
return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/XXXXX/ChineseNER-main/pretrain_model/ctb50/ctb.50d.vec'

tf版本问题

请问作者，在运行main.py时会遇到这种版本问题然后自动结束运行，环境是按照你的requirements配置的，对tf不太熟悉-_-

WARNING:tensorflow:From /home/boyoi/anaconda3/envs/adv/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.

请教如何加载保存的模型做推理

首先感谢作者分享的代码！
请问加载保存的模型是使用.restore()恢复吗，还是estimator有什么高级的API呢？希望可以得到作者指点orz

如何在tensorboard中加入Overall f1,precision,recall等指标

def get_eval_metrics(label_ids, pred_ids, idx2tag, task_name=''):
    """
    Overall accuracy, and accuracy per tag
    """
    real_length = tf.reduce_sum(tf.sign(label_ids), axis=1) - 2
    max_length = label_ids.shape[-1].value
    mask = tf.sequence_mask(real_length, maxlen=max_length)
    pred_ids = tf.cast(pred_ids, tf.int32)
    if task_name:
        metric_op = {
            'metric_{}/overall_accuracy'.format(task_name): tf.metrics.accuracy(labels=label_ids, predictions=pred_ids, weights=mask)
        }
    else:
        metric_op = {
            'metric/overall_accuracy': tf.metrics.accuracy(labels=label_ids, predictions=pred_ids, weights=mask),
            'metric/overall_precision': tf.metrics.precision(labels=label_ids, predictions=pred_ids, weights=mask),
            'metric/overall_recall': tf.metrics.recall(labels=label_ids, predictions=pred_ids, weights=mask),
        }
    # add accuracy metric per NER tag
    for id, tag in idx2tag.items():
        id = tf.cast(id, tf.int32)
        metric_op.update(calc_metrics(tf.equal(label_ids, id), tf.equal(pred_ids, id), mask, tag, task_name))

    return metric_op

我在源码中加上了 tf.metrics.precision和tf.metrics.recall，为什么precision、recall的曲线都为1

hello，我想用自己的数据进行复现，需要哪些步骤

关于bert下的数据增强

你好，我看people_daily_augment这里的图片记录，数据增强对于基于bert的模型，基本没有效果，是么？

1

你好我在进行3运行单任务NER模型时输入的命令是"python main.py --model bert_bilstm_crf --data people_daily"
他出现了一个一直循环的warning如下：
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 9 vs previous value: 9. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
请问是出了什么问题呢，该如何改进呢

你好，运行对应数据集preprocess.py，并没有生产对应的tfrecord和data_params，是哪里有问题吗

docker

作者你好，之前没接触过docker，想问一下这里是要搭建一个私有镜像仓库吗

def init_params(self)

Traceback (most recent call last):
File "/XXXX/ChineseNER-main/dataset.py", line 131, in
prep = NerDataset('./data/msra', 100, 10, model_name='bilstm_crf_softlexicon')
File "/XXXX/ChineseNER-main/dataset.py", line 18, in init
self.init_params()
File "/XXXX/ChineseNER-main/dataset.py", line 61, in init_params
with open(os.path.join(self.data_dir, '_'.join(filter(None, [self.prefix, self.surfix, 'data_params.pkl']))), 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/msra/giga_softlexicon_data_params.pkl'