Giter Site home page Giter Site logo

dsxiangli / chinesener Goto Github PK

View Code? Open in Web Editor NEW
295.0 7.0 58.0 119.58 MB

中文NER的那些事儿

Python 99.66% Shell 0.10% PureBasic 0.24%
bert-bilstm-crf crf bert-fine-tuning ner bilstm-crf adversarial-transfer-learning multitask-learning chinese-ner msra msr

chinesener's People

Contributors

dsxiangli avatar fengxuefx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chinesener's Issues

复现的问题

请问tensorflow的版本是多少的?我看requirements.txt里tensorflow是1.14.0的,但是运行不了,报错ModuleNotFoundError: No module named 'tensorflow.python.platform',我搜了一下有的遇到过类似的情况,说是版本不对应,然后按照别人的版本下了一个1.13.1 ,也是不行。想问一下,这所有的代码都是用的同一个版本的tensorflow么?具体的tensorflow和cuda版本是多少呢?

glove词频

你好,我是刚做深度学习,好多东西不太熟练,我想请问一下SoftLexicon代码在哪里加入了glove词频进行权重计算?

tensorboard --logdir ./checkpoint/ner_msra_bert_bilstm_crf

已经生成events.out之后,根据tensorboard --logdir ./checkpoint/ner_msra_bert_bilstm_crf得到的网址进去,却显示不出图形

No scalar data was found.
Probable causes:

You haven’t written any scalar data to your event files.
TensorBoard can’t find your event files.
If you’re new to using TensorBoard, and want to find out how to add data and set up your event files, check out the README and perhaps the TensorBoard tutorial.

If you think TensorBoard is configured properly, please see the section of the README devoted to missing data problems and consider filing an issue on GitHub.

维度的问题

运行“bilstm_softlexicon_cry”模型报错。
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[28,0,1] = 704369 is not in [0, 704369)
应该是词嵌入的向量长度问题,请问该改哪里

HELP

hello,你的coding能力好强,弱弱地问下有没有pytorch版本的?

ctb.50d.vec文件没找到。 data/msra/preprocess.py

Traceback (most recent call last):
File "/XXXXX/ChineseNER-main/data/msra/preprocess.py", line 46, in
prep = get_instance(tokenizer, MAX_SEQ_LEN, TAG2IDX, MAPPING, word_enhance)
File "/XXXXX/ChineseNER-main/data/base_preprocess.py", line 90, in get_instance
instance = cls(tokenizer_type, max_seq_len, tag2idx, mapping, **kwargs)
File "/XXXXX/ChineseNER-main/data/base_preprocess.py", line 342, in init
ctb50_handler.init()# init词表
File "/XXXXX/ChineseNER-main/data/word_enhance.py", line 50, in init
self.model = getattr(importlib.import_module(self.model_dir), 'model')
File "/anaconda/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 950, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 678, in exec_module
File "", line 205, in _call_with_frames_removed
File "/XXXXX/ChineseNER-main/pretrain_model/ctb50/init.py", line 5, in
model = convert('ctb50/ctb.50d.vec')
File "/XXXXX/ChineseNER-main/pretrain_model/glove_2_wv.py", line 18, in convert
_ = glove2word2vec(glove_file, tmp_file)
File "/anaconda/lib/python3.6/site-packages/gensim/scripts/glove2word2vec.py", line 104, in glove2word2vec
num_lines, num_dims = get_glove_info(glove_input_file)
File "/anaconda/lib/python3.6/site-packages/gensim/scripts/glove2word2vec.py", line 81, in get_glove_info
with utils.open(glove_file_name, 'rb') as f:
File "/anaconda/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 195, in open
newline=newline,
File "/anaconda/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 361, in _shortcut_open
return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/XXXXX/ChineseNER-main/pretrain_model/ctb50/ctb.50d.vec'

tf版本问题

请问作者,在运行main.py时会遇到这种版本问题然后自动结束运行,环境是按照你的requirements配置的,对tf不太熟悉-_-

WARNING:tensorflow:From /home/boyoi/anaconda3/envs/adv/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.

请教如何加载保存的模型做推理

首先感谢作者分享的代码!
请问加载保存的模型是使用.restore()恢复吗,还是estimator有什么高级的API呢?希望可以得到作者指点orz

如何在tensorboard中加入Overall f1,precision,recall等指标

def get_eval_metrics(label_ids, pred_ids, idx2tag, task_name=''):
    """
    Overall accuracy, and accuracy per tag
    """
    real_length = tf.reduce_sum(tf.sign(label_ids), axis=1) - 2
    max_length = label_ids.shape[-1].value
    mask = tf.sequence_mask(real_length, maxlen=max_length)
    pred_ids = tf.cast(pred_ids, tf.int32)
    if task_name:
        metric_op = {
            'metric_{}/overall_accuracy'.format(task_name): tf.metrics.accuracy(labels=label_ids, predictions=pred_ids, weights=mask)
        }
    else:
        metric_op = {
            'metric/overall_accuracy': tf.metrics.accuracy(labels=label_ids, predictions=pred_ids, weights=mask),
            'metric/overall_precision': tf.metrics.precision(labels=label_ids, predictions=pred_ids, weights=mask),
            'metric/overall_recall': tf.metrics.recall(labels=label_ids, predictions=pred_ids, weights=mask),
        }
    # add accuracy metric per NER tag
    for id, tag in idx2tag.items():
        id = tf.cast(id, tf.int32)
        metric_op.update(calc_metrics(tf.equal(label_ids, id), tf.equal(pred_ids, id), mask, tag, task_name))

    return metric_op

1709024327929

我在源码中加上了 tf.metrics.precision和tf.metrics.recall,为什么precision、recall的曲线都为1

关于bert下的数据增强

你好,我看people_daily_augment这里的图片记录,数据增强对于基于bert的模型,基本没有效果,是么?

1

你好我在进行3运行单任务NER模型时输入的命令是"python main.py --model bert_bilstm_crf --data people_daily"
他出现了一个一直循环的warning如下:
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 9 vs previous value: 9. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
请问是出了什么问题呢,该如何改进呢

docker

524a5b346439c4fb5c151da54e4943a
作者你好,之前没接触过docker,想问一下这里是要搭建一个私有镜像仓库吗

def init_params(self)

Traceback (most recent call last):
File "/XXXX/ChineseNER-main/dataset.py", line 131, in
prep = NerDataset('./data/msra', 100, 10, model_name='bilstm_crf_softlexicon')
File "/XXXX/ChineseNER-main/dataset.py", line 18, in init
self.init_params()
File "/XXXX/ChineseNER-main/dataset.py", line 61, in init_params
with open(os.path.join(self.data_dir, '_'.join(filter(None, [self.prefix, self.surfix, 'data_params.pkl']))), 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/msra/giga_softlexicon_data_params.pkl'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.