rockingdingo / deepnlp Goto Github PK

Deep Learning NLP Pipeline implemented on Tensorflow

License: MIT License

Python 98.45% Shell 1.55%

deepnlp's Introduction

Deep Learning NLP Pipeline implemented on Tensorflow. Following the 'simplicity' rule, this project aims to use the deep learning library of Tensorflow to implement new NLP pipeline. You can extend the project to train models with your own corpus/languages. Pretrained models of Chinese corpus are distributed. Free RESTful NLP API are also provided. Visit http://www.deepnlp.org/api/v1.0/pipeline for details.

Brief Introduction

Modules
Installation
Tutorial
- Segmentation
- POS
- NER
- Parsing
- Pipeline
- Textsum
- Textrank
- Textcnn
- Train your model
- Web API Service
中文简介
安装说明
Reference

Modules

NLP Pipeline Modules:
- Word Segmentation/Tokenization
- Part-of-speech (POS)
- Named-entity-recognition(NER)
- Dependency Parsing (Parse)
- textsum: automatic summarization Seq2Seq-Attention models
- textrank: extract the most important sentences
- textcnn: document classification
- Web API: Free Tensorflow empowered web API
- Planed: Automatic Summarization
Algorithm(Closely following the state-of-Art)
- Word Segmentation: Linear Chain CRF(conditional-random-field), based on python CRF++ module
- POS: LSTM/BI-LSTM/LSTM-CRF network, based on Tensorflow
- NER: LSTM/BI-LSTM/LSTM-CRF network, based on Tensorflow
- Parse: Arc-Standard System with Feed Forward Neural Network
- Textsum: Seq2Seq with attention mechanism
- Texncnn: CNN
Pre-trained Model
- Chinese: Segmentation, POS, NER, Parse (1998 china daily corpus)
- Domain Specific NER Models are also provided: general, entertainment, o2o, etc... Contribution are welcome
- English: POS (brown corpus)
- For your Specific Language, you can easily use the script to train model with the corpus of your language choice.

Installation

Requirements
- CRF++ (>=0.54)
- Tensorflow(1.4)
- Python (python2.7 and python3.6 are tested) This project is up to date with the latest tensorflow release. For tensorflow (<=0.12.0), use deepnlp <=0.1.5 version. tensorflow (1.0-1.3), use deepnlp = 0.1.6 version tensorflow (1.4), use deepnlp = 0.1.7 version See RELEASE.md for more details
Pip

    # linux, run the script:
    pip install deepnlp

Due to pkg size restriction, english pos model, ner domain specific model files are not distributed on pypi You can download the pre-trained model files from github and put in your installation directory .../site-packages/.../deepnlp/... model files: ../pos/ckpt/en/pos.ckpt ; ../ner/ckpt/zh/ner.ckpt

Source Distribution, e.g. deepnlp-0.1.7.tar.gz: https://pypi.python.org/pypi/deepnlp

    # linux, run the script:
    tar zxvf deepnlp-0.1.7.tar.gz
    cd deepnlp-0.1.7
    python setup.py install

Initial setup

    # install crf++0.58 package using the script
    sh ./deepnlp/segment/install_crfpp.sh
    # Download all the pre-trained models
    python ./test/test_install.py
    
    # Or Download pre-trained models from below command lines
    import deepnlp
    deepnlp.download('segment')
    deepnlp.download('pos')
    deepnlp.download('ner')
    deepnlp.download('parse')

Running Examples

    # ./deepnlp/test folder
    cd test
    python test_segment.py    # segmentation
    python test_pos_en.py       # POS tag
    python test_ner_zh.py       # NER Zh
    python test_ner_domain.py   # NER domain-specific models
    python test_ner_dict_udf.py # NER load user dict and UDF for disambiguation
    python test_nn_parser.py    # dependency parsing
    python test_api_v1_module.py
    python test_api_v1_pipeline.py

Tutorial

Set Coding

设置编码 For python2, the default coding is ascii not unicode, use future module to make it compatible with python3

#coding=utf-8
from __future__ import unicode_literals # compatible with python3 unicode

Download pretrained models

下载预训练模型 If you install deepnlp via pip, the pre-trained models are not distributed due to size restriction. You can download full models for 'Segment', 'POS' en and zh, 'NER' zh, zh_entertainment, zh_o2o, 'Textsum' by calling the download function.

import deepnlp
# Download all the modules
deepnlp.download()

# Download specific module
deepnlp.download('segment')
deepnlp.download('pos')
deepnlp.download('ner')
deepnlp.download('parse')

# Download module and domain-specific model
deepnlp.download(module = 'pos', name = 'en') 
deepnlp.download(module = 'ner', name = 'zh_entertainment')

Segmentation

分词模块

#coding=utf-8
from __future__ import unicode_literals
from deepnlp import segmenter

tokenizer = segmenter.load_model(name = 'zh_entertainment')
text = "我刚刚在浙江卫视看了电视剧老九门，觉得陈伟霆很帅"
segList = tokenizer.seg(text)
text_seg = " ".join(segList)

#Results
# 我 刚刚 在 浙江卫视 看 了 电视剧 老九门 ， 觉得 陈伟霆 很 帅

POS

词性标注

#coding:utf-8
from __future__ import unicode_literals

import deepnlp
deepnlp.download('pos')

## English Model
from deepnlp import pos_tagger
tagger = pos_tagger.load_model(name = 'en')  # Loading English model, lang code 'en', English Model Brown Corpus

text = "I want to see a funny movie"
words = text.split(" ")     # unicode
print (" ".join(words))

tagging = tagger.predict(words)
for (w,t) in tagging:
    pair = w + "/" + t
    print (pair)
    
#Results
#I/nn want/vb to/to see/vb a/at funny/jj movie/nn

## Chinese Model
from deepnlp import segmenter
from deepnlp import pos_tagger
tagger = pos_tagger.load_model(name = 'zh') # Loading Chinese model, lang code 'zh', China Daily Corpus

text = "我爱吃北京烤鸭"
words = segmenter.seg(text) # words in unicode coding
print (" ".join(words))

tagging = tagger.predict(words)  # input: unicode coding
for (w,t) in tagging:
    pair = w + "/" + t
    print (pair)

#Results
#我/r 爱/v 吃/v 北京/ns 烤鸭/n

NER

命名实体识别

from __future__ import unicode_literals   # compatible with python3 unicode

import deepnlp
deepnlp.download('ner')  # download the NER pretrained models from github if installed from pip

from deepnlp import ner_tagger

# Example: Entertainment Model
tagger = ner_tagger.load_model(name = 'zh_entertainment')   # Base LSTM Based Model
#Load Entertainment Dict
tagger.load_dict("zh_entertainment")
text = "你 最近 在 看 胡歌 演的 猎场 吗 ?"
words = text.split(" ")
tagset_entertainment = ['actor', 'role_name', 'teleplay', 'teleplay_tag']
tagging = tagger.predict(words, tagset = tagset_entertainment)
for (w,t) in tagging:
    pair = w + "/" + t
    print (pair)

#Result
#你/nt
#最近/nt
#在/nt
#看/nt
#胡歌/actor
#演的/nt
#猎场/teleplay
#吗/nt
#?/nt

Parsing

依存句法分析

from __future__ import unicode_literals # compatible with python3 unicode coding

from deepnlp import nn_parser
parser = nn_parser.load_model(name = 'zh')

#Example 1, Input Words and Tags Both
words = ['它', '熟悉', '一个', '民族', '的', '历史']
tags = ['r', 'v', 'm', 'n', 'u', 'n']

#Parsing
dep_tree = parser.predict(words, tags)

#Fetch result from Transition Namedtuple
num_token = dep_tree.count()
print ("id\tword\tpos\thead\tlabel")
for i in range(num_token):
    cur_id = int(dep_tree.tree[i+1].id)
    cur_form = str(dep_tree.tree[i+1].form)
    cur_pos = str(dep_tree.tree[i+1].pos)
    cur_head = str(dep_tree.tree[i+1].head)
    cur_label = str(dep_tree.tree[i+1].deprel)
    print ("%d\t%s\t%s\t%s\t%s" % (cur_id, cur_form, cur_pos, cur_head, cur_label))

# Result
id	word	pos	head	label
1	它	r	2	SBV
2	熟悉	v	0	HED
3	一个	m	4	QUN
4	民族	n	5	DE
5	的	u	6	ATT
6	历史	n	2	VOB

Pipeline

#coding:utf-8
from __future__ import unicode_literals

from deepnlp import pipeline
p = pipeline.load_model('zh')

#Segmentation
text = "我爱吃北京烤鸭"
res = p.analyze(text)

print (res[0].encode('utf-8'))
print (res[1].encode('utf-8'))
print (res[2].encode('utf-8'))

words = p.segment(text)
pos_tagging = p.tag_pos(words)
ner_tagging = p.tag_ner(words)

print (pos_tagging.encode('utf-8'))
print (ner_tagging.encode('utf-8'))

Textsum

自动文摘

See details: README

Textrank

重要句子抽取

See details: README

TextCNN (WIP)

文档分类

Train your model

自己训练模型 ###Segment model See instructions: README

###POS model See instructions: README

###NER model See instructions: README

###Parsing model See instructions: README

###Textsum model See instructions: README

Web API Service

www.deepnlp.org provides free web API service for common NLP modules of sentences and paragraphs. The APIs are RESTful and based on pre-trained tensorflow models. Chinese language is now supported.

RESTful API

Testing API from Browser, Need to log in first

Calling API from python

See ./deepnlp/test/test_api_v1_module.py for more details.

#coding:utf-8
from __future__ import unicode_literals

import json, requests, sys, os
if (sys.version_info>(3,0)): from urllib.parse import quote 
else : from urllib import quote

from deepnlp import api_service
login = api_service.init()          # registration, if failed, load default empty login {} with limited access
conn = api_service.connect(login)   # save the connection with login cookies

# Sample URL
# http://www.deepnlp.org/api/v1.0/pipeline/?lang=zh&annotators=segment,pos,ner&text=我爱吃上海小笼包

# Define text and language
text = ("我爱吃上海小笼包").encode("utf-8")  # convert text from unicode to utf-8 bytes

# Set up URL for POS tagging
url_pos = 'http://www.deepnlp.org/api/v1.0/pos/?"+ "lang=" + quote('zh') + "&text=" + quote(text)
web = requests.get(url_pos, cookies = conn)
tuples = json.loads(web.text)
print (tuples['pos_str'].encode('utf-8'))    # POS json {'pos_str', 'w1/t1 w2/t2'} return string

中文简介

deepnlp项目是基于Tensorflow平台的一个python版本的NLP套装, 目的在于将Tensorflow深度学习平台上的模块，结合最新的一些算法，提供NLP基础模块的支持，并支持其他更加复杂的任务的拓展，如生成式文摘等等。

NLP 套装模块
- 分词 Word Segmentation/Tokenization
- 词性标注 Part-of-speech (POS)
- 命名实体识别 Named-entity-recognition(NER)
- 依存句法分析 Dependency Parsing (Parse)
- 自动生成式文摘 Textsum (Seq2Seq-Attention)
- 关键句子抽取 Textrank
- 文本分类 Textcnn (WIP)
- 可调用 Web Restful API
- 计划中: 句法分析 Parsing
算法实现
- 分词: 线性链条件随机场 Linear Chain CRF, 基于CRF++包来实现
- 词性标注: 单向LSTM/ 双向BI-LSTM, 基于Tensorflow实现
- 命名实体识别: 单向LSTM/ 双向BI-LSTM/ LSTM-CRF 结合网络, 基于Tensorflow实现
- 依存句法分析: 基于arc-standard system的神经网络的parser
预训练模型
- 中文: 基于人民日报语料和微博混合语料: 分词, 词性标注, 实体识别

API 服务

http://www.deepnlp.org 出于技术交流的目的, 提供免费API接口供文本和篇章进行深度学习NLP的分析, 简单注册后就可以使用。 API符合RESTful风格, 内部是基于tensorflow预先训练好的深度学习模型。具体使用方法请参考博客: http://www.deepnlp.org/blog/tutorial-deepnlp-api/

API目前提供以下模块支持：

分词: http://www.deepnlp.org/api/v1.0/segment/?lang=zh&text=我爱吃北京烤鸭
词性标注: http://www.deepnlp.org/api/v1.0/pos/?lang=zh&text=我爱吃北京烤鸭
命名实体识别: http://www.deepnlp.org/api/v1.0/ner/?lang=zh&text=我爱吃北京烤鸭
Pipeline: http://www.deepnlp.org/api/v1.0/pipeline/?lang=zh&annotators=segment,pos,ner&text=我爱吃北京烤鸭

安装说明

需要
- CRF++ (>=0.54) 可以从 https://taku910.github.io/crfpp/ 下载安装
- Tensorflow(1.0) 这个项目的Tensorflow函数会根据最新Release更新，目前支持Tensorflow 1.0版本，对于老版本的Tensorflow(<=0.12.0), 请使用 deepnlp <=0.1.5版本, 更多信息请查看 RELEASE.md
Pip 安装

    pip install deepnlp

从源码安装, 下载deepnlp-0.1.7.tar.gz文件: https://pypi.python.org/pypi/deepnlp

    # linux, run the script:
    tar zxvf deepnlp-0.1.7.tar.gz
    cd deepnlp-0.1.7
    python setup.py install

初始设置

    # 运行脚本安装 crf++0.58 包
    sh ./deepnlp/segment/install_crfpp.sh
    # 运行脚本下载预训练模型测试
    python ./test/test_install.py

Reference

CRF++ package: https://taku910.github.io/crfpp/#download
Tensorflow: https://www.tensorflow.org/
Word Segmentation Using CRF++ Blog: http://www.52nlp.cn/%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D%E5%85%A5%E9%97%A8%E4%B9%8B%E5%AD%97%E6%A0%87%E6%B3%A8%E6%B3%954
Blogs http://www.deepnlp.org/blog/

deepnlp's People

Contributors

Stargazers

Watchers

Forkers

vyraun ml-lab fanfanfeng xyz8 tanganyao mingyuanxie ml-ai-nlp-ir hgjldx iamsile wwbigdata902 syx528911137 qgzang allensmile chagge mengxuanwang xiaokekehaha jdc08161063 wanjinchang benjamesbabala wonyonyon meshiguge jjymhkx0820 xwzpp jxlin xcbat xuanhan863 eedanny tk1363704 weiziang1 gailysun kuyezhiying wen036 awesome-ml zhoujialinmumu fancycheung euwen luckytina daisenryaku kdsec xielm12 yuhuofei vipbaodao hitluobin lonson liurida chan4cc quanfang xuh5156 leezqcst phecy kevinwenya coderxdy xum2008 o-github-o 52nlp pingoogle lllabmaster winjia shiyongde tsuntree yaokaichun sericwong ghostviper tianshuaifei zhyuxie lifegwt lz-in-sz liuzp hikylemorris strategist922 alisaincd thelordofdream wenhuazang njmch03 pokbe songmiai tuobacai xiximeng zxsted yoko504 zhangluoyang 5jlin yongyehuang yufengzhixing chivychao nlptech coffee-coder lawup shiyybua yezichou snakehacker louiekang goodhaidong x-hacker lxj0276 jfhrecoba kimiqq coodoo futuremac skybirdhe

deepnlp's Issues

[text_sum] 几个模型的选取

你好，作者：
在看text_sum模块时，看headline.py训练时用的是seq2seq_model.py，想问一下，与seq2seq_attn.py和seq2seq_model_attn.py有什么区别呢？

[text sum] _UNK

Hi,

我在使用提供的训练好的模型进行进行预测时发现，输出" _UNK"的情况时有发生，不知这是什么原因呢。

是因为 input 的分词与训练时候有差别吗，也就是分词结果不在 vocab 当中？

README中的两条测试返回均正常。

NER 为什么在数据预处理哪里，只是将所有的句子进行简单的拆分，30个词一组，而没有PAD

而且在LSTM的记忆神经元哪里的state，一直在更新，这一句的state用到了上一句的state，这样真的没关系吗

有没有win版本的呀？

ner是不是只支持14种label，以及ner的词典是不是不能大于6000？

请问一下ner是不是只支持14种label，以及ner的训练集词典是不是不能大于6000？我用这个老是报错

tensorflow-gpu训练1k的数据，但是训练实践超过了12小时，这个正常吗？

训练实践和数据量的大小有关系吗？我接下来准备训练比较大的数据集 @rockingdingo

Can you show ROUGE-score?

Can you show ROUGE scores of the model trained on SogouCS corpus? thanks

TypeError: in method 'Tagger_add', argument 2 of type 'char const *'

问题怎么解决？谢谢了。

deepnlp1.7是需要配合python3+tensorflow1.4吗？

textsum 结果都是_UNK

textsum的ckpt下的文件解压后似乎少一个.meta文件
不管我用news/train目录下数据，还是用自己的语料训练结果都是_UNK

请问TextCNN（WIP）文本分类的模块在哪里？

看readme说明有文档分类的功能，但是在代码里好像没找到textCNN的功能模块，不知道是哪个文件，烦请告知一下，谢谢！

[textsum]Attention heatmap error

Compared with the headmap showing in "A Neural Attention Model for Abstractive Sentence Summarization"(A.M.Rush), there are some problems here.

A heatmap should map the attention relationship between input and output, which means that it should have a higher attention if the output word is copied from input sentence. However, the heatmap on this project is just a mass, which is hard to get a point of which input word attentions which output word.

ner训练的文件是什么格式啊？

没有启动训练所需的文件

语料预处理问题（时间、数字）

您好，目前在学习您的textsum源代码解决文本自动摘要问题。看到您对语料预处理阶段对数字及时间用相应的标签置换，想知道您这么处理的初衷?（因为数字组合使vocab很大吗？）毕竟新闻类对时间、地点的要求性很高，如果直接置换掉，其应用性会降低很多吧。或者说您后续有什么处理，我忽略了，求指教，谢谢！

Use tensorflow1.0 cause error

I use deepnlp branch is 0.1.5 and this version is not support tensorflow1.0

AttributeError: 'module' object has no attribute 'rnn_cell'.

模型结果(pos_model_bilstm.py)

您好，我在跑您的这个模型的时候，我在计算cost和accuracy。我训练了三十轮。
最终的模型训练的结果为：
Epoch: 30 Train cost:0.080 accuracy:0.984
Epoch: 30 Valid cost: 0.622 accuracy: 0.948
Test cost: 0.172 accuracy: 0.952
这个是比较合理的结果吗？我的训练数据大概10MB，测试和验证都只有5MB
您在跑这个模型的时候最终的结果可以提供一下吗？谢谢。
针对我以上的结果我个人觉得准确率太高了，您可以指点一下吗？是哪里出问题了。

自动文摘的 readme 的連結錯誤

https://github.com/rockingdingo/deepnlp/tree/master/deepnlp/textsum 連結錯誤,
會補回嗎?

运行demo输出是_UNK

运行demo predict.py输出是_UNK ，并且输出结果速度非常慢，请问作者什么原因?tf的版本是0.12 cpu版

TypeError: sampled_loss() got an unexpected keyword argument 'logits'

python3.5 运行 predict 和 headline 报的异常。
Traceback (most recent call last):
File "E:\Users\gzlixiaowei\workspace2\webtest\src\textsumch\predict.py", line 171, in
tf.app.run()
File "E:\python35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "E:\Users\gzlixiaowei\workspace2\webtest\src\textsumch\predict.py", line 153, in main
decode()
File "E:\Users\gzlixiaowei\workspace2\webtest\src\textsumch\predict.py", line 40, in decode
model = create_model(sess, True)
File "E:\Users\gzlixiaowei\workspace2\webtest\src\textsumch\headline.py", line 142, in create_model
forward_only=forward_only)
File "E:\Users\gzlixiaowei\workspace2\webtest\src\textsumch\seq2seq_model.py", line 169, in init
softmax_loss_function=softmax_loss_function)
File "E:\python35\lib\site-packages\tensorflow\contrib\legacy_seq2seq\python\ops\seq2seq.py", line 1221, in model_with_buckets
softmax_loss_function=softmax_loss_function))
File "E:\python35\lib\site-packages\tensorflow\contrib\legacy_seq2seq\python\ops\seq2seq.py", line 1134, in sequence_loss
softmax_loss_function=softmax_loss_function))
File "E:\python35\lib\site-packages\tensorflow\contrib\legacy_seq2seq\python\ops\seq2seq.py", line 1089, in sequence_loss_by_example
crossent = softmax_loss_function(labels=target, logits=logit)

python3.5.2 运行的时候有问题

运行这个 python3 test/test_segment.py
会有异常：
Traceback (most recent call last):
File "test/test_segment.py", line 7, in
segList = segmenter.seg(text)
File "/usr/local/lib/python3.5/dist-packages/deepnlp/segmenter.py", line 28, in seg
model.add((char + "\to\tB").encode('utf-8'))
File "/usr/local/lib/python3.5/dist-packages/CRFPP.py", line 101, in add
def add(self, *args): return _CRFPP.Tagger_add(self, *args)
TypeError: in method 'Tagger_add', argument 2 of type 'char const *'

但是我同样的方法，换成py27 就是正常了。

What is development data?

What is development data such as "dev" in the code

train.txt等预料

您好，可以提供一下train.txt', 'dev.txt', 'test.txt这几个文件吗？谢谢

用LSTM做NER引用的论文是什么

娱乐类型的NER无法使用，模型读取不到

第一次使用tenforflow，出现报错：tensorflow.python.framework.errors_impl.NotFoundError: /mnt/python/pypi/deepnlp/deepnlp/textsum/ckpt

Reading model parameters from /mnt/python/pypi/deepnlp/deepnlp/textsum/ckpt/headline_large.ckpt-48000
Traceback (most recent call last):
File "predict.py", line 171, in
tf.app.run()
File "/home/a/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "predict.py", line 153, in main
decode()
File "predict.py", line 40, in decode
model = create_model(sess, True)
File "/home/a/下载/deepnlp-master/deepnlp/textsum/headline.py", line 149, in create_model
saver.restore(session, tf.train.latest_checkpoint(FLAGS.train_dir))
File "/home/a/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1482, in latest_checkpoint
if file_io.get_matching_files(v2_path) or file_io.get_matching_files(
File "/home/a/anaconda2/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 269, in get_matching_files
compat.as_bytes(filename), status)]
File "/home/a/anaconda2/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/a/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: /mnt/python/pypi/deepnlp/deepnlp/textsum/ckpt

大家谁能帮忙看看这个报错是什么情况呀？？？？运行的是python predict.py

[text_sum] 预处理问题

你好，作者：
看了你写的text_sum，大致与translate的例子相同，看博客介绍了一些数据的一些预处理，但是我在项目中没有找到。想知道作者具体使用的是那个搜狗数据集，如果方便的话，作者能否将搜狗数据集的预处理这些也开放一下呢？在此感谢作者！！！

AttributeError: 'NoneType' object has no attribute 'update'

运行的时候报如下错误：
File "E:\deeplearning\software\Anaconda3\lib\copy.py", line 306, in _reconstruct
y.dict.update(state)
AttributeError: 'NoneType' object has no attribute 'update'
有遇到的吗，求解答？

请问文本摘要还会更新吗？

textsum不见了，后面还会发布吗？

AttributeError: module 'deepnlp.segmenter' has no attribute 'seg'

用pip和source两种方法安装结果都是一样

from deepnlp import segmenter
text = "我爱吃北京烤鸭"
words = segmenter.seg(text)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-786664ee950e> in <module>()
      1 text = "我爱吃北京烤鸭"
----> 2 words = segmenter.seg(text)

AttributeError: module 'deepnlp.segmenter' has no attribute 'seg'

Ubuntu 16.04
Anaconda4.4.0 - Python3.6
~/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/deepnlp
deepnlp.version: '0.1.7'

UnicodeDecodeError

I got this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc2 in position 23: invalid continuation byte
when I run python ner_model.py.
I tried it on ubuntu16.04 and windows10, both in python2 and python3. But I got the same error.
Can anyone help me?

[TextSum]怎么把sohu的数据处理成train dev test的？

http://download.labs.sogou.com/dl/sogoulabdown/SogouCS/news_sohusite_xml.smarty.zip
@rockingdingo Thank you！
尤其是dev数据，这里没有
https://github.com/rockingdingo/deepnlp/tree/master/deepnlp/textsum/news

[ner]跑ner的demo的时候，输出所有的词性都一样，而且每次结果都不同

POS tagging

你好！我想用你的代码训练其他语言的Pos tagger，出现了以下错误。
Epoch: 1 Learning rate: 0.100
Traceback (most recent call last):
File "pos_model.py", line 285, in
tf.app.run()
File "/Users/Altangadas/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "pos_model.py", line 276, in main
verbose=True)
File "pos_model.py", line 225, in run_epoch
if verbose and step % (epoch_size // 10) == 10:
ZeroDivisionError: integer division or modulo by zero

是因为训练语料（train.txt)太小吗？
请指教。

textsum模型可以用来做分类吗？

您好，我最近想用textsum模型来实现一下分类问题，title存放类标签，tag依旧替换，不知道有没有需要注意的地方？对seq2seq模型理解的并不是特别深刻。

Could't match files

在使用 TF 1.0.0，最新的 deepnlp，使用 model 是 textsum 自动摘要。

ERROR:tensorflow:Couldn't match files for checkpoint /mnt/python/pypi/deepnlp/deepnlp/textsum/ckpt/headline_large.ckpt-48000

是需要把 ckpt/ 中解压后的数据手动放进这里一份吗？

感谢

textSum没有了，可以重新提供一下吗？需要参考您的中文文本预处理部分。

模型训练报错* Error in `python': double free or corruption (!prev): 0x000000000094d880 *

您好，运行模型训练代码报错了，想请教一下news 下的dev文件夹内部的content-dev.txt,title-dev.txt作用是什么。另外运行headline.py出错
Creating 4 layers of 256 units.
Reading model parameters from /mnt/python/pypi/deepnlp/deepnlp/textsum/ckpt/headline_large.ckpt-48000
*** Error in `python': double free or corruption (!prev): 0x000000000094d880 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d19d)[0x7f66e78f719d]
/lib64/ld-linux-x86-64.so.2(_dl_deallocate_tls+0x58)[0x7f66e8938038]
/lib64/libpthread.so.0(+0x6e07)[0x7f66e834ae07]
/lib64/libpthread.so.0(+0x6f1f)[0x7f66e834af1f]
/lib64/libpthread.so.0(pthread_join+0xe3)[0x7f66e834cf73]
/lib64/libstdc++.so.6(_ZNSt6thread4joinEv+0x27)[0x7f66b0126b67]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(+0x23b3af0)[0x7f66b2934af0]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(_ZN10tensorflow6thread10ThreadPool4ImplD0Ev+0xb3)[0x7f66b2911693]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(_ZN10tensorflow6thread10ThreadPoolD1Ev+0x1a)[0x7f66b2911cfa]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(_ZN10tensorflow10FileSystem16GetMatchingPathsERKSsPSt6vectorISsSaISsEE+0x599)[0x7f66b2931729]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(_ZN10tensorflow3Env16GetMatchingPathsERKSsPSt6vectorISsSaISsEE+0x9b)[0x7f66b292daab]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(+0xad561b)[0x7f66b105661b]
/usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so(+0xad7800)[0x7f66b1058800]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4594)[0x7f66e8640aa4]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f)[0x7f66e864076f]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f)[0x7f66e864076f]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f)[0x7f66e864076f]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f)[0x7f66e864076f]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f)[0x7f66e864076f]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f)[0x7f66e864076f]
/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed)[0x7f66e86420bd]
/lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f66e86421c2]
/lib64/libpython2.7.so.1.0(+0xfb5ff)[0x7f66e865b5ff]
/lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0x7e)[0x7f66e865c7be]
/lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe9)[0x7f66e865da49]
/lib64/libpython2.7.so.1.0(Py_Main+0xc9f)[0x7f66e866eb9f]
；；；
7f66e7c36000-7f66e7c3b000 rw-p 00000000 00:00 0
7f66e7c3b000-7f66e7d3c000 r-xp 00000000 fd:01 132972 /usr/lib64/libm-2.17.so
7f66e7d3c000-7f66e7f3b000 ---p 00101000 fd:01 132972 /usr/lib64/libm-2.17.so
7f66e7f3b000-7f66e7f3c000 r--p 00100000 fd:01 132972 /usr/lib64/libm-2.17.so
7f66e7f3c000-7f66e7f3d000 rw-p 00101000 fd:01 132972 /usr/lib64/libm-2.17.so
7f66e7f3d000-7f66e7f3f000 r-xp 00000000 fd:01 132860 /usr/lib64/libutil-2.17.so
7f66e7f3f000-7f66e813e000 ---p 00002000 fd:01 132860 /usr/lib64/libutil-2.17.so
7f66e813e000-7f66e813f000 r--p 00001000 fd:01 132860 /usr/lib64/libutil-2.17.so
7f66e813f000-7f66e8140000 rw-p 00002000 fd:01 132860 /usr/lib64/libutil-2.17.so
7f66e8140000-7f66e8143000 r-xp 00000000 fd:01 132971 /usr/lib64/libdl-2.17.so
7f66e8143000-7f66e8342000 ---p 00003000 fd:01 132971 /usr/lib64/libdl-2.17.so
7f66e8342000-7f66e8343000 r--p 00002000 fd:01 132971 /usr/lib64/libdl-2.17.so
7f66e8343000-7f66e8344000 rw-p 00003000 fd:01 132971 /usr/lib64/libdl-2.17.so
7f66e8344000-7f66e835a000 r-xp 00000000 fd:01 132852 /usr/lib64/libpthread-2.17.so
7f66e835a000-7f66e855a000 ---p 00016000 fd:01 132852 /usr/lib64/libpthread-2.17.so
7f66e855a000-7f66e855b000 r--p 00016000 fd:01 132852 /usr/lib64/libpthread-2.17.so
7f66e855b000-7f66e855c000 rw-p 00017000 fd:01 132852 /usr/lib64/libpthread-2.17.so
7f66e855c000-7f66e8560000 rw-p 00000000 00:00 0
7f66e8560000-7f66e86d8000 r-xp 00000000 fd:01 138269 /usr/lib64/libpython2.7.so.1.0
7f66e86d8000-7f66e88d8000 ---p 00178000 fd:01 138269 /usr/lib64/libpython2.7.so.1.0
7f66e88d8000-7f66e88d9000 r--p 00178000 fd:01 138269 /usr/lib64/libpython2.7.so.1.0
7f66e88d9000-7f66e8917000 rw-p 00179000 fd:01 138269 /usr/lib64/libpython2.7.so.1.0
7f66e8917000-7f66e8926000 rw-p 00000000 00:00 0
7f66e8926000-7f66e8947000 r-xp 00000000 fd:01 132819 /usr/lib64/ld-2.17.so
7f66e8a03000-7f66e8a85000 rw-p 00000000 00:00 0
7f66e8ab6000-7f66e8b3d000 rw-p 00000000 00:00 0
7f66e8b43000-7f66e8b46000 rw-p 00000000 00:00 0
7f66e8b46000-7f66e8b47000 r--p 00020000 fd:01 132819 /usr/lib64/ld-2.17.so
7f66e8b47000-7f66e8b48000 rw-p 00021000 fd:01 132819 /usr/lib64/ld-2.17.so
7f66e8b48000-7f66e8b49000 rw-p 00000000 00:00 0
7fff5dc2e000-7fff5dc4f000 rw-p 00000000 00:00 0 [stack]
7fff5dd08000-7fff5dd0a000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
求指教，谢谢！

train processs

hi, i would like to know what is the ending condition during training in textsum task?

关于TensorFlow与python版本选择问题

截至到2017年8月3日，根据网上的网友评论和个人的测试，发现至少在对TextSum的测试中，目前成功率最高的版本配置是Tensorflow 1.0 + Python2.7版本，Tensorflow1.0与其他目前几个主要版本的Python，或者python2.7与其他任何版本的Tensorflow都是无法顺利跑通模型的，还有一个有待检测的版本组合是Tensorflow1.2 + Python3.4 。今天会测试一下，欢迎和我交流，kinlon666 at 163.com

textsum

大神，等你的textsum 更新呢，催更一下

deepnlp

您好，我关注了您并且开始使用deepnlp，但在使用过程中遇到一些问题，可否告知QQ向您请教，感激 @rockingdingo
我的QQ438942304

AttributeError: module 'tensorflow' has no attribute 'AUTO_REUSE'

用pip 安装在python3下，使用pos_tagger出错

~/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/deepnlp/pos_tagger.py in _init_pos_model(self, session)
60 # Check if self.model already exist
61 if self.model is None:
---> 62 with tf.variable_scope(model_var_scope, tf.AUTO_REUSE):
63 self.model = pos_model.POSTagger(is_training=False, config=config) # save object after is_training
64 # Load Specific .data* ckpt file
AttributeError: module 'tensorflow' has no attribute 'AUTO_REUSE'

有关实现的问题

我大致看了一下序列化标注的实现。如果我理解得没错的话，训练时，您是每次输入30个词来训练，而不是整个句子。这样应该是不正确的吧。

ner中所用到的test文件问题

如题：
1、这个文件的测试语句是否要正确格式什么的？目前我准备的是空文件
2、格式是否和训练所用格式一致？
3、共计有分词所用测试文件、词性所用测试文件

NER训练报错

错误如图，训练的文件与POS一致

能否提供一下ner中训练用的语料库

能否提供一下ner中训练用的语料库数据，非常感谢

correct_prediction写错了吧？

在ner里的 ner_model_bilstm.py correct_prediction= tf.equal(tf.argmax(logits, 1), tf.argmax(targets, 0)) 我觉得后面的tf.argmax(targets, 0) 应该修改为（tf.reshape(targets, [-1])）

NotFoundError (see above for traceback): Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint

tagger = ner_tagger.load_model(lang = 'zh')
使用命名实体识别模块进行模型加载的时候报错：
NotFoundError (see above for traceback): Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint
说是没有这个bias偏置参数
在加载tensorflow的时候提示：
Not found: Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint
Not found: Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/kernel not found in checkpoint
在anaconda3\lib\site-packages\deepnlp\ner/ckpt\zh\ner.ckpt中没有ner.ckpt文件
checkpoint文件中的路径路径不存在：
model_checkpoint_path: "/mnt/python/pypi/deepnlp/deepnlp/ner/ckpt/zh/ner.ckpt"
all_model_checkpoint_paths: "/mnt/python/pypi/deepnlp/deepnlp/ner/ckpt/zh/ner.ckpt"
请问一下这个需要加载嘛，或者文件在哪里

现在的网站登录不了了

网站登录不了，无法查看demo了。。。

textsum自动摘要那个代码求教

1,请问大神会不会把代码从tensorflow1.0升级到tensorflow1.2. 我尝试修改您的代码，却没有成功。
2,明明用的gpu的tensorflow,为什么跑程序cpu卡的要死，基本上自己的电脑卡住不动。请问你跑模型的时候的电脑配置。我的是i7-6700hq gtx1070 跑程序完全不够看。程序哪个地方调用cpu这么厉害。
3, 您的训练数据是 1M 大小，还是1G. 1M不可能100万条新闻吧。