Comments (7)
Bert自动分词的,分出来的token形如:["do", "ing"],这样子,为了避免token出错,出现多个字在同一个list当中,需要加入X这个容错机制,防止label和inputs长度不同,可以理解为中文当中是没用的。
from bert-chinese-ner.
@ProHiryu 照理bert tokenize出来的就是字呀,怎么会有多余的字在token这个list中呢?请解释一下,而且demo中的训练文本也没有打印出这个特例。。
from bert-chinese-ner.
主要是英文下用的比如doing会分成do ##ing, tag只算在前面那个词上, 后面那个词的tag就是X
from bert-chinese-ner.
@roberts-sh 所以X这个tag再中文环境下其实是没用的?
from bert-chinese-ner.
中文当中X并不存在实际意义 @icecity96 , 已经更新代码
from bert-chinese-ner.
中文一般都会混合英文,去掉会有潜在bug风险,建议加上
from bert-chinese-ner.
@shisi2015 多谢提醒
from bert-chinese-ner.
Related Issues (20)
- 你好,在测试集上的结果(精确率,召回率,F1值)没有输出吗 HOT 2
- No such file or directory: './output/label2id.pkl'怎么解决 HOT 1
- 可不可以添加一个license? HOT 1
- killed问题 HOT 1
- 结果全为O问题 HOT 10
- 全局步长未增长,正常吗?
- 请问一下您的环境是py2吗? HOT 5
- 数据集 HOT 2
- tensorflow.python.framework.errors_impl.FailedPreconditionError: output/result_dir/train.tf_record; Is a directory HOT 1
- How to save the model with the best f1 score in verification when training in multiple rounds? HOT 3
- The label_map starts from 1 not 0. How do you avoid getting predicted label == 0 HOT 1
- FileNotFoundError: [Errno 2] No such file or directory: './output/label2id.pkl' HOT 1
- How to get word vector by the fine-tuned Bert? HOT 1
- Hello, I would like to ask, how to use the model to predict the new input data HOT 1
- 如何输出每一类的precision,recall和f1呢? HOT 1
- _read_data 读出来是个空,这段有问题吧??? HOT 1
- 请问,如何使用tenorflow-serving 进行相应的部署呢。您有相关资料或代码吗? HOT 2
- 为什么label_test.txt文件比token_test.txt文件多出许多行呢? HOT 2
- 关于结果
- 训练过程一直持续不停止 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-chinese-ner.