Comments (2)
For English, the tokens are same.
But for Chinese, the tokens are different when I use the same run_classifier.pyUsing https://github.com/google-research/bert
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: train-5
INFO:tensorflow:tokens: [CLS] 1 。 我 住 的 是 靠 马 路 的 标 准 间 。 房 间 内 设 施 简 陋 , 并 且 的 房 间 玻 璃 窗 户 外 还 有 一 层 幕 墙 玻 璃 , 而 且 不 能 打 开 , 导 致 房 间 不 能 自 然 通 风 , 采 光 不 好 。 [SEP]Using your project
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: train-5
INFO:tensorflow:tokens: [CLS] 1 。 我 ##住 ##的 ##是 ##靠 ##马 ##路 ##的 ##标 ##准 ##间 。 房 ##间 ##内 ##设 ##施 ##简 ##陋 , 并 ##且 ##的 ##房 ##间 ##玻 ##璃 ##窗 ##户 ##外 ##还 ##有 ##一 ##层 ##幕 ##墙 ##玻 ##璃 , 而 ##且 ##不 ##能 ##打 ##开 , 导 ##致 ##房 ##间 ##不 ##能 ##自 ##然 ##通 ##风 , 采 ##光 ##不 ##好 。 [SEP]
Hi, this tutorial was meant for English projects, which has a different model and vocab. If you are running Chinese projects, please see the BERT Chinese and BERT Multilingual, however, BERT Chinese would be the better choice if only Chinese data is considered. The Chinese tokenisation is quite different.
from bert-classification-tutorial.
yeah, I use the BERT Chinese. Seems they change the tokenize function on Nov. 1, using ' ' instead of '#'.
from bert-classification-tutorial.
Related Issues (20)
- 关于预处理的问题 HOT 3
- 关于预测准确率 HOT 29
- issues about new data which 7 classification HOT 8
- 怎么使用GPU模式的 HOT 7
- 保存训练过程dev set准确率
- do_eval的问题 HOT 1
- 中文乱码 HOT 1
- 请问, 哪里可以看到损失函数?
- max_seq_length的最大值不超过512
- 验证的精度,只有0.1,为什么? HOT 4
- 用模型预测最终生成的文件问题 HOT 4
- mrpc的训练数据在哪里下载
- emmm 这里的classification 好像不止改了一点点
- 能导出环境配置文件?跑了你的项目报错了
- eval_drop_remainder = True if FLAGS.use_tpu else Falsed HOT 2
- 关于多文本分类任务 HOT 1
- 问题 HOT 5
- 关于中文二分类问题 HOT 5
- 关于显卡显存 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-classification-tutorial.