Giter Site home page Giter Site logo

wobert's Introduction

WoBERT

以词为基本单位的中文BERT(Word-based BERT)

详情

https://kexue.fm/archives/7758

训练

目前开源的WoBERT是Base版本,在哈工大开源的RoBERTa-wwm-ext基础上进行继续预训练,预训练任务为MLM。初始化阶段,将每个词用BERT自带的Tokenizer切分为字,然后用字embedding的平均作为词embedding的初始化。模型使用单张24G的RTX训练了100万步(大概训练了10天),序列长度为512,学习率为5e-6,batch_size为16,累积梯度16步,相当于batch_size=256训练了6万步左右。训练语料大概是30多G的通用型语料。

此外,我们还提供了WoNEZHA,这是基于华为开源的NEZHA进行再预训练的,训练细节跟WoBERT基本一样。NEZHA的模型结构跟BERT相似,不同的是它使用了相对位置编码,而BERT用的是绝对位置编码,因此理论上NEZHA能处理的文本长度是无上限的。这里提供以词为单位的WoNEZHA,就是让大家多一个选择。

2021年03月03日: 新增WoBERT Plus模型,以RoBERTa-wwm-ext为基础,中文MLM式预训练,重新构建词表(比已经开源的WoBERT更完善),30+G语料,maxlen=512,batch_size=256、lr=1e-5训练了25万步(4 * TITAN RTX,累积4步梯度,是之前的WoBERT的4倍),每1000步耗时约1580s,共训练了18天,训练acc约64%,训练loss约1.80。

依赖

pip install bert4keras==0.8.8

下载

评测

IFLYTEK TNEWS
BERT 60.31 56.94
WoBERT 61.15 57.05
WoBERT Plus 61.92 58.20

引用

Bibtex:

@techreport{zhuiyiwobert,
  title={WoBERT: Word-based Chinese BERT model - ZhuiyiAI},
  author={Jianlin Su},
  year={2020},
  url="https://github.com/ZhuiyiTechnology/WoBERT",
}

联系

邮箱:[email protected] 追一科技:https://zhuiyi.ai

wobert's People

Contributors

zhuiyitechnology avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.