Comments (10)
同问。词库扩容很重要。
from sego.
是直接拷贝了 jieba 的词库,你直接向词库里添加新词和词频即可,词频可以通过在你的语料中统计得到
from sego.
@huichen 可以说明一下词库的几列具体分别是什么意思吗? 第一个是词语这是知道的,但后面的不是很清楚.
from sego.
三列分别是 词语、在训练语料中的词频、词性
from sego.
词频有没有计算公式?如何获得?
from sego.
@phproot 语料库中简单的出现次数的统计
from sego.
@huichen 语料库在那里呢?是不是可以自己去创建一个语料库,基于大数据?sego有没有类似结巴里面的添加新词的功能呢?
from sego.
你可以把你索引的文档类似的文档拿出来做语料,生成的字典再和这里提供的词典融合一下
from sego.
就是说,我的文章数据库中,有10万条内容。然后把这些文章当做语料,然后生成词典对吧?是使用您开发的mlf来生成吗?
from sego.
@phproot 不是用mlf,你从语料中做文本匹配简单统计即可。
from sego.
Related Issues (20)
- 怎样可以不输出日志? HOT 1
- 分词时应该把原词加入结果
- 分词时应将原词加入到结果 HOT 1
- Hi! 请问sego能忽略标点符号吗?
- 自定义函数toLower,是否可以替换为系统函数?
- 不能获取被匹配到的词么?
- 分词文本文件
- Please add LICENSE
- 错误:当词典只有一个关键词并且该关键词在句首时,无法得到该分词
- 关于并发 HOT 1
- go module 无法加载到字典文件
- 重新载入字典时,新加入的词未加入新的对象中
- 每次都要重复加载词典,加载词典时间又很长 HOT 2
- 请问如何自定义词典,有什么规律吗? HOT 1
- 有没办法支持从filesystem加载字典 例如packr pkger
- 10:00中的10的词性识别成了x
- is there a cpp implementation with a similar algorithms with sego?
- 将文本划分成字元 的时候为什么要吧大写转为小写
- 有可以直接在linux下解压开箱即用的二进制安装包吗?
- 载入sego词典 sego词典载入完毕
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sego.