Giter Site home page Giter Site logo

xiaohuangniu / analysis Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 1.0 34 KB

写这个插件的初衷很简单,之前一直用的 PHPAnalysis插件,不单精确度差,而且性能跟效率都差强人意,所以就自己动手写了这个迷你版的分词插件

PHP 100.00%

analysis's Introduction

PHP5.4 实现迷你分词插件

小黄牛

环境要求

  • 只测试了Apache2.4、PHP5.4

  • 读取词库依赖函数:file_get_contents();建议每一个词库大小不要超过500KB,这样效率性能可以达到最大化。

分词插件详细说明

  • 1、本分词插件主要依赖与词库词典检索,可进行多个词典的配置,词库文件主要存放在【minppl/lexicon/】文件夹下,用【.txt】文本存放,每一个词之间用【|】符合分割,并且要求【无bom】文件头。

  • 2、插件在检索不到任何关键词时,可进行按位截取

  • 3、使用Demo如下:

require 'minppl/Minppl.class.php';

# 实例化分词类
$obj  = new Minppl();
/**
 * 调用分词
 * @param string $key     需要被分词的目标字符串
 * @param array  $lexicon 需要用到的分词库,一维数组
 * @param bool   $sort    分词结果字数排序,true|false,长|短,默认为false
 * @param int    $num     匹配到的分词最大返回条数,默认5
 * @param bool   $mode    分词库匹配不到关键词的情况下,是否启动解词算法,默认true
 * @param int    $words   启动解词算法下的关键词长度,默认2
 * @return array|bool     分词结果或false
*/
$data = $obj->__Initialize('阿杜最爱快乐大本营:快乐家族', [
	'1-mingxing.txt',
	'2-mingxing.txt',
], false, 5, true, 2);
echo '<pre>';
var_dump($data);

# 开启调试模式
$obj->De_bug();
# 打印调试内容-错误信息与运行时间,内存消耗
$obj->Log_echo();

analysis's People

Contributors

xiaohuangniu avatar

Stargazers

 avatar Basstorm avatar  avatar wekeey avatar  avatar  avatar coachlala avatar  avatar

Watchers

James Cloos avatar  avatar

Forkers

sujinw

analysis's Issues

Try JieBa

JieBa is the best Chinese word separating tool!

(Running away...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.