Giter Site home page Giter Site logo

codeplea / ahocorasickphp Goto Github PK

View Code? Open in Web Editor NEW
178.0 10.0 15.0 240 KB

Aho-Corasick multi-keyword string searching library in PHP.

License: zlib License

PHP 98.91% Tcl 1.09%
algorithm aho-corasick ahocorasick string-search search-algorithm php

ahocorasickphp's Introduction

Aho Corasick in PHP

This is a small library which implements the Aho-Corasick string search algorithm.

It's coded in pure PHP and self-contained in a single file, ahocorasick.php.

It's useful when you want to search for many keywords all at once. It's faster than simply calling strpos many times, and it's much faster than calling preg_match_all with something like /keyword1|keyword2|...|keywordn/.

I originally wrote this to use with F5Bot, since it's searching for the same set of a few thousand keywords over and over again.

Usage

It's designed to be really easy to use. You create the ahocorasick object, add your keywords, call finalize() to finish setup, and then search your text. It'll return an array of the keywords found and their position in the search text.

Create, add keywords, and finalize():

require('ahocorasick.php');

$ac = new ahocorasick();

$ac->add_needle('art');
$ac->add_needle('cart');
$ac->add_needle('ted');

$ac->finalize();

Call search() to preform the actual search. It'll return an array of matches.

$found = $ac->search('a carted mart lot one blue ted');
print_r($found);

$found will be an array with these elements:

[0] => Array
    (
        [0] => cart
        [1] => 2
    )
[1] => Array
    (
        [0] => art
        [1] => 3
    )
[2] => Array
    (
        [0] => ted
        [1] => 5
    )
[3] => Array
    (
        [0] => art
        [1] => 10
    )
[4] => Array
    (
        [0] => ted
        [1] => 27
    )

See example.php for a complete example.

Speed

A simple benchmarking program is included which compares various alternatives.

$ php benchmark.php
Loaded 3000 keywords to search on a text of 19377 characters.

Searching with strpos...
time: 0.38440799713135

Searching with preg_match...
time: 5.6817619800568

Searching with preg_match_all...
time: 5.0735609531403

Searching with aho corasick...
time: 0.054709911346436

Note: the regex solutions are actually slightly broken. They won't work if you have a keyword that is a prefix or suffix of another. But hey, who really uses regex when it's not slightly broken?

Also keep in mind that building the search tree (the add_needle() and finalize() calls) takes time. So you'll get the best speed-up if you're reusing the same keywords and calling search() many times.

ahocorasickphp's People

Contributors

codeplea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ahocorasickphp's Issues

update your utility

update your utility, follow the PSR, for tests use PHPunit and also connect Composer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.