Giter Site home page Giter Site logo

mannieg / phpflashtext Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shdev/phpflashtext

0.0 0.0 0.0 1.24 MB

Extract Keywords from sentence or Replace keywords in sentences. @ https://github.com/vi3k6i5/flashtext

License: MIT License

PHP 100.00%

phpflashtext's Introduction

Flashtext for PHP

Build Status Coverage Status

It's a port from the wonderful python project https://github.com/vi3k6i5/flashtext, for internals of the algorithm look there.

This algorithm allows you to extract or replace several keywords at ones. If you deal with 300 keywords, which have 5 variants each a regex approach is slower than the flashtext approach. For 1000 keyword with 5 variants each the regex can't be build.

In PHP 5.6 using regex is really slow. In newer verions it performs better.

Install

composer require shdev/phpflashtext

Usage

<?php

use Shdev\FlashText\KeywordProcessor;

$keywordProcessor= new KeywordProcessor();

$keywords = [
	'java'               => ['java_2e', 'java programing'],
	'product management' => ['product management techniques', 'product management'],
];

$keywordProcessor->addKeywordsFromAssocArray($keywords);

$sentence = 'I know java_2e and product management techniques';

$keywordsExtracted = $keywordProcessor->extractKeywords($sentence);
// $keywordsExtracted = ['java', 'product management']

$keywordsExtractedWithSpanInfo = $keywordProcessor->extractKeywords($sentence, true);
// $keywordsExtractedWithSpanInfo = [
//	['java', 7, 14],
// 	['product management', 19, 48],
//]


$sentenceNew = $keywordProcessor->replaceKeywords($sentence);
// $sentenceNew = 'I know java and product management';

Citation

The original paper published on FlashText algorithm.

    @ARTICLE{2017arXiv171100046S,
       author = {{Singh}, V.},
        title = "{Replace or Retrieve Keywords In Documents at Scale}",
      journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
       eprint = {1711.00046},
     primaryClass = "cs.DS",
     keywords = {Computer Science - Data Structures and Algorithms},
         year = 2017,
        month = oct,
       adsurl = {http://adsabs.harvard.edu/abs/2017arXiv171100046S},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
    }

The article published on Medium freeCodeCamp.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.