Giter Site home page Giter Site logo

dekusi2018 / phrasesearch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cloudxlab/phrasesearch

0.0 0.0 0.0 7.04 MB

You can use phrasesearch to searching huge list of phrases in huge set of documents

Scala 17.15% Python 25.08% CSS 9.95% XSLT 47.81%

phrasesearch's Introduction

Say, you have a huge list of phrases and you want to search these phrases in a huge dataset.

Phrases:

id phrase
1 pot boiler
2 hot cake
3 cold turkey
4 to be or not to be
5 ice ice baby

Content:

id text
1 Earlier the work he did was a hot cake but once he quit cold turkey he never received a call from the publishers boiler
2 The question to be or not to for him was to be or not to be on look out for pot boiler cold
3 ice baby ice
4 ice ice ice baby baby ice ice baby

Result:

(Phrase Id, Content Id) Result (Word Positions in Content)
(4, 2) [10]
(5, 4) [1, 5]
(2, 1) [7]
(3, 1) [13]
(1, 2) [20]

The code is available in both python and scala in the respective folders.

Note:

The work is still heavily under developement. The code is not yet modular. If you want to use it right now, you might have to fiddle around the code.

TODO:

  • Make Modular
  • Write comments
  • Remove the actual phrase from key

phrasesearch's People

Contributors

sandeepcxl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.