Giter Site home page Giter Site logo

sadit / textsearch.jl Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 5.0 2.68 MB

Searching methods and models for textual data; it was designed to work with SimilaritySearch.jl

License: MIT License

Julia 100.00%
text-search similarity-search textmodel term-weighting-schemes vector-representations

textsearch.jl's Introduction

Dev Build Status Coverage

TextSearch

TextSearch.jl is a package to create vector representations of text, mostly, independently of the language. It is intended to be used with SimilaritySearch.jl, but can be used independetly if needed. TextSearch.jl was renamed from TextModel.jl to reflect its capabilities and mission.

For generic text analysis you should use other packages like TextAnalysis.jl.

It supports a number of simple text preprocessing functions, and three different kinds of tokenizers, i.e., word n-grams, character q-grams, and skip-grams. It supports creating multisets of tokens, commonly named bag of words (BOW). TextSearch.jl can produce sparse vector representations based on term-weighting schemes like TF, IDF, and TFIDF. It also supports term-weighting schemes designed to cope text classification tasks, mostly based on distributional representations.

Installing

You may install the package as follows

] add TextSearch

also, you can run the set of tests as follows

] test TextSearch

Using the library

The directory examples contains a few examples of how to use it, based on Pluto.jl

After cloning the repository, you must intantiate the directory.

using Pkg
pkg"instantiate"

once you instantiated your environment, just run Pluto notebook and explore the examples

using Pluto
Pluto.run()

textsearch.jl's People

Contributors

github-actions[bot] avatar juliatagbot avatar kyriox avatar sadit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

textsearch.jl's Issues

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Info about upcoming removal of packages in the General registry

As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.

To transition to the new Pkg system using Project.toml, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.

If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.