Giter Site home page Giter Site logo

tf-idf's Introduction

Travis CI Build Status

#Tfidf An Elixir implementation of tf-idf

Based on the blog post by Steven Loria

##What is tf-idf?

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining.

tf-idf on Wikipedia

Installation

defp deps do
  [{:tfidf, "~> 0.1.0"}]
end

Usage

Tfidf.calculate(word, text, corpus, tokenize_fn \\ &tokenize(&1))

Calculates the tf-idf for a given word within a text and a corpus (List) of texts.

iex> Tfidf.calculate("dog", "nice dog dog", ["dog hat", "dog", "cat mat", "duck"])
0.19178804830118723

An optional tokenizer function can be passed as the last argument to replace the default tokenizer:

iex> Tfidf.calculate("dog", "nice,dog,dog", ["dog,hat", "dog", "cat,mat", "duck"], &String.split(&1, ","))
0.19178804830118723

=====

Tfidf.calculate(word, tokenized_text, corpus)

Calculates the tf-idf for a given word within a pre-tokenized list and a corpus comprised of pre-tokenized lists.

iex> Tfidf.calculate("dog", ["nice", "dog", "dog"], [["dog", "hat"], ["dog"], ["cat", "mat"], ["duck"]])
0.19178804830118723

=====

Tfidf.calculate_all(text, corpus, tokenize_fn \\ &tokenize(&1))

Calculates the tf-idf for all words in a given text, returns a list of {word, score} tuples.

iex> Tfidf.calculate_all("nice dog", ["dog hat", "dog", "cat mat", "duck"])
[{"nice", 0.6931471805599453}, {"dog", 0.14384103622589042}]

As with Tfidf.calculate/4 an optional tokenizer function can be passed as the last argument. This will be used in place of the default tokenizer.

iex> Tfidf.calculate_all("nice,dog", ["dog,hat", "dog", "cat,mat", "duck"], &String.split(&1, ","))
[{"nice", 0.6931471805599453}, {"dog", 0.14384103622589042}]

tf-idf's People

Contributors

lowks avatar ocannings avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.