Giter Site home page Giter Site logo

nlp's Introduction

Cosine - The purpose of this module is to make it easy to evaluate cosine similarity for a set of text sentences. This is a helper that invokes NLTK library for cosine similarity but makes it simple by:

  1. Accepting a list of text messages as a Python list and performing vectorization internally
  2. Allowing some preprocessing such as stemming and lemmatization

To compute cosine similarity for a set of text messages, instantiate the class Cosine and invoke the compute_similarity method with the list of text messages as input.

API:

values compute_similarity(messages, stem = True, lemm = True)

Inputs:

messages - this is a Python list where each element is a text message stem - this indicates whether stemming needs to be done on the input and this defaults to True lemm - this indicates whether lemmatization needs to be done on the input and this defaults to True

Outputs:

values - this is a n x n matrix of similarity measures, that is implemented as a list of Python list. The element of each inner list is a 2 element Python tuple of the form (angle, cosine value)

Example of values for input containing 4 messages:

[ [(-2.2204460492503131e-16, 1.0), (1.0, 0.5403023058681398), (0.71132486540518713, 0.7574976186657894), (1.0, 0.5403023058681398)], [(1.0, 0.5403023058681398), (0.0, 1.0), (0.29289321881345254, 0.9574125437190454), (0.0, 1.0)], [(0.71132486540518713, 0.7574976186657894), (0.29289321881345254, 0.9574125437190454), (2.2204460492503131e-16, 1.0), (0.29289321881345254, 0.9574125437190454)], [(1.0, 0.5403023058681398), (0.0, 1.0), (0.29289321881345254, 0.9574125437190454), (0.0, 1.0)] ]

Usage Example:

if name == 'main': cosine = Cosine() messages = []

while(1):
    t = raw_input("Enter a text or q to quit: ")
    if (t == 'q') or (t == 'quit') or (t == 'qui'):
        break
    messages.append(t)
values = cosine.compute_similarity(messages)
for val in values:
    for v in val:
        print '%.3f\t' % v[1],
    print '\n'

nlp's People

Contributors

ananthpn avatar mmynk avatar mohit-surana avatar

Watchers

Vishwas Kumar Singh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.