Giter Site home page Giter Site logo

clust's Introduction

clust

Cluster ngrams in Python. Clustering is done using Scipy's hierarchy clustering.

Usage

cluster_ngrams(ngrams, compute_distance, max_dist, method)

Returns a list of ngrams in each cluster.

  • ngrams: [list] List of ngrams to cluster. Ex: [['my', 'cat', 'ran'], ['i', 'like', 'trigrams']]
  • compute_distance: [func] Distance function that takes two ngrams as input and returns the distance between them. This package includes a function that sums the Damerau–Levenshtein distance between the words in both ngrams as dl_ngram_dist
  • max_dist: [float] If the distance between two clusters is more than max_dist, then the clusters will not be merged together.
  • method: [string] Method for clustering. 'single', 'complete', 'average', 'centroid', 'median', 'ward', or 'weighted'. See the Scipy docs for details.

Example

>>> from clust import cluster_ngrams, dl_ngram_dist
>>> ngrams = [['from', 'my', 'house'],                                                  
...['from', 'my', 'hose'],                                                   
...['he', 'was', 'eating'],
...['she', 'was', 'eating'],
...['fell', 'asleep', 'on'],                                                 
...['moved', 'to', 'a'],
...['rom', 'my', 'house'],
...['from', 'my', 'house']]
>>> cluster_ngrams(ngrams, dl_ngram_dist, max_dist=3, method='single')
[[['fell', 'asleep', 'on']], 
[['moved', 'to', 'a']], 
[['he', 'was', 'eating'], ['she', 'was', 'eating']], 
[['rom', 'my', 'house'], ['from', 'my', 'hose'], ['from', 'my', 'house'], ['from', 'my', 'house']]]

clust's People

Contributors

smilli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.