A simple command line tool for comparing text files using the minhash algorithm and contrasting with the jaccard index.
npm install
npm link
Or if you would like to install globally
npm install https://github.com/sjhorn/node-minhash -g
Using node
minhash file1.txt file2.txt
minhash https://file.com/page1.html https://file.com/page2.html
var minhash = require('node-minhash');
minhash.summary(string1, string2);
Compare two text strings using both minhash and jaccard index and print a summary
Compare two text strings using both minhash and jaccard index
Convert string to set of shingles using the default of 2 words per shingle and tokenise using the natural libraries default tokeniser.
Compare two strings by tokenising and then compare the intersection of shingles to the union of shingles.
Convert a set of shingles to a set of crc-32 hashes.