Counting word colocations in natural language corpora. This project benchmarks naive implementations of a colocation counter in C++ and Haskell, compiled with G++ and GHC. respectively.
License: Other
Haskell 34.96%C++ 56.85%Shell 8.19%
collocations-benchmark's Introduction
INTRODUCTION
Collocations-Benchmark is the GitHub repository for the blog post
http://blog.mpacula.com/2011/12/18/counting-collocations-ghc-and-g-benchmarked.
Collocations-Benchmark provides Haskell and C++ sources of a
collocation counter: an algorithm that counts which words go close
together in a natural language corpus.
COMPILING
To compile the benchmark binaries, simply call 'make'. To compile
profiling binaries, call 'make prof'. Plots can be generated using
'make all-plots'.
RUNNING
To run the benchmark, run the 'run-benchmark.sh' script without any
arguments. The results will be written to the files in the 'data'
directory.
If you wish to run the binaries directly, use pipes/redirection for
input and output. For example:
cat data/input.txt | ./colocations-cpp > output.txt
AUTHOR
Collocations-Benchmark was originally written by Maciej Pacula
(https://github.com/mpacula). The new Haskell counter was contributed
by Bas van Dijk (https://github.com/basvandijk).