Hadoop map reduce to compute n gram counts
The submission was programmed in python and tested on NYU Dataproc Hadoop Cluster.
To run the code: mapred streaming -input hw1.txt -output -mapper "python mapper.py" -reducer "python reducer.py" -file mapper.py -file reducer.py
--> This will run and output will be stored as <outputfile>
use this file and run: mapred streaming -input -output -mapper "python mapper2.py" -reducer "python reducer2.py" -file mapper2.py -file reducer2.py
The will be stored as a .txt file and we can parse it to check the output
We parse using the command,
hdfs dfs -cat .txt/par*