Author: Rahmat Peter I. Dabalos
README:
Setup Hadoop:
- Create 4 instances in P2C(http://srg.ics.uplb.edu.ph/projects/peak-two-cloud/peak-two-cloud-resources/user-guide). 1 Masternode 3 Slavenodes
- Edit all the /etc/hosts with all the ip-addresses of each instance
- Extract hadoop-2.7.2 in /usr/local/hadoop for Masternode, and Slavenodes
- make ssh passwordless for each instance
- put into .bashrc the bin path of hadoop
- Configure Hadoop config files: follow instructions from : http://chaalpritam.blogspot.com/2015/05/hadoop-270-single-node-cluster-setup-on.html
Running Naive Bayes / Dictionary Based
- Run start-all.sh
- make directories(hadoop fs -mkdir path) /user/ubuntu/dictionary /user/ubuntu/dataset /user/ubuntu/Final/Final
- put into hdfs(hadoop fs -put file) hadoop fs -put negative-words-combined.text /user/ubuntu/dictionary/ hadoop fs -put positive-words-combined.text /user/ubuntu/dictionary/ hadoop fs -put stop-words.text /user/ubuntu/dictionary/ hadoop fs -put positive-text negative-text /user/ubuntu/dataset hadoop fs -put finalData /user/ubuntu/Final/Final
- run ./compileandrun.sh from either bayes(Naive Bayes) or wordcountprof(Dictionary Based) folder
- results will be in hdfs /user/ubuntu/output/ for Dictionary Based and /user/ubuntu/output2/ for Naive Bayes
Gathering Data:
- Rfacebook: follow instructions from this link: http://pablobarbera.com/blog/archives/3.html
- twitteR: follow instruction from this link: http://www.r-bloggers.com/getting-started-with-twitter-in-r/
- Apache Flume and Apache Hive: from this link: http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html
WordCloud tool: tagul.com Bell Curve Plot: used excel