Files:
- /input/football.txt : Tweets on the topic Soccer
- /output1/part-r-00000 : Output of MapReduce
- /Tweets.ipynb : R code to obtain the tweets
- /WordCloud.ipynb : R code to obtain the wordcloud
- /WordCloudPhoto.png : Wordcloud
- /WordCount.java : MapReduce program to obtain the wordcount
- /WordCountTweet.jar : Jar corresponding to WordCount.java
How to Run:
- Put the 'input' folder from folder in hdfs using the command : hdfs dfs –put ~/input/ ~/
- Run the jar 'WordCOuntTweets.jar' using the command: hadoop jar WordCountTweets.jar WordCount ~/input ~/output1
- The output will be stored in ~/output1 in hdfs.
- To get the output1 folder into local current directory : hdfs dfs –get ~/output1
- To view the output on terminal : hdfs dfs -cat ~/output1/part-r-00000
Run the WordCloud visualization code:
- Open jupyter and access the 'WordCloud.ipynb' present in /Ancillary/Part1/
- Check if the input file 'MR_Output.csv' is present in the same folder as 'WordCloud.ipynb'
- Run the notebook cell by cell to view the input.