- Version 1.0 12/01/2014
- Author: Lei Xia
- Email: [email protected]
- main.java
- hive.sql
- mysql.sql
- most_common.java
- most_common_by_frequency.java
- Node.java
- Node_for_avg.java
- word_length.java
- year.java
Program can be built using default make argument in Eclipse/Intelliji IDEA
- For Java program, first locate the data file path, then compile the program, simply run the program in the command line by typing following in the shell:
- java main [input_data_file_path] or run the program in Eclipse or Intelliji IDEA by simply clicking the run button, then follow the prompt of the program
- For Hive query, first locate the data file in HDFS, then run the Hive script by typing following:
- hive -e f hive.sql
- For mysql script, the same as above
- Project Name: Google Ngram
- Description: Analyze the ngram data from Google to find out all the detail of the data,including:
- the information of all the words' length(min, max, med, avg, std, etc)
- the information of all the words' frequency(min,max,med,avg,std,etc)
- the information of all the total year the word apprears(min, max, med, avg, std,etc)
- the most common words according to its years' count
- the most common words according to its total frequency
- Input/Output:
- Input: the file path of the data.
- Eg, $java main ./file_path
- Output: display of the result according to the user's choose