mikekiwa / hadoop-hive-project Goto Github PK
View Code? Open in Web Editor NEWThis project forked from simalince/hadoop-hive-project
Hadoop Hive Project
This project forked from simalince/hadoop-hive-project
Hadoop Hive Project
This file contains instructions on how to run the 2 Hive programs provided in this project. An input file is needed in order to run both scripts. Here is the instruction on how to generate one: - Run Python script randomDataGenerator.py. File input.tsv will be generated. $ python randomDataGenerator.py > input.tsv An initialization script needs to be executed in order to create the underlying database and table structure for the 2 Hive programs below. - Run Hive script init.hql by providing your HDFS directory as part of the hiveconf parameter HDFS_LOC and the data input file name as part of the hiveconf parameter INPUT_FILE. Here, HDFS_LOC is set to /user/ince/input, INPUT_FILE is set to input.tsv $ hive -hiveconf HDFS_LOC=/user/ince/input -hiveconf INPUT_FILE=input.tsv -f init.hql โ- Hive Programs in this Project โ- 1. Most Popular Courses - If not already done, execute the initialization step described above. Then, run Hive program mostPopularCourses.hql by providing N as part of the hiveconf variable N. Here, N is set to 3. Output file mostPopularCoursesOutput.tsv will be generated. $ hive -hiveconf N=3 -f mostPopularCourses.hql > mostPopularCoursesOutput.tsv 2. Consumption Summary per Course - If not already done, execute the initialization step described above. Then, run Hive program consumptionSummary.hql. Output file consumptionSummaryOutput.tsv will be generated in the second hive call. See consumptionSummary.hql for details on why the output file generation is executed as an additional step. $ hive -f consumptionSummary.hql > consumptionSummaryOutput.tsv $ hive -e 'SELECT * FROM videoDB.course_activity' > consumptionSummaryOutput.tsv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.