Using Hive on Movie Lens DB
-
Get the MovieLens 20M Dataset (190 MB compressed) from http://grouplens.org/datasets/movielens/20m/ and extract it into your Hadoop environment.
-
Strip the header line from each file 3 sed -i 1d
-
Copy the files into hdfs hadoop fs -mkdir /movieLens
-
Create a new database in hive (either by invoking hive on the shell or using the web-based Hive query editor 4 ): CREATE DATABASE movieLens; USE movieLens;
-
Create the tables using the createMovieLensTables.hql
-
Load data to populate the tables using the loadMovieLens.hql
-
Enjoy playing with the data.