Requirements:
1.zip, tar, git command
2.Internet access to download files from preview.unraveldata.com and hive-testbench from github
3.Python 2.6 and newer (Python 2.6 requires argparse module)
To run the examples:
python playground-examples.py
To run part of the example use one of the following arguments:
[-spark] [-hive] [-workflow] [--spark-steaming] [-impala] [-autoaction]
By default all server address are using localhost:
[--hive-host] [--hdfs-master] [--impala-server]
There are these 3 arguments to controll which query/example to run: [--dateset-size] [--impala-query] [--spark-example]
In order to get hive before and after result Set these properties in Clouder Manager (CM) and restart services:
- yarn.scheduler.minimum-allocation-mb to 256 Mib
- yarn.scheduler.increment-allocation-mb to 128 MiB