The big-data-analytics from packtpublishing

big-data-analytics's Introduction

#Big Data Analytics This is the code repository for Big Data Analytics, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. ##Instructions and Navigations All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

from pyspark import SparkConf, SparkContext
conf = (SparkConf()
.setMaster("spark://masterhostname:7077")
.setAppName("My Analytical Application")
.set("spark.executor.memory", "2g"))
sc = SparkContext(conf = conf)

Practical exercises in this book are demonstrated on virtual machines (VM) from Cloudera, Hortonworks, MapR, or prebuilt Spark for Hadoop for getting started easily. The same exercises can be run on a bigger cluster as well. Prerequisites for using virtual machines on your laptop:

RAM: 8 GB and above
CPU: At least two virtual CPUs
The latest VMWare player or Oracle VirtualBox must be installed for Windows or Linux OS
Latest Oracle VirtualBox, or VMWare Fusion for Mac
Virtualization enabled in BIOS
Browser: Chrome 25+, IE 9+, Safari 6+, or Firefox 18+ recommended (HDP Sandbox will not run on IE 10)
Putty
WinScP

The Python and Scala programming languages are used in chapters, with more focus on Python. It is assumed that readers have a basic programming background in Java, Scala, Python, SQL, or R, with basic Linux experience. Working experience within Big Data environments on Hadoop platforms would provide a quick jump start for building Spark applications.

##Related Products