#Big Data Analytics This is the code repository for Big Data Analytics, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. ##Instructions and Navigations All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
The code will look like the following:
from pyspark import SparkConf, SparkContext
conf = (SparkConf()
.setMaster("spark://masterhostname:7077")
.setAppName("My Analytical Application")
.set("spark.executor.memory", "2g"))
sc = SparkContext(conf = conf)
Practical exercises in this book are demonstrated on virtual machines (VM) from Cloudera, Hortonworks, MapR, or prebuilt Spark for Hadoop for getting started easily. The same exercises can be run on a bigger cluster as well. Prerequisites for using virtual machines on your laptop:
- RAM: 8 GB and above
- CPU: At least two virtual CPUs
- The latest VMWare player or Oracle VirtualBox must be installed for Windows or Linux OS
- Latest Oracle VirtualBox, or VMWare Fusion for Mac
- Virtualization enabled in BIOS
- Browser: Chrome 25+, IE 9+, Safari 6+, or Firefox 18+ recommended (HDP Sandbox will not run on IE 10)
- Putty
- WinScP
The Python and Scala programming languages are used in chapters, with more focus on Python. It is assumed that readers have a basic programming background in Java, Scala, Python, SQL, or R, with basic Linux experience. Working experience within Big Data environments on Hadoop platforms would provide a quick jump start for building Spark applications.
##Related Products
-
Big Data Forensics โ Learning Hadoop Investigations ###Suggestions and Feedback Click here if you have any feedback or suggestions.