Giter Site home page Giter Site logo

zdx / data-algorithms-book Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mahmoudparsian/data-algorithms-book

0.0 2.0 0.0 17.1 MB

MapReduce and Spark Source Code and Scripts for Data Algorithms Book

License: Other

Shell 2.59% Java 97.30% Python 0.11%

data-algorithms-book's Introduction

Data Algorithms: Recipes for Scaling up with Hadoop and Spark

Data Algorithms Book

Author book signings for ("Data Algorithms") will be held in the O'Reilly booth on Thursday, Feb. 19, 2015. Complimentary copies of books will be provided for the first 25 attendees.

I have started adding bonus chapters.

Repository

This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of distributed MapReduce algrithms, which are implemented using

  • Java (JDK7)
  • Spark 1.2.0
  • MapReduce/Hadoop 2.6.0

Work in Progress...

Please note that this is a work in progress...

Data Algorithms Book Work In Progress

URL To Data Algorithms Book

Source Code

  • All source code, libraries, and build scripts are posted here
  • Shell scripts are posted for running Spark and Mapreduce/Hadoop programs (in progress...)

Software Used

Software Version
Java JDK7
Hadoop 2.6.0
Spark 1.2.0
Ant 1.9.4

Structure of Repository

Name Description
README.md The file you are reading now
README_lib.md Must read before you build with Ant
src Source files for MapReduce/Hadoop/Spark
scripts Shell scripts to run MapReduce/Hadoop and Spark pograms
lib Required jar files for compiling source code
build.xml The ant build script
dist The ant build's output directory (creates a single JAR file)
LICENSE License for using this repository (Apache License, Version 2.0)
misc misc. files for this repository
setenv example of how to set your environment variables before building
data sample data files (such as FASTQ and FASTA) for basic testing purposes

Structure of src Directory

src directory

Also, each chapter has two sub folders:

org.dataalgorithms.chapNN.spark      (for Spark programs)
org.dataalgorithms.chapNN.mapreduce  (for Mapreduce/Hadoop programs)

where NN = 00, 01, ..., 31

How To Build using Apache's Ant

How To Build by Ant

Sample Builds by Ant

How To Run Spark/Hadoop Programs

How To Run Python Programs

To run python programs just call them with spark-submit together with the arguments to the program.

Questions/Comments

Thank you!

best regards,
Mahmoud Parsian

data-algorithms-book's People

Contributors

kashif avatar mahmoudparsian avatar pyspark-in-action avatar slangeberg avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.