Giter Site home page Giter Site logo

hadoop-1's Introduction

hadoop

Hadoop Cluster Configurations

This is intended to help Hadoop Users, specifically users with System Administration background to setup Hadoop quickly and efficiently.

The config files are from running cluster. Feel free to use them, but please drop an email with your feedback.

I have uploaded a 64-bit version of the latest stable release hadoop-2.9.2 to google drive.

For any help you can reach me at: [email protected]

I provide Advanced Hadoop Administration, DevOps, HBase, Kafka and other traings.

Advanced Hadoop Training: I will be covering topics like: detailed kerberos, Encryption, Centerlized caching, Storage policy, Ranger, Knox, Hadoop Performance Tuning and Production Use cases. Contact me for details.

"Doing a course is not a guarantee for a job, but having a solid foundation surely is"

Course 1: Hadoop Administration

This is Hadoop Administration course, for which you see all the configs in this github.

Demo: https://www.youtube.com/channel/UC6vfYICj0azZkuc5sVw71PA

Course 2: Advanced Hadoop: Performance Tuning and Security

Duration: 24 hours

Module 1: Hadoop High Availability for HDFS and Resource Manager.

− Using both JQM and Shared storage.

  • Zookeeper Details.

Module 2: Hadoop Queuing and pools details.

− Fair and Capacity Scheduler details. − Dynamic pool configuration. − User management and LDAP integration.

  • Dynamic shares and scheduling policies.

Module 3: HDFS Advanced Features

− Hadoop Centralised Caching. − Hadoop Storage Policy and Archive Storage. − Hadoop memory as storage tier. − HDFS Extended Attributes. − HDFS Short circuit Read. − Quotas per storage type. − Snapshots and HDFS over NFS.

  • Yarn Labels

Module 4: In-depth Performance tuning and Cluster Sizing. − JVM tuning for Hadoop. − HDFS and MapReduce Tuning. − Network tuning.

  • YARN Performance tuning and details on parameters.

Module 5: Hadoop Security. − Hadoop Knox or any other security tool. − Detailed kerberos setup for securing Hadoop. − Hadoop Encryption at rest.

Module 6: Hadoop Upgrade and Production use cases. − Hadoop Rolling upgrade. − Phoenix details and setup. − HDFS Configuration for multihoming. − Namenode Recovery scenarios − Common production Issues.

Module 7: HBASE and Hive. − Hbase Administration and troubleshooting. − Hive and Hbase recovery and upgrades.

  • HBase and Hive production use cases and common issues.

Course 3: Advanced Hadoop: Data Lake and Streaming

Duration: 24 hours

Module 1: Using Hadoop as an warehouse. − Data policies.

  • Various ingestion and extraction methods.
  • Archiving policies

Module 2: Flume Configuration. − Flume Installation and Configuration. − Flume channels and various formats. − Flume twitter use case.

Module 3: Data Ingestion using Sqoop and Hive − Sqoop details.

  • MySql imports and exports
  • Tuning Sqoop
  • Hive details and intergation with Sqoop
  • Hbase integration

Module 4: Spark installation and Configuration. − Spark Architecture

  • Spark standalone mode setup. − Spark in YARN mode. − Spark use cases and programs.

Module 5: Data Pipleline − Understand Kafka architecture and configuration. − Building a Kafka Data pipeline. − Example and common issues.

  • Integrating Spark with Kafka.

Module 6: Storm Architecture. − Storm Cluster Setup. − Storm Use Cases. − Adding Storm to the Data Pipeline. − Storm performance tuning.

Module 7: Project.

Course 4: Advanced Hadoop: HBase performance optimization and Row Key Design

This is a advanced course and is expected that he user has a good hold on Hadoop platform with HDFS, OS knowledge.

  • HBase Architecture Details
  • HBase TroubleShooting
  • HBase use cases.
  • HBase Row key Design
  • HBase coprocessors
  • HBase replica
  • HBase Kerberos Setup

hadoop-1's People

Contributors

netxillon avatar navdeepkooner avatar gdhillon avatar

Watchers

Vijay avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.