Giter Site home page Giter Site logo

amit1nayak / bigdata-ecosystem-architecture Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jayvardhan-reddy/bigdata-ecosystem-architecture

0.0 1.0 0.0 575 KB

Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.

License: MIT License

Shell 69.95% Scala 14.56% TSQL 15.49%

bigdata-ecosystem-architecture's Introduction

BigData Ecosystem Architecture

Internal working of Bigdata and it's ecosystems such as

  • The background process of resource allocation, database connection.
  • How the data is distributed across the nodes.
  • Execution life-cycle on submitting a Job.

** Note: Refer the links metioned below under each ecosystem for detailed explanation **

1. HDFS ๐Ÿ˜

The various underlying process that takes place during the storage of a file into HDFS such as:

  • Type of scheduler

  • Block & Rack information

  • File size

  • File location

  • Replication information about the file(Over-replicated blocks, Under-replicated blocks, ...)

  • Health status of the file

Please click on the link below to know the execution and flow process

๐Ÿ”— HDFS Architecture in Depth

2. SQOOP :octocat:

Used to perform 2 main operations.

  • Sqoop Import:

    • To ingest data from any source such as traditional databases into hadoop file system HDFS
  • Sqoop Export:

    • To export data from hadoop file system HDFS to any traditional databases

To support the above two operations internally a CodeGen is used.

  • Sqoop CodeGen:

    • To compile metadata and other relative information into java class file & create a Jar

Please click on the link below to know the execution and flow process

๐Ÿ”— SQOOP Architecture in Depth

3. HIVE ๐Ÿ

It has mainly 4 components

  • Hadoop core components(Hdfs, MapReduce)

  • Metastore

  • Driver

  • Hive Clients

Please click on the link below to know the execution and flow process

๐Ÿ”— HIVE Architecture in Depth

4. SPARK ๐Ÿ’ฅ

The various phases involved before and during the execution of a spark job.

  • Spark Context

    • It is the heart of spark application.
  • Yarn Resource Manager, Application Master & launching of executors (containers).

  • Setting up environment variables, job resources.

  • CoarseGrainedExecutorBackend & Netty-based RPC.

  • SparkListeners.

    • LiveListenerBus
    • StatsReportListener
    • EventLoggingListener
  • Execution of a job

    • Logical Plan (Lineage)
    • Physical Plan (DAG)
  • Spark-WebUI.

Please click on the link below to know the execution and flow process

๐Ÿ”— SPARK Architecture in Depth

4.1 SPARK Abstraction Layers & Internal Optimization Techniques used ๐Ÿ’ฅ

It has 3 different variants as part of it.

  • RDD (Resilient Distributed Datasets)

    • Lineage Graph
    • DAG Scheduler
  • DataFrames

    • Catalyst Optimizer
    • Tungsten Engine
    • Default source or Base relation
  • Datasets

    • Optimized Tungsten Engine - V2
    • Whole Stage Code Generation

5. HBASE ๐Ÿ‹

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.