Giter Site home page Giter Site logo

keroloswilliam / apache-hadoop-helm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pfisterer/apache-hadoop-helm

0.0 0.0 0.0 107 KB

Helm chart for Apache Hadoop using multi-arch docker images

License: Apache License 2.0

Dockerfile 52.51% Mustache 47.49%

apache-hadoop-helm's Introduction

Hadoop Chart

This chart is modified from stable/hadoop and mgit-at/helm-hadoop-3 and has been updated to:

  • run use multi-architecture Docker image and
  • use the currently latest version of Hadoop.

This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift.

Chart Details

Installing the Chart

To install the chart with the release name hadoop:

helm helm repo add pfisterer-hadoop https://pfisterer.github.io/apache-hadoop-helm/
helm install --name hadoop pfisterer-hadoop/hadoop

Configuration

The following table lists the configurable parameters of the Hadoop chart and their default values.

Parameter Description Default
image.repository Hadoop image farberg/apache-hadoop
image.tag Hadoop image tag 3.3.2
imagee.pullPolicy Pull policy for the images IfNotPresent
hadoopVersion Version of hadoop libraries being used 3.3.2
antiAffinity Pod antiaffinity, hard or soft hard
hdfs.nameNode.pdbMinAvailable PDB for HDFS NameNode 1
hdfs.nameNode.resources resources for the HDFS NameNode requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m
hdfs.dataNode.replicas Number of HDFS DataNode replicas 1
hdfs.dataNode.pdbMinAvailable PDB for HDFS DataNode 1
hdfs.dataNode.resources resources for the HDFS DataNode requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m
hdfs.webhdfs.enabled Enable WebHDFS REST API true
yarn.resourceManager.pdbMinAvailable PDB for the YARN ResourceManager 1
yarn.resourceManager.resources resources for the YARN ResourceManager requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m
yarn.nodeManager.pdbMinAvailable PDB for the YARN NodeManager 1
yarn.nodeManager.replicas Number of YARN NodeManager replicas 1
yarn.nodeManager.parallelCreate Create all nodeManager statefulset pods in parallel (K8S 1.7+) false
yarn.nodeManager.resources Resource limits and requests for YARN NodeManager pods requests:memory=2048Mi,cpu=1000m,limits:memory=2048Mi,cpu=1000m
persistence.nameNode.enabled Enable/disable persistent volume false
persistence.nameNode.storageClass Name of the StorageClass to use per your volume provider -
persistence.nameNode.accessMode Access mode for the volume ReadWriteOnce
persistence.nameNode.size Size of the volume 50Gi
persistence.dataNode.enabled Enable/disable persistent volume false
persistence.dataNode.storageClass Name of the StorageClass to use per your volume provider -
persistence.dataNode.accessMode Access mode for the volume ReadWriteOnce
persistence.dataNode.size Size of the volume 200Gi

Customized Hadoop Base Docker Image

This image is modified from comcast/kube-yarn and mgit-at/helm-hadoop-3. Currently, native libraries are not been included.

Build and Push the Docker Image

# Set version
HADOOP_VERSION=3.3.2

# Build
docker buildx build --push --platform "linux/arm64,linux/amd64" -t farberg/apache-hadoop:latest -t farberg/apache-hadoop:$HADOOP_VERSION .

Testing with minikube

If you are running locally with minikube and want to try your images without pushing them to a registry, build the images on the minikube VM first:

eval $(minikube docker-env)
# use the build command from above

Development

Help is always appreciated. Please create pull requests.

Open Issues

Upload a new version of the chart

helm lint
helm package .
mv hadoop*.tgz docs/
helm repo index docs/ --url https://pfisterer.github.io/apache-hadoop-helm/
git add docs/
git commit -a -m "Updated helm repository"
git push origin master

Changes

Version 1.2.0

  • Initial release of this chart
  • Use multi-architecture base image
  • Apache Hadoop 3.3.2

apache-hadoop-helm's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.