Giter Site home page Giter Site logo

hwc-cluster-creator-tool's Introduction

HDInsight Cluster Creation Tool for Hive Warehouse Connector (HWC)

N|Solid

This tool helps to spin-up HWC enabled Azure HDInsight clusters on the given customer subscription, storage account and/or custom VNet, with minimal manual steps.

Features

This tool provides the following features:

  • Creates both HDI Spark and HDI LLAP Clusters under the same VNet and allows required Health and Management Inbound Rules.(Does not allow ssh access inside the inbound rule by Default).
  • Creates only Spark Cluster for a given VNet and Storage account if the LLAP cluster is already created and vice versa.
  • Supports WASB and ADLS_GEN2 storage types. However, user should input an existing storage account before running this tool i.e this tool does not create a new storage account if it does not exist.
  • Supports creation of secure HDInsight Clusters i.e clusters with Enterprise Security Pack.
  • Custom VNet is configured with minimal Inbound Rules, however if the VNet is already present it can be reused.
  • This tool requires Azure Active Directory (AAD) credentials for creating the HDI clusters.

Getting Started

Prerequisites

  • An active Azure Account for creating HDI Clusters.
  • Azure CLI should be installed.
    • Run the below command for creating a new Service Principal or use the one which is already created.
    az account set -s YOUR_SUBSCRIPTION
    az ad sp create-for-rbac --name YOUR_SERVICE_PRINCIPAL_NAME --sdk-auth
    • Store the json info returned from the above command into to a file so that the clientID, tenenatID and clientSecret info can be used while configuring the tool.

Configuring the Tool

User can set configs in a YAML file (as shown below) and pass it to the tool. The below list covers all the configurations supported by this tool. Additionally, the conf folder in this repo has templates for standard and secure cluster config files

type: SPARK_AND_LLAP # Values can be either SPARK_AND_LLAP , SPARK_ONLY or LLAP_ONLY Default is SPARK_AND_LLAP. For example, if the user has an existing LLAP cluster, they can use this Tool to create Spark Cluster by specifying the type as SPARK_ONLY and configure the network with the existing VNet of the LLAP Cluster by setting create field to false.
clusterNamePrefix: Foo # This prefix is used while creating the cluster name. Only first three chars are used as prefix from this string
resourceGroup: <RESOURCE_GROUP> # Resource group where the cluster needs to be created
region: <REGION> # Region name where the cluster needs to be created, should be in small case without space. Eg: eastus2
headNodeVMSize: STANDARD_D13_V2 # Any Standard VM Size supported for Head Nodes in HDInsight, https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-supported-node-configuration
workerNodeVMSize: STANDARD_D13_V2 # Any Standard VM Size supported for Worker Nodes in HDInsight, https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-supported-node-configuration
workerNodeSize: 3 # Size of worker nodes
subscription: <YOUR_SUBSCRIPTION> # Subscription ID

activeDirectory:
  azureEnv: AZURE # Azure Env either AZURE_CHINA/AZURE_GERMANY/AZURE_US_GOVERNMENT , default AZURE
  clientId: <YOUR_CLIENT_ID> # Client ID of the service principal
  tenantId: <YOUR_TENANT_ID> # Tenant ID of the service principal
  clientSecret: <YOUR_CLIENT_SECRET> # Client Secret for the service principal

clusterCredentials:
  clusterLoginUsername: <YOU_USER_NAME> # Ambari username
  clusterLoginPassword: <YOUR_PASSWORD> # Ambari password
  sshCredentials:
    type: keys
    publicKeypaths: [<SSH_KEYS1>, <SSH_KEY2>] # Public SSH Key paths
    sshUsername: <SSH_USER> # SSH username

storage:
  type: WASB # Default is WASB, we can use ADLS_GEN2 as well
  endpoint: <YOUR_STORAGE_ACCOUNT>.blob.core.windows.net # For WASB <YOUR_STORAGE_ACCOUNT>.blob.core.windows.net and for ADLS_GEN2 <YOUR_STORAGE_ACCOUNT>.dfs.core.windows.net
  key: <YOUR_STORAGE_KEY> #[Required for WASB] Storage key for WASB
  resourceGroup: <RESOURCE_GROUP> #[Required for ADLS_GEN2] Resource group where ADLS_GEN2 exist
  managedIdentityName: <IDENTITY_NAME> #[Required for ADLS_GEN2] Managed Identity Name for ADLS_GEN2
  mangedIdentityResourceGroup: <IDENTITY_RESOURCE_GROUP> #[Required for ADLS_GEN2] Resource Group name where the Managed Identity exist for ADLS_GEN2

network:
  vnetName: <YOUR_VNET> # VNet Name to be used
  resourceGroup: <VNET_RESOURCE_GROUP> # Resource group in which the VNet exists
  subnetName: <SUBNET_NAME> # Subnet name to be used within a VNet
  create: false # If true, creates new one (resourceGroup here is not required), else configures the existing VNet and Subnet from the resourceGroup mentioned

security:  #[Optional] This has to be configured only for Secure(ESP Enabled) clusters, for standard clusters this is not required
  ldapUrl: <YOUR_LDAP_URL> # LDAP URL of the AAD-DS
  domainUserName: <YOUR_DOMAIN_USERNAME> # eg : [email protected]
  aaddsDnsDomainName: <YOUR_AADDS_DNS_DOMAIN_NAME> # eg: securehadoop.onmicrosoft.com
  clusterAccessGroup: <YOUR_ACCESS_GROUP> # eg: clusterusers
  aaddsResourceId: <YOUR_AADDS_RESOURCE_ID> # eg: /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.AAD/domainServices/<YOUR_AADDS_DNS_DOMAIN_NAME>
  msiResourceId: <YOUR_MANAGED_IDENTITY> # /subscriptions/<YOUR_SUBSCRIPTION>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<YOUR_IDENTITY>

ClusterCredentials can be set to use username-password instead of ssh-keys for ssh to cluster like:

clusterCredentials:
  clusterLoginUsername: <YOU_USER_NAME> # Ambari username
  clusterLoginPassword: <YOUR_PASSWORD> # Ambari password
  sshCredentials:
    type: password
    sshUsername: <SSH_USER> # SSH username
    sshPassword: <SSH_PASSWORD> # SSH password

Running the tool

  • Clone the repository

    git clone https://github.com/Azure-Samples/HWC-Cluster-Creator-Tool.git
  • Build the repository

    mvn clean install
  • Launch the tool for creating HWC Cluster

    java -cp target/HWC-ClusterCreator-1.0-SNAPSHOT-jar-with-dependencies.jar com.microsoft.hdinsight.HWCClusterCreator YAML_CONFIG_PATH

Verify the HWC Cluster Setup

Additional Resources

hwc-cluster-creator-tool's People

Contributors

adesh-rao avatar dependabot[bot] avatar microsoft-github-operations[bot] avatar microsoftopensource avatar sushil-k-s avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hwc-cluster-creator-tool's Issues

[Action Needed] This repo is inactive

This GitHub repository has been identified as a candidate for archival

This repository has had no activity in more than [x amount of time]. Long periods of inactivity present security and code hygiene risks. Archiving will not prevent users from viewing or forking the code. A banner will appear on the repository alerting users that the repository is archived.

Please see https://aka.ms/sunsetting-faq to learn more about this process.

Action

✍️

❗**If this repository is still actively maintained, please simply close this issue. Closing an issue on a repository is considered activity and the repository will not be archived.🔒

If you take no action, this repository is still inactive 30 days from today it will be automatically archived..

Need more help? 🖐️

Support different VM Types and Worker Nodes Size while creating LLAP and Spark Clusters

Currently the tool re uses the VM type configured for Head Nodes and Worker Nodes with default worker nodes size for creating both Spark and LLAP.

Current Config

headNodeVMSize: STANDARD_D13_V2
workerNodeVMSize: STANDARD_D14_V2
workerNodeSize: 3

Creating this issue to support different worker nodes size, VM Types for LLAP and Spark Clusters.

New Config

Spark
   headNodeVMSize: STANDARD_D13_V2
   workerNodeVMSize: STANDARD_D14_V2
   workerNodeSize: 2
LLAP
   headNodeVMSize: STANDARD_D14_V2
   workerNodeVMSize: STANDARD_D15_V2
   workerNodeSize: 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.