Giter Site home page Giter Site logo

amazon-redshift-infrastructure-automation's Introduction

AWS Analytics Automation Toolkit

As analytics solutions have moved away from the one-size-fits-all model to choosing the right tool for the right function, architectures have become more optimized and performant while simultaneously becoming more complex. Solutions leveraging Amazon Redshift will often be used alongside services including AWS DMS, AWS AppSync, AWS Glue, AWS SCT, Amazon Sagemaker, Amazon QuickSight, and more. One of the core challenges of building these solutions can oftentimes be the integration of these services.

This solution takes advantage of the repeated integrations between different services in common use cases, and leverages the AWS CDK to automate the provisioning of AWS analytics services, primarily Amazon Redshift. Deployment is now customizing a JSON configuration file indicating the resources to be used, and this solution takes those inputs to auto-provision the required infrastructure dynamically.

PLEASE NOTE: This solution is meant for proof of concept or demo use cases, and not for production workloads.

Table of Contents

  1. Overview of Deployment
  2. Prerequisites
  3. Deployment Steps
    1. Configure the config file
    2. Launch the infrastructure
    3. Post deployment
  4. Clean up
  5. Troubleshooting
  6. Feedback

Overview of Deployment

This project leverages CloudShell, a browser-based shell service, to programmatically initiate the deployment through the AWS console. To achieve this, a JSON-formatted config file specifying the desired service configurations needs to be uploaded to CloudShell. Then, a series of steps need to be run to clone this repository and initiate the CDK scripts.

The following sections give further details of how to complete these steps.

Prerequisites

Prior to deployment, some resources need to be preconfigured:

  • Please verify that you will be deploying this solution in a region that supports CloudShell

  • Execute the deployment with an IAM user with permissions to use:

    • AWSCloudShellFullAccess
    • IAMFullAccess
    • AWSCloudFormationFullAccess
    • AmazonSSMFullAccess
    • AmazonRedshiftFullAccess
    • AmazonS3FullAccess
    • SecretsManagerReadWrite
    • AmazonEC2FullAccess
    • Create custom DMS policy called AmazonDMSRoleCustom -- select Create policy with the following permissions:
       {
       	"Version": "2012-10-17",
       	"Statement": [
       		{
       		"Effect": "Allow",
       		"Action": "dms:*",
       		"Resource": "*"
       		}
       	]
       }
      
  • [OPTIONAL] If using SCT, create a key pair that can be accessed (see the documentation on how to create a new one)

  • [OPTIONAL] If using an external database, open source firewalls/ security groups to allow for traffic from AWS

If these are complete, continue to deployment steps. If you come across errors, please refer to the troubleshooting section -- if the error isn't addressed there, please submit the feedback using the Issues tab of this repo.

Deployment Steps

In order to launch the staging and target infrastructures, download the user-config-template.json file in this repo.

Configure the config file

The structure of the config file has two parts: (1) a list of key-value pairs, which create a mapping between a specific service and whether it should be launched in the target infrastructure, and (2) configurations for the service that are launched in the target infrastructure. Open the user-config-template.json file and replace the values for the Service Keys in the first section with the appropriate Launch Value defined in the table below. If you're looking to create a resource, define the corresponding Configuration fields in the second section.

Service Key Launch Values Configuration Description
vpc_id CREATE, existing VPC ID In case of CREATE, configure vpc:
on_prem_cidr: CIDR block used to connect to VPC (for security groups)
vpc_cidr: The CIDR block used for the VPC private IPs and size
number_of_az: Number of Availability Zones the VPC should cover
cidr_mask: The size of the public and private subnet to be launched in the VPC.
[REQUIRED] The VPC to launch the target resources in -- can either be an existing VPC or created from scratch.
redshift_endpoint CREATE, N/A, existing Redshift endpoint In case of CREATE, configure redshift:
cluster_identifier: Name to be used in the cluster ID
database_name: Name of the database
node_type: ds2.xlarge, ds2.8xlarge, dc1.large, dc1.8xlarge, dc2.large, dc2.8xlarge, ra3.xlplus, ra3.4xlplus, or ra3.16xlarge
number_of_nodes: Number of compute nodes
master_user_name: Username to be used for Redshift database
subnet_type: Subnet type the cluster should be launched in -- PUBLIC or PRIVATE (note: need at least 2 subnets in separate AZs)
encryption: Whether the cluster should be encrypted -- y/Y or n/N
Launching a Redshift cluster.
dms_on_prem_to_redshift_target CREATE, N/A Can only CREATE if are also creating Redshift cluster.
In case of CREATE,
1. Configure dms_migration:
migration_type: full-load, cdc, or full-load-and-cdc
subnet_type: Subnet type the cluster should be launched in -- PUBLIC or PRIVATE (note: need at least 2 subnets in separate AZs)
2. Configure external_database:
source_db: Name of source database to migrate
source_engine: Engine type of the source
source_schema: Name of source schema to migrate
source_host: DNS endpoint of the source
source_user: Username of the database to migrate
source_port: [INT] Port to connect to connect on
Creates a migration instance, task, and endpoints between a source and Redshift configured above.
sct_on_prem_to_redshift_target CREATE, N/A Can only CREATE if are also creating Redshift cluster.
In case of CREATE,
1. Configure sct_on_prem_to_redshift:
key_name: EC2 key pair name to be used for EC2 running SCT
2. Configure external_database:
source_db: Name of source database to migrate
source_engine: Engine type of the source
source_schema: Name of source schema to migrate
source_host: DNS endpoint of the source
source_user: Username of the database to migrate
source_port: [INT] Port to connect to connect on
Launches an EC2 instance and installs SCT to be used for schema conversion.

You can see an example of a completed config file under user-config-sample.json.

Once all appropriate Launch Values and Configurations have been defined, save the file as the name user-config.json.

Launch the infrastructure

  1. Open CloudShell

  2. Clone the Git repository

    git clone https://github.com/aws-samples/amazon-redshift-infrastructure-automation.git

  3. Run the deployment script

    ~/amazon-redshift-infrastructure-automation/scripts/deploy.sh

  4. When prompted

Upload Config

upload the completed user-config.json file

Upload Location

  1. When the upload is complete,

Upload Confirmation

press the Enter key

  1. When prompted

Input Stack

input a unique stack name to be used to identify this deployment, then press the Enter key

  1. Depending on your resource configuration, you may receive some input prompts:
Prompt Input Description
Input Database Password Password of external database If are using an external database, will create a Secrets Manager secret with the password value
Input Redshift Password Password of existing Redshift cluster If are giving a Redshift endpoint in the user_config.json file, will create a Secrets Manager secret with the password for the cluster database

Post deployment

Once the script has been run, you can monitor the deployment of CloudFormation stacks through the CloudShell terminal, or with the CloudFormation console.

Clean up

  1. Open the CloudFormation console, and select Stacks in the left panel:

    1. Filter by the stack name used for the deployment
    2. Select the stacks to be deleted, and select Delete at the top
  2. To remove secrets produced by the deployment, you can either

    • Open the Secrets Manager console, and select Secrets in the left panel
      1. Filter by the stack name used for the deployment
      2. Select each secret, and under Actions, select Delete secret
    • Replace [STACK NAME] in the below prompts below with the stack name used for the deployment and run them in CloudShell:

    aws secretsmanager delete-secret --secret-id [STACK NAME]-SourceDBPassword --force-delete-without-recovery

    aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftPassword --force-delete-without-recovery

    aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftClusterSecretAA --force-delete-without-recovery

Troubleshooting

  • Error: User: [IAM-USER-ARN] is not authorized to perform: [ACTION] on resource: [RESOURCE-ARN]

    User running CloudShell doesn't have the appropriate permissions required - can use a separate IAM user with appropriate permissions:

    NOTE: User running the deployment (logged into the console) still needs AWSCloudShellFullAccess permissions

    1. Open the IAM console
    2. Under Users, select Add users
    3. Create a new user

    New user

    1. Select Next: Permissions
    2. Add the following policies:
      • IAMFullAccess
      • AWSCloudFormationFullAccess
      • AmazonSSMFullAccess
      • AmazonRedshiftFullAccess
      • AmazonS3FullAccess
      • SecretsManagerReadWrite
      • AmazonEC2FullAccess
      • Create custom DMS policy called AmazonDMSRoleCustom -- select Create policy with the following permissions:
         {
         	"Version": "2012-10-17",
         	"Statement": [
         		{
         		"Effect": "Allow",
         		"Action": "dms:*",
         		"Resource": "*"
         		}
         	]
         }
        

    Policies

    1. Get and download the CSV containing the Access Key and Secret Access Key for this user -- these will be used with Cloudshell:

    Access Keys

    1. When first open CloudShell, run

      'aws configure'

    2. Enter the Access Key and Secret Access Key downloaded for the IAM user created in the Prerequisites

      Input Config

  • Error: An error occurred (InvalidRequestException) when calling the CreateSecret operation: You can't create this secret because a secret with this name is already scheduled for deletion.

    This occurs when you use a repeated stack name for the deployment, which results in a repeat of a secret name in Secrets Manager. Either use a new stack name when prompted for it, or delete the secrets by replacing [STACK NAME] with the stack name used for the deployment in the following commands and running them in CloudShell:

    aws secretsmanager delete-secret --secret-id [STACK NAME]-SourceDBPassword --force-delete-without-recovery

    aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftPassword --force-delete-without-recovery

    aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftClusterSecretAA --force-delete-without-recovery

    Then rerun:

    ~/amazon-redshift-infrastructure-automation/scripts/deploy.sh

Feedback

Our aim is to make this tool as dynamic and comprehensive as possible, so we’d love to hear your feedback. Let us know your experience deploying the solution, and share any other use cases that the automation solution doesn’t yet support. Please use the Issues tab under this repo, and we’ll use that to guide our roadmap.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

amazon-redshift-infrastructure-automation's People

Contributors

amazon-auto avatar julbeck avatar kaklisamir avatar manashdeb avatar pgvillena avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.