AWS Analytics Automation Toolkit

As analytics solutions have moved away from the one-size-fits-all model to choosing the right tool for the right function, architectures have become more optimized and performant while simultaneously becoming more complex. Solutions leveraging Amazon Redshift will often be used alongside services including AWS DMS, AWS AppSync, AWS Glue, AWS SCT, Amazon Sagemaker, Amazon QuickSight, and more. One of the core challenges of building these solutions can oftentimes be the integration of these services.

This solution takes advantage of the repeated integrations between different services in common use cases, and leverages the AWS CDK to automate the provisioning of AWS analytics services, primarily Amazon Redshift. Deployment is now customizing a JSON configuration file indicating the resources to be used, and this solution takes those inputs to auto-provision the required infrastructure dynamically.

PLEASE NOTE: This solution is meant for proof of concept or demo use cases, and not for production workloads.

Overview of Deployment
Prerequisites
Deployment Steps
Clean up
Troubleshooting
Feedback

Overview of Deployment

This project leverages CloudShell, a browser-based shell service, to programmatically initiate the deployment through the AWS console. To achieve this, a JSON-formatted config file specifying the desired service configurations needs to be uploaded to CloudShell. Then, a series of steps need to be run to clone this repository and initiate the CDK scripts.

The following sections give further details of how to complete these steps.

Prerequisites

Prior to deployment, some resources need to be preconfigured:

Please verify that you will be deploying this solution in a region that supports CloudShell
Execute the deployment with an IAM user with permissions to use:
- AWSCloudShellFullAccess
- IAMFullAccess
- AWSCloudFormationFullAccess
- AmazonSSMFullAccess
- AmazonRedshiftFullAccess
- AmazonS3FullAccess
- SecretsManagerReadWrite
- AmazonEC2FullAccess
- Create custom DMS policy called AmazonDMSRoleCustom -- select Create policy with the following permissions:
```
 {
 	"Version": "2012-10-17",
 	"Statement": [
 		{
 		"Effect": "Allow",
 		"Action": "dms:*",
 		"Resource": "*"
 		}
 	]
 }
```
[OPTIONAL] If using SCT, create a key pair that can be accessed (see the documentation on how to create a new one)
[OPTIONAL] If using an external database, open source firewalls/ security groups to allow for traffic from AWS

If these are complete, continue to deployment steps. If you come across errors, please refer to the troubleshooting section -- if the error isn't addressed there, please submit the feedback using the Issues tab of this repo.

Deployment Steps

In order to launch the staging and target infrastructures, download the user-config-template.json file in this repo.

Configure the config file

The structure of the config file has two parts: (1) a list of key-value pairs, which create a mapping between a specific service and whether it should be launched in the target infrastructure, and (2) configurations for the service that are launched in the target infrastructure. Open the user-config-template.json file and replace the values for the Service Keys in the first section with the appropriate Launch Value defined in the table below. If you're looking to create a resource, define the corresponding Configuration fields in the second section.

Service Key	Launch Values	Configuration	Description
`vpc_id`	`CREATE`, existing VPC ID	In case of `CREATE`, configure `vpc`: `on_prem_cidr`: CIDR block used to connect to VPC (for security groups) `vpc_cidr`: The CIDR block used for the VPC private IPs and size `number_of_az`: Number of Availability Zones the VPC should cover `cidr_mask`: The size of the public and private subnet to be launched in the VPC.	[REQUIRED] The VPC to launch the target resources in -- can either be an existing VPC or created from scratch.
`redshift_endpoint`	`CREATE`, `N/A`, existing Redshift endpoint	In case of `CREATE`, configure `redshift`: `cluster_identifier`: Name to be used in the cluster ID `database_name`: Name of the database `node_type`: `ds2.xlarge`, `ds2.8xlarge`, `dc1.large`, `dc1.8xlarge`, `dc2.large`, `dc2.8xlarge`, `ra3.xlplus`, `ra3.4xlplus`, or `ra3.16xlarge` `number_of_nodes`: Number of compute nodes `master_user_name`: Username to be used for Redshift database `subnet_type`: Subnet type the cluster should be launched in -- `PUBLIC` or `PRIVATE` (note: need at least 2 subnets in separate AZs) `encryption`: Whether the cluster should be encrypted -- `y`/`Y` or `n`/`N`	Launching a Redshift cluster.
`dms_on_prem_to_redshift_target`	`CREATE`, `N/A`	Can only CREATE if are also creating Redshift cluster. In case of `CREATE`, 1. Configure `dms_migration`: `migration_type`: `full-load`, `cdc`, or `full-load-and-cdc` `subnet_type`: Subnet type the cluster should be launched in -- `PUBLIC` or `PRIVATE` (note: need at least 2 subnets in separate AZs) 2. Configure `external_database`: `source_db`: Name of source database to migrate `source_engine`: Engine type of the source `source_schema`: Name of source schema to migrate `source_host`: DNS endpoint of the source `source_user`: Username of the database to migrate `source_port`: [INT] Port to connect to connect on	Creates a migration instance, task, and endpoints between a source and Redshift configured above.
`sct_on_prem_to_redshift_target`	`CREATE`, `N/A`	Can only CREATE if are also creating Redshift cluster. In case of `CREATE`, 1. Configure `sct_on_prem_to_redshift`: `key_name`: EC2 key pair name to be used for EC2 running SCT 2. Configure `external_database`: `source_db`: Name of source database to migrate `source_engine`: Engine type of the source `source_schema`: Name of source schema to migrate `source_host`: DNS endpoint of the source `source_user`: Username of the database to migrate `source_port`: [INT] Port to connect to connect on	Launches an EC2 instance and installs SCT to be used for schema conversion.

You can see an example of a completed config file under user-config-sample.json.

Once all appropriate Launch Values and Configurations have been defined, save the file as the name user-config.json.

Launch the infrastructure

Open CloudShell
Clone the Git repository

git clone https://github.com/aws-samples/amazon-redshift-infrastructure-automation.git
Run the deployment script

~/amazon-redshift-infrastructure-automation/scripts/deploy.sh
When prompted

upload the completed user-config.json file

When the upload is complete,

press the Enter key

When prompted

input a unique stack name to be used to identify this deployment, then press the Enter key

Depending on your resource configuration, you may receive some input prompts:

Prompt	Input	Description
	Password of external database	If are using an external database, will create a Secrets Manager secret with the password value
	Password of existing Redshift cluster	If are giving a Redshift endpoint in the user_config.json file, will create a Secrets Manager secret with the password for the cluster database

Post deployment

Once the script has been run, you can monitor the deployment of CloudFormation stacks through the CloudShell terminal, or with the CloudFormation console.

Clean up

Open the CloudFormation console, and select Stacks in the left panel:
1. Filter by the stack name used for the deployment
2. Select the stacks to be deleted, and select Delete at the top
To remove secrets produced by the deployment, you can either
- Open the Secrets Manager console, and select Secrets in the left panel
  1. Filter by the stack name used for the deployment
  2. Select each secret, and under Actions, select Delete secret
- Replace [STACK NAME] in the below prompts below with the stack name used for the deployment and run them in CloudShell:
aws secretsmanager delete-secret --secret-id [STACK NAME]-SourceDBPassword --force-delete-without-recovery

aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftPassword --force-delete-without-recovery

aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftClusterSecretAA --force-delete-without-recovery

Troubleshooting

Error: User: [IAM-USER-ARN] is not authorized to perform: [ACTION] on resource: [RESOURCE-ARN]

User running CloudShell doesn't have the appropriate permissions required - can use a separate IAM user with appropriate permissions:

NOTE: User running the deployment (logged into the console) still needs AWSCloudShellFullAccess permissions
1. Open the IAM console
2. Under Users, select Add users
3. Create a new user
1. Select Next: Permissions
2. Add the following policies:
  - IAMFullAccess
  - AWSCloudFormationFullAccess
  - AmazonSSMFullAccess
  - AmazonRedshiftFullAccess
  - AmazonS3FullAccess
  - SecretsManagerReadWrite
  - AmazonEC2FullAccess
  - Create custom DMS policy called AmazonDMSRoleCustom -- select Create policy with the following permissions:
```
 {
 	"Version": "2012-10-17",
 	"Statement": [
 		{
 		"Effect": "Allow",
 		"Action": "dms:*",
 		"Resource": "*"
 		}
 	]
 }
```
1. Get and download the CSV containing the Access Key and Secret Access Key for this user -- these will be used with Cloudshell:
1. When first open CloudShell, run
  
  'aws configure'
2. Enter the Access Key and Secret Access Key downloaded for the IAM user created in the Prerequisites
Error: An error occurred (InvalidRequestException) when calling the CreateSecret operation: You can't create this secret because a secret with this name is already scheduled for deletion.

This occurs when you use a repeated stack name for the deployment, which results in a repeat of a secret name in Secrets Manager. Either use a new stack name when prompted for it, or delete the secrets by replacing [STACK NAME] with the stack name used for the deployment in the following commands and running them in CloudShell:

aws secretsmanager delete-secret --secret-id [STACK NAME]-SourceDBPassword --force-delete-without-recovery

aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftPassword --force-delete-without-recovery

aws secretsmanager delete-secret --secret-id [STACK NAME]-RedshiftClusterSecretAA --force-delete-without-recovery

Then rerun:

~/amazon-redshift-infrastructure-automation/scripts/deploy.sh

Feedback

Our aim is to make this tool as dynamic and comprehensive as possible, so we’d love to hear your feedback. Let us know your experience deploying the solution, and share any other use cases that the automation solution doesn’t yet support. Please use the Issues tab under this repo, and we’ll use that to guide our roadmap.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

jbadru73 / amazon-redshift-infrastructure-automation Goto Github PK

amazon-redshift-infrastructure-automation's Introduction

AWS Analytics Automation Toolkit

Table of Contents

Overview of Deployment

Prerequisites

Deployment Steps

Configure the config file

Launch the infrastructure

Post deployment

Clean up

Troubleshooting

Feedback

Security

License

amazon-redshift-infrastructure-automation's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent