Giter Site home page Giter Site logo

miztiik / database-migration Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 10.0 104 KB

In this course you will learn the value and process of moving from self-managed databases (on premises or in the cloud) into fully managed Amazon Web Services (AWS) database solutions.

Python 73.66% Makefile 3.21% Shell 23.13%

database-migration's Introduction

MIGRATING YOUR MONGODB DATABASES TO DOCUMENTDB

Mystique Unicorn App backend is hosted on mongodb. Recenly one of their devs discovered that AWS released Amazon DocumentDB (with MongoDB compatibility) a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads.

Can you help them migrate from mongodb to documentdb?

๐ŸŽฏ Solutions

We will follow an multi-stage process to accomplish our goal. We need the following components to get this right,

  1. Source Database - MongoDB
    • If in AWS: EC2 instance in a VPC, Security Group, SSH Keypair(Optional)
    • Some dummy data inside the database
  2. Destination Database - DocumentDB
    • Subnet Groups
    • VPC Security Groups
  3. Database Migration Service(DMS) - Replication Instance
    • DMS IAM Roles
    • Endpoints
    • Database Migration Tasks

Miztiik Automation: Database Migration

In this article, we will build an architecture, similar to the one shown above - A simple API using API Gateway which will trigger a Lambda function. We will have an stageVariable lambdaAlias and lets assume it is going to be an prod environment. The lambda will have multiple alias point at different stage of development. prod pointing to the most stable version and dev pointing to the bleeding edlge version.

In this Workshop you will practice how to migrate your MongoDB databases to Amazon DocumentDB using different strategies.

  1. ๐Ÿงฐ Prerequisites

    This demo, instructions, scripts and cloudformation template is designed to be run in us-east-1. With few modifications you can try it out in other regions as well(Not covered here).

    • ๐Ÿ›  AWS CLI Installed & Configured - Get help here
    • ๐Ÿ›  AWS CDK Installed & Configured - Get help here
    • ๐Ÿ›  Python Packages, Change the below commands to suit your OS, the following is written for amzn linux 2
      • Python3 - yum install -y python3
      • Python Pip - yum install -y python-pip
      • Virtualenv - pip3 install virtualenv

    As there are a number of components that need to be setup, we will use a combination of Cloudformation(generated from CDK), CLI & GUI.

  2. โš™๏ธ Setting up the environment

    • Get the application code

      git clone https://github.com/miztiik/dms-mongodb-to-documentdb
      cd dms-mongodb-to-documentdb
  3. ๐Ÿš€ Prepare the environment

    We will need cdk to be installed to make our deployments easier. Lets go ahead and install the necessary components.

    # If you DONT have cdk installed
    npm install -g aws-cdk
    
    # Make sure you in root directory
    python3 -m venv .env
    source .env/bin/activate
    pip3 install -r requirements.txt

    The very first time you deploy an AWS CDK app into an environment (account/region), youโ€™ll need to install a bootstrap stack, Otherwise just go ahead and deploy using cdk deploy.

    cdk bootstrap
    cdk ls
    # Follow on screen prompts

    You should see an output of the available stacks,

    vpc-stack
    dms-prerequisite-stack
    mongodb-on-ec2
  4. ๐Ÿš€ Deploying the Source Database

    Let us walk through each of the stacks,

    • Stack: vpc-stack This stack will do the following,

      1. Create an custom VPC miztiikVpc(We will use this VPC to host our source MongoDB, DocumentDB, DMS Replication Instance)

      Initiate the deployment with the following command,

      cdk deploy vpc-stack
    • Stack: dms-prerequisite-stack This stack will do the following,

      1. DocumentDB & DMS Security groups - (created during the prerequisite stack)
        • Port - 27017 Accessible only from within the VPC
      2. DMS IAM Roles - (This stack will FAIL, If these roles already exist in your account)
        • AmazonDMSVPCManagementRole
        • AmazonDMSCloudWatchLogsRole
      3. SSH KeyPair using a custom cfn resource
        • This resource is currently not used. The intial idea was to use the SSH Keypair to administer the source mongodb on EC2. SSM Session Manager does the same job admirably.

      Initiate the deployment with the following command,

      cdk deploy dms-prerequisite-stack

      After successful completion, take a look at all the resources and get yourself familiar with them. We will be using them in the future.

    • Stack: Source Database - MongoDB This stack will do the following,

      1. Create an EC2 instance inside our custom VPC(created during the prerequisite stack)
      2. Attach security group with mongo port(27017) open to the world (For any use-case other than sandbox testing, you might want to restrict it)
      3. Instance IAM Role is configured to allow SSM Session Manager connections(No more SSH key pairs)
      4. Instance is bootstrapped using user_data script to install Mongodb 4.x
      5. Create user mongodbadmin & password (We will need this later for inserts and DMS)
      6. Creates a table miztiik_db(_Later we will add a collection customers)

      Initiate the deployment with the following command,

      cdk deploy mongodb-on-ec2

      As our database is a fresh installation, it does not have any data in it. We need some data to migrate. This git repo also includes a insert_records_to_mongodb.py that will help us to generate some dummy data and insert them to the database. After successful launch of the stack,

      • Connect to the EC2 instance using SSM Session Manager - Get help here

      • Switch to privileged user using sudo su

      • Navigate to /var/log

      • Run the following commands

        git clone https://github.com/miztiik/dms-mongodb-to-documentdb
        cd dms-mongodb-to-documentdb/dms_mongodb_to_documentdb/stacks/back_end/bootstrap_scripts
        python3 insert_records_to_mongodb
      • You should be able to see some id printed out and a summary at the end, Expected Output,

        {"no_of_records_inserted":114}
        {"total_coll_count":343}

        If you want to interact with mongodb, you can try out the following commands,

        # Open Mongo shell
        mongo
        # List all Database
        show dbs
        # Use one of the datbases
        use miztiik_db
        db.stats()
        # List all collections
        show collections
        # List some documents in the customer collection
        db.customers.find()
        # List indexes
        db.customers.getIndexes()
        # Quit
        quit()

        Now we are all done with our source database.

  5. ๐Ÿš€ Deploying the Target Database

    We can automate the creation of DocumentDB & DMS using CDK, But since this will be the first time we use these services,let us use the Console/GUI to set them up. We can leverage the excellant documentation from AWS on how to setup our DocumentDB. You should only need to do Step 3 - Create an Amazon DocumentDB cluster(Use your own judgement, as docs tend to change over a period of time)

    Couple of things to note,

    • For VPC - Use our custom VPC miztiikVpc
    • For Security Group - Use docsdb_sg_dms-prerequisite-stack

    Download the public key for Amazon DocumentDB. We will need this to connect to DocumentDB Cluster from your machine and also from DMS Replication Instance

    wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
  6. ๐Ÿš€ Deploying the DMS Replication Instance

    We can leverage the excellant documentation from AWS on how to setup our DMS Replication Instance.

    Couple of things to note,

    • For VPC - Use our custom VPC miztiikVpc
    • For Security Group - Use dms_sg_dms-prerequisite-stack

    After creating the replication instance, We need to create few more resources to begin our replication. We will use defaults mostly

    • Endpoints for source MongoDB(custom values listed below)
      • Source choose mongodb
      • For server address se the private dns of the ec2 instance
      • Auth Mode should be password
      • Update user as mongodbadmin, the password Som3thingSh0uldBe1nVault
      • Authentication source as admin
      • Database name miztiik_db
      • For endpoing specific attributes, choose the DMS Replication instance we create in the previous step
    • Endpoint for destination databases - DocumentDB(custom values listed below)
      • Choose docsdb as target
      • For server name use the dnsname from docsdb, here is my example,
        • docsdb.cluster-konstone.us-weast-2.docdb.amazonaws.com
      • Ensure you choose SSL verification verify-full and upload CA certificate for the Amazon DocumentDB public key we downloaded earlier
      • Database name miztiik_db
      • For endpoing specific attributes, choose the DMS Replication instance we create in the previous step
    • Database Migration Task
      • Choose our replication instance, source & destination endpoints
      • For Migration Type, choose Migrate Existing Data
      • For Table Mappings, Add new selection rule, you can create a custom schema name and leave % for the table name and Action Include
      • Create Task
  7. ๐Ÿ”ฌ Testing the solution

    Navigate to DMS task, under Table Statistics You should be able observe that the dms has copied the data from mongodb to documentdb. You can connect to documentdb and test the records using the same commands that we used with mongodb earlier.

    Additional Learnings: You can check the logs in cloudwatch for more information or increase the logging level of the database migration task.

  8. ๐Ÿ“’ Conclusion

    Here we have demonstrated how to use Amazon Database Migration Service(DMS) to migrate data from MongoDB to DocumentDB.

  9. ๐ŸŽฏ Additional Exercises

    We have shown how to migrate existing data using DMS. It is possible to use DMS to replicate changes(Change Data Capture - CDC). For this, you need to setup your mongodb(on EC2/OnPrem) as a replica set and set the migration type of database migration task to Migrate existing data and replication ongoing changes

  1. ๐Ÿงน CleanUp

    If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well

    • Resources created during Deploying The Application
    • Delete CloudWatch Lambda LogGroups
    • Any other custom resources, you have created for this demo
    # Delete from cdk
    cdk destroy
    
    # Follow any on-screen prompts
    
    # Delete the CF Stack, If you used cloudformation to deploy the stack.
    aws cloudformation delete-stack \
        --stack-name "MiztiikAutomationStack" \
        --region "${AWS_REGION}"

    This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.

๐Ÿ“Œ Who is using this

This repository aims to teach api best practices to new developers, Solution Architects & Ops Engineers in AWS. Based on that knowledge these Udemy course #1, course #2 helps you build complete architecture in AWS.

๐Ÿ’ก Help/Suggestions or ๐Ÿ› Bugs

Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here

๐Ÿ‘‹ Buy me a coffee

ko-fi Buy me a coffee โ˜•.

๐Ÿ“š References

  1. Setup MongoDB Community Edition on EC2

  2. Create Database in MongoDB

  3. Create Index in Mongodb

  4. Setup MongoDB for public access

  5. Pymongo Insert

  6. Pymongo Insert

๐Ÿท๏ธ Metadata

Level: 300

miztiik-success-green

database-migration's People

Stargazers

 avatar

Watchers

James Cloos avatar Mystique avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.