Giter Site home page Giter Site logo

cdk-qldb-streaming's Introduction

CDK for QLDB Streaming

This is a project of TypeScript development with CDK for setting up QLDB steaming to keep source and destination QLDB ledges in sync in terms of data contents and history, to mitigate the current QLDB limit of not supporting the typical DB backup & restore.

Architecture Layout

At the high level, this CDK App has two stacks - QldbBlogDbStack & QldbBlogStreamStack.

Architecture

QLDB Streaming

QldbBlogDbStack creates two QLDB ledgers, source and destination ledgers, i.e. QldbBlog & QldbBlogStreaming.

QldbBlogStreamStack creates the following components:

  1. A Kinesis Data Stream instance with KMS CMK encryption. Please in order to guarantee sequence, we have to set its Shard number to 1.
  2. A QLDB Stream instance which feeds all journal changes of QldbBlog ledger to the Kinesis Data Stream above. The QLDB Stream's inclusiveStartTime is set to "2020-01-01" in order to make sure all data will be replicated to destination QLDB ledger from the very beginning.
  3. A Kinesis Firehose connected to the Kinesis Data Stream which store all QLDB journal change records to a S3 bucket.
  4. A Lambda which has the Kinesis Data Stream as trigger. This Lambda extracts PartiQL statements from the events received from the KDS and execute those PartiQL statements against destination QDLB ledger in real time, so keep source and destination ledgers in sync in near real-time in terms of data contents.

Deployment Steps

At the high level, to deploy the CDK app, we need to conduct the following steps:

  1. First compile the codes of the Lambda function required by the CDK Custom Resource which is used to create tables in source QLDB ledger.
  2. Then compile the codes of the Lambda function which extracts PartiQL statements from KDS events and execute them against destination ledger.
  3. Then compile the CDK codes.
  4. Then deploy the CDK stack of QldbBlogDbStack.
  5. Lastly deploy the CDK stack of QldbBlogStreamStack.

Please note, you need to conduct cdk bootstrap as per https://docs.aws.amazon.com/cdk/latest/guide/bootstrapping.html first if not done yet.

The detailed commands of the steps are listed below:

  • $ cd lib/lambda/createQldbTables
  • $ npm install
  • $ npm run publish
  • $ cd ../replayQldbPartiQL
  • $ npm install
  • $ npm run publish
  • $ cd ../../..
  • $ npm install
  • $ npm run build
  • $ cdk deploy QldbBlogDbStack
  • $ cdk deploy QldbBlogStreamStack

Expected Results

After the CDK deployment by the steps described above, you will find that the CDK customer resource has created four tables automatically in the source ledger QldbBlog. And the destination ledger QldbBlogStreaming also has the same four tables which are replicated by the PartiQL replay lambda created during CDK deployment.

Beyond this point, if you follow the steps described in "Manual Option" section of QLDB Get Started - https://docs.aws.amazon.com/qldb/latest/developerguide/getting-started-step-2.html to create indexes & populate data into those tables of source ledger, you will find the same has been exactly replicated to destination ledger by the PartiQL statement replay Lambda automatically.

Please note there is a limitation of this solution to keep data in sync between source & destination ledgers - when correlating data rows between different tables, e.g. foreign keys, we have to use the fields visible in the QLDB User View and cannot use the document id in QLDB Committed View which is generated by QLDB automatically. The reason is the document ID value in the PartiQL statement for source ledger does not point to a valid data row in destination ledger.

In addition, the journal change events received from QLDB stream are also stored in S3 bucket by Kinesis Firehose instance.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.