Giter Site home page Giter Site logo

ankithooda / aws-kinesis-beanstalk-workers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from awslabs/aws-kinesis-beanstalk-workers

0.0 2.0 0.0 119 KB

Elastic Beanstalk Managed Consumer for Amazon Kinesis

License: Apache License 2.0

HTML 0.93% CSS 9.51% Java 89.56%

aws-kinesis-beanstalk-workers's Introduction

Amazon Kinesis Elastic Beanstalk Managed Consumer

Amazon Kinesis provides a scalable and highly available platform for ingesting data from thousands of clients. Once data is available on a Kinesis stream, you can build applications to process the data using the Amazon Kinesis Client Library (KCL). KCL provides a framework for managing many of the complexities that accompany designing stream-processing applications. For example, the KCL will automatically distribute workers to process each shard in a Kinesis stream. It will manage this in a single JVM or across a fleet of instances. Using the KCL, you can build elastic, fault-tolerant, scalable stream-processing applications. Once you’ve built such an application, you’ll want a simple way to deploy it.

AWS Elastic Beanstalk (Tomcat) is an easy-to-use service for deploying and scaling java based web applications and services. Simply upload your application archive, and AWS Elastic Beanstalk automatically deploys it across multiple Availability Zones with configuration of AutoScaling. It also provides load balancing and monitors application health. These features make AWS Elastic Beanstalk a great platform for running Amazon Kinesis Applications.

This module provides a java codebase which will significantly decrease the complexity of building an application based on the KCL. With KCL, you simply implement an interface, and then start a Worker. KCL ensures that your code is run on a dedicated Thread that is consuming data from a single Shard. Furthermore, by managing leases in Amazon DynamoDB, it allows you to run a large number of Workers on one or more machines, while ensuring that only 1 Thread in the fleet is processing Shard data at a given time.

Building KCL Applications typically requires that you build an IRecordProcessor instance, and an IRecordProcessorFactory instance which creates new record processors. You then have to build a consumer module that provides credentials so it can connect to Kinesis, configures the Worker, and then starts processing.

With the Amazon Kinesis Elastic Beanstalk Managed Consumer, you only have 1 job to do as a programmer. You must effectively implement the IRecordProcessor by extending a ManagedClientProcessor class. This class has all the smarts about how to commit changes to the Application's progress, how to startup and shutdown, and so on. This allows you to focus on wiring in the business logic for your application, and not worry how it gets started up and maintained over time. Furthermore, rather than you having to figure out how to run the Worker instance on multiple machines, this module provides integration with Elastic Beanstalk. By including this module as a dependency, and the building with Maven, this module will generate a deployable WAR for Apache Tomcat that can be dropped into Elastic Beanstalk. AWS will then handle starting your Kinesis Workers and ensuring that they are automatically scaled based on processing demand.

This project also gives you the ability to easily work with Kinesis records that are encrypted with the AWS Key Management Services (KMS). By providing a KMS Key ARN and optionally an Encryption Context, the ManagedConsumer will automatically decrypt data before it is supplied to your record processor.

Getting Started

To get started with this project, just create a fork from Github. You'll see a com.amazonaws.services.kinesis package which contains all the managed code, which you are welcome to look at but are not required to change. You'll also see a default package which contains the MyRecordProcessor class. This can be renamed and moved as you like. If nothing else, you need to add your code to this bit of the processRecords() method:

Image

There's also a really important detail - when the KCL runs your application, in runs a separate ManagedClientProcessor instance in a Thread per Shard of the Stream. Because of this, we need a simple way to create new instances of MyRecordProcessor, and we've decided to use a copy() constructor. This is a method within MyRecordProcessor that knows how to create new instances of itself. The provided implementation just calls the MyClientProcessor constructor:

@Override
public ManagedClientProcessor copy() throws Exception {
	return new MyRecordProcessor();
}

However, if your MyRecordProcessor class has local state variables which must be configured on instance initialisation, then it is recommended that you provide these as Constructor arguments. For (hopefully) obvious reasons, using static variables is probably a really bad idea here, but if you know what you are doing then go ahead.

Running the application

Once you have wired in your business logic and want to start testing, simply install Apache Maven, the tool we've decided to use to build this module. Once done, you can run the command mvn war:war, and Maven will build a distribution artefact that you can deploy to Apache Tomcat. This project will automatically have exposed a listener-class in the provided web.xml which uses a ServletInitiator thread to start your app. Elastic Beanstalk will do all the clever stuff to keep your app running, and to restart hosts should they crash, etc.

aws-kinesis-beanstalk-workers's People

Contributors

ianmeyers avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.