Giter Site home page Giter Site logo

hazelcast-jet-kinesis's People

Contributors

ncherkas avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

googlielmo

hazelcast-jet-kinesis's Issues

Calling sleep() when there is no data

Even though this is done by the Jet internally, we still may need to sleep more when there is no data - in order to avoid the Kinesis rate limiting. Also, keep returning false until the Kinesis Stream is not closed.

Implement Fault Tolerance

Kinesis provides a record sequence number to make the checkpoints while you're processing the stream. The checkpoints itself are the record sequence numbers which you periodically save into separate storage, so that you can go back to them in case of any kind stream processing failure. The Jet supports this scenario out-of-box by providing a corresponding Processor API to save/restore any processing state like checkpoints etc.

We need to implement this for the Amazon Kinesis Jet Source. It's suggested to learn how it's done in the Kafka Connector.

Embed AWS SDK dependency

It would be good to embed AWS SDK and maybe other dependencies so that it's safer and easier to use our connector in big projects with possible version conflicts.

Retry errors inside the Kinesis Source and Sink

When implementing the Jet Connector we should keep in mind that all the retriable faults and errors should be handled inside the Source or Sink. In our case, there are some various cases when Kinesis API may return error which should be properly handled e.g. it can be just InternalFailure or something more specific like ThrottlingException. Here, we should analyze possible cases and handle properly. We can consider using https://github.com/jhalterman/failsafe

Choose a proper naming for ISet

We use the Hazelcast ISet to sync about which shards are already processed and closed. It's very important since two parent shards can be taken by the parallel processors and we can proceed with a parent shard later only after both of them are handled. This is why we coordinate using ISet. The current ISet naming doesn't seem correct because it's bound only to the stream name though it's possible that multiple jobs process a single stream in parallel and need its own context for the coordination. It's suggested to consider using a vertex name of the Source/Sink (should be unique) maybe adding Job id to it (in newer versions it can ve different after restart). Let's double check that.

Also, we need to check how this works if we store ISet data into a snapshot for the Fault-Tolerance. Maybe this will also solve the naming problem. Suggested to work on this after we implement the Fault-Tolerance.

Setup initial CI pipeline

It should automatically trigger on PR creation or merge. We should look around for some solution integrated with GH.

Alert on falling behind in the stream processing

When processing the Kinesis Stream, we may fall behind the retention period and at the end lose some data. The Kinesis API provides a response attribute MillisBehindLatest to track this, so that you can alert via error log message about possible falling behind. Also, there is a CloudWatch metric GetRecords.IteratorAgeMilliseconds related to that but I'm not sure we can integrate it somehow.

Let's print error or warn message when MillisBehindLatest > stream retention period.

Research edge cases

We need to investigate and research the following edge cases:

  • upscale and downscale
  • fault tolerance and error handling
  • shards assignment scenarios

Downscale handling

So far, the downscale (when we merge shards) is not properly handled, or at least we're not sure if it's handled somehow under the hood of Jet. The main problem here is with the WatermarkSourceUtil, which is used to manage watermarks. Its API doesn't allow to decrease a partition count so we're not able to explicitly propagate this down to the Jet internals. It was suggested that the Jet itself internally decreases a partition count when there is no data for the partition during more than 60 sec. Here, by no data we mean there was no events reported via the WatermarkSourceUtil API for the given duration. We need to do the following:

  • verify and test the behavior of Jet and WatermarkSourceUtil when there is no data for the partition
  • check for other possible issues when Kinesis downscales
  • communicate with Jet Team about hazelcast/hazelcast-jet#763 (update: seems already fixed)

Migrate from 0.7

The current draft implementation is based on the Jet ver. 0.7 We need to migrate to a recent one.

Messages order per processor instance

There was some issue with the order of the messages read from multiple shards in the same processor. It was the internal Jet problem, let's clarify that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.