amient / affinity Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 4.0 5.08 MB

Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka

License: Apache License 2.0

Scala 68.15% Java 29.46% HTML 0.19% JavaScript 0.75% Shell 1.44% Dockerfile 0.03%

akka-http avro distributed-systems high-availability kafka restapi scalability stream-processing

affinity's People

Contributors

Stargazers

Watchers

Forkers

containerz hadoop835 marwahaha alexrogalskiy

affinity's Issues

TLS Keystore uses resource loader even when keystore.file is specified

TransactionSpec

Have noticed inconsistent state when encountering failures while experimenting with websockets. The example graph is using transactions so this shouldn't happen so the transactions needs extensive unit-testing.

Intermittent failing test in the example app

Looks like the test runs before the partitions are registered. Sometimes 'partition 0 is not represented in the cluster' is seen but the test passes, other times this message repeats and fails which means it is the ack retrying some message.

The TestRegionNode should block until the partitions are online.

Stream iterator for MemStore

instead of Iterator[BB, BB] we need Stream[BB,BB] that works with the asynchronous get

Upgrade to latest akka 2.4.x and akka http 10.0.x

enforce the Reply[T] on the ack reply

When kafka is not reachable the bootstrap prevents shutdown

Should throw exceptions

support for websocket push scenario

Test method for schema compatibility

either compatibility check between schemas or empirical test using generated data and trying to read it with the current compile time schemas

Test concurrent state manipulation

Build a more complex test transaction that say creates a state record and updates it after it is created and execute 100x in parallel....

HTTP Request Matcher support for route-specific timeout

Currently either the gobal akka.http.server.request-timeout has to be used or the Timeout-Access header has to be manipulated in the handler - this can be provided by the HTTP unapply olverload

http.request.header[headers.`Timeout-Access`] foreach { t ⇒
    t.timeoutAccess.updateTimeout(30 minutes)
}

api module for java extensions

refactor low-level interfaces like MemStore and Storage to java api and separate in a separate module

Master-Standby swap has to be robust

There is a possibility under current implementation that when partition becomes a Master there are still messages in the change log from the previous Master that have not been consumed. This is because under the current prototype, the master naively stops tailing the topic on receiving BecomeMaster message but this needs to be analysed statically and proven that it can't happen or put in place mechanism that guarantees all messages by previous Master have been Consumed.

non-replicated symmetrically divided cluster doesn't resume handling after all partitions are online

8 partitions in total
[0, 2, 4, 6] assigned to the first instance
[1, 3, 5, 6] assigned to the second instance

also 2 other services with a single partition [0]

after both instances are initialized (and zookeeper coordinator holds all partitions) handling doesn't resume from the suspended state

Controller doesn't receive gateway termination message on shutdown

This causes some ugly exceptions on shutdown. Need to implement some lightweight terminator pattern to guarantee services are shutdown before containers and containers before controller.

Gradle artefacts ids lose scala classifier when gradle install used for local maven

Use maven-publish plugin instead with support for gradle publishToMavenLocal

Clean-up system test shutdown exceptions

FIXED - .pattern.AskTimeoutException: Recipient[Actor[akka://test/user/subscriber1#-468677115]] had already been terminated. Sender[null] sent the message of type "io.amient.affinity.core.cluster.Coordinator$MasterStatusUpdate". -> io.amient.affinity.core.util.AckableActorRef$.io$amient$affinity$core$util$AckableActorRef$$attempt$1(AckSupport.scala:78)
FIXED - java.lang.IllegalStateException: cannot enqueue after timer shutdown -> io.amient.affinity.core.cluster.Coordinator$$anonfun$updateGroup$1.apply(Coordinator.scala:143)

Review Exit Codes

The exit codes also need thinking about .. at the moment there are some situations which should call System.exit because they may be launched with other components that do not need to be taken down if one component fails.

new module `testutil`

The SystemTestBase currently under test sources of the core module should be a moved into main sources of new module testing. This module can than be the basis for testing core system tests as well as application data api tests.

Document Design Goals and Decisions

Goal 1. horizontal scalability
Goal 2. high-availability

Graph Model Template

Look at the example graph model and try to imagine how it would work if the model was provided by the library and the example app only configures an instance of it.

Basically the files instrumental in defining the graph model behaviour are:

example.conf - defines the state store name and implementation classes as well nominates the avro serde registry
ExampleDomain.scala - Defines all the graph message types
Graph.scala - Defines the Data API and the orchestrated logic
GraphPartition.scala - Defines the distributed data processing logic

Multiplex WebSocket Support

Advanced web socket client can subscribe to any number of key-value streams with a single connection. This will complicate the protocol but we need affinity-ws-client module anyway which may contain java client too.

After having a functioning prototype, try if the same could apply to plain-text json websocket.

CF SchemaRegistry kafka support

under affinity-kafka module, create serializer and deserializer for AvroSerde, completely reusing confluent schema registry deserializer and wrapping around their serailzer

Configurable partitioner

Via blackbox partitioner provider

RocksDB implementation of the MemStore

First refactor storage provider via configuration so that for example NoopStorage can be used in tests and KafkaStorage for runtime, then add RocksDBStorage as a separate module affinity-rocksdb

WebSocket closed because of deadletters encountered on ActorPublisher

"Haven't heard from ZooKeeper" causes removal of regions

This can be only development setup issue but when a mac goes to sleep coordinator removes all regions and services. Still has to be investigated why this happens and handle it correctly to recover the state after connection is re-established.

Create SystemTestBase

start zookeeper
start kafka
start node with region and gateway
provide method for rest calls

Use TypeSafe Config instead of java.util.Properties

This is available from the system context so easier to manage

Consider first-class support for Delta Stream Capture

Currently only the full state is recorded in the memstores and optionally in the storage.
There is already a good API that supports streaming incremental changes to websockets.
This API can be used by the state itself to record all changes in a custom delta capture implementation, e.g. plain kafka topic. This API of the DeltaStreamCapture needs to enable both simple and complex implementations. Simple is simply writing all messages to a file or kafka topic, complex could mean partially defined function which sends each different type of message into a different topic or which transforms all incremental messages to a new type.

Move kafka-specific system test into kafka module

global system tests shouldn't depend a specific kafka version, rather kafka module should extend SystemTestBase and implement usecases..

However, SystemTestBaseWithKafka should be available for application system testing.

AvroRecord support for specific implementations of Map and Set

Analytics Support on top of the compacted kafka topics

bring over the CompactedKafkaRDD

Kafka Serializer and Deserializer for ZkAvroSchemaProvider

WebSocket Java Client and WSSystemTestSupport

AvroSchemaProvider

At the moment the internal Serde interface and its implementation AvroSerde manage the schemas in prototype fashion.

We need a provider interface
There will be the default implementation which is embedded
There will be second built-in implementation that connects to Confluent Schema Registry
The default implementation must support declarative schema evolution

TLS SystemTest shutdown error

test name: TlsGatewaySpec / HTTPS Requests / should be handled correctly as TLS

a) [ERROR] [12/20/2016 15:34:09.788] [SystemTests-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(SystemTests)] Outgoing request stream error (akka.stream.AbruptTerminationException)

b) [ERROR] [12/10/2016 18:27:02.953] [SystemTests-akka.actor.default-dispatcher-9] [akka.actor.ActorSystemImpl(SystemTests)] Outgoing response stream error (akka.stream.AbruptTerminationException)
[ERROR] [12/10/2016 18:27:02.955] [SystemTests-akka.actor.default-dispatcher-2] [akka.actor.ActorSystemImpl(SystemTests)] Outgoing request stream error (akka.stream.AbruptTerminationException)

c) cannot enqueue after termination in coordinator - DONE

Add LICENSE, NOTICE and source file headers

Minimal production version

Presuming running on a single node, zookeeper coordinator is not required so TestCoordinator can be converted to EmbeddedCoordinator since it uses real synchronization and instead of KafkaStorage a simple FileStorage can be used with append only at runtime and custom compaction on open.

Support TLS WebSocket

TLS transport has been already added and it works fine for https:// requests but currently wss:// requests return 400 which looks like we just need to add a route

Built-in response gzip

Serving traffic out from the cloud providers is typically the most expensive bandwidth component.

Dynamic partition assignment

Before anything, Kafka Consumer already contains coordinator that satisfies all the requirement on the coordination of most-of-the-time-stable group members including resonance with previous known state of the whole consumer groups - it would be interesting to explore tapping into the ConsumerRebalanceListener interface of Kafka and simply follow the rebalance result in the akka layer. The consumer is designed to first wait for last known group members, probably with some timeout. It would be limited to kafka as the only, implicit storage, but there is no need for other storage, so that's not a problem.

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Detailed+Consumer+Coordinator+Design#KafkaDetailedConsumerCoordinatorDesign-Co-ordinator

Another thing that still needs to be explored that kafka streams queryable state api connected with ksql and the applications running in between as affinity actor system groups with direct, co-located access to both statestores via ksql as well as programmatically.

At the moment partitions are assigned statically by configuration which is far from ideal. What would help is rebalancing/re-assigning partitions whenever a new member is added/removed to the service group but with a certain degree of stickiness or resonance to the existing assignment.

In the ideal scenario, the operation should also involve potentially re-partitioning the underlying topic which in the light of zero-downtime guarantee would mean essentially first migrating the existing changelog into a topic with target number of partitions and then re-balancing the state partitions against this new state topic.

This is not trivial for several reasons:

the mem store persitence requires stickiness in the assignment which in turn requires coordinator to create persistent structure about the cluster that is watched and assignment and rebalance actions are taken if new nodes join > also CoordinatorEmbedded is not compatible with this idea and will have to be refactored into CoordinatorLocal which will use memory for ephemeral and disk for persistent structures
any service partition may contain multiple state stores each, all of which at the moment must have identical number of partitions
it may be desirable to have a larger pool for Masters and a smaller Standby so that machines are not wasted, but in such case we need some sort of Master-Preference flag
it would be nice for the assignment algorithm to have certain resonance with the previous shape, i.e. not a fixed stickiness just a certain tendency to keep the partitions where they are (if we go with master preference then the preferred masters would resonate with each other)
make provisions for automatic repartitioning
there is a certain minimum number of nodes expected to achieve desired parallelism and redundancy so there must be a hint in the configuration about this. E.g. when the first member of a service group it shouldn't load all partitions if there are more members expected to join.
the minimum number of nodes expected can be viewed also as a maximum partitions per container combined with a master:standby ratio, .e.g say a service-keyspace has 4 partitions and we want to achive 100% redundancy. Say the throughput of the application requires at least 2 containers. Launching the first service group member, will cause 2 partitions to load, expecting another container to join. When the second container joins it gets the remaining 2 master partitions assigned. At this point the cluster is available but with 0% redundancy so each of the containers is now assigned the other's pair of partitions to load as standbys. At this point the minimum requirements have been met and the cluster is in the balanced state. However nothing stops us adding another member to increase the throughput of the application. Since cluster is in the balanced state, the new container will have to become a master of at least one partition and standby for at least one other partition both of which are actively served by one the first 2 containers. Once the node finishes bootstrap of the master partition it takes over and the pervious master is shut down.

Once there is an identical config that can be launched multiple times, essentially we can have docker images/aws images/... that can be deployed ad-hoc automatically scaling the service groups.

Kafka Avro Formatter for console consumer debugging

The first POST request to the akka http connection flow has zero length

even if a non-zero length entity was POSTed.
any consequent requests behave correctly
also if any GET request is made first then it's ok

This looks like a quirk of akka-http but it may be that the connection async handler needs to be initialized programatically.

service with single-partition doesn't form master-slave under replication

service with 1 partition

deployed on 2 instances with [0] and [0] should result with one of them being selected as master and the other as replica. instead when doing load-balanced requests after one write, one request returns what was written and the next nothing.

TLS/SSL HttpInterface

http://doc.akka.io/docs/akka/2.4.10/scala/http/server-side-https-support.html

Controller children restart on failure issue

when region shuts down due to partition exception CreateRegion is never called again and the promise will be 'already completed'
when gateway gets restarted CreateGateway is never called again and the promise will be 'already completed'

Coordinator doesn't register singleton service sometimes

Can be reproduced by running the ExampleApp. The regions are propagated correctly but the singleton service is sometimes missing on the 4th node.

WebSocket downstream scenario and AvroJS

State update push direction is solved. Now the update messages could be done also through the websocket. However the update logic may be more application-specific than simply listening on state change which is well handled by REST endpoints so this needs thinking whether it is a good idea.

The scenario that addresses this dilemma is already in the example connected graph where the POST to a connect endpoint triggers several operations that result in updates of multiple vertices. The same logic could be abstracted into a method which can be called by both REST and WebSocket.

Unify KeySpace and Service notion

Each KeySpace should be a separate concept, not everything deployed in co-located regions.
Service and KeySpace are equals, Service can also be partitioned and have standbys