Giter Site home page Giter Site logo

buildersoftio / andyx Goto Github PK

View Code? Open in Web Editor NEW
15.0 3.0 1.0 3.39 MB

Buildersoft Andy X Project

License: Apache License 2.0

C# 99.83% Dockerfile 0.17%
andy streaming distributed-systems event-sourcing event-driven data-pipeline reactive-architecture

andyx's Introduction

Sublime's custom image

What is Andy X?

Andy X is an open-source distributed streaming platform designed to deliver the best performance possible for high-performance data pipelines, streaming analytics, streaming between microservices and data integrations.

Get Started

Follow the Getting Started instructions how to run Andy X.

For local development and testing, you can run Andy X within a Docker container, for more info click here

How to Engage, Contribute, and Give Feedback

Some of the best ways to contribute are to try things out, file issues, join in design conversations, and make pull-requests.

Reporting security issues and bugs

Security issues and bugs should be reported privately, via email, [email protected]. You should receive a response within 24 hours.

Related projects

These are some other repos for related projects:

Deploying Andy X with docker-compose

Andy X can be easily deployed on a docker container using docker-compose, for more info click here

Code of conduct

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community.

For more information, see the .NET Foundation Code of Conduct.

Support

Let's do it together! You can support us by clicking on the link below!

alt text.

andyx's People

Contributors

eneshoxha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

moritzrenkin

andyx's Issues

Expose endpoints for Andy CLI, Dashboard and 3rd party

As a developer I want to use REST endpoints to read Tenants, Products, Components, Topics, Consumers, Producers and Storages.
Also, as a developer I want to be able to create new tenants and manage settings from Tenants, Components and Topics

Consumers with same name can not consume messages on different topics

As a Developer when I consume different events from different topics and I use the same name for these consumers, It fails to consume as described on the image below.

If the topics or components or products or topics are different it should allow to consume because are different topics

image

Implement Metrics

Implement Metrics across Andy X Cluster for
Tenant, Product, Component, Topics, Consumers and Producers

Implement Message Consumption

Implement Message Consumption logic when Consumers conencts to the node.

Create the acknowledgement_log db to store the history of message acknowledgement and cursor_log db to store the current online position for x consumer.

In the pointer_log should be stored the unacked messages, but this will depend on the Subscription mode.
Modes Durable and NonDurable.
If the Mode is Durable, if a message is not acked the new message will not be sent from the Andy X Node.
If the Mode is NonDurable, the not acked messages will be stored and the node will re-deliver again in the future, the cursor position will move ahead currentMessage+1;

Subscription Types for the Consumer will continue to be Exclusive, Failover and Shared.

Shared Subscriptions by default will be NonDurable.

Message acknowledgements will have 3 statuses
Acked, UnAcked and Skipped.

If the acked is skipped, the current position will move ahead, if the message is unacked the current position will stay in the same place and the Node will re-deliver the same message again but this depends on the Consumer Mode.

position_log db will persist the current position of the cursor, and there will be records for each connection.

position_id cursor mark_delete_position read_position is_connected pending_read_op entries_since_last_unacked
10001 consumer_name ledger_id:-1 {ledger_id}:0 true 0 55

Add Weight on Storage

Implement StorageCalculatorService

As user of Andy X, Storages connected to a node/s, connected as shard should have the same ~ storage size.

A service that will check the size of storage and will will sent the size to the node. Node will continue sending messages to nodes that have the shortest size.

Implementation of Retention Policy TTL

This issue is about implementing the background service for Retention,
As we would like to have retention policies based on tenant, product and component.

The level of priority for retention policy is the one which is the nearest to Topic.
Level 0 Component Retention Policies;
Level 1 Product Retention Policies;
Level 2 Tenant Retention Policies;

In this issue the logic should be implemented for all 3 levels for HARD_TTL and SOFT_TTL.

SOFT_TTL, in case when a message was not consumed from a subscription will not be deleted if the TTL has come.
HARD_TTL, it will delete the message even if is not consumed by a subscription, these messages will be skiped from subscription.

Implementation of Cluster Rest Endpoints

As user I want to use Cluster Endpoints to read cluster configuration from Andy X Portal and Andy X Cli.
Also, in this issue we will develop in-memory cluster manager, and temp directories for async communication

Implementation of Andy X Replicated Async Cluster

As a User of Andy X I would like to deploy more than one instance of Andy X and connect them as a Cluster.

On Andy X there will be three different node links within a andy x cluster.

  • Distributed Sync & Async Nodes,
  • Replicated Nodes
  • Production Cluster (connection between distributed and replicas)

Replicated Async Cluster

The configuration of nodes in Replicated Async Cluster is done in the logic of Master/Slave, in terminology of Andy X we are using Main and Worker Nodes.

v1-replicated-async-cluster-1

As is described in the diagram above, when a Producer is connected to Node 1 which is the MAIN NODE, when messages are accepted and stored in the node, asynchronously messages will mirror to other WORKER NODES. In the consumer is conencted as in the diagram to Node 3, it will consume messages already stored in Node 3.

Producers connected in different Nodes in the same topic

v1-replicated-async-cluster-different-producers

In case as is shown in the diagram above, Producer_2 is producing message 4 into Node 2 (which is a worker node), this message will be stored in a temp storage for the main node, the processing and storing will happen from the Main Node, as soon as is processed from Node 1 (main node), the message will be replicated to other worker nodes.

If the Main Node will go down, one of the working nodes will act as Main Node.
v1-replicated-async-cluster-master-down

the switch between Main and Working Nodes can be done via Andy X CLI and Andy X Portal, by Promoting Nodes.

    andyx cluster "default_01" promote "node_2"

The configuration of the cluster is done in cluster_config.json
example

{
	"Name": "default_01",
	"Shards": [
		{
			"replicas":[
				{
					"NodeId": "01"
				},
				{
					"NodeId": "02"
				}
			]
		},
		{
			"replicas":[
				{
					"NodeId": "03"
				},
				{
					"NodeId": "04"
				}
			]
		},
		{
			"replicas":[
				{
					"NodeId": "05"
				},
				{
					"NodeId": "06"
				}
			]
		}
	]
}

Implementation of Andy X Distributed Async Cluster

As a User of Andy X I would like to deploy more than one instance of Andy X and connect them as a Cluster.

On Andy X there will be three different node links within a andy x cluster.

  • Distributed Sync & Async Nodes,
  • Replicated Nodes
  • Production Cluster (connection between distributed and replicas)

Distributed Async Cluster

v1-distributed-async-cluster

As is described in the diagram above, a Producer is connected to Node 1, and a Consumer is connected to Node 3. In the async distributed cluster, if the Producer produces three messages, the first message will be stored to Node 1 Storage, the second message will be stored temporary in Node 1, as message dedicated for Node 2, the same thing will happen will message 3. Asynchronously from temp storage the messages will be sent to specific node alocated from Node 1, as soon as these messages are accepted from the nodes, it will be deleted from that temp storage, as is described in the diagram bellow.

v1-distributed-async-cluster-messagesync

The same parameters exists when one of the nodes is down, but what is different here, is that the messages that were dedicated for that node will be stored in the node when the producer is connected, as soon as that node is working the syncronization between nodes in cluster will happen, and Consumers will not work if one of the nodes is not active.

Implement Initial Cluster Infrastructure

This issue is being used to implement the core abstraction of clustering logic for Andy X Cluster.

Implementing

  • NodeConfiguration for NodeId, to help storing the messages into topics with id {node_id}:{entryId}
  • Rename TenantMemoryRepository to TenantMemoryService
  • Create Topic Temp Directories for Clustering (create temporary rocksdb)

Implementation of Andy X Distributed Sync Cluster

As a User of Andy X I would like to deploy more than one instance of Andy X and connect them as a Cluster.

On Andy X there will be three different node links within a andy x cluster.

  • Distributed Sync & Async Nodes,
  • Replicated Nodes
  • Production Cluster (connection between distributed and replicas)

Distributed Sync Cluster

v1-distributed-sync-cluster

As is described in the diagram above, if a producer is connected to Node 1, and Node 1 is connected to Node 2 and Node 3 in Distributed Cluster, the data will be syncronized as described. If the there will be three messages accepted by Node 1, Node 1 will store the first message and it will delegate the message to other nodes as already stored, when the second message arrived in Node 1, it will delegate to Node 2 to store, also it will send the message to node 3 as already stored.

Nodes in memory will use PriorityQueue to store messages from different Nodes as is described in the diagram bellow.
As the messages will be stored indexed by Node in the cluster and the entry of the

messages in that Node {node_id}:{entry_id}.

In distributed cluster topics are also known as distributed topics

v1-distributed-sync-cluster-on-connect

In Distributed Cluster in one of the nodes is down, the production of message continues working properly, but the consumption of messages will go down.

v1-distributed-sync-cluster-not-working

Implement Storage Synchronizer

Storage Synchronizer is a standalone process that will consumer .bin files written from the Producer Node and will store into topics.

The storage will be done into Ledgers, each ledger will have around 5000 messages stored.

The payload of the message will be stored as binary.

Discussion: Should we make Ledgers created every one hour or should we have it done by the numbers of messages written into the Ledger.

  • For each Ledger Storage Synchronizer will store the status into ledger_logs and will do snapshots.
  • Should ledger_logs be stored as .json configuration file or should me use SQLite to store the configuration.
    • Recommandation is using SQLite db.

ledger_logs should provide table
ledgers for topic
| ledger_id | ledger_location | entries| createdDate | status

ledger_id ledger_location entries createdDate status size
10001 root/data/topic/msg_10001_date.andx 5000 2022-05-23 Closed 100MB
10002 root/data/topic/msg_10002_date.andx 6500 2022-05-23 Closed 100MB
10003 root/data/topic/msg_10003_date.andx 75 2022-05-23 Opened 100MB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.