Giter Site home page Giter Site logo

distributedtaskscheduler's Introduction

DistributedTaskScheduler

Prerequisites :

server.port= 30000

pubsub (kafka) props

pubsub.namespace=localhost:9092

pubsub.topic = distributed_task

pubsub.task-shard.topic = distributed_task_shard

pubsub.result.topic = distributed_task_completed

pubsub.consumergroup = task-subscribers

pubsub.consumercount = 2

Redis Caching Params

spring.cache.type=redis

spring.redis.host=localhost

spring.redis.port=6379

Assumptions :

  1. If there are multiple Schedulers working concurrently then, They will have to be working under publically exposed Loadbalancer which will route client request to 1 of the working instances of the server nodes

  2. The implentation support scope is limited bash scripts. this can be easilly enhanced to spport and run other languages as well given more time and efort

Work Flow

a) 1 or more scheduler working behind an NginX Load balancer server. [Not part of this assignment. But an assumption] a1) Scheduler can also work on multiple nodes since it is stateless. a2) When configured to run only on 1 node then no need of nGinX or any load balancer

b) client will send a serialized MasterTask Object over a rest (POst) call. b1) MasterTask class is available in the Code b2) Its simple in structure. which reduces client side complexity to while submiting tasks to scheduler b3) Scheduler has been made to attemp max 10 times to submit the tasks

c) Client will poll execution Status of the task submited. API is made available already in the implementation is scheduler.

d) Once client submits the task

d1) scheduler saves it to Redis with appropriate status.

d2) publishes it to Kafka for initiating Execution

d3) Any of the running Scheduler subscriber recieves it

d4) prepares executable function after replacing variables in templateFunction.

d5) devides the task input load depending on the available worker nodes

d6) pubilsh individual task shard to kafka for Worker consumers to execute it.

f) the scheduler consumer instance, will have hear beats available from all the worker nodes up and running.

f1) the scheduler consumer instance, will replace all the variables in from Template function using values provided in variableJson field of the MasterTask.

f2) based on the number of worker nodes available, the input list that is to be processed by the command will be devided and further published to KAFKA again in chunked format.

g) Worker node will consume the shard of the task it has to execute.

g1) it will even furthe devide the shard based on number parallel cores avaliable at its disposal. 

g2) To have to maximum possible concurrency in terms of shards execution

g3) each thread will mutate the latch supplied to it.

g4) once latch is reached to 0 on the worked consumer, then this worker consumer will increment the chunks executed variable of the Master task and update redi

g5) client will automaticlly see task status as finished once all Worker consumers have executed their shards.

g6) Each worker consumer initiate Redis Transaction  to update its part of the work done.

k) All redis updates are supposed to happen in atomic transactions. to avoid stale ready.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.