Giter Site home page Giter Site logo

susamn / rio Goto Github PK

View Code? Open in Web Editor NEW
61.0 4.0 6.0 86 KB

A lightweight job scheduler based on priority queue with timeout, retry, replica, context cancellation and easy semantics for job chaining. Build for golang web apps.

License: MIT License

Go 100.00%
golang go job scheduler scheduling goroutine priority-queue heap minheap worker

rio's Introduction

Table of Contents

  1. Introduction
    1. What is RIO?
    2. Concern
      1. An asynchronous job processor
      2. Easy management of these goroutines and chaining them

Introduction

Tag Go Build Coverage license godoc

Go Report Card Code Inspector Badge Code Inspector Report Card Codacy Badge

What is RIO?

Rio is a lightweight job scheduler and job chaining library. Its mainly build for Golang web apps, but it can be very easily mold to serve any application needing job scheduling. The library is an asynchronous job processor, which makes all the backend calls asynchronously with retry, timeout and context cancellation functionality. It also provides very easy semantics to join multiple datasources based on their output and input types, at the same time having no coupling between the datasources. This helps in creating new apis or resolvers for GraphQL apis a breeze.

Concern

Many times we write web apps which connects to different data sources, combines the data obtained from these sources and then do some more jobs. During these process, we do a lot of boilerplate to transform one data type to other. Also in the absense of a proper job scheduler, we create goroutines abruptly and without proper management. These create unmanagable code. To update those code is even more hard in future, when there is a new team member in the team.

Rio tries to solve this problem by introducing two concepts.

An asynchronous job processor

This is the piece which runs the multiple jobs asynchronously (Based on the Rob Pike video: Google I/O 2010). It has a priority queue(balancer.go and pool.go) which hands off incoming requests to a set of managed workers. The balancer hands off new job to the lightly loaded worker.

Easy management of these goroutines and chaining them

How many times do we do this:

call service 1 in goroutine 1
wait and get response from goroutine 1
call service 2 in goroutine 2, taking piece of data from service call 1
wait and get response from goroutine 2
call service 3 in goroutine 3, taking piece of data from service call 3
wait and get response from goroutine 3

You get the idea, this only delays thing more and does a lot of context switching. Rio helps in this, by chaining multiple calls together by means of using closures and function types and runs in one goroutine.

Now many can think is it not going to be slower compared to doing multiple goroutine calls. I think not, it will be faster. Think of the previous example. If you do not get response from service 1, can you invoke service 2, or if service 2 fails, can you call service 3? No right, as there is data dependency between these calls.

Rio chains dependent jobs together by introducing this pattern.

request := context,
          (<callback of service 1>.WithTimeOut(100 ms).WithRetry(3))
          .FollowedBy(<function which can transform data from service 1 response to request or partial request of 2>,
                      <callback of service 2>)
          .FollowedBy(<function which can transform data from service 2 response to request or partial request of 3>,
                                  <callback of service 3>)

In the example in examples/web.go the chaining pattern looks like this:

request := rio.BuildRequests(context.Background(),
      rio.NewFutureTask(callback1).WithMilliSecondTimeout(10).WithRetry(3), 2).
      FollowedBy(Call1ToCall2, rio.NewFutureTask(callback2).WithMilliSecondTimeout(20))

Once the chaining is done, post the job to load balancer

balancer.PostJob(request)
<-request.CompletedChannel

Once the call chain happens, the request comes back with responses for all these calls in a slice and you can do this

  1. Only one job response

    request.GetOnlyResponse()
    

    or

  2. Multiple job response

    request.GetResponse(index) ---0,1,2
    

    If any job fails, the response will be empty response, specifically rio.EMPTY_CALLBACK_RESPONSE

rio's People

Contributors

codacy-badger avatar susamn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rio's Issues

Add replica calls to services and pick the correct one based on some condition

Think about a backend having 3 data centers, and all these data centers are no responding uniformly, some is slow and some are fast.

or

There are multiple datacenters having sync issues. Some data center is updated first and some are later.

Now during this time, if a backend call is made to the slow responding dc or the dc having stale data, the obtained response will be of higher response time or be faulty respectively.

Replica calls can help in this context. If the backend is allowing, Rio should be able to select the one response based on some condition, maybe response time based or some data quality based.

Job posting to balancer and waiting for response should be abstracted in a single method call

Currently, we post new job to balancer and wait on the request CompletedChannel from the user side, like this:

balancer.PostJob(request)
<-request.CompletedChannel

We should merge this together and have one method, which will either:

  • Fail immediately if the request validation fails
  • Block for the whole duration of the job, meaning the method should do the wait on the CompletedChannel. The reason for doing it, if any user of the api forgets to wait on the channel, the worker for that request will block indefinitely and there will be less workers in balancer. Balancer unknowingly will queue new work for that worker and when the buffer fills up, it will block also.

Write proper test case for the pool

We have to write proper test case for the pool.go file. It's a priority queue based on the heap interface. The existing pool_test.go file does not have enough test case to test it on load.

The BridgeConnection type has to be a struct with error

Currently, the BridgeConnection type is like this:

type BridgeConnection chan interface{}

It would have to be
type BridgeConnection struct{ Data chan interface{} Error error }

It will help in managing the errors well between chained tasks

BuildSingleRequest has nil CompletedChannel

func BuildRequests(context context.Context, task *FutureTask, size int) *Request {

	tasks := make([]*FutureTask, 0, size)
	tasks = append(tasks, task)
	return &Request{Ctx: context, Tasks: tasks, TaskCount: size, CompletedChannel: make(chan bool)}
}

compared to

func BuildSingleRequest(context context.Context, task *FutureTask) *Request {
	tasks := make([]*FutureTask, 1)
	tasks = append(tasks, task)
	return &Request{Ctx: context, Tasks: tasks}
}

Change the worker implementation to be buffered

Currently, the workers are unbuffered. Any new task comes in, it will be started in a separate goroutine, so we are not really using the priority queue properly and there are stray goroutines in the background. If the system crashes, there will be many goroutines that will fail per worker. It the worker requests channel is buffered, we will have fewer failed tasks.

w := &Worker{ requests: make(chan *Request), pending: 0, index: i, Name: fmt.Sprintf("Worker-%d", i), done: b.done}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.