susamn / rio Goto Github PK

61.0 4.0 6.0 86 KB

A lightweight job scheduler based on priority queue with timeout, retry, replica, context cancellation and easy semantics for job chaining. Build for golang web apps.

License: MIT License

Go 100.00%

golang go job scheduler scheduling goroutine priority-queue heap minheap worker

rio's Introduction

Introduction
1. What is RIO?
2. Concern
  1. An asynchronous job processor
  2. Easy management of these goroutines and chaining them

Introduction

What is RIO?

Rio is a lightweight job scheduler and job chaining library. Its mainly build for Golang web apps, but it can be very easily mold to serve any application needing job scheduling. The library is an asynchronous job processor, which makes all the backend calls asynchronously with retry, timeout and context cancellation functionality. It also provides very easy semantics to join multiple datasources based on their output and input types, at the same time having no coupling between the datasources. This helps in creating new apis or resolvers for GraphQL apis a breeze.

Concern

Many times we write web apps which connects to different data sources, combines the data obtained from these sources and then do some more jobs. During these process, we do a lot of boilerplate to transform one data type to other. Also in the absense of a proper job scheduler, we create goroutines abruptly and without proper management. These create unmanagable code. To update those code is even more hard in future, when there is a new team member in the team.

Rio tries to solve this problem by introducing two concepts.

An asynchronous job processor

This is the piece which runs the multiple jobs asynchronously (Based on the Rob Pike video: Google I/O 2010). It has a priority queue(balancer.go and pool.go) which hands off incoming requests to a set of managed workers. The balancer hands off new job to the lightly loaded worker.

Easy management of these goroutines and chaining them

How many times do we do this:

call service 1 in goroutine 1
wait and get response from goroutine 1
call service 2 in goroutine 2, taking piece of data from service call 1
wait and get response from goroutine 2
call service 3 in goroutine 3, taking piece of data from service call 3
wait and get response from goroutine 3

You get the idea, this only delays thing more and does a lot of context switching. Rio helps in this, by chaining multiple calls together by means of using closures and function types and runs in one goroutine.

Now many can think is it not going to be slower compared to doing multiple goroutine calls. I think not, it will be faster. Think of the previous example. If you do not get response from service 1, can you invoke service 2, or if service 2 fails, can you call service 3? No right, as there is data dependency between these calls.

Rio chains dependent jobs together by introducing this pattern.

request := context,
          (<callback of service 1>.WithTimeOut(100 ms).WithRetry(3))
          .FollowedBy(<function which can transform data from service 1 response to request or partial request of 2>,
                      <callback of service 2>)
          .FollowedBy(<function which can transform data from service 2 response to request or partial request of 3>,
                                  <callback of service 3>)

In the example in examples/web.go the chaining pattern looks like this:

request := rio.BuildRequests(context.Background(),
      rio.NewFutureTask(callback1).WithMilliSecondTimeout(10).WithRetry(3), 2).
      FollowedBy(Call1ToCall2, rio.NewFutureTask(callback2).WithMilliSecondTimeout(20))

Once the chaining is done, post the job to load balancer

balancer.PostJob(request)
<-request.CompletedChannel

Once the call chain happens, the request comes back with responses for all these calls in a slice and you can do this

Only one job response
```
request.GetOnlyResponse()
```
or
Multiple job response
```
request.GetResponse(index) ---0,1,2
```
If any job fails, the response will be empty response, specifically rio.EMPTY_CALLBACK_RESPONSE

rio's People

Contributors

Stargazers

Watchers

Forkers

diegopacheco amasser sporynin ramesharun chennqqi imneov

rio's Issues

The slice initialization has to be done properly

Add replica calls to services and pick the correct one based on some condition

Think about a backend having 3 data centers, and all these data centers are no responding uniformly, some is slow and some are fast.

There are multiple datacenters having sync issues. Some data center is updated first and some are later.

Now during this time, if a backend call is made to the slow responding dc or the dc having stale data, the obtained response will be of higher response time or be faulty respectively.

Replica calls can help in this context. If the backend is allowing, Rio should be able to select the one response based on some condition, maybe response time based or some data quality based.

If timeout is unset, make sure the service does not timeout instantaneously

Job posting to balancer and waiting for response should be abstracted in a single method call

Currently, we post new job to balancer and wait on the request CompletedChannel from the user side, like this:

balancer.PostJob(request)
<-request.CompletedChannel

We should merge this together and have one method, which will either:

Fail immediately if the request validation fails
Block for the whole duration of the job, meaning the method should do the wait on the CompletedChannel. The reason for doing it, if any user of the api forgets to wait on the channel, the worker for that request will block indefinitely and there will be less workers in balancer. Balancer unknowingly will queue new work for that worker and when the buffer fills up, it will block also.

Support for single call followed by multiple parallel calls

It would be cool to have a fanout of dependent calls based on one parent call response. Like this:

Call 1 -->Bridge data ==> Call 2 & Call 3 in parallel

Write proper test case for the pool

We have to write proper test case for the pool.go file. It's a priority queue based on the heap interface. The existing pool_test.go file does not have enough test case to test it on load.

Write test case for checking the count of goroutines before and after the test

We need to write test cases for calculating the resource utilization and if there are any unaccounted goroutines leaks. Use the following package to count the number of goroutines.

runtime.NumGoroutine()

Add balancer method to provide statistics

There should be a method in balancer to know how many jobs are queued and how the workers are responding

The BridgeConnection type has to be a struct with error

Currently, the BridgeConnection type is like this:

type BridgeConnection chan interface{}

It would have to be
type BridgeConnection struct{ Data chan interface{} Error error }

It will help in managing the errors well between chained tasks

Documentation of the public structs

BuildSingleRequest has nil CompletedChannel

func BuildRequests(context context.Context, task *FutureTask, size int) *Request {

	tasks := make([]*FutureTask, 0, size)
	tasks = append(tasks, task)
	return &Request{Ctx: context, Tasks: tasks, TaskCount: size, CompletedChannel: make(chan bool)}
}

compared to

func BuildSingleRequest(context context.Context, task *FutureTask) *Request {
	tasks := make([]*FutureTask, 1)
	tasks = append(tasks, task)
	return &Request{Ctx: context, Tasks: tasks}
}

Change the worker implementation to be buffered

Currently, the workers are unbuffered. Any new task comes in, it will be started in a separate goroutine, so we are not really using the priority queue properly and there are stray goroutines in the background. If the system crashes, there will be many goroutines that will fail per worker. It the worker requests channel is buffered, we will have fewer failed tasks.

w := &Worker{ requests: make(chan *Request), pending: 0, index: i, Name: fmt.Sprintf("Worker-%d", i), done: b.done}

Add default worker wait time to abandon request, via configuration

timeout is unset
Context is not cancelled
Network call is not responding
The worker should abandon the request and continue, closing the request CompletedChannel with the error, request abandoned