I'm trying to implement some (extremely) rudimentary load balancing by having multiple

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Faux load balancing? about serving HOT 7 CLOSED

tensorflow commented on May 21, 2024

Faux load balancing?

from serving.

Comments (7)

viksit commented on May 21, 2024

I might be missing something but - why not use a simple load balancer such as nginx/haproxy or the amazon Elastic LB in case you're on AWS, for this?

If it's the same model on all machines, a simple round robin should work, and you could expose a simple /status endpoint on your server which the load balancer knows how to read and not direct requests to? In this case, there would be no predictor layer (since that seems to act as a bottleneck), and instead, simply have,

client <---> nginx/LB <---> [m1, m2, m3]

and m1/status returns OK if < n threads are being used, else ERROR - and the LB will know not to direct requests to it till it does become OK. (It's constantly polling it).

from serving.

eriophora commented on May 21, 2024

Not missing anything! The load balancer would totally work; the only issue
is the expense (theyre priced based on the number [and size!?] of the
requests) and it would be cheaper to just implement it myself--as you say,
I'm just using a round robin. But the nginx suggestion is interesting; Ill
look into it (coming from academia i realize i have very limited experience
in the practicalities of these kinda things). The round robin has nice
properties in terms of expected load value, but it could still potentially
dispatch many requests to the same machine. Hence it would be nice to query
the servers for their queue sizes or number of pending batches, for
instance.

On Saturday, April 30, 2016, Viksit Gaur [email protected] wrote:

I might be missing something but - why not use a simple load balancer such
as nginx/haproxy or the amazon Elastic LB in case you're on AWS, for this?

If it's the same model on all machines, a simple round robin should work,
and you could expose a simple /status endpoint on your server which the
load balancer knows how to read and not direct requests to?

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#60 (comment)

Nick Dufour
Machine Learning Engineer
Neon Labs https://www.neon-lab.com/

from serving.

viksit commented on May 21, 2024

Ah - I would say though, that not knowing how large your requests are, Amazon ELBs and Google Cloud charge $0.025 an hour, which for 30 days is $18, and $0.008 per GB, which for 100GB would be $0.8 :) [And less messing around with nginx config settings]

from serving.

yeputons commented on May 21, 2024

@eriophora Have you looked at the "Serving Inception Model with TensorFlow Serving and Kubernetes" tutorial on TensorFlow? They show how to "pack" model into a Docker container and later run several containers with Kubernetes, which also provides a load balancer out-of-the-box (see here), it's even included in their example setting.

from serving.

Pingze-github commented on May 21, 2024

I use Nginx to implement load banlancing of multiple dockerized Tensorflow-serving.
Because Nginx starts to support gRPC from1.13.10.

I run multiple dockerized Tensorflow-serving servers to serve the same model, and proxy them by a nginx sever. I runned pressure test using jmeter, and then try to restart the containters. The result shows that the clients can have a continuous and uninterrupted service.

But there still one problem. Sometimes when I restart the containers, one request may be failed (only one because my model server deal one request at the same time). My clients report as below:

grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.CANCELLED
	details = "Received http2 header with status: 502"
	debug_error_string = "{"created":"@1543454947.014000000","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":100,"grpc_message":"Received http2 header with status: 502","grpc_status":1,"value":"502"}"

I guess that because when I restarted the docker (within server), It's just dealing a request. And when its process received a stop signal, it closed immediatelly without finishing dealing that request.

In my HTTP web server, I always close the server receiving request immediatelly, but stop the process only after all requests have been dealed. I hope Tensorflow-serving can support this default.

from serving.

Pingze-github commented on May 21, 2024

@eriophora Have you looked at the "Serving Inception Model with TensorFlow Serving and Kubernetes" tutorial on TensorFlow? They show how to "pack" model into a Docker container and later run several containers with Kubernetes, which also provides a load balancer out-of-the-box (see here), it's even included in their example setting.

The url has been removed. Have you read something that tells how to solve the problem below?
when RESTART a dockerized Tensorflow-serving sever, if the server just dealing a request, that request will failed. The client will receive an error:

grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Stream removed"
	debug_error_string = "{"created":"@1543458849.986000000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"Stream removed","grpc_status":2}"

If this problem cannot be solved, service is not stable even using load banlancing.

from serving.

misterpeddy commented on May 21, 2024

I may be missing something here but this is correct, if tf serving receives a sigstop it will be stopped - I'm not aware of any way to get around that. Please feel free to elaborate if I'm misunderstanding.

from serving.

Faux load balancing? about serving HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent