Giter Site home page Giter Site logo

Faux load balancing? about serving HOT 7 CLOSED

tensorflow avatar tensorflow commented on May 21, 2024
Faux load balancing?

from serving.

Comments (7)

viksit avatar viksit commented on May 21, 2024

I might be missing something but - why not use a simple load balancer such as nginx/haproxy or the amazon Elastic LB in case you're on AWS, for this?

If it's the same model on all machines, a simple round robin should work, and you could expose a simple /status endpoint on your server which the load balancer knows how to read and not direct requests to? In this case, there would be no predictor layer (since that seems to act as a bottleneck), and instead, simply have,

client <---> nginx/LB <---> [m1, m2, m3]

and m1/status returns OK if < n threads are being used, else ERROR - and the LB will know not to direct requests to it till it does become OK. (It's constantly polling it).

from serving.

eriophora avatar eriophora commented on May 21, 2024

Not missing anything! The load balancer would totally work; the only issue
is the expense (theyre priced based on the number [and size!?] of the
requests) and it would be cheaper to just implement it myself--as you say,
I'm just using a round robin. But the nginx suggestion is interesting; Ill
look into it (coming from academia i realize i have very limited experience
in the practicalities of these kinda things). The round robin has nice
properties in terms of expected load value, but it could still potentially
dispatch many requests to the same machine. Hence it would be nice to query
the servers for their queue sizes or number of pending batches, for
instance.

On Saturday, April 30, 2016, Viksit Gaur [email protected] wrote:

I might be missing something but - why not use a simple load balancer such
as nginx/haproxy or the amazon Elastic LB in case you're on AWS, for this?

If it's the same model on all machines, a simple round robin should work,
and you could expose a simple /status endpoint on your server which the
load balancer knows how to read and not direct requests to?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#60 (comment)

Nick Dufour
Machine Learning Engineer
Neon Labs https://www.neon-lab.com/

from serving.

viksit avatar viksit commented on May 21, 2024

Ah - I would say though, that not knowing how large your requests are, Amazon ELBs and Google Cloud charge $0.025 an hour, which for 30 days is $18, and $0.008 per GB, which for 100GB would be $0.8 :) [And less messing around with nginx config settings]

from serving.

yeputons avatar yeputons commented on May 21, 2024

@eriophora Have you looked at the "Serving Inception Model with TensorFlow Serving and Kubernetes" tutorial on TensorFlow? They show how to "pack" model into a Docker container and later run several containers with Kubernetes, which also provides a load balancer out-of-the-box (see here), it's even included in their example setting.

from serving.

Pingze-github avatar Pingze-github commented on May 21, 2024

I use Nginx to implement load banlancing of multiple dockerized Tensorflow-serving.
Because Nginx starts to support gRPC from1.13.10.

I run multiple dockerized Tensorflow-serving servers to serve the same model, and proxy them by a nginx sever. I runned pressure test using jmeter, and then try to restart the containters. The result shows that the clients can have a continuous and uninterrupted service.

But there still one problem. Sometimes when I restart the containers, one request may be failed (only one because my model server deal one request at the same time). My clients report as below:

grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.CANCELLED
	details = "Received http2 header with status: 502"
	debug_error_string = "{"created":"@1543454947.014000000","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":100,"grpc_message":"Received http2 header with status: 502","grpc_status":1,"value":"502"}"

I guess that because when I restarted the docker (within server), It's just dealing a request. And when its process received a stop signal, it closed immediatelly without finishing dealing that request.

In my HTTP web server, I always close the server receiving request immediatelly, but stop the process only after all requests have been dealed. I hope Tensorflow-serving can support this default.

from serving.

Pingze-github avatar Pingze-github commented on May 21, 2024

@eriophora Have you looked at the "Serving Inception Model with TensorFlow Serving and Kubernetes" tutorial on TensorFlow? They show how to "pack" model into a Docker container and later run several containers with Kubernetes, which also provides a load balancer out-of-the-box (see here), it's even included in their example setting.

The url has been removed. Have you read something that tells how to solve the problem below?
when RESTART a dockerized Tensorflow-serving sever, if the server just dealing a request, that request will failed. The client will receive an error:

grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Stream removed"
	debug_error_string = "{"created":"@1543458849.986000000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"Stream removed","grpc_status":2}"

If this problem cannot be solved, service is not stable even using load banlancing.

from serving.

misterpeddy avatar misterpeddy commented on May 21, 2024

I may be missing something here but this is correct, if tf serving receives a sigstop it will be stopped - I'm not aware of any way to get around that. Please feel free to elaborate if I'm misunderstanding.

from serving.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.