Comments (7)
I might be missing something but - why not use a simple load balancer such as nginx/haproxy or the amazon Elastic LB in case you're on AWS, for this?
If it's the same model on all machines, a simple round robin should work, and you could expose a simple /status endpoint on your server which the load balancer knows how to read and not direct requests to? In this case, there would be no predictor layer (since that seems to act as a bottleneck), and instead, simply have,
client <---> nginx/LB <---> [m1, m2, m3]
and m1/status
returns OK
if < n
threads are being used, else ERROR
- and the LB will know not to direct requests to it till it does become OK. (It's constantly polling it).
from serving.
Not missing anything! The load balancer would totally work; the only issue
is the expense (theyre priced based on the number [and size!?] of the
requests) and it would be cheaper to just implement it myself--as you say,
I'm just using a round robin. But the nginx suggestion is interesting; Ill
look into it (coming from academia i realize i have very limited experience
in the practicalities of these kinda things). The round robin has nice
properties in terms of expected load value, but it could still potentially
dispatch many requests to the same machine. Hence it would be nice to query
the servers for their queue sizes or number of pending batches, for
instance.
On Saturday, April 30, 2016, Viksit Gaur [email protected] wrote:
I might be missing something but - why not use a simple load balancer such
as nginx/haproxy or the amazon Elastic LB in case you're on AWS, for this?If it's the same model on all machines, a simple round robin should work,
and you could expose a simple /status endpoint on your server which the
load balancer knows how to read and not direct requests to?—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#60 (comment)
Nick Dufour
Machine Learning Engineer
Neon Labs https://www.neon-lab.com/
from serving.
Ah - I would say though, that not knowing how large your requests are, Amazon ELBs and Google Cloud charge $0.025 an hour, which for 30 days is $18, and $0.008 per GB, which for 100GB would be $0.8 :) [And less messing around with nginx config settings]
from serving.
@eriophora Have you looked at the "Serving Inception Model with TensorFlow Serving and Kubernetes" tutorial on TensorFlow? They show how to "pack" model into a Docker container and later run several containers with Kubernetes, which also provides a load balancer out-of-the-box (see here), it's even included in their example setting.
from serving.
I use Nginx to implement load banlancing of multiple dockerized Tensorflow-serving.
Because Nginx starts to support gRPC from1.13.10.
I run multiple dockerized Tensorflow-serving servers to serve the same model, and proxy them by a nginx sever. I runned pressure test using jmeter, and then try to restart the containters. The result shows that the clients can have a continuous and uninterrupted service.
But there still one problem. Sometimes when I restart the containers, one request may be failed (only one because my model server deal one request at the same time). My clients report as below:
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.CANCELLED
details = "Received http2 header with status: 502"
debug_error_string = "{"created":"@1543454947.014000000","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":100,"grpc_message":"Received http2 header with status: 502","grpc_status":1,"value":"502"}"
I guess that because when I restarted the docker (within server), It's just dealing a request. And when its process received a stop signal, it closed immediatelly without finishing dealing that request.
In my HTTP web server, I always close the server receiving request immediatelly, but stop the process only after all requests have been dealed. I hope Tensorflow-serving can support this default.
from serving.
@eriophora Have you looked at the "Serving Inception Model with TensorFlow Serving and Kubernetes" tutorial on TensorFlow? They show how to "pack" model into a Docker container and later run several containers with Kubernetes, which also provides a load balancer out-of-the-box (see here), it's even included in their example setting.
The url has been removed. Have you read something that tells how to solve the problem below?
when RESTART a dockerized Tensorflow-serving sever, if the server just dealing a request, that request will failed. The client will receive an error:
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Stream removed"
debug_error_string = "{"created":"@1543458849.986000000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"Stream removed","grpc_status":2}"
If this problem cannot be solved, service is not stable even using load banlancing.
from serving.
I may be missing something here but this is correct, if tf serving receives a sigstop it will be stopped - I'm not aware of any way to get around that. Please feel free to elaborate if I'm misunderstanding.
from serving.
Related Issues (20)
- TensorBoard profile showing high usage of relu HOT 6
- Building tensorflow serving with TCMalloc HOT 1
- Unable to compile prediction_service.proto for Golang HOT 4
- TF Serving batching for Sparse Tensors HOT 6
- TF Serving gets stuck in the polling loop due to a non-existing model provided in config file HOT 3
- Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT HOT 3
- TensorFlow serving seems to have no version attribute HOT 3
- GPU inference in Docker container fails due to missing libdevice directory HOT 4
- CPU Memory occupied by TF Serving even though serving is on GPU HOT 6
- Version 2.15 release? HOT 7
- Mismatch between TensorRT version used in TF 2.14 GPU docker images for tensorflow/serving and tensorflow/tensorflow causes segfault during inference HOT 1
- Critical Vulnerability HOT 3
- Who to contact for security issues HOT 3
- Difference between Metrics emitted by TF Serving HOT 4
- OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 7
- java.lang.RuntimeException: Unexpected code Response{protocol=http/1.1, code=400, message=Bad Request, url=http://localhost:8501/v1/models/myfruit:predict} HOT 6
- CUDA Graphs support for Tensorflow Serving HOT 2
- OP_REQUIRES failed at xla_compile_on_demand_op.cc:290 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 4
- Add health check to Dockerfile HOT 1
- ETA for TensorFlow Runtime Integration?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serving.