Comments (18)
just finished a part of adaptation here.
from asio-grpc.
I ran the cpp_grpc_mt_bench, cpp_asio_grpc_bench and a modified version of cpp_asio_grpc_bench that uses C++20 coroutines with GRPC_SERVER_CPUS=1
on a Windows machine:
Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, Boost 1.77, gRPC 1.37.0, asio-grpc v1.2.0
with the command:
ghz --proto=asio-grpc\example\protos\helloworld.proto --call=helloworld.Greeter.SayHello --cpus 7 --insecure --concurrency=1000 --connections=50 --duration 20s --data-file 100B.txt 127.0.0.1:50051
Boost.Coroutine:
Summary:
Count: 423578
Total: 20.03 s
Slowest: 582.23 ms
Fastest: 0 ns
Average: 26.45 ms
Requests/sec: 21151.68
Response time histogram:
0.000 [635] |
58.223 [407360] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
116.446 [12220] |∎
174.670 [1388] |
232.893 [21] |
291.116 [0] |
349.339 [0] |
407.563 [0] |
465.786 [247] |
524.009 [565] |
582.232 [188] |
Latency distribution:
10 % in 6.40 ms
25 % in 14.00 ms
50 % in 24.00 ms
75 % in 32.45 ms
90 % in 44.49 ms
95 % in 53.04 ms
99 % in 93.99 ms
Status code distribution:
[OK] 422624 responses
[Canceled] 517 responses
[Unavailable] 437 responses
Error distribution:
[517] rpc error: code = Canceled desc = grpc: the client connection is closing
[437] rpc error: code = Unavailable desc = transport is closing
C++20 coroutines:
Summary:
Count: 415305
Total: 20.02 s
Slowest: 1.07 s
Fastest: 0 ns
Average: 27.42 ms
Requests/sec: 20746.18
Response time histogram:
0.000 [885] |
107.035 [411518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
214.069 [1553] |
321.104 [1] |
428.139 [0] |
535.173 [0] |
642.208 [0] |
749.243 [0] |
856.278 [0] |
963.312 [144] |
1070.347 [856] |
Latency distribution:
10 % in 5.99 ms
25 % in 13.00 ms
50 % in 23.31 ms
75 % in 32.23 ms
90 % in 45.08 ms
95 % in 56.09 ms
99 % in 93.16 ms
Status code distribution:
[OK] 414957 responses
[Canceled] 131 responses
[Unavailable] 217 responses
Error distribution:
[131] rpc error: code = Canceled desc = grpc: the client connection is closing
[217] rpc error: code = Unavailable desc = transport is closing
gRPC multi-threaded example
Summary:
Count: 413776
Total: 20.01 s
Slowest: 1.18 s
Fastest: 0 ns
Average: 27.10 ms
Requests/sec: 20673.49
Response time histogram:
0.000 [986] |
118.369 [409658] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
236.738 [1266] |
355.107 [0] |
473.477 [0] |
591.846 [0] |
710.215 [0] |
828.584 [0] |
946.953 [163] |
1065.322 [721] |
1183.692 [116] |
Latency distribution:
10 % in 5.89 ms
25 % in 13.00 ms
50 % in 23.02 ms
75 % in 32.00 ms
90 % in 44.00 ms
95 % in 53.70 ms
99 % in 93.75 ms
Status code distribution:
[OK] 412910 responses
[Canceled] 220 responses
[Unavailable] 646 responses
Error distribution:
[646] rpc error: code = Unavailable desc = transport is closing
[220] rpc error: code = Canceled desc = grpc: the client connection is closing
I would say there is no significant different.
Results on Linux might vary as seen in the README. Slightly disappointing in my opinion to see such a difference between the gRPC multi-threaded example and my library on Linux, but then again, the example is hardly scalable to real world applications.
As for the difference between Boost.Coroutine and C++20 coroutines, I cannot say either. Boost.Asio's built-in memory recycling for C++20 coroutines is definitely active and working in my library. I think that is all I can do. We might just have to wait until compilers are better at optimizing C++20 coroutines. I still recommend to use them in-place of Boost.Coroutines.
I also ran the benchmarks through Intel VTune and cannot see anything immediately wrong, at least nothing seems to indicate that my implementation is causing some noticeable slowdown:
from asio-grpc.
Oh, I left out some information. I can get the comparable result with single thread. But the issue I mentioned is using multi-thread, maybe four with.
Could u also provide a benchmark using multiple threads?
from asio-grpc.
Sure, although the maximum my machine can saturate are 2 threads :)
from asio-grpc.
Anyway, thanks for your work. I’m trying to adapt your work with libunifex recently since it’s a more light-weight solution. 😀
from asio-grpc.
You're welcome. I might as well hack on libunifex a bit myself now :). Doesn't look much different from Boost.Asio, just lacking documentation
from asio-grpc.
wow that is so cool!
from asio-grpc.
I have updated the benchmarks in the README. They actually show better performance with C++20 coroutines compared to Boost.Coroutine on my Linux machine.
Also note that if you are running benchmarks for 4 CPU servers then you really need to ensure that the server is fully exhausted. E.g. on my 12 core machine using 8 cores for the client and 4 for the server I am unable to do so:
name | req/s | avg. latency | 90 % in | 95 % in | 99 % in | avg. cpu | avg. memory |
---|---|---|---|---|---|---|---|
go_grpc | 86840 | 8.74 ms | 16.41 ms | 19.92 ms | 28.71 ms | 247.3% | 28.08 MiB |
cpp_grpc_callback | 85723 | 9.97 ms | 15.16 ms | 17.95 ms | 24.42 ms | 215.32% | 158.87 MiB |
cpp_asio_grpc_cpp20_coroutine | 84986 | 8.52 ms | 16.46 ms | 22.15 ms | 34.92 ms | 230.18% | 66.54 MiB |
rust_tonic_mt | 84591 | 9.04 ms | 17.39 ms | 22.55 ms | 34.38 ms | 281.41% | 18.87 MiB |
cpp_grpc_mt | 83895 | 8.55 ms | 16.86 ms | 23.15 ms | 37.01 ms | 227.57% | 67.24 MiB |
cpp_asio_grpc_boost_coroutine | 83631 | 8.66 ms | 16.96 ms | 22.62 ms | 36.11 ms | 231.3% | 66.75 MiB |
rust_grpcio | 82151 | 9.31 ms | 17.01 ms | 22.90 ms | 36.57 ms | 274.72% | 34.45 MiB |
rust_thruster_mt | 79143 | 9.96 ms | 19.58 ms | 25.58 ms | 37.57 ms | 288.98% | 15.47 MiB |
Notice the avg. cpu
column should be 400% for the server to be exhausted. Since it is not, these results are completely useless.
from asio-grpc.
For the upcoming version of asio-grpc I am planning to:
- Boost-less version that uses only standalone asio (already done on master)
- Initial support for unified executors with libunifex
from asio-grpc.
Thanks for the info. I’d like to reproduce with my machine. By the way, will repeatedly_request improve the performance? What if there’s not enough request calls to match incoming requests?
from asio-grpc.
Yes good question. I couldn't find any information on "how many outstanding request calls" there should be at a time. Actually I just tested and it seems that if there are multiple outstanding calls to RequestXXX
then all of them will handle the incoming RPC simultaneously. So I suppose that having exactly one is correct and that is what repeatedly_request
is doing. E.g. the code in the benchmark could be rewritten to avoid coroutines entirely and only rely on callbacks which might yield better performance but I would say that such code is hardly scalable to real work applications:
struct ProcessRPC
{
using executor_type = agrpc::GrpcContext::executor_type;
agrpc::GrpcContext& grpc_context;
auto get_executor() const noexcept { return grpc_context.get_executor(); }
template <class RPCHandler>
void operator()(RPCHandler&& rpc_handler, bool ok)
{
if (!ok)
{
return;
}
auto args = rpc_handler.args();
auto response = std::allocate_shared<test::v1::Response>(grpc_context.get_allocator());
response->set_integer(21);
auto& response_ref = *response;
agrpc::finish(std::get<2>(args), response_ref, grpc::Status::OK,
asio::bind_executor(this->get_executor(), [rpc_handler = std::move(rpc_handler),
response = std::move(response)](bool) {}));
}
};
agrpc::repeatedly_request(&test::v1::Test::AsyncService::RequestUnary, service, ProcessRPC{grpc_context});
from asio-grpc.
Actually, I don't know whether it's necessary to align with theses lines to handle async requests.
} else if (status_ == PROCESS) {
// Spawn a new CallData instance to serve new clients while we process
// the one for this CallData. The instance will deallocate itself as
// part of its FINISH state.
new CallData(service_, cq_);
// The actual processing.
...
That means, the coroutine version may be something like this:
awaitable<void> handle_rpc(agrpc::GrpcContext& grpc_context,
helloworld::Greeter::AsyncService& service) {
auto executor = co_await this_coro::executor;
auto context = std::allocate_shared<UnaryRPCContext>(
grpc_context.get_allocator());
bool request_ok{true};
request_ok = co_await agrpc::request(
&helloworld::Greeter::AsyncService::RequestSayHello, service,
context->server_context, context->request, context->writer);
if (!request_ok) {
co_return;
}
// This line
co_spawn(executor, handle_rpc(grpc_context, service), detached);
helloworld::HelloReply response;
response.set_message(context->request.name());
auto &writer = context->writer;
co_await agrpc::finish(writer, response, grpc::Status::OK);
}
boost::asio::co_spawn(
grpc_context,
[&]() -> boost::asio::awaitable<void> {
while (true) {
co_await handle_rpc(grpc_context, service);
}
},
boost::asio::detached);
Even though the actuall processing time is long, there's still enough calls to RequestXXX
with seperate coroutine to match the incoming requests.
from asio-grpc.
I reruned the benchmark with my machine and the results are the consistent with yours.
However, it's strange that, after changing this line to co_await agrpc::finish(writer, response, grpc::Status::OK);
, the performance is hurt significantly. I think it's also related to this #3 (comment).
from asio-grpc.
In my experiment, after switch to use co_spawn immediately after receiving the request, the co_await agrpc::finish
version can get comparable benchmark results.
from asio-grpc.
Seems legitimate to me. I mean that is exactly what repeatedly_request
helps with: To ensure that the next call to Request
is being made immediately, even before handling the particular request. If you look at https://github.com/Tradias/asio-grpc#snippet-repeatedly-request-spawner you will see an example that does exactly what your change is doing, aka. spawn another coroutine to handle the Finish
while already having made another call to Request
in the background.
The performance with unifex might be slightly better since they can avoid the extra dynamic memory allocation for the OperationState
and store it in the coroutine frame directly. Although if they do not have memory recycling for the task
itself like Boost.Asio does for their awaitable
it might even out.
from asio-grpc.
I am very glad for your input and your efforts on adapting asio-grpc to libunifex. I think we should combine our efforts. I am open to pull requests, issues and ideas in general. Currently thinking about how to design an API for the Sender
concept that works with different "backend"s like Asio's set_value
and libunifex's set_value
.
There is a CONTRIBUTING guideline that should get you started, as well as a CMakePresets.json
that your IDE might find helpful.
from asio-grpc.
Actually I have been following the recent proposals of how executor/network are landed in C++ standard. I did some efforts on adapting to libunifex for a more lightweight solution, since asio is not only related to executors. (maybe because I'm not so familiar with asio😀)
Now that you have already extended asio-grpc with libunifex support, maybe combine our efforts is a good way, and I am also willing to do some contribution for an easier-to-user async api. I will take a look up the recent updates and think about it.
For the rest, since uou mentioned before that we need some codegen to make things easier for users, my first though is to add some plugin like this to provide some dummy codes. Then I'd like to refer to some C# grpc api for the designs.
from asio-grpc.
Close as the issue is addressed now. Hope we can have further discussion in the futute.
from asio-grpc.
Related Issues (20)
- Provide operator bool to check the validity of ServerPRCPtr HOT 1
- Question: add sender after run HOT 1
- simple program stuck HOT 2
- Consult some questions HOT 9
- Asio-gRPC seems to have TSAN warnings HOT 8
- How to detect client closed connection HOT 6
- Can I call ServerBuilder::BuildAndStart() after GrpcContext::run() HOT 2
- Questions on how to switch from an GrpcContext to io_context and back HOT 6
- How to shutdown grpc clients HOT 1
- Clarification Needed on Thread Context Switch in writer() Function (example streaming-server.cpp) HOT 8
- assertion failed: !started_ HOT 2
- Can I use asio-grpc inside an existing boost::asio application? HOT 7
- The agrpc::GrpcContext hangs forever HOT 13
- an upgrade from 2.5.1 to 2.9.1 leads to build errors HOT 6
- build fails with latest boost 1.84 HOT 4
- [Question]: Slowly receiveing client in long-lived streaming HOT 6
- InvokeHandler conflicts with Objective-C/C++ defines HOT 1
- Need some basic help! HOT 9
- How to make grPC Client to Establish connection based on IP address? HOT 2
- Update conan package to 3.0.0 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asio-grpc.