Giter Site home page Giter Site logo

Comments (84)

SamSaffron avatar SamSaffron commented on May 18, 2024 19

Just putting this out here, in case anyone is still facing the problem

https://samsaffron.com/archive/2018/02/02/instrumenting-rails-with-prometheus

from client_ruby.

dmagliola avatar dmagliola commented on May 18, 2024 12

We're happy to report, this is now fixed! 🎉
(Sorry for the delay, we should've updated this a while ago)

We haven't yet launched a 1.0 (coming soon!), but you can start using it with our recent v0.10.0.pre.alpha.2 release.

NOTE: We expect there might still be a couple of small breaking changes in the API between now and the 1.0, but we've been using this code in production for months and it works great, so if you are OK with potentially having to do minor adjustments once we release 1.0, you can already use this.

from client_ruby.

SamSaffron avatar SamSaffron commented on May 18, 2024 8

I started my Discourse extraction here:

https://github.com/discourse/prometheus_exporter

Supports multiple processes with optional custom transport, recovery and chunked encoding. Should be released proper in a few weeks.

Works fine with Passenger, Unicorn and Puma in clustered mode. You can also aggregate sidekiq.

from client_ruby.

dmagliola avatar dmagliola commented on May 18, 2024 5

An update on this issue: We (GoCardless) are currently working on this. In the process, we're also planning to bring the Prometheus Client more in sync with the current best practices as recommended here.

We've opened an issue with an RFC explaining what we're planning to do.
And we have a PR with the first steps.

This PR doesn't solve everything, and specifically it doesn't deal with the multi-process issue, but it's the first step in allowing that to happen, by introducing "pluggable backends" where the metric data is stored, allowing consumers of the library to pick which is best for their needs, and to write their own if necessary.

We appreciate comments and feedback on these proposals and look forward to working with you on this!

from client_ruby.

grobie avatar grobie commented on May 18, 2024 3

Another note: I'll work on a complete rewrite of client_ruby likely to start in the next few weeks. It will address several of the library's architectural short comings and include the option to change the value handling (lock-free for single process programs, multi-threaded, multi-process). client_ruby is the oldest client library in its current state, we've learned a lot in the meantime and similar rewrites have been done in client_golang and client_java.

from client_ruby.

lyda avatar lyda commented on May 18, 2024 3

We've released this which is what we're using for Gitlab. Hopefully it will be of some use for @grobie. It's multiproc mode only so not ideal for the general case but hopefully will be of use for future work.

from client_ruby.

SamSaffron avatar SamSaffron commented on May 18, 2024 2

note... unicorn provides cross process metrics using raindrops ... see:

https://bogomips.org/raindrops/

This uses shared memory and is already built, can be extended to add more metrics and provide more stats as needed.

I am looking at writing an exporter based off raindrops for discourse

from client_ruby.

dmagliola avatar dmagliola commented on May 18, 2024 2

Another update: we've updated PR #95 to (among a lot of other things) add a "backend" that does work with pre-fork servers, sharing data between processes on a scrape.
We've also done plenty of benchmarking and performance improvements to it, and would love the everyone's comments on it!

The new backend we're proposing it's pretty simple. It basically uses files to keep the data, but thanks to the wonders of file system caching, we're not really blocking on disks, and it's actually surprisingly fast. A counter increment with this backend is about 9μs.
For context:

  • the MMap Store that @juliusv was proposing in the multiprocess branch is about 6μs per increment
  • a Hash with a Mutex around is about 4μs
  • and a bare Hash, with nothing around is about 1μs.

We still haven't abandoned the idea of using MMaps in the near future, but we've found some stability issues with that solution, whereas the File-based backend is very stable and reliable, so we're proposing to go with that for the time being.

NOTE: It's a big PR, it was a big refactoring, and there's a lot going on. However, the commits are coherent and have good descriptions, so we recommend reading it commit-by-commit rather than the full diff all at once, which will not be very parseable.

from client_ruby.

SamSaffron avatar SamSaffron commented on May 18, 2024 1

wanted to share some pics so you can see what you get with raindrops:

image

Reqs are very evenly distributed between the 5 boxes (which is to be expected with haproxy) - we run about 8 workers per machine, everything is counted.

We see some queueing when there are spikes, which indicates we may need an extra process or two here

image

from client_ruby.

philandstuff avatar philandstuff commented on May 18, 2024 1

Is this fixed by #95 now?

from client_ruby.

wshihadeh avatar wshihadeh commented on May 18, 2024 1

@dmagliola does the fix also consider deploying the service in a HA mode. in that case, the running process do not have access to the shared memory :(

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

I believe this problem came up with some users of the (experimental) Python client, and some proposed work on the HAProxy exporter.

What you really want to do is separately scrape each worker, and then aggregate it up in the Prometheus server. This gives the most useful/correct results, and also gives you per-process statistics for things like memory and cpu. I'm not familiar with Unicorn or Rack, do they offer anything that'd allow for this?

Shared memory is an option, it has disadvantages though in that it'd be a source of contention and you can't do per-process analysis.

from client_ruby.

juliusv avatar juliusv commented on May 18, 2024

From the Unicorn Design doc:

"Unicorn is a traditional UNIX prefork web server. [...] Load balancing between worker processes is done by the OS kernel. All workers share a common set of listener sockets and does non-blocking accept() on them. The kernel will decide which worker process to give a socket to and workers will sleep if there is nothing to accept()."

Given that, I'm not sure how doable it is to scrape individual workers. Even if it was possible, the benefit of having individual insight and avoiding contention has to be weighed against potentially having an order of magnitude more time series.

I'd expect that on average, the workers on a single machine will behave similarly, except admittedly in pathological situations like this one pointed out in the Unicorn docs: "When your application goes awry, a BOFH can just "kill -9" the runaway worker process without worrying about tearing all clients down, just one".

It also states that "Unicorn deployments are typically only 2-4 processes per-core", which would mean multiplying the number of exposed time series on a typical large machine with e.g. 32 cores by 64x to 128x.

@grobie Did you have any plans wrt. supporting preforked servers already?

from client_ruby.

atombender avatar atombender commented on May 18, 2024

@brian-brazil: It's possible to collect per-process information with shared memory: Just organize the memory by PID. You'll have to reap dead PIDs, though. If you're clever you can avoid locking altogether, by having each process write to its own dedicated segment of the shared memory.

Scraping individual workers, as @juliusv points out, would mostly be useful to the extent that it would capture misbehaving workers that consume too much memory or CPU.

Having run Unicorn in production for years, I would agree that this doesn't happen often enough to justify the overhead of monitoring individual worker. Unicorn's master process will "kill -9" processes that don't respond within a per-process request timeout. In my experience, this tends to cover the pathological cases.

In terms of scaling, One of our clusters has 300 Unicorn workers on each box (anything from 4-25 per app). I wouldn't want to have Prometheus poll that many workers every few seconds.

from client_ruby.

juliusv avatar juliusv commented on May 18, 2024

👍 Sounds like we need a shared-memory solution or similar.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

It's possible to collect per-process information with shared memory: Just organize the memory by PID. You'll have to reap dead PIDs, though. If you're clever you can avoid locking altogether, by having each process write to its own dedicated segment of the shared memory.

You need to be a little careful here, as you can't simply throw away data from old PIDs. You'd need to combine in the counters to ensure they stay monotonically increasing. Organizing on >128 byte boundaries (size of a cache line) should avoid locking issues during collection.

How would we deal with gauges? Both the set and inc/dec styles are problematic. The set style doesn't make sense when there's multiple processes (you really need it per-process), and with the inc/dec pattern to count in progress requests you probably want to reset to 0 for a pid when that pid dies.

I wouldn't want to have Prometheus poll that many workers every few seconds.

60s is a more than sufficient monitoring interval for most purposes. Benchmarking will determine the actual impact.

This seems like it'll get very complicated to implement and use, as most uses of instrumentation would need to take the process model into account to get useful and correct results out the other end.
How do other instrumentation systems solve this problem?

from client_ruby.

atombender avatar atombender commented on May 18, 2024

You need to be a little careful here, as you can't simply throw away data from old PIDs.

How would this differ from the hypothetical non-shared case where each worker exposes a metrics port? If the worker dies and starts again, all its counters will restart at zero. In a hypothetical shared-memory implementation where each worker has a dedicated segment that is reported separately, it would be the same thing, no?

Anyway, the problem is moot because I just realized that Unicorn uniquely identifies workers by a sequential number, which is stable. So if worker 3 dies, the next worker spawned is also given the number 3. That means no reaping is needed, nor is it necessary to any kind of PID mapping, which I was worried about.

I did a little digging into the Unicorn sources and just realized that Unicorn uses a gem called raindrops to accomplish exactly what I have been describing so far. Raindrops uses shared memory to collect and aggregate worker metrics (Unicorn also seems to use Raindrops directly to detect idle workers), but it looks like it only cares about socket metrics, not anything related to HTTP. Worst case, if it's cannot be used as-is, I could look into extracting the relevant code from Raindrops into this gem.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

How would this differ from the hypothetical non-shared case where each worker exposes a metrics port?

Where the data comes from each worker, there's no persistence of data beyond a worker's lifetime and the rate() function works as expected. With shared memory where the client is smart and doing aggregation, you need to not lose increments from dead workers as counters need to be kept monotonic.

Prometheus has the paradigm that instrumentation is dumb, and all the logic is done in the prometheus server. You need to be careful when adding logic to clients that it maintains the same semantics as a dumb client.

So if worker 3 dies, the next worker spawned is also given the number 3.

That makes things easier alright. And as I presume everything is single-threaded, we don't need locking for metric updates.

Thinking on this over the past bit, I think we can make this work but only for Counter and Counter-like metrics (Summary without percentile and Histogram).

from client_ruby.

juliusv avatar juliusv commented on May 18, 2024

@brian-brazil Gauges are indeed a problem. I see four possibilities in principle for them:

  1. Allow the user to configure a merging strategy (averaging, summing, ...) for each gauge. The actual merging work wouldn't need to happen continuously, but only upon scrape.
  2. Make an exception for gauges and distinguish them by a "worker" label.
  3. Use "last write wins". But that's not useful at all for any common gauge use cases (queue lengths, etc.).
  4. Don't support gauges in this kind of environment, expecting that the bulk of relevant metrics for this use case would be counters and summaries anyways.

1 is probably what we'd want if we don't split metrics by worker otherwise.

@atombender If indeed one Unicorn server is always expected to have the same number of stably identified worker processes (at least between complete Unicorn restarts, which would also wipe the shared memory), your approach sounds sensible to me. If I understand the raindrops approach correctly, it uses a single counter for all workers ("counters are shared across all forked children and lock-free"), so there wouldn't even be any counter resets anywhere in that approach if only a single worker is killed?

from client_ruby.

juliusv avatar juliusv commented on May 18, 2024

(if you're only reading by mail - I edited my last comment because Markdown made a 5 out of my 1...)

from client_ruby.

atombender avatar atombender commented on May 18, 2024

Note that I am not suggesting that the client should aggregate anything. I am suggesting that each worker has a slot in the shared memory segment that it is responsible for updating. When a worker gets a /metrics request, it simply returns the contents of all slots, separately labeled by worker ID.

Shouldn't that completely work around the gauge problem? I was initially thinking that the client would aggregate workers, because, as I said earlier, I think per-worker metrics aren't very useful. But if it poses a problem for gauges, I think it's better to "over-report" and do aggregation on the server.

(Pardon my obtuseness, but I'm not too familiar with how Prometheus works yet.)

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

Don't support gauges in this kind of environment, expecting that the bulk of relevant metrics for this use case would be counters and summaries anyways.

This is my expectation. It sounds like entirely an online-serving use case, with no thread pools or the like where you'd usually want gauges. I can see some use cases for gauges (e.g. if you were batching up requests to send on elsewhere), but the counters uses are the primary ones.

from client_ruby.

juliusv avatar juliusv commented on May 18, 2024

@atombender Ah, ok. That'd definitely work and require fewer scrapes, though you'll still end up with a lot of dimensionality in your time series that might not prove to be of much value vs. their cost in transfer, storage, and queries.

@brian-brazil Yep.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

When a worker gets a /metrics request, it simply returns the contents of all slots, separately labeled by worker ID.

That'd work, I thought we were trying to avoid that due to all the additional timeseries.

Shouldn't that completely work around the gauge problem? I was initially thinking that the client would aggregate workers, because, as I said earlier, I think per-worker metrics aren't very useful. But if it poses a problem for gauges, I think it's better to "over-report" and do aggregation on the server.

Reporting everything is what I generally favour, assuming it works out resource wise.

That resolves the counter problem, it doesn't solve the gauge problem as gauge use cases tend to work under the assumption that the gauge is reset when the process dies/starts (think of a gauge that represented in-progress requests). If unicorn can clear them when a process dies, that should handle that.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The other limit, is that callback-like collectors won't work with a shared memory model. This would affect things like runtime and process statistics, that you want to gather at scrape time.

from client_ruby.

grobie avatar grobie commented on May 18, 2024

@grobie Did you have any plans wrt. supporting preforked servers already?

When we had the use cases at SoundCloud, we discussed three options: a) scraping each worker independently b) letting the master process handle metrics (the current dumb lock implementation in the ruby client probably needs some improvement) c) let each worker push metrics to a pushgateway.

I believe we run all our Ruby services on JRuby by now using the java client. Going forward with b) sounds good to me.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

I am very interested in this issue. I am wondering if it is seen with threaded servers (like Puma)? I think they work in a way that memory isn't shared? Depending on where/when things get initialized?

Is there there is a demo app in the repo I'll play around with it today.

Is anyone aware of any other rack middlewares that share global state that we could look to for solutions?

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

If raindrops is the right path to look at for this... They have a sample middleware. Their code isn't on github but here is a clone of that middleware: https://github.com/tmm1/raindrops/blob/master/lib/raindrops/middleware.rb

from client_ruby.

grobie avatar grobie commented on May 18, 2024

Thanks, I'll have a look.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

I have been playing around with raindrops, but can't seem to get the processes writing to the same counters. I'll keep at it. In the meantime I found this: https://github.com/TheClimateCorporation/unicorn-metrics which might show a similar use case.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

For those interested in this I have made a little progress. It seems that when using raindrops the Raindrop counters need to be declared before the workers fork. Otherwise each worker will end up instantiating separate instances of the counters. If you declare them before hand, it actually works as expected.

This makes the prometheus middleware a little inconvenient, as you would need to know all possible endpoints and other counters beforehand. This is the approach that the unicorn-metrics project above takes.

I have began looking at a different approach - where each worker exposes it's own metrics at /_metrics and then there would be a middleware at /metrics that queries the others and combines the results. I currently can't find a way to figure out what the other workers TCP/unix sockets are from inside the worker. There is a Unicorn::HttpServer::LISTENERS constant, but it seems to only contain reference to the current worker.

Hopefully this information can be helpful to someone, I would really love to see prometheus being useful on ruby servers.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The approach I'm taking over in python is to have each process keep a store of it's metrics on a file per process on disk, and then read them in at scrape time.

I can see a challenge in distributing the scrape to workers if a worker is single-threaded. You want scapes to be independent of other processing.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

I implemented backing the metrics with PStores. This seems to have the desired effect albeit causing a slowdown since it writes stats to disk on every request. My branch is here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker I tested it out with apache bench doing 2000 requests with 10 concurrent requests and 4 unicorn workers. After running that the /metrics endpoint reported the correct number of hits, not sure about the other types of stats.

On this branch, with the above setup, I'm getting ~ 350 requests /sec. While on the master branch. I'm getting over 1,700. Now, while this sounds like a big difference, the duration of the request that returns OK/200 is about 13ms.. I'm guessing in a real-world load, with real request times, this difference will be much more negligible.

That being said, there may be better backing stores to use than PStore or better ways to store it. Also, it should probably be an option to back with a hash when not running in a prefork environment.

Let me know what you think.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

That's pretty much the same approach as I'm using in Python, however I'm only letting this behaviour kick in when explicitly asked for as it has a number of limitations including performance (see prometheus/client_python#66).

Scrape-time performance doesn't matter too much. what's really important is that metric increments etc. be fast. Optimally that'd mean something where writes hit the kernel, but not the disk.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

On my branch: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker I have updated it so there are backing "stores". I have included one with the original behavior (Hash) and one with the new behavior (PStore). If you pass the prefork: true argument to the middlewares it will use the PStore backend, otherwise it uses the original. I imagine this could be extended in the future to support other on-disk or even DB (redis?) backends.

Also (I'm not sure why) but the benchmarks have improved a little. The Hash one is giving me 2,100 requests/sec while the PStore one is up to 1,400. Perhaps may just be a system resource difference from before, I need some more scientific benchmark. However at 1,400 requests/sec I doubt that will cause a bottleneck in any real web load.

Currently there are a bunch of broken tests. If this solution seems like something that might be considered for merging I will fixup the tests, update the docs and clean everything up.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

For continuity sake. Discussion for this topic has continued over on the mailing list here: https://groups.google.com/d/topic/prometheus-developers/oI3qPhaRY7I/discussion

from client_ruby.

mortenlj avatar mortenlj commented on May 18, 2024

Hi!

It seems discussion about this issue died out on the mailing list sometime before Christmas last year. Is there any movement towards a solution to this problem?

from client_ruby.

grobie avatar grobie commented on May 18, 2024

I'm not aware of anyone working on a solution for this problem at the moment. Ideas and contributions are very welcome.

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

Does anyone found a way to make it work with Puma?

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

Well, it seems to work with @jeffutter branch.

Here is how I did it, please take a look and tell me what needs to be improved.

in config/puma.rb

workers N

on_worker_boot do |index|
  $puma_worker = index
end

in config.ru

$registry = Prometheus::Client.registry(Prometheus::Client::Stores::PStore)
$registry.counter(:api_enqueued_jobs_total, 'A counter of API enqueued jobs')
....

use Prometheus::Client::Rack::Exporter, prefork: true, registry: $registry

in SomeController.rb

$registry.get(:api_enqueued_jobs_total).increment(..., worker: $puma_worker)

@jeffutter why it is returning JSON by default on '/metrics' endpoint?

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

Here is a quick Benchmark I've done in my laptop:

ApacheBench/2.3
-n 1000 -c 10

Puma
workers 4
threads 16,32

Request
POST to API endpoint that hit database and return JSON.

No Prometheus at all.

Requests per second:    27.46 [#/sec] (mean)
Time per request:       364.217 [ms] (mean)
Time per request:       36.422 [ms] (mean, across all concurrent requests)
Transfer rate:          44.86 [Kbytes/sec] received
                        5.55 kb/s sent
                        50.41 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      11
Processing:   113  362 217.7    311    1582
Waiting:      113  360 217.5    309    1580
Total:        113  362 217.8    311    1582

Percentage of the requests served within a certain time (ms)
  50%    311
  66%    411
  75%    469
  80%    522
  90%    638
  95%    724
  98%    910
  99%   1231
 100%   1582 (longest request)

Taking one measure

Requests per second:    27.11 [#/sec] (mean)
Time per request:       368.899 [ms] (mean)
Time per request:       36.890 [ms] (mean, across all concurrent requests)
Transfer rate:          44.29 [Kbytes/sec] received
                        5.48 kb/s sent
                        49.77 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       7
Processing:   110  366 233.1    282    1318
Waiting:      109  364 232.6    281    1318
Total:        110  366 233.2    283    1318

Percentage of the requests served within a certain time (ms)
  50%    283
  66%    404
  75%    507
  80%    545
  90%    691
  95%    848
  98%   1003
  99%   1111
 100%   1318 (longest request)

Request count by worker

1 => 253
3 => 251
2 => 245
0 => 251

Taking one measure and hitting /metrics path once every 5s

Requests per second:    27.76 [#/sec] (mean)
Time per request:       360.260 [ms] (mean)
Time per request:       36.026 [ms] (mean, across all concurrent requests)
Transfer rate:          45.35 [Kbytes/sec] received
                        5.61 kb/s sent
                        50.96 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       5
Processing:   113  357 218.7    281    1292
Waiting:      113  356 218.3    281    1292
Total:        113  358 218.7    281    1292

Percentage of the requests served within a certain time (ms)
  50%    281
  66%    403
  75%    486
  80%    540
  90%    688
  95%    792
  98%    913
  99%   1001
 100%   1292 (longest request)

Request count by worker

2 => 252
1 => 251
3 => 248
0 => 249

from client_ruby.

grobie avatar grobie commented on May 18, 2024

The JSON format has been fully removed in Prometheus server v1.0.0 and in client_ruby v0.5.0. We should only use the text format.

I'm happy to review pull requests, but as SoundCloud doesn't use Ruby much any more, I won't have time to work on something on my own before December.

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

@zevarito Yeah, sorry my branch uses JSON, it is over a year old, I think back then JSON was supported.

As much as I dig Prometheus, work decided they didn't want to run their own monitoring servers, so we've gone with NewRelic. I would love to see Prometheus' ruby client support multi-threaded and forking servers but I'm afraid I don't have any time to contribute anymore at the moment.

Feel free to take the branches I had and build off of them or go in an entirely different direction.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The python client now has multi-process support as of prometheus/client_python#66

A similar method should work for Ruby.

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

@grobie I've fetched latest code and merged in @jeffutter branch and it seems to work fine.
Besides PStore addition, the code also provides a separation in the Store layer that benefits the addition of new storage mechanisms in the future.

An important thing to notice is that PStore will not work with nested transactions, that could be a problem in some cases, but another storage mechanism can be added to address that.

I will give it a try in staging environment and see how it works. I'll do a PR if all goes smooth.

@brian-brazil do you think -the python way- should be the way to do it for the Ruby lib or can be it just another storage mechanism?

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The Python way should be the way to do it in Ruby, as I expect you'll run into all the same issues I did with other approaches in Python with Ruby too.

36ms is not at all workable for instrumentation. The goal should be sub-microsecond.

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

@brian-brazil I am lost about "36ms is not at all workable for instrumentation" from where it comes?

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The above benchmarks indicate 36ms for an operation.

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

@brian-brazil The Benchmarks are based in an operation that involves POST, DB access and JSON serialization. Unless I am getting it wrong, Benchmark shows 1 not using Prometheus with a .468ms faster response than using Prometheus.

Bn 1) Not Prometheus - Time per request: 36.422ms
Bn 2) Use Prometheus - Time per request: 36.890ms

You could point out that Bn 3 using Prometheus and having hypothetically more load by being hit with /metrics it actually shows even better performance than the rest of the Bn, but the offset is in my opinion short of acceptable since it is less than 1 ms between the three samples. Do you think it is not? Indeed more test should be done.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

.468ms is still way to slow. Instrumentation should cost less than a microsecond.

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

@brian-brazil there are any benchmarking I can see of the Python lib or any other lib?
"Instrumentation should cost less than a microsecond." I guess ab is not the right tool to measure that, should it be measured from code itself or some profiling?

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The Python multi-proc solution is at about 1.2us. This is due to Python's mutexes being surprisingly slow (should be 1-2 orders of magnitude faster).

I usually use a quick microbenchmark of .inc().

from client_ruby.

grobie avatar grobie commented on May 18, 2024

As already indicated in #11, the implementation might benefit from

.468ms is still way to slow. Instrumentation should cost less than a microsecond.

This is pretty much an impossible goal for ruby. A hash access and counter increase alone already takes close to one microsecond. This is without any Mutex involved.

So far we've also traded labels validation for execution speed. This should probably be made opt-in so that one can run with a strict Registry in tests, but use faster registry in production.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

A hash access and counter increase alone already takes close to one microsecond.

This should be measured in nanoseconds. I know Ruby isn't the fastest, but I have difficulty believing it's that slow.

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

@grobie #11 actually is about scraping time, I think what @brian-brazil is concern is about the measuring time. Worth to notice that I am not using Rack::Collector here, just measuring other non-http activity. Also the measure I am taking is just counter increment.

Here is the benchmarking at method level call.

I've used AB as the previous tests and wrapped measure call around Benchmark.bm block and then log.

These are the first 5 and the tail 5 results for 1000 requests.

[#<Benchmark::Tms:0x007f8c03124378 @label="", @real=0.00038335899444064125, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c021ee430 @label="", @real=0.0004944999964209273, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c03177820 @label="", @real=0.0005114129962748848, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0220d1f0 @label="", @real=0.0005079250040580519, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c035e5678 @label="", @real=0.00026892300229519606, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
...
[#<Benchmark::Tms:0x007f8c0663a7d8 @label="", @real=0.0002955899981316179, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0665a128 @label="", @real=0.00022226799774216488, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0686f558 @label="", @real=0.000291318996460177, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c065586d0 @label="", @real=0.00021019799896748737, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c04e8e4e0 @label="", @real=0.00018655300664249808, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]

from client_ruby.

grobie avatar grobie commented on May 18, 2024

@zevarito The scrape example mentioned in #11 was just one example, the issue was about investigating and removing any kind of performance bottleneck.

@brian-brazil I guess there is a reason that writing C-extensions is so popular in Ruby.

require 'benchmark'
h = {}
Benchmark.realtime { 1000.times { c = h[:foo_total] || 0; c += 1; h[:foo_total] = c } } / 1000
# => 5.882129771634937e-07

m = Mutex.new
Benchmark.realtime { 1000.times { m.synchronize { c = h[:foo_total] || 0; c += 1; h[:foo_total] = c } } } / 1000
# => 1.4377119950950147e-06

from client_ruby.

zevarito avatar zevarito commented on May 18, 2024

@grobie right, and I remember the conversation we have here #33 (comment) about Rack::Collector and scrapping time, worth to mention that the cardinality is not a Ruby issue itself.

I wonder if you find acceptable the last Benchmark I've posted.

from client_ruby.

christoph-buente avatar christoph-buente commented on May 18, 2024

Hi, we are experiencing the mentioned behaviour with ruby+rails+unicorn in an environment with more than one worker. As you can see on the screenshots, the workers load does not seem to be distributed evenly. The upper and lower boundary is basically the scraped values from either worker 1 or 2. The drop in the graph is the time of deployment.

screenshot 2017-01-04 09 50 02

I'm actually wondering if someone could check out other libraries that are measuring rack/rails stats as a middleware? For example https://github.com/librato/librato-rack or https://github.com/newrelic/rpm

We are using both and i did not experience these kind of problems there?

Thanks

from client_ruby.

jeffutter avatar jeffutter commented on May 18, 2024

@christoph-buente I poked around a bit in the newrelic gem. I believe neither of these aggregate metrics between processes. Both of them push metrics to a remote server which handles the aggregation. I believe they probably buffer the events, so that it doesn't send one event on every single request, but it still gets around the problem we face here by aggregating the events on the server.

from client_ruby.

christoph-buente avatar christoph-buente commented on May 18, 2024

Thanks @jeffutter for looking into it. Even though it's highly discouraged for long running processes, but i feel the push gateway seems to be a legit way to get around this too.

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

The push gateway is not appropriate for this use case at all. Statsd exporter would be the architecture for that.

from client_ruby.

christoph-buente avatar christoph-buente commented on May 18, 2024

Thanks @brian-brazil, will have a look into that.

from client_ruby.

mcfilib avatar mcfilib commented on May 18, 2024

@SamSaffron did you ever get round to looking into writing that exporter based off raindrops?

from client_ruby.

SamSaffron avatar SamSaffron commented on May 18, 2024

@Filib I did but its running on internal code. Its a tiny amount of code actually:

something along the lines of:

net_stats = Raindrops::Linux::tcp_listener_stats("0.0.0.0:3000")["0.0.0.0:3000"]
if net_stats
puts "web_active:#{net_stats.active}"
puts "web_queued:#{net_stats.queued}"
end

Then we also have monitors that alert us if we are sitting at sustained queued for longer than a minute or so.

from client_ruby.

lyda avatar lyda commented on May 18, 2024

Note that @juliusv has pushed a branch that solves this. I think it needs more work to support all metric types.

from client_ruby.

lyda avatar lyda commented on May 18, 2024

When doing this note that the exchangeable-value-types branch does multiprocess but has some flaws:

  • Not all metric types are supported. Specifically the summary type isn't supported.
  • This means that the Collector for rack isn't supported either.

Be nice to have fixes for that as well.

from client_ruby.

grobie avatar grobie commented on May 18, 2024

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

What doesn't work about Summaries? From a stats standpoint, it should be possible to get something out of them.

from client_ruby.

grobie avatar grobie commented on May 18, 2024

from client_ruby.

juliusv avatar juliusv commented on May 18, 2024

This is the branch where I put up (really hacky and experimental!) multiprocess support. It's in large part a transliteration of the Python client's multiprocess support, with tests hacked up to "pass". It's alpha-usable, but I'm hoping for @grobie's rewrite to do this all properly :) See https://github.com/prometheus/client_ruby/blob/multiprocess/example.rb

from client_ruby.

brian-brazil avatar brian-brazil commented on May 18, 2024

How do you aggregate per-process quantiles?

You can't. But you can report them all.

The _sum and _count work fine in any case.

from client_ruby.

grobie avatar grobie commented on May 18, 2024

from client_ruby.

mrgordon avatar mrgordon commented on May 18, 2024

I'm guessing there is no progress on this which is sad as it basically makes metrics collections useless in any Rails environment with a forking web server.

I'm trying to switch to the commits that @juliusv and @lyda linked but the examples don't show how to use their changes with the Rack collector / exporter.

Is there anything needed other than:

Prometheus::Client::MmapedValue.set_pid(1)
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter

I tried doing that but hit errors where it was trying to call total on a hash at lib/prometheus/client/formats/text.rb:171:in 'histogram'

Full trace:

app error: undefined method `total' for #<Hash:0x0055d849b5d520> (NoMethodError)
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:171:in `histogram'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:150:in `representation'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:33:in `block (2 levels) in marshal'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:32:in `each'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:32:in `block in marshal'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:28:in `each'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:28:in `marshal'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/exporter.rb:69:in `respond_with'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/exporter.rb:30:in `call'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/collector.rb:37:in `block in call'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/collector.rb:77:in `trace'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/collector.rb:37:in `call'
/usr/local/bundle/gems/rack-1.4.7/lib/rack/content_length.rb:14:in `call'
/usr/local/bundle/gems/railties-3.2.22.5/lib/rails/rack/log_tailer.rb:17:in `call'
/usr/local/bundle/gems/unicorn-4.8.2/lib/unicorn/http_server.rb:572:in `process_client'

I reverted some value.get.total calls to value.total and I can get the app to load now but it looks like the metrics are still per fork as if I hit /metrics repeatedly I get different values.

mrgordon@68ca030

I'm guessing due to my changes I also get some metrics back like this:

http_server_request_duration_seconds_count{method="get",path="/metrics"} #<Prometheus::Client::SimpleValue:0x005555afd28278>

I'll look more into that issue but I'm guessing the SimpleValue indicates the mmap stuff isn't working.

from client_ruby.

mrgordon avatar mrgordon commented on May 18, 2024

Quick update: I added prometheus_multiproc_dir=/tmp to the environment as I see the code checks for this even though none of the examples set it. My output from /metrics now shows mmap stuff but the http_server_requests_total still varies depending on which Unicorn thread I talk to.

e.g.

$ curl http://myapp/metrics
http_server_requests_total{code="200",method="get",path="/metrics"} 4.0
...
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="10"} 4.0
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="+Inf"} #<Prometheus::Client::MmapedValue:0x0055ba313dfc70>

a few seconds later

$ curl http://myapp/metrics
http_server_requests_total{code="200",method="get",path="/metrics"} 3.0
...
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="10"} 3.0
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="+Inf"} #<Prometheus::Client::MmapedValue:0x0055ba313da720>

from client_ruby.

lzap avatar lzap commented on May 18, 2024

What would be the best workaround for Rails app running in Passenger? I am considering statsd_exporter and pushgateway. I expect the latter to be easier to integrate with but slower.

from client_ruby.

SamSaffron avatar SamSaffron commented on May 18, 2024

from client_ruby.

kvitajakub avatar kvitajakub commented on May 18, 2024

Hi, is there any progress on this please?

Looks like it should have issue or feature label in addition to the question.

from client_ruby.

lzap avatar lzap commented on May 18, 2024

Thanks. How many metrics do you send for each request in production? I am little bit concerned about that HTTP/JSON transport.

from client_ruby.

SamSaffron avatar SamSaffron commented on May 18, 2024

@lzap will update the readme, I can send about 10k strings over in 200ms over a single request ... if you add a round trip of to_json and JSON.parse its around 600ms for 10k messages.

I persist the HTTP connection for 30 seconds, configurable

from client_ruby.

fajpunk avatar fajpunk commented on May 18, 2024

Hi! I notice the Help wanted tag on this issue, but it's unclear from reading the comments exactly what help is wanted; all I see is a few proposed solutions and alternate libraries. Has a solution been chosen, and if so, is there a branch where work is being done? I'd like to help if I can.

from client_ruby.

dmagliola avatar dmagliola commented on May 18, 2024

@philandstuff In short, yes, but the latest release still doesn't reflect that. There's still a few changes planned before cutting a new release, some of which are breaking changes, so we'd like to get them done before that.
But... Soon :-)

from client_ruby.

dmagliola avatar dmagliola commented on May 18, 2024

@wshihadeh I'm not sure I understand the question fully...
HA mode would be different servers, correct?
In that case, each server would have separate time series in Prometheus anyway, and the aggregation happens when querying Prometheus.

Or are you talking about something different?

from client_ruby.

wshihadeh avatar wshihadeh commented on May 18, 2024

Yes, I am talking about that. To explain it more , suppose that the application is a web server that is deployed in multiple containers or servers and it exposes the metrics endpoint. I don't see why we need two Prometheus tasks for such setup (one is enough that scrape metrics from the load balancer or service IP). In that case, the metrics will not be totally accurate and that is why I think there is a need for using shared storage between the containers or servers something like Redis

from client_ruby.

philandstuff avatar philandstuff commented on May 18, 2024

I don't see why we need two Prometheus tasks for such setup

Prometheus really really wants to scrape individual instances. There are a number of reasons for this but the main one is that this is the overwhelming convention in the Prometheus world. If you want to break this convention you will find yourself fighting the tooling instead of working with it.

You should use some sort of service discovery to ensure that Prometheus automatically detects new instances when they are created. That way, you only need a single scrape_config but Prometheus will still scrape each instance separately.

from client_ruby.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.