Comments (84)
Just putting this out here, in case anyone is still facing the problem
https://samsaffron.com/archive/2018/02/02/instrumenting-rails-with-prometheus
from client_ruby.
We're happy to report, this is now fixed! 🎉
(Sorry for the delay, we should've updated this a while ago)
We haven't yet launched a 1.0
(coming soon!), but you can start using it with our recent v0.10.0.pre.alpha.2
release.
NOTE: We expect there might still be a couple of small breaking changes in the API between now and the 1.0
, but we've been using this code in production for months and it works great, so if you are OK with potentially having to do minor adjustments once we release 1.0
, you can already use this.
from client_ruby.
I started my Discourse extraction here:
https://github.com/discourse/prometheus_exporter
Supports multiple processes with optional custom transport, recovery and chunked encoding. Should be released proper in a few weeks.
Works fine with Passenger, Unicorn and Puma in clustered mode. You can also aggregate sidekiq.
from client_ruby.
An update on this issue: We (GoCardless) are currently working on this. In the process, we're also planning to bring the Prometheus Client more in sync with the current best practices as recommended here.
We've opened an issue with an RFC explaining what we're planning to do.
And we have a PR with the first steps.
This PR doesn't solve everything, and specifically it doesn't deal with the multi-process issue, but it's the first step in allowing that to happen, by introducing "pluggable backends" where the metric data is stored, allowing consumers of the library to pick which is best for their needs, and to write their own if necessary.
We appreciate comments and feedback on these proposals and look forward to working with you on this!
from client_ruby.
Another note: I'll work on a complete rewrite of client_ruby likely to start in the next few weeks. It will address several of the library's architectural short comings and include the option to change the value handling (lock-free for single process programs, multi-threaded, multi-process). client_ruby is the oldest client library in its current state, we've learned a lot in the meantime and similar rewrites have been done in client_golang and client_java.
from client_ruby.
We've released this which is what we're using for Gitlab. Hopefully it will be of some use for @grobie. It's multiproc mode only so not ideal for the general case but hopefully will be of use for future work.
from client_ruby.
note... unicorn provides cross process metrics using raindrops ... see:
https://bogomips.org/raindrops/
This uses shared memory and is already built, can be extended to add more metrics and provide more stats as needed.
I am looking at writing an exporter based off raindrops for discourse
from client_ruby.
Another update: we've updated PR #95 to (among a lot of other things) add a "backend" that does work with pre-fork servers, sharing data between processes on a scrape.
We've also done plenty of benchmarking and performance improvements to it, and would love the everyone's comments on it!
The new backend we're proposing it's pretty simple. It basically uses files to keep the data, but thanks to the wonders of file system caching, we're not really blocking on disks, and it's actually surprisingly fast. A counter increment with this backend is about 9μs.
For context:
- the MMap Store that @juliusv was proposing in the
multiprocess
branch is about 6μs per increment - a Hash with a Mutex around is about 4μs
- and a bare Hash, with nothing around is about 1μs.
We still haven't abandoned the idea of using MMaps in the near future, but we've found some stability issues with that solution, whereas the File-based backend is very stable and reliable, so we're proposing to go with that for the time being.
NOTE: It's a big PR, it was a big refactoring, and there's a lot going on. However, the commits are coherent and have good descriptions, so we recommend reading it commit-by-commit rather than the full diff all at once, which will not be very parseable.
from client_ruby.
wanted to share some pics so you can see what you get with raindrops:
Reqs are very evenly distributed between the 5 boxes (which is to be expected with haproxy) - we run about 8 workers per machine, everything is counted.
We see some queueing when there are spikes, which indicates we may need an extra process or two here
from client_ruby.
Is this fixed by #95 now?
from client_ruby.
@dmagliola does the fix also consider deploying the service in a HA mode. in that case, the running process do not have access to the shared memory :(
from client_ruby.
I believe this problem came up with some users of the (experimental) Python client, and some proposed work on the HAProxy exporter.
What you really want to do is separately scrape each worker, and then aggregate it up in the Prometheus server. This gives the most useful/correct results, and also gives you per-process statistics for things like memory and cpu. I'm not familiar with Unicorn or Rack, do they offer anything that'd allow for this?
Shared memory is an option, it has disadvantages though in that it'd be a source of contention and you can't do per-process analysis.
from client_ruby.
From the Unicorn Design doc:
"Unicorn is a traditional UNIX prefork web server. [...] Load balancing between worker processes is done by the OS kernel. All workers share a common set of listener sockets and does non-blocking accept() on them. The kernel will decide which worker process to give a socket to and workers will sleep if there is nothing to accept()."
Given that, I'm not sure how doable it is to scrape individual workers. Even if it was possible, the benefit of having individual insight and avoiding contention has to be weighed against potentially having an order of magnitude more time series.
I'd expect that on average, the workers on a single machine will behave similarly, except admittedly in pathological situations like this one pointed out in the Unicorn docs: "When your application goes awry, a BOFH can just "kill -9" the runaway worker process without worrying about tearing all clients down, just one".
It also states that "Unicorn deployments are typically only 2-4 processes per-core", which would mean multiplying the number of exposed time series on a typical large machine with e.g. 32 cores by 64x to 128x.
@grobie Did you have any plans wrt. supporting preforked servers already?
from client_ruby.
@brian-brazil: It's possible to collect per-process information with shared memory: Just organize the memory by PID. You'll have to reap dead PIDs, though. If you're clever you can avoid locking altogether, by having each process write to its own dedicated segment of the shared memory.
Scraping individual workers, as @juliusv points out, would mostly be useful to the extent that it would capture misbehaving workers that consume too much memory or CPU.
Having run Unicorn in production for years, I would agree that this doesn't happen often enough to justify the overhead of monitoring individual worker. Unicorn's master process will "kill -9" processes that don't respond within a per-process request timeout. In my experience, this tends to cover the pathological cases.
In terms of scaling, One of our clusters has 300 Unicorn workers on each box (anything from 4-25 per app). I wouldn't want to have Prometheus poll that many workers every few seconds.
from client_ruby.
👍 Sounds like we need a shared-memory solution or similar.
from client_ruby.
It's possible to collect per-process information with shared memory: Just organize the memory by PID. You'll have to reap dead PIDs, though. If you're clever you can avoid locking altogether, by having each process write to its own dedicated segment of the shared memory.
You need to be a little careful here, as you can't simply throw away data from old PIDs. You'd need to combine in the counters to ensure they stay monotonically increasing. Organizing on >128 byte boundaries (size of a cache line) should avoid locking issues during collection.
How would we deal with gauges? Both the set and inc/dec styles are problematic. The set style doesn't make sense when there's multiple processes (you really need it per-process), and with the inc/dec pattern to count in progress requests you probably want to reset to 0 for a pid when that pid dies.
I wouldn't want to have Prometheus poll that many workers every few seconds.
60s is a more than sufficient monitoring interval for most purposes. Benchmarking will determine the actual impact.
This seems like it'll get very complicated to implement and use, as most uses of instrumentation would need to take the process model into account to get useful and correct results out the other end.
How do other instrumentation systems solve this problem?
from client_ruby.
You need to be a little careful here, as you can't simply throw away data from old PIDs.
How would this differ from the hypothetical non-shared case where each worker exposes a metrics port? If the worker dies and starts again, all its counters will restart at zero. In a hypothetical shared-memory implementation where each worker has a dedicated segment that is reported separately, it would be the same thing, no?
Anyway, the problem is moot because I just realized that Unicorn uniquely identifies workers by a sequential number, which is stable. So if worker 3 dies, the next worker spawned is also given the number 3. That means no reaping is needed, nor is it necessary to any kind of PID mapping, which I was worried about.
I did a little digging into the Unicorn sources and just realized that Unicorn uses a gem called raindrops
to accomplish exactly what I have been describing so far. Raindrops uses shared memory to collect and aggregate worker metrics (Unicorn also seems to use Raindrops directly to detect idle workers), but it looks like it only cares about socket metrics, not anything related to HTTP. Worst case, if it's cannot be used as-is, I could look into extracting the relevant code from Raindrops into this gem.
from client_ruby.
How would this differ from the hypothetical non-shared case where each worker exposes a metrics port?
Where the data comes from each worker, there's no persistence of data beyond a worker's lifetime and the rate()
function works as expected. With shared memory where the client is smart and doing aggregation, you need to not lose increments from dead workers as counters need to be kept monotonic.
Prometheus has the paradigm that instrumentation is dumb, and all the logic is done in the prometheus server. You need to be careful when adding logic to clients that it maintains the same semantics as a dumb client.
So if worker 3 dies, the next worker spawned is also given the number 3.
That makes things easier alright. And as I presume everything is single-threaded, we don't need locking for metric updates.
Thinking on this over the past bit, I think we can make this work but only for Counter and Counter-like metrics (Summary without percentile and Histogram).
from client_ruby.
@brian-brazil Gauges are indeed a problem. I see four possibilities in principle for them:
- Allow the user to configure a merging strategy (averaging, summing, ...) for each gauge. The actual merging work wouldn't need to happen continuously, but only upon scrape.
- Make an exception for gauges and distinguish them by a "worker" label.
- Use "last write wins". But that's not useful at all for any common gauge use cases (queue lengths, etc.).
- Don't support gauges in this kind of environment, expecting that the bulk of relevant metrics for this use case would be counters and summaries anyways.
1 is probably what we'd want if we don't split metrics by worker otherwise.
@atombender If indeed one Unicorn server is always expected to have the same number of stably identified worker processes (at least between complete Unicorn restarts, which would also wipe the shared memory), your approach sounds sensible to me. If I understand the raindrops approach correctly, it uses a single counter for all workers ("counters are shared across all forked children and lock-free"), so there wouldn't even be any counter resets anywhere in that approach if only a single worker is killed?
from client_ruby.
(if you're only reading by mail - I edited my last comment because Markdown made a 5 out of my 1...)
from client_ruby.
Note that I am not suggesting that the client should aggregate anything. I am suggesting that each worker has a slot in the shared memory segment that it is responsible for updating. When a worker gets a /metrics
request, it simply returns the contents of all slots, separately labeled by worker ID.
Shouldn't that completely work around the gauge problem? I was initially thinking that the client would aggregate workers, because, as I said earlier, I think per-worker metrics aren't very useful. But if it poses a problem for gauges, I think it's better to "over-report" and do aggregation on the server.
(Pardon my obtuseness, but I'm not too familiar with how Prometheus works yet.)
from client_ruby.
Don't support gauges in this kind of environment, expecting that the bulk of relevant metrics for this use case would be counters and summaries anyways.
This is my expectation. It sounds like entirely an online-serving use case, with no thread pools or the like where you'd usually want gauges. I can see some use cases for gauges (e.g. if you were batching up requests to send on elsewhere), but the counters uses are the primary ones.
from client_ruby.
@atombender Ah, ok. That'd definitely work and require fewer scrapes, though you'll still end up with a lot of dimensionality in your time series that might not prove to be of much value vs. their cost in transfer, storage, and queries.
@brian-brazil Yep.
from client_ruby.
When a worker gets a /metrics request, it simply returns the contents of all slots, separately labeled by worker ID.
That'd work, I thought we were trying to avoid that due to all the additional timeseries.
Shouldn't that completely work around the gauge problem? I was initially thinking that the client would aggregate workers, because, as I said earlier, I think per-worker metrics aren't very useful. But if it poses a problem for gauges, I think it's better to "over-report" and do aggregation on the server.
Reporting everything is what I generally favour, assuming it works out resource wise.
That resolves the counter problem, it doesn't solve the gauge problem as gauge use cases tend to work under the assumption that the gauge is reset when the process dies/starts (think of a gauge that represented in-progress requests). If unicorn can clear them when a process dies, that should handle that.
from client_ruby.
The other limit, is that callback-like collectors won't work with a shared memory model. This would affect things like runtime and process statistics, that you want to gather at scrape time.
from client_ruby.
@grobie Did you have any plans wrt. supporting preforked servers already?
When we had the use cases at SoundCloud, we discussed three options: a) scraping each worker independently b) letting the master process handle metrics (the current dumb lock implementation in the ruby client probably needs some improvement) c) let each worker push metrics to a pushgateway.
I believe we run all our Ruby services on JRuby by now using the java client. Going forward with b) sounds good to me.
from client_ruby.
I am very interested in this issue. I am wondering if it is seen with threaded servers (like Puma)? I think they work in a way that memory isn't shared? Depending on where/when things get initialized?
Is there there is a demo app in the repo I'll play around with it today.
Is anyone aware of any other rack middlewares that share global state that we could look to for solutions?
from client_ruby.
If raindrops is the right path to look at for this... They have a sample middleware. Their code isn't on github but here is a clone of that middleware: https://github.com/tmm1/raindrops/blob/master/lib/raindrops/middleware.rb
from client_ruby.
Thanks, I'll have a look.
from client_ruby.
I have been playing around with raindrops, but can't seem to get the processes writing to the same counters. I'll keep at it. In the meantime I found this: https://github.com/TheClimateCorporation/unicorn-metrics which might show a similar use case.
from client_ruby.
For those interested in this I have made a little progress. It seems that when using raindrops the Raindrop counters need to be declared before the workers fork. Otherwise each worker will end up instantiating separate instances of the counters. If you declare them before hand, it actually works as expected.
This makes the prometheus middleware a little inconvenient, as you would need to know all possible endpoints and other counters beforehand. This is the approach that the unicorn-metrics project above takes.
I have began looking at a different approach - where each worker exposes it's own metrics at /_metrics and then there would be a middleware at /metrics that queries the others and combines the results. I currently can't find a way to figure out what the other workers TCP/unix sockets are from inside the worker. There is a Unicorn::HttpServer::LISTENERS
constant, but it seems to only contain reference to the current worker.
Hopefully this information can be helpful to someone, I would really love to see prometheus being useful on ruby servers.
from client_ruby.
The approach I'm taking over in python is to have each process keep a store of it's metrics on a file per process on disk, and then read them in at scrape time.
I can see a challenge in distributing the scrape to workers if a worker is single-threaded. You want scapes to be independent of other processing.
from client_ruby.
I implemented backing the metrics with PStores. This seems to have the desired effect albeit causing a slowdown since it writes stats to disk on every request. My branch is here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker I tested it out with apache bench doing 2000 requests with 10 concurrent requests and 4 unicorn workers. After running that the /metrics endpoint reported the correct number of hits, not sure about the other types of stats.
On this branch, with the above setup, I'm getting ~ 350 requests /sec. While on the master branch. I'm getting over 1,700. Now, while this sounds like a big difference, the duration of the request that returns OK/200 is about 13ms.. I'm guessing in a real-world load, with real request times, this difference will be much more negligible.
That being said, there may be better backing stores to use than PStore or better ways to store it. Also, it should probably be an option to back with a hash when not running in a prefork environment.
Let me know what you think.
from client_ruby.
That's pretty much the same approach as I'm using in Python, however I'm only letting this behaviour kick in when explicitly asked for as it has a number of limitations including performance (see prometheus/client_python#66).
Scrape-time performance doesn't matter too much. what's really important is that metric increments etc. be fast. Optimally that'd mean something where writes hit the kernel, but not the disk.
from client_ruby.
On my branch: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker I have updated it so there are backing "stores". I have included one with the original behavior (Hash) and one with the new behavior (PStore). If you pass the prefork: true
argument to the middlewares it will use the PStore backend, otherwise it uses the original. I imagine this could be extended in the future to support other on-disk or even DB (redis?) backends.
Also (I'm not sure why) but the benchmarks have improved a little. The Hash one is giving me 2,100 requests/sec while the PStore one is up to 1,400. Perhaps may just be a system resource difference from before, I need some more scientific benchmark. However at 1,400 requests/sec I doubt that will cause a bottleneck in any real web load.
Currently there are a bunch of broken tests. If this solution seems like something that might be considered for merging I will fixup the tests, update the docs and clean everything up.
from client_ruby.
For continuity sake. Discussion for this topic has continued over on the mailing list here: https://groups.google.com/d/topic/prometheus-developers/oI3qPhaRY7I/discussion
from client_ruby.
Hi!
It seems discussion about this issue died out on the mailing list sometime before Christmas last year. Is there any movement towards a solution to this problem?
from client_ruby.
I'm not aware of anyone working on a solution for this problem at the moment. Ideas and contributions are very welcome.
from client_ruby.
Does anyone found a way to make it work with Puma?
from client_ruby.
Well, it seems to work with @jeffutter branch.
Here is how I did it, please take a look and tell me what needs to be improved.
in config/puma.rb
workers N
on_worker_boot do |index|
$puma_worker = index
end
in config.ru
$registry = Prometheus::Client.registry(Prometheus::Client::Stores::PStore)
$registry.counter(:api_enqueued_jobs_total, 'A counter of API enqueued jobs')
....
use Prometheus::Client::Rack::Exporter, prefork: true, registry: $registry
in SomeController.rb
$registry.get(:api_enqueued_jobs_total).increment(..., worker: $puma_worker)
@jeffutter why it is returning JSON by default on '/metrics' endpoint?
from client_ruby.
Here is a quick Benchmark I've done in my laptop:
ApacheBench/2.3
-n 1000 -c 10
Puma
workers 4
threads 16,32
Request
POST to API endpoint that hit database and return JSON.
No Prometheus at all.
Requests per second: 27.46 [#/sec] (mean)
Time per request: 364.217 [ms] (mean)
Time per request: 36.422 [ms] (mean, across all concurrent requests)
Transfer rate: 44.86 [Kbytes/sec] received
5.55 kb/s sent
50.41 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.6 0 11
Processing: 113 362 217.7 311 1582
Waiting: 113 360 217.5 309 1580
Total: 113 362 217.8 311 1582
Percentage of the requests served within a certain time (ms)
50% 311
66% 411
75% 469
80% 522
90% 638
95% 724
98% 910
99% 1231
100% 1582 (longest request)
Taking one measure
Requests per second: 27.11 [#/sec] (mean)
Time per request: 368.899 [ms] (mean)
Time per request: 36.890 [ms] (mean, across all concurrent requests)
Transfer rate: 44.29 [Kbytes/sec] received
5.48 kb/s sent
49.77 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 7
Processing: 110 366 233.1 282 1318
Waiting: 109 364 232.6 281 1318
Total: 110 366 233.2 283 1318
Percentage of the requests served within a certain time (ms)
50% 283
66% 404
75% 507
80% 545
90% 691
95% 848
98% 1003
99% 1111
100% 1318 (longest request)
Request count by worker
1 => 253
3 => 251
2 => 245
0 => 251
Taking one measure and hitting /metrics
path once every 5s
Requests per second: 27.76 [#/sec] (mean)
Time per request: 360.260 [ms] (mean)
Time per request: 36.026 [ms] (mean, across all concurrent requests)
Transfer rate: 45.35 [Kbytes/sec] received
5.61 kb/s sent
50.96 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 5
Processing: 113 357 218.7 281 1292
Waiting: 113 356 218.3 281 1292
Total: 113 358 218.7 281 1292
Percentage of the requests served within a certain time (ms)
50% 281
66% 403
75% 486
80% 540
90% 688
95% 792
98% 913
99% 1001
100% 1292 (longest request)
Request count by worker
2 => 252
1 => 251
3 => 248
0 => 249
from client_ruby.
The JSON format has been fully removed in Prometheus server v1.0.0 and in client_ruby v0.5.0. We should only use the text format.
I'm happy to review pull requests, but as SoundCloud doesn't use Ruby much any more, I won't have time to work on something on my own before December.
from client_ruby.
@zevarito Yeah, sorry my branch uses JSON, it is over a year old, I think back then JSON was supported.
As much as I dig Prometheus, work decided they didn't want to run their own monitoring servers, so we've gone with NewRelic. I would love to see Prometheus' ruby client support multi-threaded and forking servers but I'm afraid I don't have any time to contribute anymore at the moment.
Feel free to take the branches I had and build off of them or go in an entirely different direction.
from client_ruby.
The python client now has multi-process support as of prometheus/client_python#66
A similar method should work for Ruby.
from client_ruby.
@grobie I've fetched latest code and merged in @jeffutter branch and it seems to work fine.
Besides PStore addition, the code also provides a separation in the Store layer that benefits the addition of new storage mechanisms in the future.
An important thing to notice is that PStore will not work with nested transactions, that could be a problem in some cases, but another storage mechanism can be added to address that.
I will give it a try in staging environment and see how it works. I'll do a PR if all goes smooth.
@brian-brazil do you think -the python way- should be the way to do it for the Ruby lib or can be it just another storage mechanism?
from client_ruby.
The Python way should be the way to do it in Ruby, as I expect you'll run into all the same issues I did with other approaches in Python with Ruby too.
36ms is not at all workable for instrumentation. The goal should be sub-microsecond.
from client_ruby.
@brian-brazil I am lost about "36ms is not at all workable for instrumentation" from where it comes?
from client_ruby.
The above benchmarks indicate 36ms for an operation.
from client_ruby.
@brian-brazil The Benchmarks are based in an operation that involves POST, DB access and JSON serialization. Unless I am getting it wrong, Benchmark shows 1 not using Prometheus with a .468ms
faster response than using Prometheus.
Bn 1) Not Prometheus - Time per request: 36.422ms
Bn 2) Use Prometheus - Time per request: 36.890ms
You could point out that Bn 3 using Prometheus and having hypothetically more load by being hit with /metrics it actually shows even better performance than the rest of the Bn, but the offset is in my opinion short of acceptable since it is less than 1 ms between the three samples. Do you think it is not? Indeed more test should be done.
from client_ruby.
.468ms is still way to slow. Instrumentation should cost less than a microsecond.
from client_ruby.
@brian-brazil there are any benchmarking I can see of the Python lib or any other lib?
"Instrumentation should cost less than a microsecond." I guess ab
is not the right tool to measure that, should it be measured from code itself or some profiling?
from client_ruby.
The Python multi-proc solution is at about 1.2us. This is due to Python's mutexes being surprisingly slow (should be 1-2 orders of magnitude faster).
I usually use a quick microbenchmark of .inc()
.
from client_ruby.
As already indicated in #11, the implementation might benefit from
.468ms is still way to slow. Instrumentation should cost less than a microsecond.
This is pretty much an impossible goal for ruby. A hash access and counter increase alone already takes close to one microsecond. This is without any Mutex involved.
So far we've also traded labels validation for execution speed. This should probably be made opt-in so that one can run with a strict Registry in tests, but use faster registry in production.
from client_ruby.
A hash access and counter increase alone already takes close to one microsecond.
This should be measured in nanoseconds. I know Ruby isn't the fastest, but I have difficulty believing it's that slow.
from client_ruby.
@grobie #11 actually is about scraping time, I think what @brian-brazil is concern is about the measuring time. Worth to notice that I am not using Rack::Collector here, just measuring other non-http activity. Also the measure I am taking is just counter increment.
Here is the benchmarking at method level call.
I've used AB as the previous tests and wrapped measure call around Benchmark.bm block and then log.
These are the first 5 and the tail 5 results for 1000 requests.
[#<Benchmark::Tms:0x007f8c03124378 @label="", @real=0.00038335899444064125, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c021ee430 @label="", @real=0.0004944999964209273, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c03177820 @label="", @real=0.0005114129962748848, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0220d1f0 @label="", @real=0.0005079250040580519, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c035e5678 @label="", @real=0.00026892300229519606, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
...
[#<Benchmark::Tms:0x007f8c0663a7d8 @label="", @real=0.0002955899981316179, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0665a128 @label="", @real=0.00022226799774216488, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0686f558 @label="", @real=0.000291318996460177, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c065586d0 @label="", @real=0.00021019799896748737, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c04e8e4e0 @label="", @real=0.00018655300664249808, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
from client_ruby.
@zevarito The scrape example mentioned in #11 was just one example, the issue was about investigating and removing any kind of performance bottleneck.
@brian-brazil I guess there is a reason that writing C-extensions is so popular in Ruby.
require 'benchmark'
h = {}
Benchmark.realtime { 1000.times { c = h[:foo_total] || 0; c += 1; h[:foo_total] = c } } / 1000
# => 5.882129771634937e-07
m = Mutex.new
Benchmark.realtime { 1000.times { m.synchronize { c = h[:foo_total] || 0; c += 1; h[:foo_total] = c } } } / 1000
# => 1.4377119950950147e-06
from client_ruby.
@grobie right, and I remember the conversation we have here #33 (comment) about Rack::Collector and scrapping time, worth to mention that the cardinality is not a Ruby issue itself.
I wonder if you find acceptable the last Benchmark I've posted.
from client_ruby.
Hi, we are experiencing the mentioned behaviour with ruby+rails+unicorn in an environment with more than one worker. As you can see on the screenshots, the workers load does not seem to be distributed evenly. The upper and lower boundary is basically the scraped values from either worker 1 or 2. The drop in the graph is the time of deployment.
I'm actually wondering if someone could check out other libraries that are measuring rack/rails stats as a middleware? For example https://github.com/librato/librato-rack or https://github.com/newrelic/rpm
We are using both and i did not experience these kind of problems there?
Thanks
from client_ruby.
@christoph-buente I poked around a bit in the newrelic gem. I believe neither of these aggregate metrics between processes. Both of them push metrics to a remote server which handles the aggregation. I believe they probably buffer the events, so that it doesn't send one event on every single request, but it still gets around the problem we face here by aggregating the events on the server.
from client_ruby.
Thanks @jeffutter for looking into it. Even though it's highly discouraged for long running processes, but i feel the push gateway seems to be a legit way to get around this too.
from client_ruby.
The push gateway is not appropriate for this use case at all. Statsd exporter would be the architecture for that.
from client_ruby.
Thanks @brian-brazil, will have a look into that.
from client_ruby.
@SamSaffron did you ever get round to looking into writing that exporter based off raindrops?
from client_ruby.
@Filib I did but its running on internal code. Its a tiny amount of code actually:
something along the lines of:
net_stats = Raindrops::Linux::tcp_listener_stats("0.0.0.0:3000")["0.0.0.0:3000"]
if net_stats
puts "web_active:#{net_stats.active}"
puts "web_queued:#{net_stats.queued}"
end
Then we also have monitors that alert us if we are sitting at sustained queued for longer than a minute or so.
from client_ruby.
Note that @juliusv has pushed a branch that solves this. I think it needs more work to support all metric types.
from client_ruby.
When doing this note that the exchangeable-value-types branch does multiprocess but has some flaws:
- Not all metric types are supported. Specifically the summary type isn't supported.
- This means that the Collector for rack isn't supported either.
Be nice to have fixes for that as well.
from client_ruby.
from client_ruby.
What doesn't work about Summaries? From a stats standpoint, it should be possible to get something out of them.
from client_ruby.
from client_ruby.
This is the branch where I put up (really hacky and experimental!) multiprocess support. It's in large part a transliteration of the Python client's multiprocess support, with tests hacked up to "pass". It's alpha-usable, but I'm hoping for @grobie's rewrite to do this all properly :) See https://github.com/prometheus/client_ruby/blob/multiprocess/example.rb
from client_ruby.
How do you aggregate per-process quantiles?
You can't. But you can report them all.
The _sum and _count work fine in any case.
from client_ruby.
from client_ruby.
I'm guessing there is no progress on this which is sad as it basically makes metrics collections useless in any Rails environment with a forking web server.
I'm trying to switch to the commits that @juliusv and @lyda linked but the examples don't show how to use their changes with the Rack collector / exporter.
Is there anything needed other than:
Prometheus::Client::MmapedValue.set_pid(1)
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
I tried doing that but hit errors where it was trying to call total
on a hash at lib/prometheus/client/formats/text.rb:171:in 'histogram'
Full trace:
app error: undefined method `total' for #<Hash:0x0055d849b5d520> (NoMethodError)
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:171:in `histogram'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:150:in `representation'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:33:in `block (2 levels) in marshal'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:32:in `each'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:32:in `block in marshal'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:28:in `each'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/client/formats/text.rb:28:in `marshal'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/exporter.rb:69:in `respond_with'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/exporter.rb:30:in `call'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/collector.rb:37:in `block in call'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/collector.rb:77:in `trace'
/usr/local/bundle/bundler/gems/client_ruby-b4adef5a39fd/lib/prometheus/middleware/collector.rb:37:in `call'
/usr/local/bundle/gems/rack-1.4.7/lib/rack/content_length.rb:14:in `call'
/usr/local/bundle/gems/railties-3.2.22.5/lib/rails/rack/log_tailer.rb:17:in `call'
/usr/local/bundle/gems/unicorn-4.8.2/lib/unicorn/http_server.rb:572:in `process_client'
I reverted some value.get.total
calls to value.total
and I can get the app to load now but it looks like the metrics are still per fork as if I hit /metrics
repeatedly I get different values.
I'm guessing due to my changes I also get some metrics back like this:
http_server_request_duration_seconds_count{method="get",path="/metrics"} #<Prometheus::Client::SimpleValue:0x005555afd28278>
I'll look more into that issue but I'm guessing the SimpleValue
indicates the mmap stuff isn't working.
from client_ruby.
Quick update: I added prometheus_multiproc_dir=/tmp
to the environment as I see the code checks for this even though none of the examples set it. My output from /metrics
now shows mmap stuff but the http_server_requests_total
still varies depending on which Unicorn thread I talk to.
e.g.
$ curl http://myapp/metrics
http_server_requests_total{code="200",method="get",path="/metrics"} 4.0
...
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="10"} 4.0
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="+Inf"} #<Prometheus::Client::MmapedValue:0x0055ba313dfc70>
a few seconds later
$ curl http://myapp/metrics
http_server_requests_total{code="200",method="get",path="/metrics"} 3.0
...
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="10"} 3.0
http_server_request_duration_seconds_bucket{method="get",path="/metrics",le="+Inf"} #<Prometheus::Client::MmapedValue:0x0055ba313da720>
from client_ruby.
What would be the best workaround for Rails app running in Passenger? I am considering statsd_exporter and pushgateway. I expect the latter to be easier to integrate with but slower.
from client_ruby.
from client_ruby.
Hi, is there any progress on this please?
Looks like it should have issue or feature label in addition to the question.
from client_ruby.
Thanks. How many metrics do you send for each request in production? I am little bit concerned about that HTTP/JSON transport.
from client_ruby.
@lzap will update the readme, I can send about 10k strings over in 200ms over a single request ... if you add a round trip of to_json and JSON.parse its around 600ms for 10k messages.
I persist the HTTP connection for 30 seconds, configurable
from client_ruby.
Hi! I notice the Help wanted
tag on this issue, but it's unclear from reading the comments exactly what help is wanted; all I see is a few proposed solutions and alternate libraries. Has a solution been chosen, and if so, is there a branch where work is being done? I'd like to help if I can.
from client_ruby.
@philandstuff In short, yes, but the latest release still doesn't reflect that. There's still a few changes planned before cutting a new release, some of which are breaking changes, so we'd like to get them done before that.
But... Soon :-)
from client_ruby.
@wshihadeh I'm not sure I understand the question fully...
HA mode would be different servers, correct?
In that case, each server would have separate time series in Prometheus anyway, and the aggregation happens when querying Prometheus.
Or are you talking about something different?
from client_ruby.
Yes, I am talking about that. To explain it more , suppose that the application is a web server that is deployed in multiple containers or servers and it exposes the metrics endpoint. I don't see why we need two Prometheus tasks for such setup (one is enough that scrape metrics from the load balancer or service IP). In that case, the metrics will not be totally accurate and that is why I think there is a need for using shared storage between the containers or servers something like Redis
from client_ruby.
I don't see why we need two Prometheus tasks for such setup
Prometheus really really wants to scrape individual instances. There are a number of reasons for this but the main one is that this is the overwhelming convention in the Prometheus world. If you want to break this convention you will find yourself fighting the tooling instead of working with it.
You should use some sort of service discovery to ensure that Prometheus automatically detects new instances when they are created. That way, you only need a single scrape_config but Prometheus will still scrape each instance separately.
from client_ruby.
Related Issues (20)
- specifying ip-address and portnumber HOT 6
- CPU usage growing over time with DirectFileStore HOT 1
- better path handling HOT 14
- Update README.md with changes to pushgateway client
- Update CHANGELOG.md for 3.0.0 release HOT 1
- `generate_path` yields incorrect values for Sinatra+Rack::Build apps HOT 6
- Exporter port option doesn't work HOT 2
- Counter metric is reset when it has no labels HOT 8
- ERROR: Permission to prometheus/client_ruby.git denied HOT 2
- The rack example does not work because of `Rack::Lint::LintError` HOT 1
- Aaup HOT 1
- How to metrify ActiveRecord sql queries and Redis jobs (sidekiq/resque)? HOT 1
- pushing metrics with instance grouping_key would result: instance is reserved HOT 6
- Fix `Rack` deprecation warning
- Improving DirectFileStore for performance and scalability HOT 18
- Implement native histograms support HOT 4
- cannot load such file -- prometheus/middleware/collector (LoadError) HOT 2
- Job name for pushgateway can't be a symbol HOT 1
- UTF-8: Implement support in Ruby client library HOT 5
- How can I change how paths are labelled? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from client_ruby.