Comments (9)
Hello @MillsyBot,
Thanks a lot, it is really great to receive a nice comment about our work and it is a real pleasure to share our work with people being grateful.
We are using daily this tool with signals to regenerate on the fly our configurations for dozens of different very large configurations in many teams with many instances of Prometheus (Prometheus receiving the signal) and this is working for a very long time, so I suspect a big bug would have been notified, but it is still possible, especially since we are not using very high --wait values
(max is around 10s).
I also added the signal reload for your exact usage some time ago, so it is exactly intended to do so. But there are a few warnings (that are supposed to be explained in the README.md, but I take the time to re-explain (and feel free to modify the README in a PR if it is not clear).
The most common problem
First, in your example, you are using --wait 60
. This means that templates are evaluated every 1 minute, it has several impacts: it means that the template will be generated at a maximum of 1/m, so the signals will be sent every second at the maximum, whatever the number of changes/s. I suspect this is what you want, but globally, even in a very noisy cluster, haproxy will reload its config every 60s max. I never tried to use it with such high values in production (while it should not have any impact, it is still possible).
Second, the order of --sig-reload and --exec are very important:
if you consider this example:
consul-templaterb --exec /my/monitoring --sig-reload NONE --exec "python -m http.server 8080" [somefiles]
/my/monitoring and python servers would be launched once all files are properly generated and will receive signals each time one of those files is changing. However:
/my/monitoring
is supposed to receiveSIGHUP
(the default value), because -sig-reload is after--exec /my/monitoring
python -m http.server 8080
will receive no signal (because--sig-reload NONE
is before--exec
python command).
It means that for instance appending --sig-reload TERM
at the end of your commands has no effect because no other --exec
flag is present).
This behavior, while powerful, is not very common (and I have been bitten a few times with it myself), and several times, when looking at it, I did not understand while my python command did receive SIGHUP while I wanted NONE, appended at the end of my command line.
Another possible issue
From the doc:
If you generate several files at the same time, the signal will be sent only once the rendering of all templates is completed, so if your process is using several configuration files, all files will be modified and consistent before the signal is sent to process.
It means that the signal is supposed to be generated when all the files are considered not dirty, it means all the data for all the files have been retrieved and properly computed.
Consider this example:
<%
counts = {}
services.each do |service_name, tags|
counts[service_name] = service(service_name).count
end
%><%= JSON.pretty_print(counts) %>
If you have a cluster where 1 service is renamed every 30s and you used --wait 60
, for instance:
service-30, then service-60, service-90 [...] service-3600
At startup, consul-templaterb ignores --wait and try to fetch data as fast as possible. But once the template has been rendered, it would watch.
every 60s, the template is evaluated, then wait...
so, at t+60: consul-templaterb finds new services service-30
and service-60
, if schedules the retrival of those 2 additional services, mark the template has dirty (incomplete), does not render it and wait for 60s.
at t+120, consul-templaterb did get values for service-30
and service-60
, but in the meantime, service-90
and service-120
have been added, so values are here, but 2 new schedules have been added service-90
and service-120
, so template is still dirty, rescheduled in 60s... still no signal sent, if the list of services is changing fast enought and the --wait 60s is there, you might end with your templates being all rendered not that often, thus signal would be never sent, or very rarely. Note that in order to send the signal, consul-templaterb sends the signal ONLY if all templates are non-dirty. The fact that you are using several templates, means that at least one of those templates is not rendered, nothing happens
This might be fixed in at least 3 ways:
- Hack the file consul-templaterb and change https://github.com/criteo/consul-templaterb/blob/master/bin/consul-templaterb#L69 to use min-value = 180 (refresh at the maximum the list of services every 3 minutes) => would work for my example, since the value would be greater than 60s, new services would never popup every 30s and it would converge with a maximum 60s. If it is the case, a clean fix would be to be able to override from command line those values and it would work in any case. Another possible fix would be to have values here that are always at least 1.5 times the
--wait
value - Do not use signals and use a binary instead, in this binary ensure to reload the command ONLY if some time has been spent. So, you could use --wait 60 BUT reload haproxy only every 60s, example
--template haproxy.conf.erb:haproxy.conf::./reload_haproxy_when_needed.sh
with ./reload_haproxy_when_needed.sh
being (not tested):
#!/bin/bash
# Test if .last_reload is older than 1 min
if test "`find .last_reload -mmin +1`"
then
kill -USR2 $(pidof haproxy)
touch .last_reload
fi
=> thus, each time haproxy.conf would be modified, the script would be launch, but only reload haproxy every 60s max, meaning that you could use --wait 1
to have faster changes.
If it something else
You might consider running it with debug flags (which displays if there are some dirty templates), and if you don't find, try posting the debug messages here.
About --wait
wait
is a bit tricky as it both handles files changes and evaluation of templates. This is clearly not the best idea I had to link both, and initially, it was more about either reducing a bit CPU usage on very complex templates (but not changing data all the time) OR avoid reloading the process all the time (but more with 10s than 60s).
This might be fixed by having an option to delay --sig-reload to launch only every x seconds (might not be that hard) and decorrelate it from --wait.
I would be glad to accept such PR if you are interested
Best
Pierre
from consul-templaterb.
Hello @MillsyBot,
I added support from what I described in my previous comment in 70808c6
This will let you use -w 1
(render every second), but send to the process signals every 60s, using the new flag -W 60
,
will be available in the next coming release (in a few minutes). This might solve your issues if my theory is correct.
from consul-templaterb.
@robloxrob glad it helps
from consul-templaterb.
@chuckyz says thanks as well.
from consul-templaterb.
Thanks for all the love here! This really makes my day.
from consul-templaterb.
@pierresouchay WOW, thank you so much for the detailed response, and for releasing a new version addressing my issues! So cool, I will buy you a beer at the next HAProxy Conf (we meet there last year).
Out of curiosity: what type of performance impact are you seeing with HAProxy reloading every 1sec? Are the impacts significant to connection reuse when the FD are constantly passing from one process to another?
from consul-templaterb.
@MillsyBot It really depends on the versions. In my tests a few years ago (Version 1.7.x), it used to break many things (including opened connections, so while it is ok to loose a connection or 2 every minute, not acceptable every sec), in recent, HAProxy has been doing lots of work to avoid this (esp. with 2.x releases), so I suspect it is not such a big issue with recent kernels and recent HAProxy versions.
On our side, we know use systems interconnected with Consul to pilot HAProxy (most notably: https://github.com/haproxytech/haproxy-consul-connect that uses dataplane recent additions to change on the fly HAProxy config)
But anyway, we also had this exact same issue with our prometheus instances (we used -wait 10s
as well), so I think it is a welcomed addition anyway, avoiding lots of crappy workarounds.
from consul-templaterb.
@MillsyBot Don't forget to tell me if it is fixing your issue, in that case, please close the issue!
Thank you!
from consul-templaterb.
@pierresouchay Super helpful.
from consul-templaterb.
Related Issues (20)
- Createing multiple files from one template
- Throttle EM requests to max/sec or max parallel? HOT 3
- Cannot see nodes/services/any in the UI HOT 4
- Improve Docker image builld HOT 2
- Getting timestamp in templates HOT 5
- Executing command on each change HOT 2
- Get checks by state HOT 6
- Trying to auth to vault with certs HOT 3
- Expose list of templates objects being generated HOT 3
- undefined method `[]' for #<Consul::Async::ConsulTemplateNodes:0x000055744a631640> HOT 10
- inactivity_timeout for the vault endpoint isn't configurable/too short HOT 2
- [FEATURE] Prometheus Endpoint for Template Rendering Times HOT 2
- `ready?` method for remote_resource.as_json is always true HOT 2
- Eventmachine Seg Fault When Using Vault HOT 5
- [Q] AIX 7.1 Support HOT 1
- Potential Memory Leak HOT 13
- consul_templaterb uses deprecated methods HOT 3
- Correct usage of --wait-signal option HOT 2
- Question: How can we dynamically generate different templates from a consul K/V JSON object ? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from consul-templaterb.