cep21 / healthcheck_nginx_upstreams Goto Github PK
View Code? Open in Web Editor NEWHealth checks upstreams for nginx
Home Page: http://wiki.nginx.org/NginxHttpHealthcheckModule
Health checks upstreams for nginx
Home Page: http://wiki.nginx.org/NginxHttpHealthcheckModule
# Update This module is no longer maintained. I recommend using https://github.com/yaoweibin/nginx_upstream_check_module instead. If you're curious about how this module used to work, read ahead: Healthcheck plugin for nginx. It polls backends and if they respond with HTTP 200 + an optional request body, they are marked good. Otherwise, they are marked bad. Similar to haproxy/varnish health checks. For help on all the options, see the docblocks inside the .c file where each option is defined. Note this also gives you access to a health status page that lets you see how well your healthcheck are doing. ==Important= Nginx gives you full freedom which server peer to pick when you write an upstream. This means that the healthchecking plugin is only a tool that other upstreams must know about to use. So your upstream code MUST SUPPORT HEALTHCHECKS. It's actually pretty easy to modify the code to support them. See the .h file for how as well as the upstream_hash patch which shows how to modify upstream_hash to support healthchecks. For an example plugin modified to support healthchecks, see my modifications to the upstream_hash plugin here: http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks ==Limitations== The module only supports HTTP 1.0, not 1.1. What that really means is it doesn't understand chunked encoding. You should ask for a 1.0 reponse with your healthcheck, unless you're sure the upstream won't send back chunked encoding. See the sample config for an example. ==INSTALL== # Similar to the upstream_hash module cd nginx-0.7.62 # or whatever patch -p1 < /path/to/this/directory/nginx.patch ./configure --add-module=/path/to/this/directory make make install ==How the module works== My first attempt was to spawn a pthread inside the master process, but nginx freaks out on all kinds of levels when you try to have multiple threads running at the same time. Then I thought, fine I'll just fork my own child. But that caused lots of issues when I tried to HUP the master process because my own child wasn't getting signals. I was thinking to myself, these just don't feel like the nginx way of doing things. So, I figured I would just work directly with the worker process model. When each worker process starts, they add an repeating event to the event tree asking for ownership of a server's healthcheck. When that ownership event comes up, they lock the server's healthcheck and try to claim it with their pid. If the process can't claim it, then it retries to claim the healthcheck later, cause maybe the worker that does own it dies or something. For the worker that does own it, it inserts a healthcheck event into nginx's event tree. When that triggers, then it starts a peer connection to the server and goes to town sending and getting data. When the healthcheck finishes, or times out, it updates the shared memory structure and signals for another healthcheck later. A few random issues I had were: 1) When nginx tries to shut down, it waits for the event tree to empty out. To get around this, I check for ngx_quit and all kinds of other variables. This means that when you do HUP nginx, your worker needs to sit around doing nothing until *something* in the healthcheck event tree comes up, after which it can clear all the healthcheck events and move on. I could fix this if nginx added a per module callback on HUP. Maybe a 'cleanup' or something. The current exit_process callback is called after the event tree is empty, not after a request to shutdown a worker. ==Extending== It should be very easy to extend this module to work with fastcgi or even generic TCP backends. You would need to just change, or abstract out, ngx_http_healthcheck_process_recv. Patches that do that are welcome, and I'm happy to help out with any questions. I'm also happy to help out with extending your upstream picking modules to work with healthchecks as well. Your code can even be no healthcheck compatable by surrounding the changes with #if (NGX_HTTP_HEALTHCHECK) ==Config== See sample_ngx_config.conf Author: Jack Lindamood <[email protected]> ==License== Apache License, Version 2.0
Hello, I want to install this module, but the patch can not be installed, can not continue down, I hope you can help me, thank you!
Here's my console info and environment:
centos7
nginx1.8
patching file –p1
Hunk #1 FAILED at 4267.
1 out of 1 hunk FAILED -- saving rejects to file –p1.rej
patching file –p1
Hunk #1 FAILED at 106.
1 out of 1 hunk FAILED -- saving rejects to file –p1.rej
patching file –p1
Hunk #1 FAILED at 4.
Hunk #2 FAILED at 12.
Hunk #3 FAILED at 23.
Hunk #4 FAILED at 57.
Hunk #5 FAILED at 365.
Hunk #6 FAILED at 410.
Hunk #7 FAILED at 471.
Hunk #8 FAILED at 576.
Hunk #9 FAILED at 601.
Hunk #10 FAILED at 617.
10 out of 10 hunks FAILED -- saving rejects to file –p1.rej
patching file –p1
Hunk #1 FAILED at 26.
1 out of 1 hunk FAILED -- saving rejects to file –p1.rej
Hi
I'm trying to use this module on Nginx 1.7.10 and getting compile-time errors (sorry, I didn't capture them stupidly) - I just wondered if you have any plans to support 1.7.10 (it looked to me like some nginx functionality had changed and broken compatibilty).
I've compiled in other modules successfully BTW.
Cheers
Neil
In round robin upstream mode, the healthcheck not worked. any plan to support this?
Hi,
I'm new to nginx and load balancing in general. I have setup nginx 0.8.54 with your healthcheck module. Is there any way to automatically remove servers from an upstream when they are marked as bad?
Cheers,
Mike
With the hash module included and the following in the config
upstream myproject {
server 10.9.8.181:80; # weight=1 max_fails=1;
server 10.9.8.193:80; # weight=1 max_fails=1;
hash $filename;
hash_again 2;
healthcheck_enabled;
healthcheck_delay 5000;
healthcheck_timeout 20000;
healthcheck_failcount 1;
#healthcheck_expected 'I_AM_ALIVE';
healthcheck_send "GET /health HTTP/1.0" 'Host: $http_host';
}
server {
listen 80;
server_name "";
access_log log/localhost.access.log main;
location / {
set $filename $request_uri;
if ($request_uri ~* ".*/(.*)") {
set $filename $1;
}
proxy_set_header Host $http_host;
proxy_pass http://myproject;
proxy_connect_timeout 3;
}
location /stat {
healthcheck_status;
}
}
If a the health check file is removed from one of the servers requests are still sent to that server. the stat page confirms the server is "down"
Perhaps this is not a issue with the health checking but a issue with http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks
either way why is this module so difficult to use.
Hey guys,
I'm working on a chef recipe to install this module from source. Is it possible to have a release for this so we can download a tarball from github directly?
Cheers!
sorry, i post it in a wrong place, how i can close this issue ?
$ patch -p1 < nginx.patch
patching file src/http/ngx_http_upstream.c
Hunk #1 succeeded at 4403 (offset 110 lines).
patching file src/http/ngx_http_upstream.h
Hunk #1 succeeded at 110 (offset 1 line).
patching file src/http/ngx_http_upstream_round_robin.c
Hunk #1 succeeded at 5 (offset 1 line).
Hunk #2 FAILED at 15.
Hunk #4 succeeded at 78 (offset 8 lines).
Hunk #5 succeeded at 392 (offset 3 lines).
Hunk #6 FAILED at 438.
Hunk #7 FAILED at 499.
Hunk #8 FAILED at 608.
Hunk #9 FAILED at 635.
Hunk #10 FAILED at 660.
6 out of 10 hunks FAILED -- saving rejects to file src/http/ngx_http_upstream_round_robin.c.rej
patching file src/http/ngx_http_upstream_round_robin.h
Hunk #1 succeeded at 29 (offset 3 lines).
$
$ uname -rop
2.6.18-194.el5PAE i686 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Nginx version nginx-1.2.1
This is more like a feature request.. I've started down the road of making this work, but I'd wondered if you had already tackled it or could give me some pointers :)
I tried to use the latest stable: nginx/0.8.54
I installed both this healthcheck_nginx_upstream and cep21 / nginx_upstream_hash. Both patches succeeded.
The configure looked good:
./configure --add-module=../nginx_upstream_hash/ --add-module=../healthcheck_nginx_upstreams/ --with-http_stub_status_module --prefix=/opt/nginx --with-http_ssl_module --error_log=logs/error.log --user=root --group=root
I used the sample nginx configs and have this:
upstream thin_system {
hash $request_uri;
hash_again 0;
healthcheck_enabled;
healthcheck_delay 250;
healthcheck_timeout 100;
healthcheck_failcount 1;
## We will just check for HTTP 200
# healthcheck_expected 'HEALTH_OK';
healthcheck_send "GET /ajax/health HTTP/1.0" 'Host: healthcheck';
server ch1-app03:3000;
}
I have the servers which point to the backend (was working before, just trying to add the healthcheck), but it always succeeds and I don't see any traffic going out on the healthchecks. (the /ajax/health returns a 404 right now)
So I check this:
location /stat {
healthcheck_status;
}
But it is blank with only the header:
Index Name Owner PID Last action time Concurrent status values Time of concurrent values Last response down Last health status Is down?
It doesn't have a list of servers and all requests are going through.
Unfortunately it is not possible to use it without a new, working upstream_roundrobin or upstream_hash patch.
I try to run healthcheck on CentOS4.6 and nginx 0.8.48.
and after starting nginx using sample config, I access "http://localhost:81/stat/" .
But I can't get data from localhost:11114.
I have no idea...
in /var/log/messages there're lots of
kernel: nginx[28164]: segfault at 0000000000000018 rip 0000000000410a8f rsp 00007fffce7ff7d0 error 4
messages. When I reload nginx, the pid 28164 belongs to the shutting down nginx worker_process.
Every time I reload nginx there're segfaults, unless I delete all the healthcheck directives from its configuration file.
core dump shows this:
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000410a8f in time ()
(gdb) bt
#0 0x0000000000410a8f in time ()
#1 0x0000000000417879 in time ()
#2 0x00000000004177a5 in time ()
#3 0x000000000041c49e in time ()
#4 0x000000000040424b in time ()
#5 0x0000003abac1d994 in __libc_start_main () from /lib64/libc.so.6
#6 0x0000000000402a59 in time ()
#7 0x00007fffce7ffb38 in ?? ()
#8 0x0000000000000000 in ?? ()
i found coredump file when used ' ./nginx -s reload':
Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libcrypto.so.10
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libfreebl3.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Core was generated by `nginx: w'.
Program terminated with signal 11, Segmentation fault.
#0 ngx_rbtree_min (tree=0x6bde20, node=0x1326598) at src/core/ngx_rbtree.h:76
76 while (node->left != sentinel) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 openssl-1.0.0-27.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ngx_rbtree_min (tree=0x6bde20, node=0x1326598) at src/core/ngx_rbtree.h:76
#1 ngx_rbtree_delete (tree=0x6bde20, node=0x1326598) at src/core/ngx_rbtree.c:178
#2 0x000000000046c002 in ngx_event_del_timer (log=) at src/event/ngx_event_timer.h:44
#3 ngx_http_healthcheck_clear_events (log=)
at /home/hongwei/CloudListProxy/src/addon/ngx-http-healthcheck-module-0.10/ngx_http_healthcheck_module.c:669
#4 0x000000000046c1ec in ngx_http_healthcheck_mark_finished (stat=0x1326540)
at /home/hongwei/CloudListProxy/src/addon/ngx-http-healthcheck-module-0.10/ngx_http_healthcheck_module.c:318
#5 0x000000000046c57a in ngx_http_healthcheck_read_handler (rev=0x1346720)
at /home/hongwei/CloudListProxy/src/addon/ngx-http-healthcheck-module-0.10/ngx_http_healthcheck_module.c:441
#6 0x0000000000422781 in ngx_epoll_process_events (cycle=0x1303360, timer=, flags=)
at src/event/modules/ngx_epoll_module.c:683
#7 0x000000000041a1f3 in ngx_process_events_and_timers (cycle=0x1303360) at src/event/ngx_event.c:249
#8 0x0000000000421040 in ngx_worker_process_cycle (cycle=0x1303360, data=) at src/os/unix/ngx_process_cycle.c:807
#9 0x000000000041f7cc in ngx_spawn_process (cycle=0x1303360, proc=0x420f4a <ngx_worker_process_cycle>, data=0x0,
name=0x489bdb "worker process", respawn=-4) at src/os/unix/ngx_process.c:198
#10 0x0000000000420498 in ngx_start_worker_processes (cycle=0x1303360, n=1, type=-4) at src/os/unix/ngx_process_cycle.c:362
#11 0x0000000000421db1 in ngx_master_process_cycle (cycle=0x1303360) at src/os/unix/ngx_process_cycle.c:249
#12 0x00000000004049fd in main (argc=, argv=) at src/core/nginx.c:412
(gdb) quit
I try to run healthcheck on FreeBSD 8.0 and nginx 0.7.67. But after nginx start, it consume cpu 100%. I use truss on freebsd, I found this:
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
and below is debug log:
2010/08/05 09:43:51 [debug] 9804#0: healthcheck: Init zone
2010/08/05 09:43:51 [debug] 9804#0: bind() 0.0.0.0:80 #6
2010/08/05 09:43:51 [debug] 9804#0: counter: 000000080060B080, 1
2010/08/05 09:43:54 [alert] 9787#0: worker process 9788 exited on signal 9
2010/08/05 09:43:55 [debug] 9818#0: healthcheck: Init zone
2010/08/05 09:43:55 [debug] 9818#0: bind() 0.0.0.0:80 #6
2010/08/05 09:43:55 [debug] 9818#0: counter: 000000080060B080, 1
2010/08/05 09:43:55 [debug] 9819#0: healthcheck: Init zone
2010/08/05 09:43:55 [debug] 9819#0: bind() 0.0.0.0:80 #6
2010/08/05 09:43:55 [notice] 9819#0: using the "kqueue" event method
2010/08/05 09:43:55 [debug] 9819#0: counter: 000000080060B080, 1
2010/08/05 09:43:55 [notice] 9819#0: nginx/0.7.67
2010/08/05 09:43:55 [notice] 9819#0: OS: FreeBSD 8.0.3-RELEASE
2010/08/05 09:43:55 [notice] 9819#0: kern.osreldate: 800107, built on 800107
2010/08/05 09:43:55 [notice] 9819#0: hw.ncpu: 16
2010/08/05 09:43:55 [notice] 9819#0: net.inet.tcp.sendspace: 32768
2010/08/05 09:43:55 [notice] 9819#0: kern.ipc.somaxconn: 128
2010/08/05 09:43:55 [notice] 9819#0: getrlimit(RLIMIT_NOFILE): 11095:11095
2010/08/05 09:43:55 [debug] 9820#0: write: 7, 00007FFFFFFFE7B0, 5, 0
2010/08/05 09:43:55 [notice] 9820#0: start worker processes
2010/08/05 09:43:55 [debug] 9820#0: channel 3:7
2010/08/05 09:43:55 [notice] 9820#0: start worker process 9821
2010/08/05 09:43:55 [debug] 9820#0: sigsuspend
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801E54000:16384
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801E58000:16384
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EAA000:16384
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EAE000:188416
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EDC000:114688
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EF8000:114688
2010/08/05 09:43:55 [debug] 9821#0: kevent set event: 6: ft:-1 fl:0005
2010/08/05 09:43:55 [debug] 9821#0: healthcheck: Adding events to worker process 9821
2010/08/05 09:43:55 [debug] 9821#0: event timer add: 0: 5332:1280972640834
2010/08/05 09:43:55 [debug] 9821#0: event timer add: 0: 1927:1280972637429
2010/08/05 09:43:55 [debug] 9821#0: kevent set event: 7: ft:-1 fl:0005
2010/08/05 09:43:55 [debug] 9821#0: worker cycle
2010/08/05 09:43:55 [debug] 9821#0: kevent timer: 1927, changes: 2
2010/08/05 09:43:57 [debug] 9821#0: kevent events: 0
2010/08/05 09:43:57 [debug] 9821#0: timer delta: 1938
2010/08/05 09:43:57 [debug] 9821#0: event timer del: 0: 1280972637429
2010/08/05 09:44:46 [notice] 9820#0: signal 15 (SIGTERM) received, exiting
I have no idea yet about that
Hello,
unfortunately this healthcheck module only works with upstream_round_robin.
Is there a possibility to use it with upstream_ip_hash as well?
we can not change this because we strongly depend on upstream_ip_hash module.
How to solve this issue?
Thanks,
Hans
Hey,
I'm trying to apply this to the latest stable release of nginx and having issues getting the patch to apply cleanly.
nginx-0.7.65 $ patch -p1 --dry-run < modules/healthcheck_nginx_upstreams/nginx.patch
patching file src/http/ngx_http_upstream.c
Hunk #1 succeeded at 4153 (offset 7 lines).
patching file src/http/ngx_http_upstream.h
Hunk #1 succeeded at 105 with fuzz 2 (offset -6 lines).
can't find file to patch at input line 54
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--- src/http/ngx_http_upstream_round_robin.c.orig Mon Mar 8 20:55:32 2010
+++ src/http/ngx_http_upstream_round_robin.c Mon Mar 8 21:03:17 2010
File to patch:
For what it's worth, I get the same error trying to run this against 0.7.62, so maybe I'm doing something wrong. I'm not familiar enough with patch to patch this myself quickly.
Thanks,
Adam
if (peer->state != NGX_HTTP_CHECK_SEND_DONE) {
if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
goto check_recv_fail;
}
return;
}
why need to add this code
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.