cep21 / healthcheck_nginx_upstreams Goto Github PK

Health checks upstreams for nginx

Home Page: http://wiki.nginx.org/NginxHttpHealthcheckModule

C 100.00%

healthcheck_nginx_upstreams's Introduction

# Update

This module is no longer maintained. I recommend using https://github.com/yaoweibin/nginx_upstream_check_module instead.

If you're curious about how this module used to work, read ahead:

Healthcheck plugin for nginx. It polls backends and if they respond with
HTTP 200 + an optional request body, they are marked good. Otherwise, they
are marked bad. Similar to haproxy/varnish health checks.

For help on all the options, see the docblocks inside the .c file where each
option is defined.

Note this also gives you access to a health status page that lets you see
how well your healthcheck are doing.

==Important=
Nginx gives you full freedom which server peer to pick when you write an
upstream. This means that the healthchecking plugin is only a tool that
other upstreams must know about to use. So your upstream code MUST SUPPORT
HEALTHCHECKS. It's actually pretty easy to modify the code to support them.

See the .h file for how as well as the upstream_hash patch which shows
how to modify upstream_hash to support healthchecks.

For an example plugin modified to support healthchecks, see my modifications
to the upstream_hash plugin here:

http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks

==Limitations==
The module only supports HTTP 1.0, not 1.1. What that really means is it
doesn't understand chunked encoding. You should ask for a 1.0 reponse with
your healthcheck, unless you're sure the upstream won't send back chunked
encoding. See the sample config for an example.

==INSTALL==
# Similar to the upstream_hash module

cd nginx-0.7.62 # or whatever
patch -p1 < /path/to/this/directory/nginx.patch
./configure --add-module=/path/to/this/directory
make
make install

==How the module works==
My first attempt was to spawn a pthread inside the master process, but nginx
freaks out on all kinds of levels when you try to have multiple threads
running at the same time. Then I thought, fine I'll just fork my own child.
But that caused lots of issues when I tried to HUP the master process because
my own child wasn't getting signals. I was thinking to myself, these just
don't feel like the nginx way of doing things. So, I figured I would just
work directly with the worker process model.

When each worker process starts, they add an repeating event to the event
tree asking for ownership of a server's healthcheck. When that ownership
event comes up, they lock the server's healthcheck and try to claim it with
their pid. If the process can't claim it, then it retries to claim the
healthcheck later, cause maybe the worker that does own it dies or something.

For the worker that does own it, it inserts a healthcheck event into nginx's
event tree. When that triggers, then it starts a peer connection to the
server and goes to town sending and getting data. When the healthcheck
finishes, or times out, it updates the shared memory structure and signals for
another healthcheck later.

A few random issues I had were:
1) When nginx tries to shut down, it waits for the event tree to empty out.
To get around this, I check for ngx_quit and all kinds of other variables.
This means that when you do HUP nginx, your worker needs to sit around doing
nothing until *something* in the healthcheck event tree comes up, after which
it can clear all the healthcheck events and move on. I could fix this if
nginx added a per module callback on HUP. Maybe a 'cleanup' or something.
The current exit_process callback is called after the event tree is empty, not
after a request to shutdown a worker.

==Extending==
It should be very easy to extend this module to work with fastcgi or even
generic TCP backends. You would need to just change, or abstract out,
ngx_http_healthcheck_process_recv. Patches that do that are welcome, and I'm
happy to help out with any questions. I'm also happy to help out with
extending your upstream picking modules to work with healthchecks as well.
Your code can even be no healthcheck compatable by surrounding the changes
with #if (NGX_HTTP_HEALTHCHECK)

==Config==
See sample_ngx_config.conf

Author: Jack Lindamood <[email protected]>

==License==

Apache License, Version 2.0

healthcheck_nginx_upstreams's People

Contributors

Stargazers

Watchers

Forkers

piotrsikora yaoweibin davidcoallier github-ivan zhuomingliang liseen mschenck oopos dukehoops-zz zhanglei rrana jametong chenbk85 gjpark anismiles wangwei1237 eyckwigo duhoobo angrz jessecollier shiyingxyl babyduncan radut scrum2b cloudxtreme irongomme superproxy reveller yamkolli coverxiaoeye scottk212 cuiwm timerope johnyin123 dut3062796s sbuls fishgege chakra-coder cupidove leftalon zerolugithub 409869887 zhouchangxun juzipeek bobrowskim mehome oscar810429 shuaihuzhou liudantop leixu26 kimzhong huangtao1990 hanfeicode myjeffxie mentoya enumerate1 kluzas charygao cdekimpe isabella232 elmeast pmannava22 dong90 vishal1551

healthcheck_nginx_upstreams's Issues

Patch can not be installed！

Hello, I want to install this module, but the patch can not be installed, can not continue down, I hope you can help me, thank you!
Here's my console info and environment:
centos7
nginx1.8

patch –p1< ../cep21-healthcheck_nginx_upstreams-b33a846/nginx.patch

patching file –p1
Hunk #1 FAILED at 4267.
1 out of 1 hunk FAILED -- saving rejects to file –p1.rej
patching file –p1
Hunk #1 FAILED at 106.
1 out of 1 hunk FAILED -- saving rejects to file –p1.rej
patching file –p1
Hunk #1 FAILED at 4.
Hunk #2 FAILED at 12.
Hunk #3 FAILED at 23.
Hunk #4 FAILED at 57.
Hunk #5 FAILED at 365.
Hunk #6 FAILED at 410.
Hunk #7 FAILED at 471.
Hunk #8 FAILED at 576.
Hunk #9 FAILED at 601.
Hunk #10 FAILED at 617.
10 out of 10 hunks FAILED -- saving rejects to file –p1.rej
patching file –p1
Hunk #1 FAILED at 26.
1 out of 1 hunk FAILED -- saving rejects to file –p1.rej

Nginx 1.7.10 support

I'm trying to use this module on Nginx 1.7.10 and getting compile-time errors (sorry, I didn't capture them stupidly) - I just wondered if you have any plans to support 1.7.10 (it looked to me like some nginx functionality had changed and broken compatibilty).

I've compiled in other modules successfully BTW.

Cheers
Neil

health check on round robin upstream? feature request

In round robin upstream mode, the healthcheck not worked. any plan to support this?

Remove server from upstream

Hi,

I'm new to nginx and load balancing in general. I have setup nginx 0.8.54 with your healthcheck module. Is there any way to automatically remove servers from an upstream when they are marked as bad?

Cheers,
Mike

Healthcheck doesn't seem to do anything

With the hash module included and the following in the config

    upstream myproject {
            server 10.9.8.181:80; # weight=1 max_fails=1;
            server 10.9.8.193:80; # weight=1 max_fails=1;
            hash $filename;
            hash_again 2;
            healthcheck_enabled;
            healthcheck_delay 5000;
            healthcheck_timeout 20000;
            healthcheck_failcount 1;
            #healthcheck_expected 'I_AM_ALIVE';
            healthcheck_send "GET /health HTTP/1.0" 'Host: $http_host';
    }

    server {
            listen 80;
            server_name "";
            access_log log/localhost.access.log main;

            location / {
                  set $filename $request_uri;
                  if ($request_uri ~* ".*/(.*)") {
                    set $filename $1;
                  }
                  proxy_set_header Host $http_host;
                  proxy_pass http://myproject;
                  proxy_connect_timeout 3;
            }

            location /stat {
                    healthcheck_status;
            }
    }

If a the health check file is removed from one of the servers requests are still sent to that server. the stat page confirms the server is "down"

Perhaps this is not a issue with the health checking but a issue with http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks
either way why is this module so difficult to use.

Request for Release

Hey guys,

I'm working on a chef recipe to install this module from source. Is it possible to have a release for this so we can download a tarball from github directly?

Cheers!

Does the upstream check module support the FASTCGI protocol ?

sorry, i post it in a wrong place, how i can close this issue ?

healthcheck_nginx_upstreams patch fails for Nginx 1.2.1

$ patch -p1 < nginx.patch
patching file src/http/ngx_http_upstream.c
Hunk #1 succeeded at 4403 (offset 110 lines).
patching file src/http/ngx_http_upstream.h
Hunk #1 succeeded at 110 (offset 1 line).
patching file src/http/ngx_http_upstream_round_robin.c
Hunk #1 succeeded at 5 (offset 1 line).
Hunk #2 FAILED at 15.
Hunk #4 succeeded at 78 (offset 8 lines).
Hunk #5 succeeded at 392 (offset 3 lines).
Hunk #6 FAILED at 438.
Hunk #7 FAILED at 499.
Hunk #8 FAILED at 608.
Hunk #9 FAILED at 635.
Hunk #10 FAILED at 660.
6 out of 10 hunks FAILED -- saving rejects to file src/http/ngx_http_upstream_round_robin.c.rej
patching file src/http/ngx_http_upstream_round_robin.h
Hunk #1 succeeded at 29 (offset 3 lines).
$

$ uname -rop
2.6.18-194.el5PAE i686 GNU/Linux

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)

Nginx version nginx-1.2.1

SSL Support?

This is more like a feature request.. I've started down the road of making this work, but I'd wondered if you had already tackled it or could give me some pointers :)

Getting healthcheck status page, but no healthchecks being performed

I tried to use the latest stable: nginx/0.8.54
I installed both this healthcheck_nginx_upstream and cep21 / nginx_upstream_hash. Both patches succeeded.

The configure looked good:
./configure --add-module=../nginx_upstream_hash/ --add-module=../healthcheck_nginx_upstreams/ --with-http_stub_status_module --prefix=/opt/nginx --with-http_ssl_module --error_log=logs/error.log --user=root --group=root

I used the sample nginx configs and have this:

upstream thin_system {

hash $request_uri;
hash_again 0;
healthcheck_enabled;
healthcheck_delay 250;
healthcheck_timeout 100;
healthcheck_failcount 1;
## We will just check for HTTP 200
#    healthcheck_expected 'HEALTH_OK';
 healthcheck_send "GET /ajax/health HTTP/1.0" 'Host: healthcheck';
    server ch1-app03:3000;
  }

I have the servers which point to the backend (was working before, just trying to add the healthcheck), but it always succeeds and I don't see any traffic going out on the healthchecks. (the /ajax/health returns a 404 right now)

So I check this:
location /stat {
healthcheck_status;
}

But it is blank with only the header:
Index Name Owner PID Last action time Concurrent status values Time of concurrent values Last response down Last health status Is down?

It doesn't have a list of servers and all requests are going through.

This module is completely useless this time

Unfortunately it is not possible to use it without a new, working upstream_roundrobin or upstream_hash patch.

I can't run it on CentOS4.6 nginx 0.8.48

I try to run healthcheck on CentOS4.6 and nginx 0.8.48.
and after starting nginx using sample config, I access "http://localhost:81/stat/" .
But I can't get data from localhost:11114.

I have no idea...

nginx with healthcheck segfault on reload

in /var/log/messages there're lots of
kernel: nginx[28164]: segfault at 0000000000000018 rip 0000000000410a8f rsp 00007fffce7ff7d0 error 4
messages. When I reload nginx, the pid 28164 belongs to the shutting down nginx worker_process.
Every time I reload nginx there're segfaults, unless I delete all the healthcheck directives from its configuration file.

core dump shows this:
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000410a8f in time ()

(gdb) bt
#0 0x0000000000410a8f in time ()
#1 0x0000000000417879 in time ()
#2 0x00000000004177a5 in time ()
#3 0x000000000041c49e in time ()
#4 0x000000000040424b in time ()
#5 0x0000003abac1d994 in __libc_start_main () from /lib64/libc.so.6
#6 0x0000000000402a59 in time ()
#7 0x00007fffce7ffb38 in ?? ()
#8 0x0000000000000000 in ?? ()

healthcheck module coredump

i found coredump file when used ' ./nginx -s reload':

Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libcrypto.so.10
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libfreebl3.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Core was generated by `nginx: w'.
Program terminated with signal 11, Segmentation fault.
#0 ngx_rbtree_min (tree=0x6bde20, node=0x1326598) at src/core/ngx_rbtree.h:76

76 while (node->left != sentinel) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 openssl-1.0.0-27.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ngx_rbtree_min (tree=0x6bde20, node=0x1326598) at src/core/ngx_rbtree.h:76
#1 ngx_rbtree_delete (tree=0x6bde20, node=0x1326598) at src/core/ngx_rbtree.c:178
#2 0x000000000046c002 in ngx_event_del_timer (log=) at src/event/ngx_event_timer.h:44
#3 ngx_http_healthcheck_clear_events (log=)

at /home/hongwei/CloudListProxy/src/addon/ngx-http-healthcheck-module-0.10/ngx_http_healthcheck_module.c:669

#4 0x000000000046c1ec in ngx_http_healthcheck_mark_finished (stat=0x1326540)

at /home/hongwei/CloudListProxy/src/addon/ngx-http-healthcheck-module-0.10/ngx_http_healthcheck_module.c:318

#5 0x000000000046c57a in ngx_http_healthcheck_read_handler (rev=0x1346720)

at /home/hongwei/CloudListProxy/src/addon/ngx-http-healthcheck-module-0.10/ngx_http_healthcheck_module.c:441

#6 0x0000000000422781 in ngx_epoll_process_events (cycle=0x1303360, timer=, flags=)

at src/event/modules/ngx_epoll_module.c:683

#7 0x000000000041a1f3 in ngx_process_events_and_timers (cycle=0x1303360) at src/event/ngx_event.c:249
#8 0x0000000000421040 in ngx_worker_process_cycle (cycle=0x1303360, data=) at src/os/unix/ngx_process_cycle.c:807
#9 0x000000000041f7cc in ngx_spawn_process (cycle=0x1303360, proc=0x420f4a <ngx_worker_process_cycle>, data=0x0,

name=0x489bdb "worker process", respawn=-4) at src/os/unix/ngx_process.c:198

#10 0x0000000000420498 in ngx_start_worker_processes (cycle=0x1303360, n=1, type=-4) at src/os/unix/ngx_process_cycle.c:362
#11 0x0000000000421db1 in ngx_master_process_cycle (cycle=0x1303360) at src/os/unix/ngx_process_cycle.c:249
#12 0x00000000004049fd in main (argc=, argv=) at src/core/nginx.c:412

(gdb) quit

I can't run it on FreeBSD 8.0 nginx 0.7.67

I try to run healthcheck on FreeBSD 8.0 and nginx 0.7.67. But after nginx start, it consume cpu 100%. I use truss on freebsd, I found this:
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)
sched_yield(0x8005f9070,0x25f0,0x400,0xfffffffffffffffa,0x200,0x7fffffffe660) = 0 (0x0)

and below is debug log:
2010/08/05 09:43:51 [debug] 9804#0: healthcheck: Init zone
2010/08/05 09:43:51 [debug] 9804#0: bind() 0.0.0.0:80 #6
2010/08/05 09:43:51 [debug] 9804#0: counter: 000000080060B080, 1
2010/08/05 09:43:54 [alert] 9787#0: worker process 9788 exited on signal 9
2010/08/05 09:43:55 [debug] 9818#0: healthcheck: Init zone
2010/08/05 09:43:55 [debug] 9818#0: bind() 0.0.0.0:80 #6
2010/08/05 09:43:55 [debug] 9818#0: counter: 000000080060B080, 1
2010/08/05 09:43:55 [debug] 9819#0: healthcheck: Init zone
2010/08/05 09:43:55 [debug] 9819#0: bind() 0.0.0.0:80 #6
2010/08/05 09:43:55 [notice] 9819#0: using the "kqueue" event method
2010/08/05 09:43:55 [debug] 9819#0: counter: 000000080060B080, 1
2010/08/05 09:43:55 [notice] 9819#0: nginx/0.7.67
2010/08/05 09:43:55 [notice] 9819#0: OS: FreeBSD 8.0.3-RELEASE
2010/08/05 09:43:55 [notice] 9819#0: kern.osreldate: 800107, built on 800107
2010/08/05 09:43:55 [notice] 9819#0: hw.ncpu: 16
2010/08/05 09:43:55 [notice] 9819#0: net.inet.tcp.sendspace: 32768
2010/08/05 09:43:55 [notice] 9819#0: kern.ipc.somaxconn: 128
2010/08/05 09:43:55 [notice] 9819#0: getrlimit(RLIMIT_NOFILE): 11095:11095
2010/08/05 09:43:55 [debug] 9820#0: write: 7, 00007FFFFFFFE7B0, 5, 0
2010/08/05 09:43:55 [notice] 9820#0: start worker processes
2010/08/05 09:43:55 [debug] 9820#0: channel 3:7
2010/08/05 09:43:55 [notice] 9820#0: start worker process 9821
2010/08/05 09:43:55 [debug] 9820#0: sigsuspend
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801E54000:16384
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801E58000:16384
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EAA000:16384
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EAE000:188416
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EDC000:114688
2010/08/05 09:43:55 [debug] 9821#0: malloc: 0000000801EF8000:114688
2010/08/05 09:43:55 [debug] 9821#0: kevent set event: 6: ft:-1 fl:0005
2010/08/05 09:43:55 [debug] 9821#0: healthcheck: Adding events to worker process 9821
2010/08/05 09:43:55 [debug] 9821#0: event timer add: 0: 5332:1280972640834
2010/08/05 09:43:55 [debug] 9821#0: event timer add: 0: 1927:1280972637429
2010/08/05 09:43:55 [debug] 9821#0: kevent set event: 7: ft:-1 fl:0005
2010/08/05 09:43:55 [debug] 9821#0: worker cycle
2010/08/05 09:43:55 [debug] 9821#0: kevent timer: 1927, changes: 2
2010/08/05 09:43:57 [debug] 9821#0: kevent events: 0
2010/08/05 09:43:57 [debug] 9821#0: timer delta: 1938
2010/08/05 09:43:57 [debug] 9821#0: event timer del: 0: 1280972637429
2010/08/05 09:44:46 [notice] 9820#0: signal 15 (SIGTERM) received, exiting

I have no idea yet about that

healthcheck ip_hash

Hello,

unfortunately this healthcheck module only works with upstream_round_robin.
Is there a possibility to use it with upstream_ip_hash as well?
we can not change this because we strongly depend on upstream_ip_hash module.

How to solve this issue?

Thanks,
Hans

Patch fails against ngx_http_upstream_round_robin.c

Hey,

I'm trying to apply this to the latest stable release of nginx and having issues getting the patch to apply cleanly.

nginx-0.7.65 $ patch -p1 --dry-run < modules/healthcheck_nginx_upstreams/nginx.patch
patching file src/http/ngx_http_upstream.c
Hunk #1 succeeded at 4153 (offset 7 lines).
patching file src/http/ngx_http_upstream.h
Hunk #1 succeeded at 105 with fuzz 2 (offset -6 lines).
can't find file to patch at input line 54
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:

--- src/http/ngx_http_upstream_round_robin.c.orig Mon Mar 8 20:55:32 2010
+++ src/http/ngx_http_upstream_round_robin.c Mon Mar 8 21:03:17 2010

File to patch:

For what it's worth, I get the same error trying to run this against 0.7.62, so maybe I'm doing something wrong. I'm not familiar enough with patch to patch this myself quickly.

Thanks,
Adam

have a question to ask in function[ngx_http_upstream_check_recv_handler]

if (peer->state != NGX_HTTP_CHECK_SEND_DONE) {

    if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
        goto check_recv_fail;
    }

    return;
}

why need to add this code