zhouchangxun / ngx_healthcheck_module Goto Github PK

nginx module for upstream servers health check. support stream and http upstream. 该模块可以为Nginx提供主动式后端服务器健康检查的功能（同时支持四层和七层后端服务器的健康检测）

Shell 0.40% C 99.60%

nginx healthcheck tcp udp http nginx-module loadbalance

ngx_healthcheck_module's Introduction

ngx-healthcheck-module

Travis CI build details : Click to see

(中文版本请参看这里)

Health-checker for Nginx upstream servers (support http upstream && stream upstream)
This module can provide NGINX with the capability of active back-end server health check (supports health check of both four and seven back-end servers).

Status
Description
Installation
Usage
Synopsis && Directive
- healthcheck
- check
Bugs and Patches
Author
Copyright and License
See Also

Status

This nginx module is still under development， you can help improve and it.

The project is also well developed in development, and you are welcome to contribute code, or report bugs. Together to make it better.

If you have any questions, please contact me:

QQ:373882405
mail: [email protected]

Description

When you use nginx as a load balancer, nginx natively provides only basic retries to ensure access to a normal backend server.

In contrast, this nginx third-party module provides proactive health State Detection for back-end servers.
It maintains a list of back-end servers that guarantee that new requests are sent directly to a healthy back-end server.

Key features:

Supports health detection for both four-tier and seven-tier back-end servers
Four-layer supported detection type: tcp / udp / http
Seven-layer supported detection Type: http / fastcgi
Provide a unified http status query interface, output format: html / json / csv / prometheus
Provide a unified http status query interface, output format: html / json / csv / prometheus
Support judge status according to http response code or body like check_http_expect_body ~ ".+OK.+";

Installation

git clone https://github.com/nginx/nginx.git
git clone https://github.com/zhouchangxun/ngx_healthcheck_module.git

cd nginx/;
git checkout branches/stable-1.12
git apply ../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.12+.patch

./auto/configure --with-stream --add-module=../ngx_healthcheck_module/
make && make install

Back to TOC

Usage

nginx.conf example

user  root;
worker_processes  1;
error_log  logs/error.log  info;
#pid        logs/nginx.pid;

events {
    worker_connections  1024;
}

http {
    server {
        listen 80;
        # status interface
        location /status {
            healthcheck_status json;
        }
        # http front
        location / { 
          proxy_pass http://http-cluster;
        }   
    }
    # as a backend server.
    server {
        listen 8080;
        location / {
          root html;
        }
    }
    
    upstream http-cluster {
        # simple round-robin
        server 127.0.0.1:8080;
        server 127.0.0.2:81;

        check interval=3000 rise=2 fall=5 timeout=5000 type=http;
        check_http_send "GET / HTTP/1.0\r\n\r\n";
        check_http_expect_alive http_2xx http_3xx;
    }
}

stream {
    upstream tcp-cluster {
        # simple round-robin
        server 127.0.0.1:22;
        server 192.168.0.2:22;
        check interval=3000 rise=2 fall=5 timeout=5000 default_down=true type=tcp;
    }
    server {
        listen 522;
        proxy_pass tcp-cluster;
    }
    
    upstream udp-cluster {
        # simple round-robin
        server 127.0.0.1:53;
        server 8.8.8.8:53;
        check interval=3000 rise=2 fall=5 timeout=5000 default_down=true type=udp;
    }
    server {
        listen 53 udp;
        proxy_pass udp-cluster;
    }
    
}

status interface

One typical output is(json format)

root@changxun-PC:~/nginx-dev/ngx_healthcheck_module# curl localhost/status
{"servers": {
  "total": 6,
  "generation": 3,
  "http": [
    {"index": 0, "upstream": "http-cluster", "name": "127.0.0.1:8080", "status": "up", "rise": 119, "fall": 0, "type": "http", "port": 0},
    {"index": 1, "upstream": "http-cluster", "name": "127.0.0.2:81", "status": "down", "rise": 0, "fall": 120, "type": "http", "port": 0}
  ],
  "stream": [
    {"index": 0, "upstream": "tcp-cluster", "name": "127.0.0.1:22", "status": "up", "rise": 22, "fall": 0, "type": "tcp", "port": 0},
    {"index": 1, "upstream": "tcp-cluster", "name": "192.168.0.2:22", "status": "down", "rise": 0, "fall": 7, "type": "tcp", "port": 0},
    {"index": 2, "upstream": "udp-cluster", "name": "127.0.0.1:53", "status": "down", "rise": 0, "fall": 120, "type": "udp", "port": 0},
    {"index": 3, "upstream": "udp-cluster", "name": "8.8.8.8:53", "status": "up", "rise": 3, "fall": 0, "type": "udp", "port": 0}
  ]
}}
root@changxun-PC:~/nginx-dev/ngx_healthcheck_module#

or (prometheus format)

root@changxun-PC:~/nginx-dev/ngx_healthcheck_module# curl localhost/status
# HELP nginx_upstream_count_total Nginx total number of servers
# TYPE nginx_upstream_count_total gauge
nginx_upstream_count_total 6
# HELP nginx_upstream_count_up Nginx total number of servers that are UP
# TYPE nginx_upstream_count_up gauge
nginx_upstream_count_up 0
# HELP nginx_upstream_count_down Nginx total number of servers that are DOWN
# TYPE nginx_upstream_count_down gauge
nginx_upstream_count_down 6
# HELP nginx_upstream_count_generation Nginx generation
# TYPE nginx_upstream_count_generation gauge
nginx_upstream_count_generation 1
# HELP nginx_upstream_server_rise Nginx rise counter
# TYPE nginx_upstream_server_rise counter
nginx_upstream_server_rise{index="0",upstream_type="http",upstream="http-cluster",name="127.0.0.1:8082",status="down",type="http",port="0"} 0
nginx_upstream_server_rise{index="1",upstream_type="http",upstream="http-cluster",name="127.0.0.2:8082",status="down",type="http",port="0"} 0
nginx_upstream_server_rise{index="1",upstream_type="stream",upstream="tcp-cluster",name="192.168.0.2:22",status="down",type="tcp",port="0"} 0
nginx_upstream_server_rise{index="2",upstream_type="stream",upstream="udp-cluster",name="127.0.0.1:5432",status="down",type="udp",port="0"} 0
nginx_upstream_server_rise{index="4",upstream_type="stream",upstream="http-cluster2",name="127.0.0.1:8082",status="down",type="http",port="0"} 0
nginx_upstream_server_rise{index="5",upstream_type="stream",upstream="http-cluster2",name="127.0.0.2:8082",status="down",type="http",port="0"} 0
# HELP nginx_upstream_server_fall Nginx fall counter
# TYPE nginx_upstream_server_fall counter
nginx_upstream_server_fall{index="0",upstream_type="http",upstream="http-cluster",name="127.0.0.1:8082",status="down",type="http",port="0"} 41
nginx_upstream_server_fall{index="1",upstream_type="http",upstream="http-cluster",name="127.0.0.2:8082",status="down",type="http",port="0"} 42
nginx_upstream_server_fall{index="1",upstream_type="stream",upstream="tcp-cluster",name="192.168.0.2:22",status="down",type="tcp",port="0"} 14
nginx_upstream_server_fall{index="2",upstream_type="stream",upstream="udp-cluster",name="127.0.0.1:5432",status="down",type="udp",port="0"} 40
nginx_upstream_server_fall{index="4",upstream_type="stream",upstream="http-cluster2",name="127.0.0.1:8082",status="down",type="http",port="0"} 40
nginx_upstream_server_fall{index="5",upstream_type="stream",upstream="http-cluster2",name="127.0.0.2:8082",status="down",type="http",port="0"} 43
# HELP nginx_upstream_server_active Nginx active 1 for UP / 0 for DOWN
# TYPE nginx_upstream_server_active gauge
nginx_upstream_server_active{index="0",upstream_type="http",upstream="http-cluster",name="127.0.0.1:8082",type="http",port="0"} 0
nginx_upstream_server_active{index="1",upstream_type="http",upstream="http-cluster",name="127.0.0.2:8082",type="http",port="0"} 0
nginx_upstream_server_active{index="1",upstream_type="stream",upstream="tcp-cluster",name="192.168.0.2:22",type="tcp",port="0"} 0
nginx_upstream_server_active{index="2",upstream_type="stream",upstream="udp-cluster",name="127.0.0.1:5432",type="udp",port="0"} 0
nginx_upstream_server_active{index="4",upstream_type="stream",upstream="http-cluster2",name="127.0.0.1:8082",type="http",port="0"} 0
nginx_upstream_server_active{index="5",upstream_type="stream",upstream="http-cluster2",name="127.0.0.2:8082",type="http",port="0"} 0
root@changxun-PC:~/nginx-dev/ngx_healthcheck_module#

Back to TOC

Synopsis

check

Syntax

check interval=milliseconds [fall=count] [rise=count] [timeout=milliseconds] [default_down=true|false] [type=tcp|udp|http] [port=check_port]

Default: interval=30000 fall=5 rise=2 timeout=1000 default_down=true type=tcp

Context: http/upstream || stream/upstream

This command can open the back-end server health check function.

Detail

interval: the interval of the health check packet sent to the backend.
fall (fall_count): the server is considered down if the number of consecutive failures reaches fall_count.
rise (rise_count): the server is considered up if the number of consecutive successes reaches rise_count.
timeout: timeout for the back-end health request.
default_down: set the initial state of the server, if it is true, it means that the default is down, if it is false, is up. The default value is true, which is the beginning of the server that is not available, to wait for the health check package reaches a certain number of times after the success will be considered healthy.
type: type of health check pack, now supports the following types
- tcp: simple tcp connection, if the connection is successful, it shows the back-end normal.
- udp: simple to send udp packets, if you receive icmp error (host or port unreachable), it shows the back-end exception.(Only UDP type checking is supported in the stream configuration block)
- http: send an HTTP request, by the state of the back-end reply packet to determine whether the back-end survival.

A example as followed:

stream {
    upstream tcp-cluster {
        # simple round-robin
        server 127.0.0.1:22;
        server 192.168.0.2:22;
        check interval=3000 rise=2 fall=5 timeout=5000 default_down=true type=tcp;
    }
    server {
        listen 522;
        proxy_pass tcp-cluster;
    }
    ...
}

healthcheck

Syntax: healthcheck_status [html|csv|json|prometheus]

Default: healthcheck_status html

Context: http/server/location

A example as followed:

http {
    server {
        listen 80;
        
        # status interface
        location /status {
            healthcheck_status;
        }
     ...
}

You can specify the default display format. The formats can be html, csv or json. The default type is html. It also supports to specify the format by the request argument. Suppose your check_status location is '/status', the argument of format can change the display page's format. You can do like this:

/status?format=html

/status?format=csv

/status?format=json

/status?format=prometheus

At present, you can fetch the list of servers with the same status by the argument of status. For example:

/status?format=json&status=down

/status?format=html&status=down

/status?format=csv&status=up

/status?format=prometheus&status=up

Back to TOC

Todo List

add testcase.
code style.
feature enhance.

Back to TOC

Bugs and Patches

Please report bugs

create GitHub Issue,

or submit patches by

new Pull request

Back to TOC

Author

Chance Chou (周长勋) [email protected].

Back to TOC

Copyright and License

The health check part is based on Yaoweibin's healthcheck module nginx_upstream_check_module (http://github.com/yaoweibin/nginx_upstream_check_module);

This module is licensed under the BSD license.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Back to TOC

glk123 jeaklee hebin917 iyism g3ne taomaree kwanhur inste sdghchj filipfrancis nginx-modules runto-lv kinfkong dineshbhor accessorlayer mingshic xueanguo zhuzhuxia666 kellerli romannule josephmilla fishgege sjw2003go wangzihuacool adnnoky investlab idmaoning blog2i2j aboruo mysunshine92 rekgrpth kevenlee zhanw15 simhaonline acheng-floyd chinagoldline netiv jakkeyang gdzy1987 popomonkey snapt w55554 anzhuang skyydq stoffus lovitus cuiyuanrong stonegr bjuniv myoldtime tangkelu pangbit xiaowei6688 timyinshi qys123888 fiag 1275788667 tumurzakov xiaoyumo nginx-plus wzxmt bailaobo freesunnyman wstar05 lizj3624 xxyyboy hopeday6688 kyougyun momentforever dchjingyue stuckinendlesschaos megaxchan zhongyongdadi alcat01 stingray-sre baby-lynch asd80703406 sonnt85 hik-pl dongtb222 mmoya jason-x-xu sudocodus lookingdreamer xsh29 zfy3000163 pratikmoitra patcharp akakajiang tivy2009 cvimer orangepeelpeel neuroticdave

ngx_healthcheck_module's Issues

Body/content check

Hello!

This module very great, however, some end point can't be check only with http_2xx or http_3xx. I hope you can make any enhancements to check body contents like https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/.

It's very useful to track endpoint that return 200 but the contents was invalid.

日志文件重复生成

ngx_healthcheck会同时在配置文件中定义的日志路径与默认路径下生成俩份日志文件

和tengine的ngx_http_upstream_check_module能一起用吗

这个模块和tengine的ngx_http_upstream_check_module是什么关系，可以取代ngx_http_upstream_check_module吗？
由于ngx_http_upstream_check_module只有7层健康检查，这两个module可以一起用吗，指令名都是一样的会不会冲突

udp 健康检测没有生效

需要考虑没有stream配置块时，调用获取状态接口的情况。

当没有stream block时, stream_peers_ctx==NULL, 需做判断,只输出http upstream server的状态。
备注：不用考虑没有http block的情况，因为这种情况也肯定没有healthcheck_status 指令存在。

Integration with dynamic upstream modules

An you add some functions for integration with dynamic upstream modules ?

for add upstream
ngx_uint_t ngx_http_upstream_check_add_dynamic_peer(ngx_pool_t *pool, ngx_http_upstream_srv_conf_t *us, ngx_addr_t *peer_addr);

for delete upstream
void ngx_http_upstream_check_delete_dynamic_peer(ngx_str_t *name, ngx_addr_t *peer_addr);

For example, i\ve try to build ngx_healthcheck_module with modules

https://github.com/xiaokai-wang/nginx-stream-upsync-module
https://github.com/weibocom/nginx-upsync-module

and have an error :-(
Refenence project, health-check, working with dynamic upstream is https://github.com/xiaokai-wang/nginx_upstream_check_module but it can't check TCP upstream :-(

TCP stream反向代理mysql,开启ngx_healthcheck_module会出现非常多的connection errors

用tcp stream反向代理mysql,开启ngx_healthcheck_module会产生非常多的错误链接，最终导致用域名反向代理链接失败，实际数据库服务器没问题，这个不知道是什么问题。错误提示是：ERROR 1129 (HY000): Host 'x.x.x.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'

健康检查结果不对

监控了两个udp服务，然后把两个服务都停掉，状态还是up

Support for random load balancing

We recently switched to using random two load balancing in our nginx setup. After adding this we've seen some issues where an upstream considered down actually receives requests. Seeems that random load balancing is not supported by this plugin?

被标记不可用的时间

问一下：
1、当服务被标记为不可用，这个时间维持多久，是检测中的timeout时间吗？

2、当一个服务被标记不可用的时间到了，恰好在检测的时候又请求进来，会转发到这个服务吗？

支持nginx 1.18版本吗?

http 健康检查在rise>1, upstream keepalive 情况下只会探测一次

因为第一次检测成功后，后端是长连接时，这个时候 peer->pc.connection != NULL
第二次就会在ngx_http_upstream_check_begin_handler()中返回，此时shm->rise_count也就没有更新

下面部分是我修改的代码(patch)，周大侠看看修改是否妥当，或者怎么修改比较好呢？多谢

`
--- ngx_healthcheck_module-master/ngx_http_upstream_check_module.c 2020-04-23 10:48:26.000000000 +0800
+++ ngx_http_upstream_check_module.c 2020-04-23 10:50:05.000000000 +0800
@@ -931,6 +931,13 @@

 ngx_add_timer(event, ucscf->check_interval / 2);

++// wohaiaini
++ if (peer->pc.connection && peer->shm->rise_count < ucscf->rise_count)
++ {
++ ngx_http_upstream_check_connect_handler(event);
++ return;
++ }
++
/* This process is processing this peer now. */
if ((peer->shm->owner == ngx_pid ||
(peer->pc.connection != NULL) ||
`

@zhouchangxun

用1.14的补丁打上了1.16.1的nginx,编译后用example conf运行测试后,无法得到结果

运行 localhost:808/status后显示如下, 没有任何检测信息输出. 这怎么排故障?(nginx -V显示有healthchec模块)
{"servers": {
"total": 0,
"generation": 1,
"http": [
],
"stream": [
]
}}

HTML格式下, 显示所有都是0

http upstream servers
up: 0 down: 0 total: 0

stream upstream servers
up: 0 down: 0 total: 0

total servers(check enabled): 0

not support dynamic add/delete detect node and found some bugs

Our project needs to use the proactive health detect function and may need to be combined with the thid-part dynamic domain name resolution module to dynamically adding or deleting upstream nodes. but ngx_healthcheck_module does not support dynamic APIs and has some bugs. so develop a new module .

ngx_healthcheck_module bugs after test in our project, Just for your reference

windows10， msys1.0，vc2019 c/c++，windows 10 sdk编译报错

windows编译报错
编译环境：
msys1.0
git
perl
windows 10 sdk
vc2019 c/c++生成工具
nginx_1.16

操作步骤：
1.下载MSYS-1.0.11
https://nchc.dl.sourceforge.net/project/mingw/MSYS/Base/msys-core/msys-1.0.11/MSYS-1.0.11.exe
2.下载nginx , ngx_healthcheck_module
git clone https://github.com/nginx/nginx.git
git clone https://github.com/zhouchangxun/ngx_healthcheck_module.git

Create a build and lib directories, and unpack zlib, PCRE and OpenSSL libraries sources into lib directory:
mkdir objs
mkdir objs
mkdir objs/lib
cd objs/lib
tar -xzf ../../pcre-8.44.tar.gz
tar -xzf ../../zlib-1.2.11.tar.gz
tar -xzf ../../openssl-1.1.1g.tar.gz
#打补丁
cd nginx/;
patch -p1 < ../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.16+.patch

5.编译
auto/configure
--with-cc=cl
--with-debug
--prefix=
--conf-path=conf/nginx.conf
--pid-path=logs/nginx.pid
--http-log-path=logs/access.log
--error-log-path=logs/error.log
--sbin-path=nginx.exe
--http-client-body-temp-path=temp/client_body_temp
--http-proxy-temp-path=temp/proxy_temp
--http-fastcgi-temp-path=temp/fastcgi_temp
--http-scgi-temp-path=temp/scgi_temp
--http-uwsgi-temp-path=temp/uwsgi_temp
--with-cc-opt=-DFD_SETSIZE=1024
--with-pcre=objs/lib/pcre-8.44
--with-zlib=objs/lib/zlib-1.2.11
--with-openssl=objs/lib/openssl-OpenSSL_1_1_1g
--with-openssl-opt=no-asm
--with-http_ssl_module
--with-stream --add-module=../ngx_healthcheck_module/
6. windows上执行：
nmake

nmake报错，具体报错信息

openresty1.17.8.2安装模块后访问status显示server数量为0

openresty版本1.17.8.2，源码编译安装都正常,另外健康检测也运行正常,能正常摘除宕机的主机,但是配置文件启动后访问html页面上显示server一直是0 down 0 total
openresty reload看错误日志提示"2021/12/17 15:21:12 [notice] 14270#14270: [ngx_healthcheck:stream] when init main conf. upstreams num:3"
配置文件

upstream test{
    server 192.168.10.134:8905;
    server 192.168.10.135:8900;
    server 192.168.10.136:8905;
    check interval=20000 rise=1 fall=3 timeout=5000 default_down=false type=http;
    check_http_expect_alive http_2xx http_3xx;
    check_http_send "POST /getPolicyInfo HTTP/1.0\r\n\r\n";
}
server {
    listen 80;
    server_name 192.168.11.43;
    location /status {
        healthcheck_status;
    }
    location / {
        proxy_set_header X-Forwarded-Host $host:$server_port;
        proxy_set_header X-Forwarded-Server $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host;
        proxy_cookie_path / "/; httponly; secure; SameSite=Lax";
        proxy_next_upstream error timeout http_502 http_503 http_504 http_500 non_idempotent;
        proxy_connect_timeout 5s;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        proxy_pass http://test;
    }
}

Build failure

Hi! I've encounter a build issue while building the v1.0 branch.

make[1]: *** Waiting for unfinished jobs.... /root/nginx_build/ngx_healthcheck_module/ngx_stream_upstream_check_module.c: In function ‘ngx_stream_upstream_check_init_shm_zone’: /root/nginx_build/ngx_healthcheck_module/ngx_stream_upstream_check_module.c:2158:31: error: ‘ngx_upstream_check_peer_shm_t’ undeclared (first use in this function) (number ) * sizeof(ngx_upstream_check_peer_shm_t);//last item not use :) ^ /root/nginx_build/ngx_healthcheck_module/ngx_stream_upstream_check_module.c:2158:31: note: each undeclared identifier is reported only once for each function it appears in objs/Makefile:1804: recipe for target 'objs/addon/ngx_healthcheck_module/ngx_stream_upstream_check_module.o' failed

I found out that in files ngx_stream_upstream_check_module.c and ngx_http_upstream_check_module.c there is a part of code which cause a build failure because of undefined ngx_upstream_check_peer_shm_t

https://github.com/zhouchangxun/ngx_healthcheck_module/blob/v1.0/ngx_stream_upstream_check_module.c#L2158

size = sizeof(*peers_shm) + (number ) * sizeof(ngx_upstream_check_peer_shm_t);//last item not use :)

Should I erase all these two lines, or use a source from the master branch?
Thanks.

Nginx1.15是否支持？

目前将您的这个项目交由组内运维同事进行编译测试。测试1.14可以通过、1.15下不通过。

new stable version 1.18.0 not supported?

segment fault under some case.

已发现的触发条件：

不添加http upstream
启动nginx
添加http upstream配置(with check setting)
reload nginx 则段错误

异常栈如下（抽空处理下）：

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:794
794	../sysdeps/x86_64/multiarch/memcmp-sse4.S: 没有那个文件或目录.
(gdb) bt
#0  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:794
#1  0x000055b6da068798 in ngx_http_upstream_check_find_shm_peer (addr=0x55b6da44cf30, p=0x55b6da406430)
    at ../ngx_healthcheck_module/ngx_http_upstream_check_module.c:3882
#2  ngx_http_upstream_check_init_shm_zone (shm_zone=0x55b6da40a7d8, data=<optimized out>)
    at ../ngx_healthcheck_module/ngx_http_upstream_check_module.c:3787
#3  0x000055b6d9fe35ec in ngx_init_cycle (old_cycle=old_cycle@entry=0x55b6da406480) at src/core/ngx_cycle.c:484
#4  0x000055b6d9ffae35 in ngx_master_process_cycle (cycle=0x55b6da406480) at src/os/unix/ngx_process_cycle.c:235
#5  0x000055b6d9fd18ca in main (argc=1, argv=<optimized out>) at src/core/nginx.c:382
(gdb) quit

不支持动态增删upstream节点以及发现几个Bug

项目上需要使用主动健康检查功能而且可能需要结合第三方动态域名解析模块动态增删探测节点，而该模块不支持动态API且存在一些问题，所以新开发了ngx_health_detect_module并修复了发现的bug，该模块功能等同于ngx_healthcheck_module模块 + restful api动态增删探测节点功能(开关控制，可关闭)

Please add support for display format prometheus

nginx-upsync-module

nginx-upsync-module 模块不能使用缺少 ngx_http_upstream_check_add_dynamic_peer ngx_http_upstream_check_delete_dynamic_peer

Mac compile ../ngx_healthcheck_module/ngx_http_upstream_check_module.c:58:9: note: did you intend to use '#pragma pack (pop)' instead of '#pragma pack()'? #pragma pack()

../ngx_healthcheck_module/ngx_http_upstream_check_module.c:58:9: note: did you intend to use
'#pragma pack (pop)' instead of '#pragma pack()'?
#pragma pack()

git apply 报错

PS D:\workspace-mine\nginx> git checkout
Your branch is up-to-date with 'origin/branches/stable-1.14'.
PS D:\workspace-mine\nginx> git apply ..\ngx_healthcheck_module\nginx_healthcheck_for_nginx_1.14+.patch
../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.14+.patch:9: trailing whitespace.
#if (NGX_HTTP_UPSTREAM_CHECK)
../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.14+.patch:10: trailing whitespace.
#include "ngx_http_upstream_check_module.h"
../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.14+.patch:11: trailing whitespace.
#endif
../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.14+.patch:19: trailing whitespace.
#if (NGX_HTTP_UPSTREAM_CHECK)
../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.14+.patch:20: trailing whitespace.
        ngx_log_debug1(NGX_LOG_DEBUG_HTTP, pc->log, 0,
error: patch failed: src/http/modules/ngx_http_upstream_hash_module.c:9
error: src/http/modules/ngx_http_upstream_hash_module.c: patch does not apply
error: patch failed: src/http/modules/ngx_http_upstream_ip_hash_module.c:9
error: src/http/modules/ngx_http_upstream_ip_hash_module.c: patch does not apply
error: patch failed: src/http/modules/ngx_http_upstream_least_conn_module.c:9
error: src/http/modules/ngx_http_upstream_least_conn_module.c: patch does not apply
error: patch failed: src/http/ngx_http_upstream_round_robin.c:9
error: src/http/ngx_http_upstream_round_robin.c: patch does not apply
error: patch failed: src/http/ngx_http_upstream_round_robin.h:38
error: src/http/ngx_http_upstream_round_robin.h: patch does not apply
error: patch failed: src/stream/ngx_stream_upstream_hash_module.c:8
error: src/stream/ngx_stream_upstream_hash_module.c: patch does not apply
error: patch failed: src/stream/ngx_stream_upstream_least_conn_module.c:8
error: src/stream/ngx_stream_upstream_least_conn_module.c: patch does not apply
error: patch failed: src/stream/ngx_stream_upstream_round_robin.c:9
error: src/stream/ngx_stream_upstream_round_robin.c: patch does not apply
error: patch failed: src/stream/ngx_stream_upstream_round_robin.h:49
error: src/stream/ngx_stream_upstream_round_robin.h: patch does not apply

无法正确检查“主机不可达”的UDP服务器的健康状态

对于同网段的UDP服务器（不同网段未测试），若服务器关闭或网络断开，健康检查将认为UDP包正常超时，认为该服务器状态正常可用，在ngx_event_connect_peer() 和ngx_stream_upstream_check_peek_one_byte()中都没有捕获到错误。

通过测试将以下代码去注释即可解决以上问题，但不知道最初对其的注释是出于什么原因，去注释后会不会导致什么问题？
/* (changxun): set sock opt "IP_RECVERR" in order to recv icmp error like host/port unreachable. /
/ note: we have invoke 'ngx_event_connect_peer() above. so the code we comment is not required.
int val = 1;
if( setsockopt( c->fd, SOL_IP, IP_RECVERR, &val, sizeof(val) ) == -1 ){
ngx_log_error(NGX_LOG_ERR, event->log, 0,
"setsockopt(IP_RECVERR) failed with peer: %V ",
&peer->check_peer_addr->name);
}
*/

配置未生效

我在nginx1.19.2上编译添加此模块，编译过程无错误
在stream块的upstream中添加了check参数，配置检查无错误，check参数如下
check interval=1000 rise=1 fall=1 timeout=1000 default_down=false type=tcp;

但查看healthcheck_status时显示“stream upstream servers”节点数量是0，json显示如下
{"servers": {
"total": 0,
"generation": 4,
"http": [
],
"stream": [
]
}}

当下游连接异常时，会不停的尝试连接，然后打出异常。UDP检测状态后面的服务关闭掉，也能检测出来。

[ngx_healthcheck:stream]when peek one byte, recv(): -1, fd: 21. peer: 127.0.0.1:5190 (111: Connection refused)

支持nginx 1.20版本吗

你好，支持nginx 1.20版本吗

该模块是否存在检查计数器会归0的情况？

我使用yaoweibin的模块，健康检查的时候Rise counts计数到达一定数值之后会归0重新开始计数，中间会导致一段时间无法提供服务。不知道这个是否存在这种问题？

/status?format=json&status=down 获取的返回不是标准的json

使用/status?format=json&status=down 查询状态。
返回的格式是这样的， {"http": [ {}, {}， ]} 。]左边多了个逗号。

{"servers": { "total": 6, "generation": 3, "http": [ {"index": 0, "upstream": "http-cluster", "name": "127.0.0.1:8080", "status": "up", "rise": 119, "fall": 0, "type": "http", "port": 0}, {"index": 1, "upstream": "http-cluster", "name": "127.0.0.2:81", "status": "down", "rise": 0, "fall": 120, "type": "http", "port": 0}， ], "stream": [ {"index": 0, "upstream": "tcp-cluster", "name": "127.0.0.1:22", "status": "up", "rise": 22, "fall": 0, "type": "tcp", "port": 0}, {"index": 1, "upstream": "tcp-cluster", "name": "192.168.0.2:22", "status": "down", "rise": 0, "fall": 7, "type": "tcp", "port": 0}, {"index": 2, "upstream": "udp-cluster", "name": "127.0.0.1:53", "status": "down", "rise": 0, "fall": 120, "type": "udp", "port": 0}, {"index": 3, "upstream": "udp-cluster", "name": "8.8.8.8:53", "status": "up", "rise": 3, "fall": 0, "type": "udp", "port": 0}， ] }}

docker环境UDP检测超时

在同一个Docker实例中，
NC检测结果：
nc -zvu 172.23.83.193 7660
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 172.23.83.193:7660.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.01 seconds.

插件检测结果：
[ngx_healthcheck:stream][timer]udp check time out with peer: 172.23.83.193:7660, we assum it's up :)

dynamic module support

@taomaree https://github.com/taomaree/ngx_healthcheck_module/tree/dynamic I can't build this branch, how to make it work?

After build ,Directives not found

Directives not found .Give some solution ,how to resolve this problem

upstream是否可以支持https ？

由于存在upstream检查后端server时是发送了数据的，导致有些服务（比如api-server）会出现TLS握手EOF的刷屏报错。如果能支持私有https的状态检查就更好了（类似于 curl -k https://{node-ip}:{node-port}/url）。
谢谢！

healthcheck server status down not closing active TCP connections

Hi,

Thanks for an amazing module. I have a question about closing existing TCP connections.
Is this the expected behavior that if the server health check shows "down" it only closes new / incoming connections but leaves open (active tcp connections)?

Obviously it works fine for UDP where there is no session mechanism.

make的时候报错

objs/addon/ngx_healthcheck_module/ngx_stream_upstream_check_module.o
objs/addon/ngx_healthcheck_module/ngx_healthcheck_common.o
objs/addon/ngx_healthcheck_module/ngx_healthcheck_status.o
objs/ngx_modules.o
-ldl -lpthread -lpthread -lcrypt -lpcre -lssl -lcrypto -ldl -lz -lprofiler
-Wl,-E
objs/addon/ngx_healthcheck_module/ngx_stream_upstream_check_module.o: In function ngx_stream_upstream_check_init_main_conf': /app/nginx/../ngx_healthcheck_module//ngx_stream_upstream_check_module.c:1776: undefined reference to ngx_stream_upstream_module'
objs/addon/ngx_healthcheck_module/ngx_stream_upstream_check_module.o: In function ngx_stream_upstream_check_init_process': /app/nginx/../ngx_healthcheck_module//ngx_stream_upstream_check_module.c:2227: undefined reference to ngx_stream_module'
collect2: error: ld returned 1 exit status
make[1]: *** [objs/nginx] Error 1
make[1]: Leaving directory `/app/nginx'
make: *** [build] Error 2

完全按步骤做的为什么一直报这个错，不知道该怎么解决

是因为我configure的参数太多了吗

./auto/configure --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-compat --with-debug --with-file-aio --with-google_perftools_module --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_degradation_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_perl_module=dynamic --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-http_xslt_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-stream_ssl_preread_module --with-threads --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E' --add-module=/app/ngx_healthcheck_module

Update installation instructions

According to instructions I was not able to compile. Minor modifications where needed (CentOS 7.5).
This seems to work for me:

git clone https://github.com/nginx/nginx.git
git clone https://github.com/zhouchangxun/ngx_healthcheck_module.git

cd nginx/;
git checkout branches/stable-1.12
git apply ../ngx_healthcheck_module/nginx_healthcheck_for_nginx_1.12+.patch

./auto/configure --with-stream --add-module=../ngx_healthcheck_module/
make && make install

check_http_send doesn't work in stream configuration

Hi! thank you for great module!
It seems there is an issue when working with Four-layer balancing. I try to check upstream servers in tcp balancing by http health check on custom endpoint and I cannot define endpoint by 'check_http_send' option. It always send's requests to "/" instead of "/my_custom_endpoint". The same configuration works fine when checking servers in http Seven-later balancing

请问这个能支持nginx 1.17吗? 能直接给nginx打补丁不?

希望增加type=ssl_hello模式

四层负载探测ssl 端口时，如果使用tcp 或者http 出现刷屏EOF错误

When tcp check successed, error log(11: Resource temporarily unavailable)

when tcp check successed, still log exist blew error
[ngx-healthcheck][stream] when recv one byte, recv(): -1, fd: 3 (11: Resource temporarily unavailable)
when tcp check failed, error log is blew:
[ngx-healthcheck][stream] when recv one byte, recv(): -1, fd: 3 (111: Connection refused)
and in tcp error , nginx error_log level is [info] .

Unexpected false negatives

I use http-type check to trace health of upstreams. Sometimes module wrongly marks upstream as failed although it gets 200 OK. By checking dump I noticed the following things:

if check is marked as failed (recorded to error.log as check time out with peer) (even it gets 200 OK from remote host) then nginx sends RST immediately after getting reply
if check is marked as passed then normal session close is happened (with FIN/ACK)
if I lower number of nginx workers from auto (40) to 5-10 then false negatives become very rare
if I raise timeout from 2-3 seconds to 20-30 seconds then false negatives become very rare too

Does each nginx worker run its own checks for upstream(s) or there's one 'process' which manages these checks?

TCP 检查mysql的时候会导致错误日志一直报Got an error reading communication packets

希望大佬能写一个专门除了http、tcp检查之外，额外增加一个mysql的检车，否则主动监控检查mysql的时候mysql错误日志里会一直出现报错 Got an error reading communication packets，并且如果mysql配置文件里不设置 host_cache_size = 0，当error次数达到max_connect_errors值得时候，nginx的IP地址就会被mysql禁止连接，导致应用程序报错

zhouchangxun / ngx_healthcheck_module Goto Github PK

ngx_healthcheck_module's Introduction

ngx-healthcheck-module

Table of Contents

Status

Description

Installation

Usage

Synopsis

check

healthcheck

Todo List

Bugs and Patches

Author

Copyright and License

See Also

ngx_healthcheck_module's People

Contributors

Stargazers

Watchers

Forkers

ngx_healthcheck_module's Issues

Recommend Projects

Recommend Topics

Recommend Org