netscaler / netscaler-adc-metrics-exporter Goto Github PK

View Code? Open in Web Editor NEW

87.0 19.0 33.0 4.95 MB

Export metrics from Citrix ADC (NetScaler) to Prometheus

Dockerfile 1.12% Python 97.94% HTML 0.94%

cncf prometheus netscaler metrics citrix devops grafana prometheus-exporter citrix-adc citrix-ingress-controller

netscaler-adc-metrics-exporter's People

Contributors

Stargazers

Watchers

netscaler-adc-metrics-exporter's Issues

Enhancement: Multiple ADC instance stats

To monitor multiple netscaler instances we would need to run multiple exporters on multiple ports. It would be good to have a single exporter to monitor multiple netscalers. Are there any plans to include this feature?

Failed to fetch some metrics

Hi,

I've followed the README, added user & policy to my Netscaler, installed this exporter in my Rancher cluster (docker orchestrator) and I get some metrics, although not all, and specifically, missing those the ones I was looking up to (service & servicegroup up/down status).

here is the debug log from the exporter.

8/2/2019 3:09:23 PM2019-08-02T13:09:23+0000 WARNING  Could not collect metric: 'servicegroupmember'
8/2/2019 3:09:23 PM2019-08-02T13:09:23+0000 INFO     Collecting metric ns for w.x.y.z:443
8/2/2019 3:09:23 PM2019-08-02T13:09:23+0000 INFO     Collecting metric lbvserver for w.x.y.z:443
8/2/2019 3:09:23 PM2019-08-02T13:09:23+0000 INFO     Collecting metric protocolip for w.x.y.z:443
8/2/2019 3:09:23 PM2019-08-02T13:09:23+0000 INFO     Collecting metric nscapacity for w.x.y.z:443
8/2/2019 3:09:23 PM2019-08-02T13:09:23+0000 INFO     metrices for lbvserver with k8sprefix "VIP" are not fetched
8/2/2019 3:09:27 PM2019-08-02T13:09:27+0000 INFO     Collecting metric protocoltcp for w.x.y.z:443
8/2/2019 3:09:27 PM2019-08-02T13:09:27+0000 INFO     Collecting metric aaa for w.x.y.z:443
8/2/2019 3:09:27 PM2019-08-02T13:09:27+0000 INFO     Collecting metric service for w.x.y.z:443
8/2/2019 3:09:27 PM2019-08-02T13:09:27+0000 WARNING  Could not collect metric: u'service'
8/2/2019 3:09:27 PM2019-08-02T13:09:27+0000 INFO     Collecting metric csvserver for w.x.y.z:443
8/2/2019 3:09:28 PM2019-08-02T13:09:28+0000 INFO     Collecting metric Interface for w.x.y.z:443
8/2/2019 3:09:28 PM2019-08-02T13:09:28+0000 INFO     Collecting metric system for w.x.y.z:443
8/2/2019 3:09:28 PM2019-08-02T13:09:28+0000 INFO     Collecting metric protocolhttp for w.x.y.z:443
8/2/2019 3:09:28 PM2019-08-02T13:09:28+0000 INFO     Collecting metric ssl for w.x.y.z:443
8/2/2019 3:09:28 PM2019-08-02T13:09:28+0000 INFO     Collecting metric services for w.x.y.z:443

When switching to HTTP & doing a network trace, I can see that the NS answers 200 to most requests, and that the answer seems correct, so that's the exporter unable to process them . for example :

 curl https://user:[email protected]/nitro/v1/stat/servicegroup/ServiceGroup_Name?statbindings=yes
{ "errorcode": 0, "message": "Done", "severity": "NONE", "servicegroup": [ { "servicegroupname": "ServiceGroup_Name", "state": "ENABLED", "servicetype": "HTTP" } ]

Could it be an incompatibility with my hardware/firmware ?
Hardware : NSMPX-8000-10G
Firmware : NS12.1 Build 52.15
Exporter versions : 1.0.7 and/or latest (found no release note / version history)

Export not Collecting metric lbvserver

After configuring netscaler-metrics-exporter it is not returning VPX lbvserver

Changelog

Hello,

Could you please create a CHANGELOG file?
Especially for upgrade procedures like added dependency (e.g. pip install retrying) and stuff like this.

Thanks.

system cpu usage percent reported as 4294967295

Describe the bug

We are using this exporter and are seeing invalid data returned by the /nitro/v1/stat/system endpoint.

JSON provided by the endpoint /nitro/v1/stat/system (redacted for brevity):

{ "errorcode": 0, "message": "Done", "severity": "NONE", "system": { "cpuusage": "4294967295", "rescpuusage": "9", "slavecpuusage": "0", "mastercpuusage": "9", "numcpus": "3", "memusagepcnt": 17.192564, "memuseinmb": "897", "addimgmtcpuusagepcnt": 0.000000, "mgmtcpu0usagepcnt": 1.000000, "mgmtcpuusagepcnt": 1.000000, "pktcpuusagepcnt": 8.600000, "cpuusagepcnt": 4294967295.000000, "rescpuusagepcnt": 4294967295.000000 } }

Note that system.cpuusagepcnt is reported as 4294967295.000000, (2**32 - 1). I suspect there is a divide by zero or similar in the stats code. It would be easier to graph CPU usage if this case could be handled as zero rather than infinity.

Forgive me if this is the wrong place to raise this bug.

Can we get config endpoints?

https://github.com/citrix/citrix-adc-metrics-exporter/blob/942d8ee6dfc132bb7cb14a9d7e9b845b2e3b9420/metrics.json#L360

If you try to reach this as a stat endpoint you get:
{ "errorcode": 1232, "message": "Invalid object name [lbvserver_binding]", "severity": "ERROR" }

Maybe you wanted to add:
["vslbhealth", "lb_members_up_total"]
In the lbvserver block as a gauge?

Kind regards,
Rafael.

python 2.7

Error while reading config file::'module' object has no attribute 'FullLoader'

Solution: pip install PyYAML

Metric name typo: ip_rx_packers_rate

The metric called ip_rx_packers_rate specified in metrics.json should probably be called ip_rx_packets_rate, like its companion ip_tx_packets_rate.

'NoneType' object has no attribute 'getitem'

When the exporter is collecting data, some metrics fail with 'NoneType' object has no attribute '__getitem__'

The exporter is running in a docker container, latest version from master (1.0.5), with the following flags:

    - --target-nsip=<target host>
    - --port=9400
    - --username=<user>
    - --password=<password>
    - --secure=yes

The exporter was tested by retrieving the /metrics url.

Enhancement: performance improvements

Depending on the amount of content switches and virtual services the scraping can go up to ~30s. Maybe there are some easy optimisations that could be made to speed things up?

Typo in metrics.json

There is a typo in metrics.json :
["avgcltttlb", "citrixadc_lb_avergage_ttlb"] should be ["avgcltttlb", "citrixadc_lb_average_ttlb"] I guess

yaml loader deprecation warning

Describe the bug

exporter.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(stream)

Additional context
Python 3.6 on RHEL7.7

exporter.py has no shebang

There is no header in exporter.py telling which interpreter to use, and the file is not an executable.
Fixing this will not affect current instructions and running python exporter.py will work as before. Adding this will make it unnecessary, though, and ./exporter will be sufficient.

Enhancement: add sslvserver metrics

We found that the statistics for SSL virtual server resources were lacking in the metrics.json file and we would like to be able to collect these.

We have created the following addition for the metrics.json file which allows the collection of these metrics.

    "sslvserver": {
        "counters": [
            ["sslctxtotdecbytes", "citrixadc_sslvserver_decrypt_bytes_total"],
            ["sslctxtotencbytes", "citrixadc_sslvserver_encrypt_bytes_total"],
            ["sslctxtothwdec_bytes", "citrixadc_sslvserver_decrypt_hardware_bytes_total"],
            ["sslctxtothwencbytes", "citrixadc_sslvserver_encrypt_hardware_bytes_total"],
            ["sslctxtotsessionnew", "citrixadc_sslvserver_session_new_total"],
            ["sslctxtotsessionhits", "citrixadc_sslvserver_session_hits_total"],
            ["ssltotclientauthsuccess", "citrixadc_sslvserver_auth_success_total"],
            ["ssltotclientauthfailure", "citrixadc_sslvserver_auth_failure_total"]
        ],
        "gauges": [
            ["vslbhealth", "citrixadc_sslvserver_health"],
            ["actsvcs", "citrixadc_sslvserver_active_services"],
            ["sslclientauthsuccessrate", "citrixadc_sslvserver_auth_success_rate"],
            ["sslclientauthfailurerate", "citrixadc_sslvserver_auth_failure_rate"],
            ["sslctxencbytesrate", "citrixadc_sslvserver_encrypt_bytes_rate"],
            ["sslctxdecbytesrate", "citrixadc_sslvserver_decrypt_bytes_rate"],
            ["sslctxhwencbytesrate", "citrixadc_sslvserver_hw_encrypt_bytes_rate"],
            ["sslctxhwdecbytesrate", "citrixadc_sslvserver_hw_decrypt_bytes_rate"],
            ["sslctxsessionnewrate", "citrixadc_sslvserver_session_new_rate"],
            ["sslctxsessionhitsrate", "citrixadc_sslvserver_session_hits_rate"]
        ],
        "labels": [
            ["vservername", "citrixadc_sslvserver_name"],
            ["type", "citrixadc_sslvserver_type"],
            ["primaryipaddress", "citrixadc_sslvserver_ip"],
            ["state", "citrixadc_sslvserver_state"]
        ]
    }

add custom label

Is it possible to add custom label to each metric, for example, the hostname of the netscaler ?

(We plan on exposing nitroAPI on a different network, so it's not feasible for us to use hostname instead of nsip in the --target-nsip arg)

Enhancement: Extract and add label for ServiceGroup member name

Describe the bug
Not a bug, but an enhancement: Get all the servicegroupmember names of a servicegroup member and add that as label.
You can get the members of a servicegroup with its members with the API-Request:

http://<netscaler-ip-address>/nitro/v1/stat/servicegroup/lbvip1?statbindings=yes

The servicegroupmembername is visible in the servicegroupname metric of the members:

{
    "errorcode": 0,
    "message": "Done",
    "severity": "NONE",
    "servicegroup": [
        {
            "servicegroupname": "lbvsg-SERVICEGROUP1-443",
            "state": "ENABLED",
            "servicetype": "SSL",
            "servicegroupmember": [
                {
                    "servicegroupname": "lbvsg-SERVICEGROUP1-443?SERVERNAME?443",
                    "port": 0,
                    "avgsvrttfb": "0",
                    "primaryipaddress": "10.x.y.z",
                    "primaryport": 443,
                    "servicetype": "SSL",
                    "state": "UP",
                    "totalrequests": "1960838",
                    "requestsrate": 0,
                    "totalresponses": "1960787",
                    "responsesrate": 0,
                    "totalrequestbytes": "1296529870",
                    "requestbytesrate": 89,
                    "totalresponsebytes": "3693259016",
                    "responsebytesrate": 109,
                    "curclntconnections": "0",
                    "surgecount": "0",
                    "cursrvrconnections": "0",
                    "svrestablishedconn": "0",
                    "curreusepool": "0",
                    "maxclients": "0",
                    "totsvrttlbtransactions": "0",
                    "toleratingttlbtransactions": "0",
                    "frustratingttlbtransactions": "0"
                },
                {
                ... other servicegroupmember stats
                }

Expected behavior
For every member of a servicegroup it is possible to have the label with its member-name (the server name in Netscaler) for more descriptive metrics because "primaryipaddress" is not really "human-friendly" in dashboards.
An example usage is the top10 list of members (with its names, not ip addresses) with the highest TTFB values.

Persolized gauge AAA session

Thank you very much for your work. We would like to have a persolized metric in the AAA session, for example, a persolized gauge to count the number of ips of users and devices. Is this possible?

up metric

As per:
https://prometheus.io/docs/instrumenting/writing_exporters/#failed-scrapes

It's good practice to create the up metric for the exporter like in:
https://github.com/prometheus/haproxy_exporter/blob/146b612c9e13960a8c9adf0e98f50a6ad7e96e1f/haproxy_exporter.go#L321

Although it seems we have it on citrix_exporter it never seems to go to 0.

It would be great to have it set to 0 when the ADC fetch is not successful.

Upgrade dependencies

Please upgrade the pip dependencies of the project as well as python version.

Gauges?

https://github.com/citrix/citrix-adc-metrics-exporter/blob/15f435bad8b3bef42367caf4be5b954a9d7b1415/metrics.json#L5

daystoexpiration also?

Shouldn't this be a gauge as it can go down?

Sporadic metrics not pulling through

Describe the bug
When checking the metrics being exported by the pod, it will on occasion not pull through all the metrics with probe_success being 0.0

The host itself is fine and as soon as you refresh, all the metrics gets pulled through.

This also happens when I curl localhost on the pod itself. This causes alerts on our monitoring systems when the node itself is fine.

Example output when the metrics don't come through.

HELP python_gc_objects_collected_total Objects collected during gc
TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 435.0
python_gc_objects_collected_total{generation="1"} 12.0
python_gc_objects_collected_total{generation="2"} 0.0
HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
HELP python_gc_collections_total Number of times this generation was collected
TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 64.0
python_gc_collections_total{generation="1"} 5.0
python_gc_collections_total{generation="2"} 0.0
HELP python_info Python platform information
TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="8",patchlevel="10",version="3.8.10"} 1.0
HELP process_virtual_memory_bytes Virtual memory size in bytes.
TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.264512e+07
HELP process_resident_memory_bytes Resident memory size in bytes.
TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.6468352e+07
HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
TYPE process_start_time_seconds gauge
process_start_time_seconds 1.71146154557e+09
HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 802.16
HELP process_open_fds Number of open file descriptors.
TYPE process_open_fds gauge
process_open_fds 9.0
HELP process_max_fds Maximum number of open file descriptors.
TYPE process_max_fds gauge
process_max_fds 1.048576e+06
HELP citrixadc_probe_success probe_success
TYPE citrixadc_probe_success gauge
citrixadc_probe_success{nsip="pl2-ns-dmz2"} 0.0"

To Reproduce
Steps to reproduce the behavior:

Steps - curl localhost:8888 on the pod multiple times until you notice the metrics not being pulled through.
Version of the metrics exporter - 1.4.9
Version of the Citrix ADC MPX/VPX/CPX - NS13.0 92.21.nc
Logs from the metrics exporter

Expected behavior
All metrics pulled through all the time.

Additional context
Add any other context about the problem here.

Exception in thread

From time to time (at least once a week) i got an error what caused thread spawning on exporter host plus management cpu utilization on netscaler goes up to 100%

Error log:
`Exception in thread Thread-6927:
Traceback (most recent call last):
File "/usr/lib64/python3.4/socketserver.py", line 617, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib64/python3.4/socketserver.py", line 344, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python3.4/socketserver.py", line 673, in init
self.handle()
File "/usr/lib64/python3.4/http/server.py", line 401, in handle
self.handle_one_request()
File "/usr/lib64/python3.4/http/server.py", line 389, in handle_one_request
method()
File "/usr/lib/python3.4/site-packages/prometheus_client/exposition.py", line 153, in do_GET
self.wfile.write(output)
File "/usr/lib64/python3.4/socket.py", line 398, in write
return self._sock.send(b)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib64/python3.4/threading.py", line 911, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.4/threading.py", line 859, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.4/socketserver.py", line 620, in process_request_thread
self.handle_error(request, client_address)
File "/usr/lib64/python3.4/socketserver.py", line 360, in handle_error
print('-'*40)
OSError: [Errno 5] Input/output error`

Enhancement: Allow for pulling data from the /nitro/v1/config/

Currently only stats can be obtained from /nitro/v1/stat, but there are other metrics that would be beneficial that can only be obtained from /nitro/v1/config/. It would be awesome if there was a way to specify in the config file to obtain a stat from the config endpoint.

exporter crashes on NoneType data_items

When looping through data_items from entity_stats it appears some items are NoneType in my environment. This causes the exporter to crash and die.

I get this error in the logs:

2019-06-25T10:24:39+0000 ERROR    Exiting: invalid arguments! could not register collector for ['10.0.0.1']::'NoneType' object has no attribute 'keys'

After some debugging, I can see that entity_stats is a list of dicts, but some entries are coming back as NoneType . After some of these NoneType entries, I see dicts again, and later more NoneType entries.

Global Load Balancer metrics

Thank you for all the work on this exporter :)

Is it possible to have the following with this exporter:

global load balancer metrics
underlying IPs for services OR status of services global load balancer is routing to? (this is to alert in Prometheus on DNS failover)

Thank you! really appreciate any help.

Typo in metrics.json

Describe the bug
There is a typo in metrics.json:
["sslcryptoutilizationstat", "citrixadc_ssl_crypto_untilization_stat"],
I think it should be citrixadc_ssl_crypto_utilization_stat

TabError: inconsistent use of tabs and spaces in indentation

File "./exporter.py", line 274
logger.error('Exiting since NS access test failed for nsip {}'.format(nsip) )

File "./exporter.py", line 278
logger.error('Invalid k8sprefix : non-alphanumeric not accepted')

urllib3 is not part of requests anymore

Describe the bug
pip install requests will install latest version of the lib
this doesn't work on latest version of the requests lib:
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

This logs fills up the FS in days.

To Reproduce
Steps to reproduce the behavior:

Use any ADC with self signed cert
Latest as today
Any
Log:
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:986: InsecureRequestWarning: Unverified HTTPS request is being made to host

Expected behavior
No log.

Probable solution
Something in the lines of:
Lines to remove:
-from requests.packages.urllib3.exceptions import InsecureRequestWarning
-requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
-requests.packages.urllib3.disable_warnings(SubjectAltNameWarning)

Lines to add in the proper place:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
urllib3.disable_warnings(urllib3.exceptions.SubjectAltNameWarning)

netscaler / netscaler-adc-metrics-exporter Goto Github PK

netscaler-adc-metrics-exporter's People

Contributors

Stargazers

Watchers

Forkers

netscaler-adc-metrics-exporter's Issues

Recommend Projects

Recommend Topics

Recommend Org