ganglia / gmond_python_modules Goto Github PK

View Code? Open in Web Editor NEW

389.0 389.0 354.0 1.92 MB

Repository of user-contributed Gmond Python DSO metric modules

Home Page: http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_gmond_python_modules

PHP 5.66% Python 94.33% Shell 0.01%

gmond_python_modules's People

Contributors

Stargazers

Watchers

Forkers

rcrowley orieg hirose31 jnewland pdt256 sholiday dhutty osabina thebaldwin marcosmamorim jackywu elfe joshdevins alexishuxley carenas empower bernardl cristim kmullin dushyy frasermcampbell chemikadze phobos182 sunqiang gosquared georgiou gallois formspring jmartelletti npinto tmg-nl nathan-gs drawks sfontes ajsince1986 jimjcollins holdenk lovelysystems evanjfraser egallen sjthespian monkeymantra waxie phrawzty haegongster initrd asinbow funollet alappe atdt jonlives gofullstack johnewart barnybug seacoastboy hassenriahi blackthornedk martinwalsh ajdecon drgonzo65 doctaweeks maplebed xfgong elmer-garduno johann8384 dilchenko mijairaf zwqjsj0404 hc-itops ohlinux xmurobi subroto himanshu31 diaomianren888 noodlesnz jbarber nbald huzheng phulevts santhoshsukumaran mingchen himgod webbob wgencmac amplify-education xueqiu hashar bradcater diyan ch3m deeno35 mnikhil-git thomasalrin jasonthomas tedski dennisvdvliet anikundesu zvictorino arnaudaz kuzalex2

gmond_python_modules's Issues

ipmi.py sends ipmi metrics with forward-slash in their names

This report applies to ganglia 3.7.2 as distributed with CentOS-7.5.

Many errors appear in the logs like this:

gmetad: RRD_create: creating '/var/lib/ganglia/rrds/__SummaryInfo__/ipmi_bb_1_8v_ib_i/o.rrd': No such file or directory
gmetad: Unable to write meta data for metric ipmi_bb_1_8v_ib_i/o to RRD

I'm no Python programmer, but it looks like the routine mangle_metric_name needs a .replace("/","_") added to its last return statement.

redis-gmond plugin problem

Hi,

i want to use redis-gmond plugins to monitor my redis cluster. But in one node i set two redis instances. So i have to set two conf file in /etc/ganglia/conf.d names as "redis-gmond-6379.pyconf" and "redis-gmond-6380.pyconf". And link to the files "redis-gmond-6379.py" and "redis-gmond-6380.py" in /ganglia/python_modules. As my thought, there two redis groups in this node to monitor port 6379 and port 6380. But when i restart the gmond service, there is only one redis group there.

So could you help me to solve the problem and monitor 2 instances in one node ?

Thanks.

yyx

php_fpm get status from unix socket

Please add feature to php_fpm so it could get status from unix socket . Currently it seems only supported get status via tcp only

The GPU Nvidia module's documentation should make clear what do with the patch file

The GPU Nvidia module (/ganglia/gmond_python_modules/gpu/nvidia/) should make clear what do with the patch file ganglia_web.patch. (I don't know the answer)

/usr/sbin/gmond[4818]: [PYTHON] Can't call the metric_init function in the python module.

procstat.py does not see metrics for sge_qmaster

I have Univa Grid Engine 8.1.7p5 running. The sge_qmaster process is running, and top(1) shows that the process has non-zero CPU and memory usage. However, procstat.py reports 0. for both:

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
  ...
  41856 root      20   0  162m  77m 9580 S 29.7  0.1   1250:27 sge_qmaster

but:

python procstat.py -p sge_qmaster -v '/cm/shared/apps/sge/var/default/spool/qmaster.pid'  -t
 Testing sge_qmaster: /cm/shared/apps/sge/var/default/spool/qmaster.pid
 Processes in this group: 
 PID, ARGS
 41856 /cm/shared/apps/sge/current/bin/lx-amd64/sge_qmaster
 waiting 2 seconds
 procstat_sge_qmaster_mem: 0 KB [The total memory utilization]
 procstat_sge_qmaster_cpu: 0.0 percent [The total percent CPU utilization]

Memory usage is very low, so I can understand the 0 KB reported value. But CPU usage has been constantly > 20%, and goes up to 125%, for the past several hours, and the procstat module does not show any load.

From /proc/$pid/status:

Name:   sge_qmaster
State:  S (sleeping)
Tgid:   41856
Pid:    41856
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
Utrace: 0
FDSize: 256
Groups: 0 
VmPeak: 67820056 kB
VmSize:   166456 kB
VmLck:         0 kB
VmHWM:  62500364 kB
VmRSS:     67696 kB
VmData:   132300 kB
VmStk:        88 kB
VmExe:      2812 kB
VmLib:      5636 kB
VmPTE:       512 kB
VmSwap:      260 kB
Threads:        13
SigQ:   0/514879
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffbfeff
SigIgn: 0000000000010001
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed:   ffff,ffffffff
Cpus_allowed_list:      0-47
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list:      0-1
voluntary_ctxt_switches:        335
nonvoluntary_ctxt_switches:     2738

vm_stats module's vm_vmeff metric ends up with error

Hi,
Thanks in advance for great module!

All metrics other than vm_vmeff works fine, but vm_vmeff looks like failing.
Found error below in syslog.
[PYTHON] Can't call the metric handler function for [vm_vmeff] in the python module [vm_stats].

Here's my environments.

OS: centos 5.7
$ python -V
Python 2.4.3
$ gmond --version
gmond 3.1.7

Does Ganglia support Email Alert, Port Monitoring and URL Monitoring as well?

Nvidiamodules and Ganglia 3.2.0

The patch for upgrading host_view does not work for the new Ganglia version 3.2.0. The errors seems to be caused by change in how the host_view.php invokes the template file. I have not found a solution yet.

Unsupported NVML Commands

After installation and enabling the python module my node becomes DOWN in ganglia. Testing the python module directly results in the following error:

# python nvidia_smi.py
Traceback (most recent call last):
  File "nvidia_smi.py", line 873, in <module>
    print(XmlDeviceQuery())
  File "nvidia_smi.py", line 503, in XmlDeviceQuery
    strResult += GetClocksThrottleReasons(handle);
  File "nvidia_smi.py", line 184, in GetClocksThrottleReasons
    val = handleError(NVML_ERROR_NOT_SUPPORTED);
  File "nvidia_smi.py", line 196, in handleError
    if (err.value == NVML_ERROR_NOT_SUPPORTED):
AttributeError: 'int' object has no attribute 'value'
[root@snc-gpu site-packages]# python --version
Python 2.7.5

Thoughts?

Gmond and modpython.so

I'm not sure if this is the good place to post this one :
I get a segmentation fault the module modpython.so is loaded. The cluster is running on CentOS 6.4, I installed everything from the epel repos, yet I cant make this work.

Here is what I added to the gmond config file

modules {
  ...
  module {
    name = "python_module"
    path = "/usr/lib64/ganglia/modpython.so"
    params = "/usr/lib64/ganglia/python_modules/"
  }
}
include ('/etc/ganglia/conf.d/*.pyconf')

I did not add any python modules, I tried removing the already installed ones, nothing worked.
Python 2.6.6, gcc 4.7.0, gmond & gmetad 3.7.1

Any advices ?

ganglia no data.why?

exec followling the python scrips,but in ganglia xml file,the value is 0.why?

!/usr/bin/env python

acpi_file = "./file"

def temp_handler(name):
try:
f = open(acpi_file, 'r')

except IOError:
    return 0

for l in f:
    line = l.split(':')

return int(line[1])

def metric_init(params):
global descriptors, acpi_file

if 'acpi_file' in params:
    acpi_file = params['acpi_file']

d1 = {'name': 'temp',
    'call_back': temp_handler,
    'time_max': 90,
    'value_type': 'uint',
    'units': 'C',
    'slope': 'both',
    'format': '%u',
    'description': 'Temperature of host',
    'groups': 'health'}

descriptors = [d1]

return descriptors

def metric_cleanup():
'''Clean up the metric module.'''
pass

This code is for debugging and unit testing

if name == 'main':
metric_init({})
for d in descriptors:
v = d'call_back'
print 'value for %s is %u' % (d['name'], v)

the exec result is :value for temp is 8.

MySQL master stats: binlog_space_total wraps around 32bit max value

Hi team!

I have noticed this as MySQL binary logs grew to around 17 GB in size.
Ganglia only reports binlog_space_total as big as 2 GB and then values above that wrap around between 0 and 2 GB.
It is easy to reproduce by hardcoding the mysql_stats['binlog_space_total'] to a value bigger than 2GB in the python code where it gets summed up from the MySQL "SHOW MASTER LOGS" response cursor.

Attempts to call gmetric manually on the command line to push metric values around 17Gb work as expected and the ganglia graphs display them accordingly.

Please confirm this.

Thanks!
Alex

Mysqld module errors

If i try add module to ganglia i get error in logs:

/usr/sbin/gmond[3861]: [PYTHON] Can't call the metric_init function in the python module [mysql].#012

if i try run module form console:

# python /usr/lib/ganglia/python_modules/mysql.py -u root -p password --no-master --no-slave
Traceback (most recent call last):
  File "/usr/lib/ganglia/python_modules/mysql.py", line 1160, in <module>
    'unix_socket': options.unix_socket,
  File "/usr/lib/ganglia/python_modules/mysql.py", line 1095, in metric_init
    update_stats(REPORT_INNODB, REPORT_MASTER, REPORT_SLAVE)
  File "/usr/lib/ganglia/python_modules/mysql.py", line 266, in update_stats
    mysql_stats[key] = int(global_status[key]) - int(mysql_stats_last[key])
ValueError: invalid literal for int() with base 10: ''

procstat.py errors

noticed these on most of my gmond instances

abrt: [PYTHON] Can't call the metric handler function for [procstat_gmond_cpu] in the python module [procstat].#12

Anything wrong here?

[PYTHON] Can't find the metric_init function in the python module [varnish].

Hi,

I'm trying to plot varnish stats by the varnish python module but when i run gmond -d 5 i get :

[PYTHON] Can't find the metric_init function in the python module [varnish].

Unable to find any metric information for 'varnish_(.+)'. Possible that a module has not been loaded

Everything is in place and all libraries are linked properly to modpython.so

any suggestion ?

Thanks

Ayman

repeated errors in the netstats module (tcp_retrans_percentage and tcpext_tcploss_percentage metrics)

Environment:

CentOS 6.2
Python 2.6.6
gmond 3.3.0

We're seeing the following error messages in the syslog every 15 seconds on several CentOS 6.2 systems.

[PYTHON] Can't call the metric handler function for [tcpext_tcploss_percentage] in the python module [netstats].#012
[PYTHON] Can't call the metric handler function for [tcp_retrans_percentage] in the python module [netstats].#012

These are consistently tcpext_tcploss_percentage and tcp_retrans_percentage, on all systems, and other metrics appear to be working fine.

Thanks in advance!

May need to reconsider default elasticsearch/python_modules/elasticsearch.py status API call

The module is over three years old and the default status api call doesn't seem to be compatible with newer versions of ES, 1.2+ and now 1.3+. New ES versions are favoring node specific calls instead.
http://(node-ip):9200/_nodes/(node-ip)/stats

Users are already experiencing this problem:
#158 (comment)

ganglia metrics disappear when ES cluster is busy

I'm not sure if this is specific only to the ES module, or pertains to gmond as a whole. When my elasticsearch cluster is on a deep search through a large set of data, the nodes themselves remain up. However elasticsearch cannot respond to its APIs anymore. However I noticed that the machines themselves appeared down in Ganglia. I could however ssh into the hosts and check load and whatnot. The hosts and their network interfaces weren't overwhelmed, and I assume that the way the gmond_python_modules and or the elasticsearch gmond_python_module are written, will cause gmond to hang. That is gmond will not return other metrics unless the es gmond_python_module can get the ES stats. Is this assumption correct? If so, any suggestions on how to go about fixing this?

Can't call the metric handler function for...

Similar to (#88), the systemd logs of my systems are all constantly flooded with streams of messages such as:

journalctl | grep gmond

[PYTHON] Can't call the metric handler function for [udp_inerrors] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [udp_rcvbuferrors] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [tcpext_listendrops] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [tcp_attemptfails] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [tcpext_tcploss_percentage] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [tcp_retrans_percentage] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [tcp_outsegs] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [tcp_insegs] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [udp_indatagrams] in the python module [netstats].
[PYTHON] Can't call the metric handler function for [udp_outdatagrams] in the python module [netstats].

Perhaps the netstats module needs to do some more validation before calling functions?

I'm on debian 9 (ganglia-monitor 3.6.0-7)

jenkins module has whacky behavior on SIGINT

saving metadata for metric: tcpext_listendrops host: csb-buildbot-0
Processing a metric value message from csb-buildbot-0
***Allocating value packet for host--csb-buildbot-0-- and metric --tcpext_listendrops-- ****

Processing a metric metadata message from csb-buildbot-0
***Allocating metadata packet for host--csb-buildbot-0-- and metric --tcp_attemptfails-- ****

saving metadata for metric: tcp_attemptfails host: csb-buildbot-0
Processing a metric value message from csb-buildbot-0
***Allocating value packet for host--csb-buildbot-0-- and metric --tcp_attemptfails-- ****

Processing a metric metadata message from csb-buildbot-0
***Allocating metadata packet for host--csb-buildbot-0-- and metric --disk_free_absolute_rootfs-- ****

saving metadata for metric: disk_free_absolute_rootfs host: csb-buildbot-0
Processing a metric value message from csb-buildbot-0
***Allocating value packet for host--csb-buildbot-0-- and metric --disk_free_absolute_rootfs-- ****

Processing a metric metadata message from csb-buildbot-0
***Allocating metadata packet for host--csb-buildbot-0-- and metric --disk_free_percent_rootfs-- ****

saving metadata for metric: disk_free_percent_rootfs host: csb-buildbot-0
Processing a metric value message from csb-buildbot-0
***Allocating value packet for host--csb-buildbot-0-- and metric --disk_free_percent_rootfs-- ****

^Capr_pollset_poll returned unexpected status 4 = Interrupted system call

Traceback (most recent call last):
  File "/usr/lib64/ganglia/python_modules/jenkins.py", line 96, in refresh_metrics
    data = UpdateJenkinsThread._get_jenkins_statistics(self.base_url, self.username, self.apitoken)
  File "/usr/lib64/ganglia/python_modules/jenkins.py", line 66, in _get_jenkins_statistics
    c = urllib2.urlopen(url, None, 2)
  File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib64/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 97] Address family not supported by protocol>
Traceback (most recent call last):
  File "/usr/lib64/ganglia/python_modules/jenkins.py", line 104, in refresh_metrics
    for k, v in data.items():
UnboundLocalError: local variable 'data' referenced before assignment
Traceback (most recent call last):
  File "/usr/lib64/ganglia/python_modules/diskstat.py", line 361, in metric_cleanup
    logging.shutdown()
  File "/usr/lib64/python2.6/logging/__init__.py", line 1512, in shutdown
    for h in handlerList[:]:
AttributeError: 'module' object has no attribute 'metric_cleanup'
Traceback (most recent call last):
  File "/usr/lib64/ganglia/python_modules/apache_status.py", line 377, in metric_cleanup
    _Worker_Thread.shutdown()
NameError: global name '_Worker_Thread' is not defined
*** glibc detected *** gmond: double free or corruption (!prev): 0x0000000000f23ee0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x39e1a75e66]
/lib64/libc.so.6[0x39e1a789b3]
/usr/lib64/libapr-1.so.0(apr_allocator_destroy+0x1d)[0x39e2a17f7d]
/usr/lib64/libapr-1.so.0(apr_pool_terminate+0x34)[0x39e2a18b84]
/lib64/libc.so.6(exit+0xe2)[0x39e1a35b22]
/lib64/libc.so.6(__libc_start_main+0x104)[0x39e1a1ed64]
gmond[0x404579]
======= Memory map: ========
00400000-0041c000 r-xp 00000000 fd:00 801322                             /usr/sbin/gmond
0061c000-0061d000 rw-p 0001c000 fd:00 801322                             /usr/sbin/gmond
0061d000-0061e000 rw-p 00000000 00:00 0 
00a93000-01153000 rw-p 00000000 00:00 0                                  [heap]
3394400000-3394483000 r-xp 00000000 fd:00 1310776                        /lib64/libm-2.12.so
3394483000-3394682000 ---p 00083000 fd:00 1310776                        /lib64/libm-2.12.so
3394682000-3394683000 r--p 00082000 fd:00 1310776                        /lib64/libm-2.12.so
3394683000-3394684000 rw-p 00083000 fd:00 1310776                        /lib64/libm-2.12.so
39e1200000-39e1220000 r-xp 00000000 fd:00 1313808                        /lib64/ld-2.12.so
39e141f000-39e1420000 r--p 0001f000 fd:00 1313808                        /lib64/ld-2.12.so
39e1420000-39e1421000 rw-p 00020000 fd:00 1313808                        /lib64/ld-2.12.so
39e1421000-39e1422000 rw-p 00000000 00:00 0 
39e1600000-39e1602000 r-xp 00000000 fd:00 1322010                        /lib64/libdl-2.12.so
39e1602000-39e1802000 ---p 00002000 fd:00 1322010                        /lib64/libdl-2.12.so
39e1802000-39e1803000 r--p 00002000 fd:00 1322010                        /lib64/libdl-2.12.so
39e1803000-39e1804000 rw-p 00003000 fd:00 1322010                        /lib64/libdl-2.12.so
39e1a00000-39e1b8a000 r-xp 00000000 fd:00 1322003                        /lib64/libc-2.12.so
39e1b8a000-39e1d8a000 ---p 0018a000 fd:00 1322003                        /lib64/libc-2.12.so
39e1d8a000-39e1d8e000 r--p 0018a000 fd:00 1322003                        /lib64/libc-2.12.so
39e1d8e000-39e1d8f000 rw-p 0018e000 fd:00 1322003                        /lib64/libc-2.12.so
39e1d8f000-39e1d94000 rw-p 00000000 00:00 0 
39e1e00000-39e1e17000 r-xp 00000000 fd:00 1322008                        /lib64/libpthread-2.12.so
39e1e17000-39e2017000 ---p 00017000 fd:00 1322008                        /lib64/libpthread-2.12.so
39e2017000-39e2018000 r--p 00017000 fd:00 1322008                        /lib64/libpthread-2.12.so
39e2018000-39e2019000 rw-p 00018000 fd:00 1322008                        /lib64/libpthread-2.12.so
39e2019000-39e201d000 rw-p 00000000 00:00 0 
39e2200000-39e222c000 r-xp 00000000 fd:00 1310772                        /lib64/libpcre.so.0.0.1
39e222c000-39e242b000 ---p 0002c000 fd:00 1310772                        /lib64/libpcre.so.0.0.1
39e242b000-39e242c000 rw-p 0002b000 fd:00 1310772                        /lib64/libpcre.so.0.0.1
39e2600000-39e2615000 r-xp 00000000 fd:00 1322005                        /lib64/libz.so.1.2.3
39e2615000-39e2814000 ---p 00015000 fd:00 1322005                        /lib64/libz.so.1.2.3
39e2814000-39e2815000 r--p 00014000 fd:00 1322005                        /lib64/libz.so.1.2.3
39e2815000-39e2816000 rw-p 00015000 fd:00 1322005                        /lib64/libz.so.1.2.3
39e2a00000-39e2a2b000 r-xp 00000000 fd:00 791234                         /usr/lib64/libapr-1.so.0.3.9
39e2a2b000-39e2c2a000 ---p 0002b000 fd:00 791234                         /usr/lib64/libapr-1.so.0.3.9
39e2c2a000-39e2c2c000 rw-p 0002a000 fd:00 791234                         /usr/lib64/libapr-1.so.0.3.9
39e3200000-39e320b000 r-xp 00000000 fd:00 789171                         /usr/lib64/libconfuse.so.0.0.0
39e320b000-39e340b000 ---p 0000b000 fd:00 789171                         /usr/lib64/libconfuse.so.0.0.0
39e340b000-39e340c000 rw-p 0000b000 fd:00 789171                         /usr/lib64/libconfuse.so.0.0.0
39e3600000-39e375d000 r-xp 00000000 fd:00 806218                         /usr/lib64/libpython2.6.so.1.0
39e375d000-39e395c000 ---p 0015d000 fd:00 806218                         /usr/lib64/libpython2.6.so.1.0
39e395c000-39e3998000 rw-p 0015c000 fd:00 806218                         /usr/lib64/libpython2.6.so.1.0
39e3998000-39e39a6000 rw-p 00000000 00:00 0 
39e3a00000-39e3a16000 r-xp 00000000 fd:00 1316044                        /lib64/libresolv-2.12.so
39e3a16000-39e3c16000 ---p 00016000 fd:00 1316044                        /lib64/libresolv-2.12.so
39e3c16000-39e3c17000 r--p 00016000 fd:00 1316044                        /lib64/libresolv-2.12.so
39e3c17000-39e3c18000 rw-p 00017000 fd:00 1316044                        /lib64/libresolv-2.12.so
39e3c18000-39e3c1a000 rw-p 00000000 00:00 0 
39e4200000-39e43b8000 r-xp 00000000 fd:00 808285                         /usr/lib64/libcrypto.so.1.0.1e
39e43b8000-39e45b8000 ---p 001b8000 fd:00 808285                         /usr/lib64/libcrypto.so.1.0.1e
39e45b8000-39e45d3000 r--p 001b8000 fd:00 808285                         /usr/lib64/libcrypto.so.1.0.1e
39e45d3000-39e45df000 rw-p 001d3000 fd:00 808285                         /usr/lib64/libcrypto.so.1.0.1e
39e45df000-39e45e3000 rw-p 00000000 00:00 0 
39e4600000-39e4616000 r-xp 00000000 fd:00 1322045                        /lib64/libnsl-2.12.so
39e4616000-39e4815000 ---p 00016000 fd:00 1322045                        /lib64/libnsl-2.12.so
39e4815000-39e4816000 r--p 00015000 fd:00 1322045                        /lib64/libnsl-2.12.so
39e4816000-39e4817000 rw-p 00016000 fd:00 1322045                        /lib64/libnsl-2.12.so
39e4817000-39e4819000 rw-p 00000000 00:00 0 
39e4e00000-39e4e73000 r-xp 00000000 fd:00 1322011                        /lib64/libfreebl3.so
39e4e73000-39e5072000 ---p 00073000 fd:00 1322011                        /lib64/libfreebl3.so
39e5072000-39e5074000 r--p 00072000 fd:00 1322011                        /lib64/libfreebl3.so
39e5074000-39e5075000 rw-p 00074000 fd:00 1322011                        /lib64/libfreebl3.so
39e5075000-39e5079000 rw-p 00000000 00:00 0 
39e5600000-39e5607000 r-xp 00000000 fd:00 1322012                        /lib64/libcrypt-2.12.so
39e5607000-39e5807000 ---p 00007000 fd:00 1322012                        /lib64/libcrypt-2.12.so
39e5807000-39e5808000 r--p 00007000 fd:00 1322012                        /lib64/libcrypt-2.12.so
39e5808000-39e5809000 rw-p 00008000 fd:00 1322012                        /lib64/libcrypt-2.12.so
39e5809000-39e5837000 rw-p 00000000 00:00 0 
39e5a00000-39e5a26000 r-xp 00000000 fd:00 1322006                        /lib64/libexpat.so.1.5.2
39e5a26000-39e5c25000 ---p 00026000 fd:00 1322006                        /lib64/libexpat.so.1.5.2Aborted

Stack trace between when I hit C-c and the Abort

[root@csb-buildbot-0 ~]# gstack 15725
Thread 2 (Thread 0x7fbc12432700 (LWP 15734)):
#0  0x00000039e1ae1453 in select () from /lib64/libc.so.6
#1  0x00007fbc14d05219 in ?? () from /usr/lib64/python2.6/lib-dynload/timemodule.so
#2  0x00000039e36d59e4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#3  0x00000039e36d6b8f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#4  0x00000039e36d6b8f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#5  0x00000039e36d7657 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#6  0x00000039e366acb0 in ?? () from /usr/lib64/libpython2.6.so.1.0
#7  0x00000039e3643c63 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#8  0x00000039e36566af in ?? () from /usr/lib64/libpython2.6.so.1.0
#9  0x00000039e3643c63 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#10 0x00000039e36cfc93 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.6.so.1.0
#11 0x00000039e37017ba in ?? () from /usr/lib64/libpython2.6.so.1.0
#12 0x00000039e1e079d1 in start_thread () from /lib64/libpthread.so.0
#13 0x00000039e1ae89dd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fbc1c12f740 (LWP 15725)):
#0  0x00000039e1e0d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000039e36fd438 in PyThread_acquire_lock () from /usr/lib64/libpython2.6.so.1.0
#2  0x00000039e3701324 in ?? () from /usr/lib64/libpython2.6.so.1.0
#3  0x00000039e36d59e4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#4  0x00000039e36d7657 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#5  0x00000039e36d5aa4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#6  0x00000039e36d7657 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#7  0x00000039e36d5aa4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#8  0x00000039e36d6b8f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#9  0x00000039e36d7657 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#10 0x00000039e366acb0 in ?? () from /usr/lib64/libpython2.6.so.1.0
#11 0x00000039e3643c63 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#12 0x00000039e3643d51 in ?? () from /usr/lib64/libpython2.6.so.1.0
#13 0x00000039e3644972 in PyObject_CallFunction () from /usr/lib64/libpython2.6.so.1.0
#14 0x00007fbc1525b0ec in pyth_metric_cleanup () from /usr/lib64/ganglia/modpython.so
#15 0x00000039e2a1899e in apr_pool_destroy () from /usr/lib64/libapr-1.so.0
#16 0x00000039e2a18975 in apr_pool_destroy () from /usr/lib64/libapr-1.so.0
#17 0x0000000000408a5c in main ()
[root@csb-buildbot-0 ~]# pgrep gmond
18608
You have new mail in /var/spool/mail/root
[root@csb-buildbot-0 ~]# gstack 18608
Thread 1 (Thread 0x7fcefd47c740 (LWP 18608)):
#0  0x00007fcef21798eb in ?? () from /lib64/libgcc_s.so.1
#1  0x00000039e1b266f6 in dl_iterate_phdr () from /lib64/libc.so.6
#2  0x00007fcef217a207 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#3  0x00007fcef2177603 in ?? () from /lib64/libgcc_s.so.1
#4  0x00007fcef2178119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#5  0x00000039e1afea66 in backtrace () from /lib64/libc.so.6
#6  0x00000039e1a7054b in __libc_message () from /lib64/libc.so.6
#7  0x00000039e1a75e66 in malloc_printerr () from /lib64/libc.so.6
#8  0x00000039e1a789b3 in _int_free () from /lib64/libc.so.6
#9  0x00000039e2a17f7d in apr_allocator_destroy () from /usr/lib64/libapr-1.so.0
#10 0x00000039e2a18b84 in apr_pool_terminate () from /usr/lib64/libapr-1.so.0
#11 0x00000039e1a35b22 in exit () from /lib64/libc.so.6
#12 0x00000039e1a1ed64 in __libc_start_main () from /lib64/libc.so.6
#13 0x0000000000404579 in _start ()

passenger stops collecting metrics when trying to kill status process

If running the passenger module as a non-root user with sudo access, when the passenger-status command times out, it tries to run os.kill, which fails with an exception:

Exception in thread Thread-1:                                                    
Traceback (most recent call last):                                                                                                                                                    
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner    self.run()        
  File "/usr/lib/ganglia/python_modules/passenger.py", line 70, in run    self.update_metric()        
  File "/usr/lib/ganglia/python_modules/passenger.py", line 77, in update_metric    status_output = timeout_command(self.status, self.timeout)        
  File "/usr/lib/ganglia/python_modules/passenger.py", line 193, in timeout_command    os.kill(process.pid, signal.SIGKILL)    
OSError: [Errno 1] Operation not permitted

This make gmond keep running, but it no longer collects metrics. Since the original process was run with sudo, but the call to kill it wasn't, it doesn't work.

The fix might be to use a shell call with sudo to kill the process, increase the timeout, or just ignore the error. Not sure what the best approach would be but any help would be appreciated.

cc: @kmullin

Outstanding ipmi pull requests

The last commit to the ipmi plugin was over two years ago, and there have been three pull requests since then that try to address various shortcomings in the code in various (sometimes mutually exclusive) ways:

May I request that someone act on these pull requests, one way or the other?

How to install?

Is there guide how to install a module in ganglia?
I could not find anywhere.

GPU custom graphs dont work with default installation in CentOS 6

I installed the nVidia GPU plugin for several nodes that contain Quadro M4000 cards on CentOS 6. All of the metrics worked fine in ganglia-web except for the 5 that utilize the custom graphs in the files added to graph.d. These files are named gpu_... but since the graphics card is identified as gpu0, I was only able to get these graphs to appear by renaming the files gpu0..., and then going into each file and similarly updating the function name, and then hard-coding the $dIndex variable to 0. I tried to go through the PHP files to understand where the core problem was, but the amount of files & scripting is overwhelming.

IPMI plugin has no license

Could you please specify which license the ipmi module is published under?

Sorry if this is obvious to you, but to me it seems that different parts of the repository have different licenses, and no global license file is given. Therefore it's not clear to me under which license the IPMI module can be used...

diskstats module only supports readings for one drive

The diskstats module is only reporting values for one drive.
The commandline/debug output shows that all matched drives are logged but only one drive is stored and forwarded to ganglia.

output format from vmax.py

gettting this output in rows its possible to get output in more easy readable way ?
value for vmax_site1_cache_hits is 11511
value for vmax_site1_fe_reads is 4920
value for vmax_site1_fe_writes is 1224
value for vmax_site1_megabytes_read is 498
value for vmax_site1_megabytes_written is 47
value for vmax_site1_response_time_read is 4
value for vmax_site1_response_time_write is 0
value for vmax_site1_vol_iorate is 5912
value for vmax_site1_0000_tp_21_1_reads is 74
value for vmax_site1_0000_tp_21_1_writes is 112
value for vmax_site1_0000_tp_21_1_response_time_reads is 5
value for vmax_site1_0000_tp_21_1_response_time_writes is 4
value for vmax_site1_0000_tp_21_1_megabytes_read is 4
value for vmax_site1_0000_tp_21_1_megabytes_written is 3
value for vmax_site1_0000_tp_21_2_reads is 93
value for vmax_site1_0000_tp_21_2_writes is 815
value for vmax_site1_0000_tp_21_2_response_time_reads is 5
value for vmax_site1_0000_tp_21_2_response_time_writes is 5
value for vmax_site1_0000_tp_21_2_megabytes_read is 5
value for vmax_site1_0000_tp_21_2_megabytes_written is 45
value for vmax_site2_cache_hits is 22619
value for vmax_site2_fe_reads is 17088
value for vmax_site2_fe_writes is 5129
value for vmax_site2_megabytes_read is 493
value for vmax_site2_megabytes_written is 114
value for vmax_site2_response_time_read is 0
value for vmax_site2_response_time_write is 0
value for vmax_site2_vol_iorate is 22262

apache_status: reqs/sec issue

First, thanks for this module. I've just started using ganglia and this modules is a life saver.

Now, to what I found.
I'm not 100% sure, but the requests per second reported on the server-status page on apache seems to report the avg req/sec since starting apache. If using that number for trending, after sometime, the graph line is pretty straight, with not many oscilations (at least for me)

I find the the req/sec vary from about 60 to 140 during the day, but the requests per second on the server-status always stays around 90..100.

I am not a programer, but I had a look at the module code, and made a small change to the ap_hits metric, to give me the hits per second, instead of the hits since last collection. And that's what I am using for my graphs.

This is what I changed and it seems to be very accurate. You might want to add an extra metric or change the ap_reqs to us this, if you think it makes sense:



---

**\* 92,98 ****
                      else:
                          # subtract counter's old value from the new value and
                          # write it
!                         hits = new_value - last_total_accesses
                          self.status["ap_hits"] = hits
                      # store for next time
                      last_total_accesses = new_value
--- 92,100 ----
                      else:
                          # subtract counter's old value from the new value and
                          # write it
!                         #hits = new_value - last_total_accesses
!                       # Modified to get the hit rate instead of total hits
!                         hits = (new_value - last_total_accesses) / self.refresh_rate
                          self.status["ap_hits"] = hits
                      # store for next time
                      last_total_accesses = new_value

Hope this helps and please correct me if what I am saying doesn't make sense.

Bruno

nginx pymodule config error

Hello,
in nginx_status/conf.d/nginx_status.pyconf

    param status_url {
      value = 'http://nginx_status'
    }

should point to http://localhost/nginx_status (as the script default value, and because otherwise it wont work).

Regards,
Sandro

i can't monitor my infiniband network

`[root@node4 conf.d]#systemctl status gmond
● gmond.service - Ganglia Monitoring Daemon
Loaded: loaded (/usr/lib/systemd/system/gmond.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2017-08-25 13:56:35 CST; 2h 2min ago
Process: 22873 ExecStart=/usr/sbin/gmond (code=exited, status=0/SUCCESS)
Main PID: 22874 (gmond)
CGroup: /system.slice/gmond.service
└─22874 /usr/sbin/gmond

Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_local_link_integrity_errors_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_port_rcv_constraint_errors_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_port_rcv_errors_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_port_rcv_remote_physical_errors_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_port_rcv_switch_relay_errors_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_port_xmit_constraint_errors_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_port_xmit_discards_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_symbol_error_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_vl15_dropped_([\S]+)'. Possible that a module has not been loaded.
Aug 25 13:56:35 node4 /usr/sbin/gmond[22874]: Unable to find any metric information for 'ib_rate_([\S]+)'. Possible that a module has not been loaded.

i have done all the things in "InfiniBand monitoring plugin for gmond"
https://github.com/ganglia/gmond_python_modules/tree/master/network/infiniband

can anyone tell me which module i should to get ,and where it is ?

apache_status.py - _Worker_Thread is not defined

Not sure where the thread is supposed to be defined.... I'm using the latest version of this file.

Debian GNU/Linux 7 \n \l
gmond 3.6.0 build from src

Traceback (most recent call last):
File "/usr/lib64/ganglia/python_modules/apache_status.py", line 377, in metric_cleanup
_Worker_Thread.shutdown()
NameError: global name '_Worker_Thread' is not defined

php_fpm module metric_init error

Not sure how to fix this...

[PYTHON] Can't call the metric_init function in the python module [php_fpm].

Traceback (most recent call last):
  File "/usr/lib/ganglia/python_modules_enabled/php_fpm.py", line 409, in metric_init
    descriptors = _create_descriptors(params)
  File "/usr/lib/ganglia/python_modules_enabled/php_fpm.py", line 375, in _create_descriptors
    ports = params['ports'].split(',')
KeyError: 'ports'

No such option name_match in diskfree

/etc/ganglia/conf.d/diskfree.pyconf:15: no such option 'name_match'
Parse error for '/etc/ganglia/gmond.conf'

bind_xml issue

I am getting the following traceback when I attempt to run bind_xml by hand:

Traceback (most recent call last):
File "bind_xml.py", line 244, in
main(sys.argv[1:])
File "bind_xml.py", line 232, in main
v = d'call_back'
File "bind_xml.py", line 123, in get_metric_value
self.update_stats()
File "bind_xml.py", line 108, in update_stats
self.get_bind_reader().get_stats()
File "/usr/lib/python2.7/site-packages/pybindxml/reader.py", line 51, in get_stats
self.stats = XmlV22(self.bs_xml)
File "/usr/lib/python2.7/site-packages/pybindxml/reader.py", line 102, in init
super(XmlV22, self).init(xml)
File "/usr/lib/python2.7/site-packages/pybindxml/reader.py", line 68, in init
self.zone_stats = self.set_zone_stats()
File "/usr/lib/python2.7/site-packages/pybindxml/reader.py", line 142, in set_zone_stats
serial = int(zone.find('serial').string)
ValueError: invalid literal for int() with base 10: '-'

Is this a known issue?

nvidia gpu module install fialed

Hello
When I install the GPU module to ganglia in ROCKS 6.2,I cant find my graph in the website. When I read the err_log in /var/log/httpd,it says ERROR: opening '/var/lib/ganglia/rrds/YR-Cluster/cluster.local/gpu_power_usage_report.rrd': No such file or directory I dont know how to find my mistake,who can help me ?
My E-mail is [email protected]
Thank you very much!

Updates for MongoDB 2.4

I got this email from 10gen the other day. Unfortunately, I haven't been using MongoDB for a while now, so I'm leaving this here with the hope that someone else can take a look at what needs to be changed:

Mike,
We appreciate all the work you have put into building the MongoDB gmond module for Ganglia. We have been hard at work on a slew of improvements in the upcoming 2.4 release of MongoDB.

We wanted you to be aware of a change that might affect your integration. The serverStatus output is has changed and includes additional metrics. More complete details can be found at http://docs.mongodb.org/manual/release-notes/2.4/#changes-to-serverstatus-output-including-additional-metrics

We encourage you to download our release candidates for version 2.4 from http://www.mongodb.org/downloads to ensure that your integration works with these changes. See http://docs.mongodb.org/manual/release-notes/2.4/ for a complete set of changes.

Thank you,
Steve & the engineering team at 10gen

Elasticsearch module is not working with upgraded elasticsearch.

I have upgraded elasticseach and ganglia module as well. But couldn't able to register stats there. What could be the reason?

following is the output i am getting...

value for es_index___docs_count is 37694540
value for es_index___size is 282020013146
value for es_heap_committed is None
value for es_heap_used is None
value for es_non_heap_committed is None
value for es_non_heap_used is None
value for es_threads is None
value for es_threads_peak is None
value for es_gc_time is None
value for es_transport_open is None
value for es_transport_rx_count is None
value for es_transport_rx_size is None
value for es_transport_tx_count is None
value for es_transport_tx_size is None
value for es_http_current_open is None
value for es_http_total_open is None
value for es_indices_size is None
value for es_gc_count is None
value for es_merges_current is None
value for es_merges_current_docs is None
value for es_merges_total is None
value for es_merges_total_docs is None
value for es_merges_current_size is None
value for es_merges_total_size is None
value for es_merges_time is None
value for es_refresh_total is None
value for es_refresh_time is None
value for es_docs_count is None
value for es_docs_deleted is None
value for es_open_file_descriptors is None
value for es_cache_field_eviction is None
value for es_cache_field_size is None
value for es_cache_filter_count is None
value for es_cache_filter_evictions is None
value for es_cache_filter_size is None
value for es_query_current is None
value for es_query_time is None
value for es_fetch_current is None
value for es_fetch_total is None
value for es_fetch_time is None
value for es_flush_total is None
value for es_flush_time is None
value for es_get_exists_time is None
value for es_get_exists_total is None
value for es_get_time is None
value for es_get_total is None
value for es_get_missing_time is None
value for es_get_missing_total is None
value for es_indexing_delete_time is None
value for es_indexing_delete_total is None
value for es_indexing_index_time is None
value for es_indexing_index_total is None
value for es_query_total is None

How to monitor AMD GPUs ?

There are monitors for Nvidia GPUs, how about AMD ?

diskstat returns no metrics when devices values is set to empty string in diskstat.conf

There are two possible workarounds, comment out the entire devices parameter in the diskstat conf or put a check in the diskstat.py that checks for an empty string as well as a python None value for the devices. This is around line 176 (within the get_partitions method).

nfsstats AttributeError

# gmond -f
[PYTHON] Can't call the metric_init function in the python module [nfsstats].

Traceback (most recent call last):
  File "/usr/lib/ganglia/python_modules/nfsstats.py", line 141, in metric_init
    (ts, value) =  get_value(configtable[i]['prefix'] + name)
  File "/usr/lib/ganglia/python_modules/nfsstats.py", line 184, in get_value
    return (ts, int(m.group(1)))
AttributeError: 'NoneType' object has no attribute 'group'
Unable to find the metric information for 'nfs_v3_getattr'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_setattr'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_lookup'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_access'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_readlink'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_read'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_write'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_create'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_mkdir'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_symlink'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_mknod'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_remove'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_rmdir'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_rename'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_link'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_readdir'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_readdirplus'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_fsstat'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_fsinfo'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_pathconf'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfs_v3_commit'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_getattr'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_setattr'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_lookup'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_access'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_readlink'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_read'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_write'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_create'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_mkdir'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_symlink'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_mknod'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_remove'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_rmdir'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_rename'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_link'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_readdir'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_readdirplus'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_fsstat'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_fsinfo'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_pathconf'. Possible that the module has not been loaded.

Unable to find the metric information for 'nfsd_v3_commit'. Possible that the module has not been loaded.

# cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 18325608 4 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 0 11116973 172815 779538 3775010 3271 1395612 408022 96132 2332 18288 0 44932 2000 149604 1634 1779 48591 1120 1854 927 306968
proc4 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

# uname -r
2.6.32.41

# gmond -V
gmond 3.4.0

cpu_temp: wrong directory

Dear all,

I've tried to use the cpu_temp module. I'm using openSUSE and the temperature information, e.g. "temp?_input" can be found under "/sys/devices/platform/coretemp.?/hwmon/hwmon?", where the question marks "?" should be replaced by a number.

The actually python script only looks for the input files in the directory "/sys/devices/platform/coretemp.?/".

Can you please add support for searching in subdirectories?

mysqld python module have error "Can't call the metric_init function in the python module"

When I use the mysql python module have error ,tail -f /var/log/message:
Sep 14 09:51:53 localhost /usr/sbin/gmond[24970]: [PYTHON] Can't call the metric_init function in the python module [mysql_mo].#12
Sep 14 09:51:53 localhost abrt: Unable to find any metric information for 'filechecks_(.+)'. Possible that a module has not been loaded.#12
....
I use #gmond -m can see python_module have loaded ,but mysql can't find ;

gmond -d2 there is no error at here.

mysql.pyconf
modules {
module {
name = 'mysql_mo'
language = 'python'
param host {
value = 'localhost'
}
param user {
value = 'root'
}
param passwd {
value = 'root'
}
param get_innodb {
value = True
}
param get_master {
value = False
}
param get_slave {
value = False
}
param delta_per_second {
value = True
}
}
}
and my mysql_mo.py in /usr/lib64/ganglia/python_modules didn't modify
Any one know how to fix this ?
Thanks!

User of sudo within python plugins in Ganglia

Using https://github.com/ganglia/gmond_python_modules/tree/master/gpfs to monitor gpfs aka Spectrum Scale 4.2.3

Upon the first call to mmpmon sudo is being called by root hence root is sudoing to root and hence not gaining the !requiretty flag.

Once its started the plugin is downgraded to ganglia and hence the !requiretty as in the readme file works.

If I add
Defaults:root !requiretty
or
Defaults!/usr/lpp/mmfs/bin/mmpmon !requiretty
it works fine.

This was found after quiet a bit of puzzeled looking around and needing to switch the debug options in sudo.

I'm thinking this is a security issue in how the plugin operate rather than just this one plugin and hence effects other plugins.

This is running Ganglia 3.7.2 under CentOs7.3 where ganglia runs as ganglia.

Hello, author I use the ganglia of mysqld module monitors mariadb, error, hope to solve as soon as possible!

Hello, author
I use the ganglia of mysqld module monitors mariadb, error, hope to solve as soon as possible!
Starting GANGLIA gmond: [PYTHON] Can't call the metric_init function in the python module [mysql].

Traceback (most recent call last):
File "/usr/lib64/ganglia/python_modules/mysql.py", line 1092, in metric_init
update_stats(REPORT_INNODB, REPORT_MASTER, REPORT_SLAVE)
File "/usr/lib64/ganglia/python_modules/mysql.py", line 139, in update_stats
innodb_status = parse_innodb_status(cursor.fetchone()[2].split('\n'), innodb_version)
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 123, in parse_innodb_status
innodb_status['transactions'] += longish(istatus[3])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 74, in longish
return longish(x[:-1])
File "/usr/lib64/ganglia/python_modules/DBUtil.py", line 76, in longish
raise ValueError
ValueError
Unable to find the metric information for 'mysql_innodb_pending_normal_aio_reads'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pending_log_writes'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_transactions'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pending_chkp_writes'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_rows_updated'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_data_fsyncs'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_spin_rounds'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_spin_waits'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pending_aio_log_ios'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pending_log_flushes'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_rows_read'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_transactions_unpurged'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_queries_queued'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pending_ibuf_aio_reads'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pending_normal_aio_writes'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_pages_read'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_read_views'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_buffer_pool_pages_data'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_data_reads'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_queries_inside'. Possible that the module has not been loaded.

Unable to find the metric information for 'mysql_innodb_log_bytes_written'. Possible that the module has not been loaded.
.....................................

mysql is normal

[Ganglia restart extended metrics data lost]

Linux CentOS 7, Ganglia 3.7.2 use python extended metrics.

When gmetad and gmond start at the same time, the metrics collects everything normally.

If gmond is turned off and gmond is restarted, the extended metrics will be lost with a high probability, with no new data or no RRD files. Restart gmond or gmetad repeatedly to generate data or RRD files again.

Network DNS /etc/hosts are properly configured , Ganglia log no ERROR, gmond UDP is sending data, guessing is gmetad's TCP problem.

What is the reason?

Update mysqld module for MySQL 5.6

There are some variable changes in MySQL 5.6 which stop some of the mysqld stats from showing up in ganglia. This needs to be handled delicately as it needs to be backwardsly compatible.

elasticsearch/python_modules/elasticsearch.py fails if ES not running on init

The elasticsearch module fails to initialize when it is unable to connect to elasticsearch, causing the module not to be loaded (fails in module_init()).

This is due to the descriptors being discovered in module_init(). It seems as though the ES module needs to be refactored to have the actual metric "discovery" code in a separate method, then simply keeping an internal variable which checks to see whether it has been properly initialized or not. That variable could be checked before any subsequent metric collection runs, which would allow it to properly weather starting up without a running ES instance.

Support for aggregate GPU usage statistics

It'd be nice to be able to estimate a total utilization across the many GPUs in a cluster (e.g. overall percent utilization, overall memory utilization, etc). It seems possible to implement by adding to this file

How to add metrics to Ganglia

Hi all,
How to add metrics for ganglia to monitor the operation of hosted applications; Check their availability and response time and notify the administrators by email for any overrun ........ Please please help me