Giter Site home page Giter Site logo

iml-agent's Introduction

iml-agent

The Manager for Lustre Agent

Copr Devel: Build Status

ZFS Resource Agent When the resource-agents release 4.0.1 is available widely, this can be phased out or become a symlink.

iml-agent's People

Contributors

alextalker avatar brianjmurrell avatar chrisgearing avatar jgrund avatar johnsonw avatar kprantis avatar mjmac avatar mkpankov avatar mrexox avatar petertix avatar tanabarr avatar utopiabound avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iml-agent's Issues

`update_profile` should not attempt to install packages

When running in an env where packages are preinstalled, update_profile should not attempt to install anything ref:

if old_profile["managed"] != profile["managed"]:
if profile["managed"]:
action = "install"
else:
action = "remove"
try:
yum_util(action, packages=["python2-iml-agent-management"])
except AgentShell.CommandExecutionError as cee:
return agent_error(
"Unable to set profile because yum returned %s" % cee.result.stdout
)

Likewise configure_repo be a noop if the trimmed repo contents are empty:

def configure_repo(filename, file_contents):
crypto = Crypto(ENV_PATH)
full_filename = os.path.join(REPO_PATH, filename)
temp_full_filename = full_filename + ".tmp"
# this format needs to match create_repo() in manager agent-bootstrap-script
file_contents = file_contents.format(
crypto.AUTHORITY_FILE, crypto.PRIVATE_KEY_FILE, crypto.CERTIFICATE_FILE
)
try:
file_handle = os.fdopen(
os.open(temp_full_filename, os.O_WRONLY | os.O_CREAT, 0o644), "w"
)
file_handle.write(file_contents)
file_handle.close()
os.rename(temp_full_filename, full_filename)
except OSError as error:
return agent_error(str(error))
return agent_result_ok

Do not perform update to packages during IML install

IML has an action to install_packages. install_packages does an install, followed by an update as seen here:

yum_util('install', enablerepo=repos, packages=packages)
# So now we have installed the packages requested, we will also make sure that any installed packages we
# have that are already installed are updated to our presumably better versions.
update_packages = yum_check_update(repos)
if update_packages:
daemon_log.debug("The following packages need update after we installed IML packages %s" % update_packages)
yum_util('update', packages=update_packages, enablerepo=repos)

The update seems like an overreach for IML, even in managed mode as it doesn't appear to serve a purpose for IML managed repos.

The install_packages action is called on the manager side here:

https://github.com/whamcloud/integrated-manager-for-lustre/blob/054a091ed7eb7e3b001cc023feffd1adab493fae/chroma_core/models/host.py#L695-L698

and here:

https://github.com/whamcloud/integrated-manager-for-lustre/blob/054a091ed7eb7e3b001cc023feffd1adab493fae/chroma_core/models/host.py#L1415-L1419

In either case, a {yum,dnf} install should trigger an update if needed, so we should be ok to remove this.

Fixes whamcloud/integrated-manager-for-lustre#698

ValueError: could not convert string to float:

We are seeing this in chroma-agent.logs:

[18/Apr/2018:21:15:51] console DEBUG Exception raised in sandbox START:
  File "/usr/lib/python2.7/site-packages/iml_common/lib/exception_sandbox.py", line 23, in wrapper
    return function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/local.py", line 68, in properties
    return dict(item for cls in self.audit_classes() for item in cls().properties().items())
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/local.py", line 68, in <genexpr>
    return dict(item for cls in self.audit_classes() for item in cls().properties().items())
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/node.py", line 64, in properties
    'distro_version': float('.'.join(platform.linux_distribution()[1].split('.')[:2])),
ValueError: could not convert string to float: 
Exception raised in sandbox END

lustre-audit mount detection broken for raidZ pools

In:

new_device = next(
child['Disk']['path'] for child in pool['vdev']['Root']['children']
if child.get('Disk')
)

The code is making an assumption that the VDev Root children will always contain a Disk.

This is true for a pool with at least one top-level backing Disk, ex:

     "vdev": {
        "Root": {
          "children": [
            {
              "Disk": {
                "guid": 1031789542385231700,
                "state": "ONLINE",
                "path": "/dev/disk/by-id/dm-uuid-mpath-3600140560ab2b7d5caa4c18a2266b935",
                "dev_id": "dm-uuid-mpath-3600140560ab2b7d5caa4c18a2266b935",
                "phys_path": null,
                "whole_disk": false,
                "is_log": false
              }
            }
          ],
          "spares": [],
          "cache": []
        }
      }

but it is not true for any other type of setup.

https://github.com/whamcloud/rust-libzfs/blob/30f5771d56be31d8ece98024276fff030b5a759a/libzfs/src/vdev.rs#L15-L46

Shows the full enum that would need to be accounted for.

The current implementation raises an error like:

[07/Jul/2018:03:23:30] console DEBUG Exception raised in sandbox START:
  File "/usr/lib/python2.7/site-packages/iml_common/lib/exception_sandbox.py", line 23, in wrapper
    return function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/lustre.py", line 201, in _scan_mounts
    fs_label, fs_uuid, new_device = process_zfs_mount(device, data, zfs_mounts)
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/lustre.py", line 156, in process_zfs_mount
    child['Disk']['path'] for child in pool['vdev']['Root']['children']
StopIteration

While looking at this section, it appears this code is not actually being used on the manager so it should be ok to remove.

_has_link() in lib/corosync.py returns wrong status in case of link state is Unknown

iml-net-ring1_1

on the host side:

[root@sfa7990-c0 ~]# ip link set up dev enp0s20f0u8u2c2 ; echo $?
0
[root@sfa7990-c0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:01:ff:0d:00:40 brd ff:ff:ff:ff:ff:ff
    inet 10.36.44.22/22 brd 10.36.47.255 scope global dynamic eno5
       valid_lft 13617sec preferred_lft 13617sec
    inet6 fe80::201:ffff:fe0d:40/64 scope link 
       valid_lft forever preferred_lft forever
3: eno6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 00:01:ff:4d:00:40 brd ff:ff:ff:ff:ff:ff
4: enp0s20f0u8u2c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 3a:18:3b:7c:53:19 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3818:3bff:fe7c:5319/64 scope link 
       valid_lft forever preferred_lft forever
5: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:07:f9:fe:80:00:00:00:00:00:00:50:6b:4b:03:00:23:b7:cc brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.172.172.22/24 brd 172.172.172.255 scope global ib0
       valid_lft forever preferred_lft forever
    inet6 fe80::526b:4b03:23:b7cc/64 scope link 
       valid_lft forever preferred_lft forever
6: ib1: <BROADCAST,MULTICAST> mtu 4092 qdisc mq state DOWN group default qlen 256
    link/infiniband 00:00:11:b0:fe:80:00:00:00:00:00:00:50:6b:4b:03:00:23:b7:cd brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

[root@sfa7990-c0 ~]# cat /sys/class/net/enp0s20f0u8u2c2/carrier
1
[root@sfa7990-c0 ~]# cat /sys/class/net/enp0s20f0u8u2c2/operstate 
unknown

root@sfa7990-c1 ~]# for nic in $(ls -1 /sys/class/net/); do echo $nic: $(readlink /sys/class/net/$nic/device/driver); done  
eno5: ../../../../bus/pci/drivers/igb
eno6: ../../../../bus/pci/drivers/igb
enp0s20f0u8u2c2: ../../../../../../../bus/usb/drivers/cdc_ether
ib0: ../../../../bus/pci/drivers/mlx5_core
ib1: ../../../../bus/pci/drivers/mlx5_core
lo:

[root@sfa7990-c1 ~]# cat /sys/class/net/*/device/interface 
CDC Notification Interface

[root@sfa7990-c1 ~]# modinfo cdc_ether
filename:       /lib/modules/3.10.0-957.el7_lustre.x86_64/kernel/drivers/net/usb/cdc_ether.ko.xz
license:        GPL
description:    USB CDC Ethernet devices
author:         David Brownell
retpoline:      Y
rhelversion:    7.6
srcversion:     D329B19ACE6E9677F544BB8

[root@sfa7990-c0 tmp]# python
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import array, struct, fcntl, socket
>>> SIOCETHTOOL = 0x8946
>>> ETHTOOL_GLINK = 0x0000000a
>>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> struct
struct
>>> ecmd = array.array('B', struct.pack('2I', ETHTOOL_GLINK, 0))
>>> ifreq = struct.pack('16sP', 'enp0s20f0u8u2c2', ecmd.buffer_info()[0])
>>> fcntl.ioctl(sock.fileno(), SIOCETHTOOL, ifreq)
'enp0s20f0u8u2c2\x000\x14J\x01\x00\x00\x00\x00'
>>> sock.close()
>>> struct.unpack('4xI', ecmd.tostring())[0]
1
>>> def _has_link(name):
...     import array
...     import struct
...     import fcntl
...     import socket
...     SIOCETHTOOL = 0x8946
...     ETHTOOL_GLINK = 0x0000000a
...     sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
...     ecmd = array.array('B', struct.pack('2I', ETHTOOL_GLINK, 0))
...     ifreq = struct.pack('16sP', name, ecmd.buffer_info()[0])
...     fcntl.ioctl(sock.fileno(), SIOCETHTOOL, ifreq)
...     sock.close()
...     return bool(struct.unpack('4xI', ecmd.tostring())[0])
... 
>>> 
>>> 
>>> 
>>> _has_link('ib0')
True
>>> _has_link('ib1')
False
>>> _has_link('lo')
True
>>> _has_link('enp0s20f0u8u2c2')
True

According to https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net interface could have next states :

What:		/sys/class/net/<iface>/operstate
Date:		March 2006
KernelVersion:	2.6.17
Contact:	[email protected]
Description:
		Indicates the interface RFC2863 operational state as a string.
		Possible values are:
		"unknown", "notpresent", "down", "lowerlayerdown", "testing",
		"dormant", "up".

I suggest to change code logic in a way when everything except 'up' return False.

use --as-xml output option of crm_mon

We frequently encounter issues with crm_mon's output format changing. That's because we are trying to "read" a human-targeted output format.

crm_mon has an --as-xml output format switch which should produce a more stable XML format instead. We should switch to using it.

Issues from scapy bump

I've started seeing the following traceback today:

Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/action_runner.py", line 182, in run
    self.action, agent_daemon_context, self.args

  File "/usr/lib/python2.7/site-packages/chroma_agent/plugin_manager.py", line 321, in run
    return fn(**args)

  File "/usr/lib/python2.7/site-packages/chroma_agent/action_plugins/manage_corosync_common.py", line 38, in get_corosync_autoconfig
    ring1 = detect_ring1(ring0, ring1_ipaddr, ring1_prefix)

  File "/usr/lib/python2.7/site-packages/chroma_agent/lib/corosync.py", line 139, in detect_ring1
    iface.mcastport = find_unused_port(ring0)

  File "/usr/lib/python2.7/site-packages/chroma_agent/lib/corosync.py", line 206, in find_unused_port
    for dport in set(dports):

  File "/usr/lib/python2.7/site-packages/scapy/packet.py", line 1459, in __hash__
    raise TypeError('unhashable type: %r' % self.__class__.__name__)

TypeError: unhashable type: 'Ether'

This is in the following code:

for dport in set(dports):
try:
ports.remove(dport)
except ValueError:
# already removed
pass

A recent change in scapy made subclassess unhashable, meaning we can no longer use a set here.

However, we should be able to remove the set without issue, as any port which has already been removed is passed over.

Failover node isn't detected on Active/Passive storage

Hi @jgrund @utopiabound

I meet a certain problem with filesystem detection.
I have a double-controller storage system where block devices either in active or passive state.
Filesystem created on a block device that is available only if its state is 'active'.
Trouble it that your Primary/Failover node detection is based on a fact that filesystem metadata can at least be read from block devices which isn't my case. However, fail-over information specified in filesystem configuration do contain address of the 'passive' node.
My question that is there is a way to detect filesystems in such configurations with successful determination of a fail-over node?
I don't know if this question is more related to UI or the agent, so feel free to point out it for me.

`set_iml_profile` does not run on upgrade

In:

profile = json.loads(profile_json)
try:
config.set("settings", "profile", profile)
set_iml_profile(
profile.get("name"), profile.get("bundles"), profile.get("packages")
)
except ConfigKeyExistsError:
config.update("settings", "profile", profile)

/etc/iml/profile.conf will not be created if it's settings equivalent already exists.

This means it will get skipped on an upgrade scenario.

Back to back connection should be used for ring0

IML is using corosync "rrp_mode: passive" and it means that only one interface is in use and the second one will be used in case of ring0 NIC failure.
According to https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_ch_03_building.html :

`

  • A dedicated Ethernet port capable of one gigabit/sec. This port connects to the Management network.
  • HA servers are configured in pairs, with a primary server and a failover server.
  • Crossover cable โ€“ Each HA server (excluding the manager server) is connected to its peer HA server by a private crossover link. This is an Ethernet connection.
    `

According to function get_ring0() in lib/corosync.py management interface will be used for ring0:

`
def get_ring0():
# ring0 will always be on the interface used for agent->manager comms
from urlparse import urlparse
server_url = urljoin(os.environ["IML_MANAGER_URL"], "agent")
manager_address = socket.gethostbyname(urlparse(server_url).hostname)
out = AgentShell.try_run(['/sbin/ip', 'route', 'get', manager_address])
match = re.search(r'dev\s+([^\s]+)', out)
if match:
manager_dev = match.groups()[0]
else:
raise RuntimeError("Unable to find ring0 dev in %s" % out)

console_log.info("Chose %s for corosync ring0" % manager_dev)
ring0 = CorosyncRingInterface(manager_dev)

if ring0.ipv4_prefixlen < 9:
    raise RuntimeError("%s subnet is too large (/%s)" %
                       (ring0.name, ring0.ipv4_prefixlen))

return ring0

`

It logically wrong to assign rings (cluster transport level which could send/receive information across all nodes in the cluster) to management network which could be available from the World. Another cons to do that is a fact that Switch L2+/Routers/Firewalls could block multicast/unicast traffic.
In any way we have dedicated interface (back-to-back) which should be used as ring0 and for any inter-cluster communications.

fence_chroma monitor action always return status 0

According to official documentation :

Monitoring the fencing devices
Just like any other resource, the stonith class agents also support the monitor operation. Given that we have often seen monitor either not configured or configured in a wrong way, we have decided to devote a section to the matter.
Monitoring stonith resources, which is actually checking status of the corresponding fencing devices, is strongly recommended. So strongly, that we should consider a configuration without it invalid.

Good example of how it works could be taken from "fence-agents-scsi rpm for CentOS v7".

Due to issue 824 we could have a situation when we don't have fencing device or our fencing device is not ready and as a result fencing will not occur never.

ZFS backed resource create occasionally fails in SSI test

Dataset of commands run:

crm_resource -W -r testfs-OST0000_add680: 6

Resource 'testfs-OST0000_add680' not found
Error performing operation: No such device or address

pcs resource create testfs-OST0000_add680-zfs ocf:chroma:ZFS pool=zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1 op start timeout=90 op stop timeout=90 --disabled --group group-testfs-OST0000_add680: 0


pcs resource create testfs-OST0000_add680 ocf:lustre:Lustre target=zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1/testfs-OST0000 mountpoint=/mnt/testfs-OST0000 --disabled --group group-testfs-OST0000_add680: 0


pcs constraint location add testfs-OST0000_add680-primary testfs-OST0000_add680 vm7 20: 1

Error: Resource 'testfs-OST0000_add680' does not exist

corosync.log:
:

Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.28.1 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.28.2 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources:  <lrm_resource id="testfs-OST0000_add680" type="Lustre" class="ocf" provider="lustre"/>
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++                                                                <lrm_rsc_op id="testfs-OST0000_add680_last_0" operation_key="testfs-OST0000_add680_monitor_0" operation="monitor" crm-debug-origin="do
_update_resource" crm_feature_set="3.0.14" transition-key="2:17:7:43e23f53-fdcf-43f7-8de0-943d43f8ab0a" transition-magic="-1:193;2:17:7:43e23f53-fdcf-43f7-8de0-943d43f8ab0a" exit-reason="" on_node="vm7" call-id="-1" rc-code="193" op-status="-1"
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++                                                              </lrm_resource>
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: create lrm_resources
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.28.2
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=vm7/crmd/21, version=0.28.2)
Nov 14 09:43:37  Lustre(testfs-OST0000_add680)[31191]:    ERROR: zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1/testfs-OST0000 is not mounted
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.28.2 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.0 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       -- /cib/configuration/resources/group[@id='group-testfs-OST0000_add680']/primitive[@id='testfs-OST0000_add680']
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @epoch=29, @num_updates=0
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++ /cib/configuration/constraints:  <rsc_location id="testfs-OST0001_44d8b5-secondary" node="vm7" rsc="testfs-OST0001_44d8b5" score="10"/>
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_replace operation for section configuration: OK (rc=0, origin=vm7/cibadmin/2, version=0.29.0)
Nov 14 09:43:37 [30376] vm8       crmd:   notice: abort_transition_graph:       Transition aborted by deletion of primitive[@id='testfs-OST0000_add680']: Configuration change | cib=0.29.0 source=te_update_diff:456 path=/cib/configuration/resources/group[@id='group-testfs-
OST0000_add680']/primitive[@id='testfs-OST0000_add680'] complete=false
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: stonith_device_remove:        Device 'testfs-OST0000_add680' not found (1 active devices)
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: create constraints
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.29.0
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-27.raw
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.0 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.1 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=1
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']/lrm_rsc_op[@id='testfs-OST0000_add680_last_0']:  @transition-magic=0:7;2:17:7:43e23f53-fdcf-43f7
-8de0-943d43f8ab0a, @call-id=19, @rc-code=7, @op-status=0, @exec-time=21
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=vm7/crmd/22, version=0.29.1)
Nov 14 09:43:37 [30376] vm8       crmd:     info: match_graph_event:    Action testfs-OST0000_add680_monitor_0 (2) confirmed on vm7 (rc=7)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Forwarding cib_modify operation for section status to all (origin=local/crmd/57)
Nov 14 09:43:37 [30376] vm8       crmd:   notice: process_lrm_event:    Result of probe operation for testfs-OST0000_add680 on vm8: 7 (not running) | call=17 key=testfs-OST0000_add680_monitor_0 confirmed=true cib-update=57
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: modify lrm_rsc_op[@id='testfs-OST0000_add680_last_0']
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.29.1
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.1 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.2 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']/lrm_rsc_op[@id='testfs-OST0000_add680_last_0']:  @transition-magic=0:7;3:17:7:43e23f53-fdcf-43f7
-8de0-943d43f8ab0a, @call-id=17, @rc-code=7, @op-status=0, @exec-time=28
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30376] vm8       crmd:     info: match_graph_event:    Action testfs-OST0000_add680_monitor_0 (3) confirmed on vm8 (rc=7)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=vm8/crmd/57, version=0.29.2)
Nov 14 09:43:37 [30376] vm8       crmd:   notice: run_graph:    Transition 17 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-199.bz2): Complete
Nov 14 09:43:37 [30376] vm8       crmd:     info: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: modify lrm_rsc_op[@id='testfs-OST0000_add680_last_0']
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.29.2
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30375] vm8    pengine:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status_fencing:      Node vm7 is active
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status:      Node vm7 is online
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status_fencing:      Node vm8 is active
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status:      Node vm8 is online
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_op_status:  Operation monitor found resource testfs-OST0000_add680-zfs active on vm7
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 1 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 2 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 1 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 2 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: common_print: st-fencing      (stonith:fence_chroma): Started vm7
Nov 14 09:43:37 [30375] vm8    pengine:     info: common_print: testfs-OST0001_44d8b5   (ocf::lustre:Lustre):   Stopped (disabled)
Nov 14 09:43:37 [30375] vm8    pengine:     info: group_print:   Resource Group: group-testfs-OST0000_add680
Nov 14 09:43:37 [30375] vm8    pengine:     info: common_print:      testfs-OST0000_add680-zfs  (ocf::chroma:ZFS):      Stopped (disabled)
Nov 14 09:43:37 [30375] vm8    pengine:   notice: DeleteRsc:    Removing testfs-OST0000_add680 from vm7
Nov 14 09:43:37 [30375] vm8    pengine:   notice: DeleteRsc:    Removing testfs-OST0000_add680 from vm8
Nov 14 09:43:37 [30375] vm8    pengine:     info: native_color: Resource testfs-OST0001_44d8b5 cannot run anywhere
Nov 14 09:43:37 [30375] vm8    pengine:     info: native_color: Resource testfs-OST0000_add680-zfs cannot run anywhere
Nov 14 09:43:37 [30375] vm8    pengine:     info: LogActions:   Leave   st-fencing      (Started vm7)
Nov 14 09:43:37 [30375] vm8    pengine:     info: LogActions:   Leave   testfs-OST0001_44d8b5   (Stopped)
Nov 14 09:43:37 [30375] vm8    pengine:     info: LogActions:   Leave   testfs-OST0000_add680-zfs       (Stopped)
Nov 14 09:43:37 [30375] vm8    pengine:   notice: process_pe_message:   Calculated transition 18, saving inputs in /var/lib/pacemaker/pengine/pe-input-200.bz2
Nov 14 09:43:37 [30376] vm8       crmd:     info: do_state_transition:  State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Nov 14 09:43:37 [30376] vm8       crmd:     info: do_te_invoke: Processing graph 18 (ref=pe_calc-dc-1542188617-41) derived from /var/lib/pacemaker/pengine/pe-input-200.bz2
Nov 14 09:43:37 [30376] vm8       crmd:   notice: te_rsc_command:       Initiating delete operation testfs-OST0000_add680_delete_0 locally on vm8 | action 3
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Forwarding cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680'] to all (origin=local/crmd/59)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.2 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.3 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       -- /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=3
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680']: OK (rc=0, origin=vm8/crmd/59, version=0.29.2)
Nov 14 09:43:37 [30376] vm8       crmd:     info: delete_resource:      Removing resource testfs-OST0000_add680 for tengine (internal) on (null)
Nov 14 09:43:37 [30376] vm8       crmd:     info: notify_deleted:       Notifying tengine on localhost that testfs-OST0000_add680 was deleted
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Forwarding cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680'] to all (origin=local/crmd/60)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.2 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.3 (null)

Resource creation and constrain adding is serialized so it's not a race between nodes:
screen shot 2018-11-14 at 12 54 38 pm

IML chroma-agent-daemon: InsecureRequestWarning

Describe the bug
Disable insecure requests warning is not working. This statement "urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)" in /usr/lib/python2.7/site-packages/chroma_agent/agent_daemon.py has no affect and still outputs following warnings on all lustre servers repeatedly ever few seconds

Warning:
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: InsecureRequestWarning)

To Reproduce
Steps to reproduce the behavior:

  1. Install IMLv5.0 on CentOS 7.5, with default python version 2.7.5
  2. IML agents installed on CentOS 7.4, with python version 2.7.5

Screenshots
pdsh> grep InsecureRequestWarning /var/log/messages | tail -2
mds002: May 31 14:43:26 mds002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
mds002: May 31 14:43:26 mds002 chroma-agent-daemon: InsecureRequestWarning)
mds001: May 31 14:43:28 mds001 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
mds001: May 31 14:43:28 mds001 chroma-agent-daemon: InsecureRequestWarning)
oss007: May 31 14:43:30 oss007 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss007: May 31 14:43:30 oss007 chroma-agent-daemon: InsecureRequestWarning)
oss006: May 31 14:43:27 oss006 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss006: May 31 14:43:27 oss006 chroma-agent-daemon: InsecureRequestWarning)
oss011: May 31 14:43:30 oss011 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss011: May 31 14:43:30 oss011 chroma-agent-daemon: InsecureRequestWarning)
oss005: May 31 14:43:29 oss005 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss005: May 31 14:43:29 oss005 chroma-agent-daemon: InsecureRequestWarning)
oss003: May 31 14:43:31 oss003 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss003: May 31 14:43:31 oss003 chroma-agent-daemon: InsecureRequestWarning)
oss001: May 31 14:43:25 oss001 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss001: May 31 14:43:25 oss001 chroma-agent-daemon: InsecureRequestWarning)
oss008: May 31 14:43:29 oss008 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss008: May 31 14:43:29 oss008 chroma-agent-daemon: InsecureRequestWarning)
oss012: May 31 14:43:30 oss012 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss012: May 31 14:43:30 oss012 chroma-agent-daemon: InsecureRequestWarning)
oss004: May 31 14:43:30 oss004 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss004: May 31 14:43:30 oss004 chroma-agent-daemon: InsecureRequestWarning)
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: InsecureRequestWarning)

Version

What version of IML? v5.0
What Operating System? CentOS
What Operating System version? v 7.5

Please let me know if you need additional information. Thank you!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.