whamcloud / iml-agent Goto Github PK

View Code? Open in Web Editor NEW

1.0 20.0 5.0 2.2 MB

Integrated Manager for Lustre Agent

License: MIT License

Makefile 0.24% Shell 2.30% Python 96.44% Ruby 1.02%

iml-agent's Introduction

iml-agent

The Manager for Lustre Agent

Copr Devel:

ZFS Resource Agent When the resource-agents release 4.0.1 is available widely, this can be phased out or become a symlink.

iml-agent's People

Contributors

Stargazers

Watchers

Forkers

brianjmurrell zaja1kun alextalker mrexox sagarbansal88

iml-agent's Issues

Remove retry loops from start/stop target

chroma_agent/action_plugins/manage_targets.py

Sited as HYD-1898 and HYD-7230.

These predate re-written wait_target and should be removed.

Resource not having constrains should not be fatal for start/stop

start_resource and stop_resource should not die if a resource has not location (or any other) constraints.

`update_profile` should not attempt to install packages

When running in an env where packages are preinstalled, update_profile should not attempt to install anything ref:

iml-agent/chroma_agent/action_plugins/agent_updates.py

Lines 81 to 92 in ec2a08b

    
           if old_profile["managed"] != profile["managed"]: 
        
               if profile["managed"]: 
        
                   action = "install" 
        
               else: 
        
                   action = "remove" 
        
               try: 
        
                   yum_util(action, packages=["python2-iml-agent-management"]) 
        
               except AgentShell.CommandExecutionError as cee: 
        
                   return agent_error( 
        
                       "Unable to set profile because yum returned %s" % cee.result.stdout 
        
                   )

Likewise configure_repo be a noop if the trimmed repo contents are empty:

iml-agent/chroma_agent/action_plugins/agent_updates.py

Lines 24 to 44 in ec2a08b

    
           def configure_repo(filename, file_contents): 
        
               crypto = Crypto(ENV_PATH) 
        
               full_filename = os.path.join(REPO_PATH, filename) 
        
               temp_full_filename = full_filename + ".tmp" 
        
               # this format needs to match create_repo() in manager agent-bootstrap-script 
        
               file_contents = file_contents.format( 
        
                   crypto.AUTHORITY_FILE, crypto.PRIVATE_KEY_FILE, crypto.CERTIFICATE_FILE 
        
               ) 
        
               try: 
        
                   file_handle = os.fdopen( 
        
                       os.open(temp_full_filename, os.O_WRONLY | os.O_CREAT, 0o644), "w" 
        
                   ) 
        
                   file_handle.write(file_contents) 
        
                   file_handle.close() 
        
                   os.rename(temp_full_filename, full_filename) 
        
               except OSError as error: 
        
                   return agent_error(str(error)) 
        
               return agent_result_ok

Do not perform update to packages during IML install

IML has an action to install_packages. install_packages does an install, followed by an update as seen here:

iml-agent/chroma_agent/action_plugins/agent_updates.py

Lines 113 to 121 in d1b6887

    
           yum_util('install', enablerepo=repos, packages=packages) 
        
           # So now we have installed the packages requested, we will also make sure that any installed packages we 
        
           # have that are already installed are updated to our presumably better versions. 
        
           update_packages = yum_check_update(repos) 
        
           if update_packages: 
        
               daemon_log.debug("The following packages need update after we installed IML packages %s" % update_packages) 
        
               yum_util('update', packages=update_packages, enablerepo=repos)

The update seems like an overreach for IML, even in managed mode as it doesn't appear to serve a purpose for IML managed repos.

The install_packages action is called on the manager side here:

https://github.com/whamcloud/integrated-manager-for-lustre/blob/054a091ed7eb7e3b001cc023feffd1adab493fae/chroma_core/models/host.py#L695-L698

and here:

https://github.com/whamcloud/integrated-manager-for-lustre/blob/054a091ed7eb7e3b001cc023feffd1adab493fae/chroma_core/models/host.py#L1415-L1419

In either case, a {yum,dnf} install should trigger an update if needed, so we should be ok to remove this.

Fixes whamcloud/integrated-manager-for-lustre#698

ValueError: could not convert string to float:

We are seeing this in chroma-agent.logs:

[18/Apr/2018:21:15:51] console DEBUG Exception raised in sandbox START:
  File "/usr/lib/python2.7/site-packages/iml_common/lib/exception_sandbox.py", line 23, in wrapper
    return function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/local.py", line 68, in properties
    return dict(item for cls in self.audit_classes() for item in cls().properties().items())
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/local.py", line 68, in <genexpr>
    return dict(item for cls in self.audit_classes() for item in cls().properties().items())
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/node.py", line 64, in properties
    'distro_version': float('.'.join(platform.linux_distribution()[1].split('.')[:2])),
ValueError: could not convert string to float: 
Exception raised in sandbox END

lustre-audit mount detection broken for raidZ pools

In:

iml-agent/chroma_agent/device_plugins/lustre.py

Lines 155 to 158 in a167f1c

    
           new_device = next( 
        
               child['Disk']['path'] for child in pool['vdev']['Root']['children'] 
        
               if child.get('Disk') 
        
           )

The code is making an assumption that the VDev Root children will always contain a Disk.

This is true for a pool with at least one top-level backing Disk, ex:

     "vdev": {
        "Root": {
          "children": [
            {
              "Disk": {
                "guid": 1031789542385231700,
                "state": "ONLINE",
                "path": "/dev/disk/by-id/dm-uuid-mpath-3600140560ab2b7d5caa4c18a2266b935",
                "dev_id": "dm-uuid-mpath-3600140560ab2b7d5caa4c18a2266b935",
                "phys_path": null,
                "whole_disk": false,
                "is_log": false
              }
            }
          ],
          "spares": [],
          "cache": []
        }
      }

but it is not true for any other type of setup.

https://github.com/whamcloud/rust-libzfs/blob/30f5771d56be31d8ece98024276fff030b5a759a/libzfs/src/vdev.rs#L15-L46

Shows the full enum that would need to be accounted for.

The current implementation raises an error like:

[07/Jul/2018:03:23:30] console DEBUG Exception raised in sandbox START:
  File "/usr/lib/python2.7/site-packages/iml_common/lib/exception_sandbox.py", line 23, in wrapper
    return function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/lustre.py", line 201, in _scan_mounts
    fs_label, fs_uuid, new_device = process_zfs_mount(device, data, zfs_mounts)
  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/lustre.py", line 156, in process_zfs_mount
    child['Disk']['path'] for child in pool['vdev']['Root']['children']
StopIteration

While looking at this section, it appears this code is not actually being used on the manager so it should be ok to remove.

_has_link() in lib/corosync.py returns wrong status in case of link state is Unknown

on the host side:

[root@sfa7990-c0 ~]# ip link set up dev enp0s20f0u8u2c2 ; echo $?
0
[root@sfa7990-c0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:01:ff:0d:00:40 brd ff:ff:ff:ff:ff:ff
    inet 10.36.44.22/22 brd 10.36.47.255 scope global dynamic eno5
       valid_lft 13617sec preferred_lft 13617sec
    inet6 fe80::201:ffff:fe0d:40/64 scope link 
       valid_lft forever preferred_lft forever
3: eno6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 00:01:ff:4d:00:40 brd ff:ff:ff:ff:ff:ff
4: enp0s20f0u8u2c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 3a:18:3b:7c:53:19 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3818:3bff:fe7c:5319/64 scope link 
       valid_lft forever preferred_lft forever
5: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:07:f9:fe:80:00:00:00:00:00:00:50:6b:4b:03:00:23:b7:cc brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.172.172.22/24 brd 172.172.172.255 scope global ib0
       valid_lft forever preferred_lft forever
    inet6 fe80::526b:4b03:23:b7cc/64 scope link 
       valid_lft forever preferred_lft forever
6: ib1: <BROADCAST,MULTICAST> mtu 4092 qdisc mq state DOWN group default qlen 256
    link/infiniband 00:00:11:b0:fe:80:00:00:00:00:00:00:50:6b:4b:03:00:23:b7:cd brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

[root@sfa7990-c0 ~]# cat /sys/class/net/enp0s20f0u8u2c2/carrier
1
[root@sfa7990-c0 ~]# cat /sys/class/net/enp0s20f0u8u2c2/operstate 
unknown

root@sfa7990-c1 ~]# for nic in $(ls -1 /sys/class/net/); do echo $nic: $(readlink /sys/class/net/$nic/device/driver); done  
eno5: ../../../../bus/pci/drivers/igb
eno6: ../../../../bus/pci/drivers/igb
enp0s20f0u8u2c2: ../../../../../../../bus/usb/drivers/cdc_ether
ib0: ../../../../bus/pci/drivers/mlx5_core
ib1: ../../../../bus/pci/drivers/mlx5_core
lo:

[root@sfa7990-c1 ~]# cat /sys/class/net/*/device/interface 
CDC Notification Interface

[root@sfa7990-c1 ~]# modinfo cdc_ether
filename:       /lib/modules/3.10.0-957.el7_lustre.x86_64/kernel/drivers/net/usb/cdc_ether.ko.xz
license:        GPL
description:    USB CDC Ethernet devices
author:         David Brownell
retpoline:      Y
rhelversion:    7.6
srcversion:     D329B19ACE6E9677F544BB8

[root@sfa7990-c0 tmp]# python
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import array, struct, fcntl, socket
>>> SIOCETHTOOL = 0x8946
>>> ETHTOOL_GLINK = 0x0000000a
>>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> struct
struct
>>> ecmd = array.array('B', struct.pack('2I', ETHTOOL_GLINK, 0))
>>> ifreq = struct.pack('16sP', 'enp0s20f0u8u2c2', ecmd.buffer_info()[0])
>>> fcntl.ioctl(sock.fileno(), SIOCETHTOOL, ifreq)
'enp0s20f0u8u2c2\x000\x14J\x01\x00\x00\x00\x00'
>>> sock.close()
>>> struct.unpack('4xI', ecmd.tostring())[0]
1
>>> def _has_link(name):
...     import array
...     import struct
...     import fcntl
...     import socket
...     SIOCETHTOOL = 0x8946
...     ETHTOOL_GLINK = 0x0000000a
...     sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
...     ecmd = array.array('B', struct.pack('2I', ETHTOOL_GLINK, 0))
...     ifreq = struct.pack('16sP', name, ecmd.buffer_info()[0])
...     fcntl.ioctl(sock.fileno(), SIOCETHTOOL, ifreq)
...     sock.close()
...     return bool(struct.unpack('4xI', ecmd.tostring())[0])
... 
>>> 
>>> 
>>> 
>>> _has_link('ib0')
True
>>> _has_link('ib1')
False
>>> _has_link('lo')
True
>>> _has_link('enp0s20f0u8u2c2')
True

According to https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net interface could have next states :

What:		/sys/class/net/<iface>/operstate
Date:		March 2006
KernelVersion:	2.6.17
Contact:	[email protected]
Description:
		Indicates the interface RFC2863 operational state as a string.
		Possible values are:
		"unknown", "notpresent", "down", "lowerlayerdown", "testing",
		"dormant", "up".

I suggest to change code logic in a way when everything except 'up' return False.

Created repo file contains "Intel" in the filename

When the agent is deployed it creates the following repo file: /etc/yum.repos.d/Intel-Lustre-Agent.repo. This should be "Integrated-Manager-For-Lustre-Agent.repo" or something along those lines.

use --as-xml output option of crm_mon

We frequently encounter issues with crm_mon's output format changing. That's because we are trying to "read" a human-targeted output format.

crm_mon has an --as-xml output format switch which should produce a more stable XML format instead. We should switch to using it.

Issues from scapy bump

I've started seeing the following traceback today:

Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/action_runner.py", line 182, in run
    self.action, agent_daemon_context, self.args

  File "/usr/lib/python2.7/site-packages/chroma_agent/plugin_manager.py", line 321, in run
    return fn(**args)

  File "/usr/lib/python2.7/site-packages/chroma_agent/action_plugins/manage_corosync_common.py", line 38, in get_corosync_autoconfig
    ring1 = detect_ring1(ring0, ring1_ipaddr, ring1_prefix)

  File "/usr/lib/python2.7/site-packages/chroma_agent/lib/corosync.py", line 139, in detect_ring1
    iface.mcastport = find_unused_port(ring0)

  File "/usr/lib/python2.7/site-packages/chroma_agent/lib/corosync.py", line 206, in find_unused_port
    for dport in set(dports):

  File "/usr/lib/python2.7/site-packages/scapy/packet.py", line 1459, in __hash__
    raise TypeError('unhashable type: %r' % self.__class__.__name__)

TypeError: unhashable type: 'Ether'

This is in the following code:

iml-agent/chroma_agent/lib/corosync.py

Lines 206 to 211 in 7695c54

    
           for dport in set(dports): 
        
               try: 
        
                   ports.remove(dport) 
        
               except ValueError: 
        
                   # already removed 
        
                   pass

A recent change in scapy made subclassess unhashable, meaning we can no longer use a set here.

However, we should be able to remove the set without issue, as any port which has already been removed is passed over.

Failover node isn't detected on Active/Passive storage

Hi @jgrund @utopiabound

I meet a certain problem with filesystem detection.
I have a double-controller storage system where block devices either in active or passive state.
Filesystem created on a block device that is available only if its state is 'active'.
Trouble it that your Primary/Failover node detection is based on a fact that filesystem metadata can at least be read from block devices which isn't my case. However, fail-over information specified in filesystem configuration do contain address of the 'passive' node.
My question that is there is a way to detect filesystems in such configurations with successful determination of a fail-over node?
I don't know if this question is more related to UI or the agent, so feel free to point out it for me.

set_iml_profile has a bug

set_iml_profile has a bug where it calls .join on a list.

`set_iml_profile` does not run on upgrade

In:

iml-agent/chroma_agent/action_plugins/settings_management.py

Lines 17 to 25 in 6fda4ed

    
           profile = json.loads(profile_json) 
        
           try: 
        
               config.set("settings", "profile", profile) 
        
               set_iml_profile( 
        
                   profile.get("name"), profile.get("bundles"), profile.get("packages") 
        
               ) 
        
           except ConfigKeyExistsError: 
        
               config.update("settings", "profile", profile)

/etc/iml/profile.conf will not be created if it's settings equivalent already exists.

This means it will get skipped on an upgrade scenario.

Use Lustre resource agents instead of chroma agent

Configure pacemaker using standard resource and fencing agents instead of custom IML ones.

Back to back connection should be used for ring0

IML is using corosync "rrp_mode: passive" and it means that only one interface is in use and the second one will be used in case of ring0 NIC failure.
According to https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_ch_03_building.html :

A dedicated Ethernet port capable of one gigabit/sec. This port connects to the Management network.
HA servers are configured in pairs, with a primary server and a failover server.
Crossover cable – Each HA server (excluding the manager server) is connected to its peer HA server by a private crossover link. This is an Ethernet connection.
`

According to function get_ring0() in lib/corosync.py management interface will be used for ring0:

`
def get_ring0():
# ring0 will always be on the interface used for agent->manager comms
from urlparse import urlparse
server_url = urljoin(os.environ["IML_MANAGER_URL"], "agent")
manager_address = socket.gethostbyname(urlparse(server_url).hostname)
out = AgentShell.try_run(['/sbin/ip', 'route', 'get', manager_address])
match = re.search(r'dev\s+([^\s]+)', out)
if match:
manager_dev = match.groups()[0]
else:
raise RuntimeError("Unable to find ring0 dev in %s" % out)

console_log.info("Chose %s for corosync ring0" % manager_dev)
ring0 = CorosyncRingInterface(manager_dev)

if ring0.ipv4_prefixlen < 9:
    raise RuntimeError("%s subnet is too large (/%s)" %
                       (ring0.name, ring0.ipv4_prefixlen))

return ring0

It logically wrong to assign rings (cluster transport level which could send/receive information across all nodes in the cluster) to management network which could be available from the World. Another cons to do that is a fact that Switch L2+/Routers/Firewalls could block multicast/unicast traffic.
In any way we have dedicated interface (back-to-back) which should be used as ring0 and for any inter-cluster communications.

Switch from yum+dnf to just yum

In line with iml-update-check#10.

Because of meta-data incoherence between dnf and yum, we need either use only-yum or pull infrom and synthesize results from both dnf and yum.

Revert scsi device hack

Revert 5dbb409 once LU-11461 is fixed for Lustre 2.10/2.12.

Lustre master change: https://review.whamcloud.com/33277

fence_chroma monitor action always return status 0

According to official documentation :

Monitoring the fencing devices
Just like any other resource, the stonith class agents also support the monitor operation. Given that we have often seen monitor either not configured or configured in a wrong way, we have decided to devote a section to the matter.
Monitoring stonith resources, which is actually checking status of the corresponding fencing devices, is strongly recommended. So strongly, that we should consider a configuration without it invalid.

Good example of how it works could be taken from "fence-agents-scsi rpm for CentOS v7".

Due to issue 824 we could have a situation when we don't have fencing device or our fencing device is not ready and as a result fencing will not occur never.

ZFS backed resource create occasionally fails in SSI test

Dataset of commands run:

crm_resource -W -r testfs-OST0000_add680: 6

Resource 'testfs-OST0000_add680' not found
Error performing operation: No such device or address

pcs resource create testfs-OST0000_add680-zfs ocf:chroma:ZFS pool=zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1 op start timeout=90 op stop timeout=90 --disabled --group group-testfs-OST0000_add680: 0


pcs resource create testfs-OST0000_add680 ocf:lustre:Lustre target=zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1/testfs-OST0000 mountpoint=/mnt/testfs-OST0000 --disabled --group group-testfs-OST0000_add680: 0


pcs constraint location add testfs-OST0000_add680-primary testfs-OST0000_add680 vm7 20: 1

Error: Resource 'testfs-OST0000_add680' does not exist

corosync.log:
:

Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.28.1 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.28.2 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources:  <lrm_resource id="testfs-OST0000_add680" type="Lustre" class="ocf" provider="lustre"/>
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++                                                                <lrm_rsc_op id="testfs-OST0000_add680_last_0" operation_key="testfs-OST0000_add680_monitor_0" operation="monitor" crm-debug-origin="do
_update_resource" crm_feature_set="3.0.14" transition-key="2:17:7:43e23f53-fdcf-43f7-8de0-943d43f8ab0a" transition-magic="-1:193;2:17:7:43e23f53-fdcf-43f7-8de0-943d43f8ab0a" exit-reason="" on_node="vm7" call-id="-1" rc-code="193" op-status="-1"
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++                                                              </lrm_resource>
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: create lrm_resources
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.28.2
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=vm7/crmd/21, version=0.28.2)
Nov 14 09:43:37  Lustre(testfs-OST0000_add680)[31191]:    ERROR: zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1/testfs-OST0000 is not mounted
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.28.2 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.0 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       -- /cib/configuration/resources/group[@id='group-testfs-OST0000_add680']/primitive[@id='testfs-OST0000_add680']
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @epoch=29, @num_updates=0
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       ++ /cib/configuration/constraints:  <rsc_location id="testfs-OST0001_44d8b5-secondary" node="vm7" rsc="testfs-OST0001_44d8b5" score="10"/>
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_replace operation for section configuration: OK (rc=0, origin=vm7/cibadmin/2, version=0.29.0)
Nov 14 09:43:37 [30376] vm8       crmd:   notice: abort_transition_graph:       Transition aborted by deletion of primitive[@id='testfs-OST0000_add680']: Configuration change | cib=0.29.0 source=te_update_diff:456 path=/cib/configuration/resources/group[@id='group-testfs-
OST0000_add680']/primitive[@id='testfs-OST0000_add680'] complete=false
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: stonith_device_remove:        Device 'testfs-OST0000_add680' not found (1 active devices)
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: create constraints
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.29.0
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-27.raw
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.0 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.1 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=1
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']/lrm_rsc_op[@id='testfs-OST0000_add680_last_0']:  @transition-magic=0:7;2:17:7:43e23f53-fdcf-43f7
-8de0-943d43f8ab0a, @call-id=19, @rc-code=7, @op-status=0, @exec-time=21
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=vm7/crmd/22, version=0.29.1)
Nov 14 09:43:37 [30376] vm8       crmd:     info: match_graph_event:    Action testfs-OST0000_add680_monitor_0 (2) confirmed on vm7 (rc=7)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Forwarding cib_modify operation for section status to all (origin=local/crmd/57)
Nov 14 09:43:37 [30376] vm8       crmd:   notice: process_lrm_event:    Result of probe operation for testfs-OST0000_add680 on vm8: 7 (not running) | call=17 key=testfs-OST0000_add680_monitor_0 confirmed=true cib-update=57
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: modify lrm_rsc_op[@id='testfs-OST0000_add680_last_0']
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.29.1
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.1 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.2 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']/lrm_rsc_op[@id='testfs-OST0000_add680_last_0']:  @transition-magic=0:7;3:17:7:43e23f53-fdcf-43f7
-8de0-943d43f8ab0a, @call-id=17, @rc-code=7, @op-status=0, @exec-time=28
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30376] vm8       crmd:     info: match_graph_event:    Action testfs-OST0000_add680_monitor_0 (3) confirmed on vm8 (rc=7)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=vm8/crmd/57, version=0.29.2)
Nov 14 09:43:37 [30376] vm8       crmd:   notice: run_graph:    Transition 17 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-199.bz2): Complete
Nov 14 09:43:37 [30376] vm8       crmd:     info: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: update_cib_stonith_devices_v2:        Updating device list from the cib: modify lrm_rsc_op[@id='testfs-OST0000_add680_last_0']
Nov 14 09:43:37 [30372] vm8 stonith-ng:     info: cib_devices_update:   Updating devices to version 0.29.2
Nov 14 09:43:37 [30372] vm8 stonith-ng:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30375] vm8    pengine:   notice: unpack_config:        On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status_fencing:      Node vm7 is active
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status:      Node vm7 is online
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status_fencing:      Node vm8 is active
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_online_status:      Node vm8 is online
Nov 14 09:43:37 [30375] vm8    pengine:     info: determine_op_status:  Operation monitor found resource testfs-OST0000_add680-zfs active on vm7
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 1 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 2 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 1 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: unpack_node_loop:     Node 2 is already processed
Nov 14 09:43:37 [30375] vm8    pengine:     info: common_print: st-fencing      (stonith:fence_chroma): Started vm7
Nov 14 09:43:37 [30375] vm8    pengine:     info: common_print: testfs-OST0001_44d8b5   (ocf::lustre:Lustre):   Stopped (disabled)
Nov 14 09:43:37 [30375] vm8    pengine:     info: group_print:   Resource Group: group-testfs-OST0000_add680
Nov 14 09:43:37 [30375] vm8    pengine:     info: common_print:      testfs-OST0000_add680-zfs  (ocf::chroma:ZFS):      Stopped (disabled)
Nov 14 09:43:37 [30375] vm8    pengine:   notice: DeleteRsc:    Removing testfs-OST0000_add680 from vm7
Nov 14 09:43:37 [30375] vm8    pengine:   notice: DeleteRsc:    Removing testfs-OST0000_add680 from vm8
Nov 14 09:43:37 [30375] vm8    pengine:     info: native_color: Resource testfs-OST0001_44d8b5 cannot run anywhere
Nov 14 09:43:37 [30375] vm8    pengine:     info: native_color: Resource testfs-OST0000_add680-zfs cannot run anywhere
Nov 14 09:43:37 [30375] vm8    pengine:     info: LogActions:   Leave   st-fencing      (Started vm7)
Nov 14 09:43:37 [30375] vm8    pengine:     info: LogActions:   Leave   testfs-OST0001_44d8b5   (Stopped)
Nov 14 09:43:37 [30375] vm8    pengine:     info: LogActions:   Leave   testfs-OST0000_add680-zfs       (Stopped)
Nov 14 09:43:37 [30375] vm8    pengine:   notice: process_pe_message:   Calculated transition 18, saving inputs in /var/lib/pacemaker/pengine/pe-input-200.bz2
Nov 14 09:43:37 [30376] vm8       crmd:     info: do_state_transition:  State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Nov 14 09:43:37 [30376] vm8       crmd:     info: do_te_invoke: Processing graph 18 (ref=pe_calc-dc-1542188617-41) derived from /var/lib/pacemaker/pengine/pe-input-200.bz2
Nov 14 09:43:37 [30376] vm8       crmd:   notice: te_rsc_command:       Initiating delete operation testfs-OST0000_add680_delete_0 locally on vm8 | action 3
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Forwarding cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680'] to all (origin=local/crmd/59)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.2 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.3 (null)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       -- /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       +  /cib:  @num_updates=3
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Completed cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680']: OK (rc=0, origin=vm8/crmd/59, version=0.29.2)
Nov 14 09:43:37 [30376] vm8       crmd:     info: delete_resource:      Removing resource testfs-OST0000_add680 for tengine (internal) on (null)
Nov 14 09:43:37 [30376] vm8       crmd:     info: notify_deleted:       Notifying tengine on localhost that testfs-OST0000_add680 was deleted
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_process_request:  Forwarding cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680'] to all (origin=local/crmd/60)
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: --- 0.29.2 2
Nov 14 09:43:37 [30371] vm8        cib:     info: cib_perform_op:       Diff: +++ 0.29.3 (null)

Resource creation and constrain adding is serialized so it's not a race between nodes:

IML chroma-agent-daemon: InsecureRequestWarning

Describe the bug
Disable insecure requests warning is not working. This statement "urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)" in /usr/lib/python2.7/site-packages/chroma_agent/agent_daemon.py has no affect and still outputs following warnings on all lustre servers repeatedly ever few seconds

Warning:
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: InsecureRequestWarning)

To Reproduce
Steps to reproduce the behavior:

Install IMLv5.0 on CentOS 7.5, with default python version 2.7.5
IML agents installed on CentOS 7.4, with python version 2.7.5

Screenshots
pdsh> grep InsecureRequestWarning /var/log/messages | tail -2
mds002: May 31 14:43:26 mds002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
mds002: May 31 14:43:26 mds002 chroma-agent-daemon: InsecureRequestWarning)
mds001: May 31 14:43:28 mds001 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
mds001: May 31 14:43:28 mds001 chroma-agent-daemon: InsecureRequestWarning)
oss007: May 31 14:43:30 oss007 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss007: May 31 14:43:30 oss007 chroma-agent-daemon: InsecureRequestWarning)
oss006: May 31 14:43:27 oss006 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss006: May 31 14:43:27 oss006 chroma-agent-daemon: InsecureRequestWarning)
oss011: May 31 14:43:30 oss011 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss011: May 31 14:43:30 oss011 chroma-agent-daemon: InsecureRequestWarning)
oss005: May 31 14:43:29 oss005 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss005: May 31 14:43:29 oss005 chroma-agent-daemon: InsecureRequestWarning)
oss003: May 31 14:43:31 oss003 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss003: May 31 14:43:31 oss003 chroma-agent-daemon: InsecureRequestWarning)
oss001: May 31 14:43:25 oss001 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss001: May 31 14:43:25 oss001 chroma-agent-daemon: InsecureRequestWarning)
oss008: May 31 14:43:29 oss008 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss008: May 31 14:43:29 oss008 chroma-agent-daemon: InsecureRequestWarning)
oss012: May 31 14:43:30 oss012 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss012: May 31 14:43:30 oss012 chroma-agent-daemon: InsecureRequestWarning)
oss004: May 31 14:43:30 oss004 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss004: May 31 14:43:30 oss004 chroma-agent-daemon: InsecureRequestWarning)
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: InsecureRequestWarning)

Version

What version of IML? v5.0
What Operating System? CentOS
What Operating System version? v 7.5

Please let me know if you need additional information. Thank you!!

systemctl preset %{unit_name} >/dev/null 2>&1

We call preset on chroma-agent.service.

We also need to call it on iml-storage-server.target

	if old_profile["managed"] != profile["managed"]:
	if profile["managed"]:
	action = "install"
	else:
	action = "remove"

	try:
	yum_util(action, packages=["python2-iml-agent-management"])
	except AgentShell.CommandExecutionError as cee:
	return agent_error(
	"Unable to set profile because yum returned %s" % cee.result.stdout
	)

	def configure_repo(filename, file_contents):
	crypto = Crypto(ENV_PATH)
	full_filename = os.path.join(REPO_PATH, filename)
	temp_full_filename = full_filename + ".tmp"

	# this format needs to match create_repo() in manager agent-bootstrap-script
	file_contents = file_contents.format(
	crypto.AUTHORITY_FILE, crypto.PRIVATE_KEY_FILE, crypto.CERTIFICATE_FILE
	)

	try:
	file_handle = os.fdopen(
	os.open(temp_full_filename, os.O_WRONLY \| os.O_CREAT, 0o644), "w"
	)
	file_handle.write(file_contents)
	file_handle.close()
	os.rename(temp_full_filename, full_filename)
	except OSError as error:
	return agent_error(str(error))

	return agent_result_ok

	yum_util('install', enablerepo=repos, packages=packages)

	# So now we have installed the packages requested, we will also make sure that any installed packages we
	# have that are already installed are updated to our presumably better versions.
	update_packages = yum_check_update(repos)

	if update_packages:
	daemon_log.debug("The following packages need update after we installed IML packages %s" % update_packages)
	yum_util('update', packages=update_packages, enablerepo=repos)

	new_device = next(
	child['Disk']['path'] for child in pool['vdev']['Root']['children']
	if child.get('Disk')
	)

	for dport in set(dports):
	try:
	ports.remove(dport)
	except ValueError:
	# already removed
	pass

	profile = json.loads(profile_json)

	try:
	config.set("settings", "profile", profile)
	set_iml_profile(
	profile.get("name"), profile.get("bundles"), profile.get("packages")
	)
	except ConfigKeyExistsError:
	config.update("settings", "profile", profile)

whamcloud / iml-agent Goto Github PK

iml-agent's Introduction

iml-agent

iml-agent's People

Contributors

Stargazers

Watchers

Forkers

iml-agent's Issues

Version

Recommend Projects

Recommend Topics

Recommend Org