The Manager for Lustre Agent
ZFS Resource Agent When the resource-agents release 4.0.1 is available widely, this can be phased out or become a symlink.
Integrated Manager for Lustre Agent
License: MIT License
The Manager for Lustre Agent
ZFS Resource Agent When the resource-agents release 4.0.1 is available widely, this can be phased out or become a symlink.
chroma_agent/action_plugins/manage_targets.py
Sited as HYD-1898 and HYD-7230.
These predate re-written wait_target and should be removed.
start_resource
and stop_resource
should not die if a resource has not location (or any other) constraints.
When running in an env where packages are preinstalled, update_profile
should not attempt to install anything ref:
iml-agent/chroma_agent/action_plugins/agent_updates.py
Lines 81 to 92 in ec2a08b
Likewise configure_repo
be a noop if the trimmed repo contents are empty:
iml-agent/chroma_agent/action_plugins/agent_updates.py
Lines 24 to 44 in ec2a08b
IML has an action to install_packages
. install_packages
does an install, followed by an update as seen here:
iml-agent/chroma_agent/action_plugins/agent_updates.py
Lines 113 to 121 in d1b6887
The update seems like an overreach for IML, even in managed mode as it doesn't appear to serve a purpose for IML managed repos.
The install_packages
action is called on the manager side here:
and here:
In either case, a {yum
,dnf
} install should trigger an update if needed, so we should be ok to remove this.
Fixes whamcloud/integrated-manager-for-lustre#698
We are seeing this in chroma-agent.logs:
[18/Apr/2018:21:15:51] console DEBUG Exception raised in sandbox START:
File "/usr/lib/python2.7/site-packages/iml_common/lib/exception_sandbox.py", line 23, in wrapper
return function(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/local.py", line 68, in properties
return dict(item for cls in self.audit_classes() for item in cls().properties().items())
File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/local.py", line 68, in <genexpr>
return dict(item for cls in self.audit_classes() for item in cls().properties().items())
File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/audit/node.py", line 64, in properties
'distro_version': float('.'.join(platform.linux_distribution()[1].split('.')[:2])),
ValueError: could not convert string to float:
Exception raised in sandbox END
In:
iml-agent/chroma_agent/device_plugins/lustre.py
Lines 155 to 158 in a167f1c
The code is making an assumption that the VDev Root
children
will always contain a Disk
.
This is true for a pool with at least one top-level backing Disk, ex:
"vdev": {
"Root": {
"children": [
{
"Disk": {
"guid": 1031789542385231700,
"state": "ONLINE",
"path": "/dev/disk/by-id/dm-uuid-mpath-3600140560ab2b7d5caa4c18a2266b935",
"dev_id": "dm-uuid-mpath-3600140560ab2b7d5caa4c18a2266b935",
"phys_path": null,
"whole_disk": false,
"is_log": false
}
}
],
"spares": [],
"cache": []
}
}
but it is not true for any other type of setup.
Shows the full enum that would need to be accounted for.
The current implementation raises an error like:
[07/Jul/2018:03:23:30] console DEBUG Exception raised in sandbox START:
File "/usr/lib/python2.7/site-packages/iml_common/lib/exception_sandbox.py", line 23, in wrapper
return function(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/lustre.py", line 201, in _scan_mounts
fs_label, fs_uuid, new_device = process_zfs_mount(device, data, zfs_mounts)
File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/lustre.py", line 156, in process_zfs_mount
child['Disk']['path'] for child in pool['vdev']['Root']['children']
StopIteration
While looking at this section, it appears this code is not actually being used on the manager so it should be ok to remove.
on the host side:
[root@sfa7990-c0 ~]# ip link set up dev enp0s20f0u8u2c2 ; echo $?
0
[root@sfa7990-c0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:01:ff:0d:00:40 brd ff:ff:ff:ff:ff:ff
inet 10.36.44.22/22 brd 10.36.47.255 scope global dynamic eno5
valid_lft 13617sec preferred_lft 13617sec
inet6 fe80::201:ffff:fe0d:40/64 scope link
valid_lft forever preferred_lft forever
3: eno6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 00:01:ff:4d:00:40 brd ff:ff:ff:ff:ff:ff
4: enp0s20f0u8u2c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether 3a:18:3b:7c:53:19 brd ff:ff:ff:ff:ff:ff
inet6 fe80::3818:3bff:fe7c:5319/64 scope link
valid_lft forever preferred_lft forever
5: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband 00:00:07:f9:fe:80:00:00:00:00:00:00:50:6b:4b:03:00:23:b7:cc brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.172.172.22/24 brd 172.172.172.255 scope global ib0
valid_lft forever preferred_lft forever
inet6 fe80::526b:4b03:23:b7cc/64 scope link
valid_lft forever preferred_lft forever
6: ib1: <BROADCAST,MULTICAST> mtu 4092 qdisc mq state DOWN group default qlen 256
link/infiniband 00:00:11:b0:fe:80:00:00:00:00:00:00:50:6b:4b:03:00:23:b7:cd brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
[root@sfa7990-c0 ~]# cat /sys/class/net/enp0s20f0u8u2c2/carrier
1
[root@sfa7990-c0 ~]# cat /sys/class/net/enp0s20f0u8u2c2/operstate
unknown
root@sfa7990-c1 ~]# for nic in $(ls -1 /sys/class/net/); do echo $nic: $(readlink /sys/class/net/$nic/device/driver); done
eno5: ../../../../bus/pci/drivers/igb
eno6: ../../../../bus/pci/drivers/igb
enp0s20f0u8u2c2: ../../../../../../../bus/usb/drivers/cdc_ether
ib0: ../../../../bus/pci/drivers/mlx5_core
ib1: ../../../../bus/pci/drivers/mlx5_core
lo:
[root@sfa7990-c1 ~]# cat /sys/class/net/*/device/interface
CDC Notification Interface
[root@sfa7990-c1 ~]# modinfo cdc_ether
filename: /lib/modules/3.10.0-957.el7_lustre.x86_64/kernel/drivers/net/usb/cdc_ether.ko.xz
license: GPL
description: USB CDC Ethernet devices
author: David Brownell
retpoline: Y
rhelversion: 7.6
srcversion: D329B19ACE6E9677F544BB8
[root@sfa7990-c0 tmp]# python
Python 2.7.5 (default, Oct 30 2018, 23:45:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import array, struct, fcntl, socket
>>> SIOCETHTOOL = 0x8946
>>> ETHTOOL_GLINK = 0x0000000a
>>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> struct
struct
>>> ecmd = array.array('B', struct.pack('2I', ETHTOOL_GLINK, 0))
>>> ifreq = struct.pack('16sP', 'enp0s20f0u8u2c2', ecmd.buffer_info()[0])
>>> fcntl.ioctl(sock.fileno(), SIOCETHTOOL, ifreq)
'enp0s20f0u8u2c2\x000\x14J\x01\x00\x00\x00\x00'
>>> sock.close()
>>> struct.unpack('4xI', ecmd.tostring())[0]
1
>>> def _has_link(name):
... import array
... import struct
... import fcntl
... import socket
... SIOCETHTOOL = 0x8946
... ETHTOOL_GLINK = 0x0000000a
... sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
... ecmd = array.array('B', struct.pack('2I', ETHTOOL_GLINK, 0))
... ifreq = struct.pack('16sP', name, ecmd.buffer_info()[0])
... fcntl.ioctl(sock.fileno(), SIOCETHTOOL, ifreq)
... sock.close()
... return bool(struct.unpack('4xI', ecmd.tostring())[0])
...
>>>
>>>
>>>
>>> _has_link('ib0')
True
>>> _has_link('ib1')
False
>>> _has_link('lo')
True
>>> _has_link('enp0s20f0u8u2c2')
True
According to https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net interface could have next states :
What: /sys/class/net/<iface>/operstate
Date: March 2006
KernelVersion: 2.6.17
Contact: [email protected]
Description:
Indicates the interface RFC2863 operational state as a string.
Possible values are:
"unknown", "notpresent", "down", "lowerlayerdown", "testing",
"dormant", "up".
I suggest to change code logic in a way when everything except 'up' return False.
When the agent is deployed it creates the following repo file: /etc/yum.repos.d/Intel-Lustre-Agent.repo. This should be "Integrated-Manager-For-Lustre-Agent.repo" or something along those lines.
We frequently encounter issues with crm_mon
's output format changing. That's because we are trying to "read" a human-targeted output format.
crm_mon
has an --as-xml
output format switch which should produce a more stable XML format instead. We should switch to using it.
I've started seeing the following traceback today:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/chroma_agent/device_plugins/action_runner.py", line 182, in run
self.action, agent_daemon_context, self.args
File "/usr/lib/python2.7/site-packages/chroma_agent/plugin_manager.py", line 321, in run
return fn(**args)
File "/usr/lib/python2.7/site-packages/chroma_agent/action_plugins/manage_corosync_common.py", line 38, in get_corosync_autoconfig
ring1 = detect_ring1(ring0, ring1_ipaddr, ring1_prefix)
File "/usr/lib/python2.7/site-packages/chroma_agent/lib/corosync.py", line 139, in detect_ring1
iface.mcastport = find_unused_port(ring0)
File "/usr/lib/python2.7/site-packages/chroma_agent/lib/corosync.py", line 206, in find_unused_port
for dport in set(dports):
File "/usr/lib/python2.7/site-packages/scapy/packet.py", line 1459, in __hash__
raise TypeError('unhashable type: %r' % self.__class__.__name__)
TypeError: unhashable type: 'Ether'
This is in the following code:
iml-agent/chroma_agent/lib/corosync.py
Lines 206 to 211 in 7695c54
A recent change in scapy made subclassess unhashable, meaning we can no longer use a set here.
However, we should be able to remove the set without issue, as any port which has already been removed is passed over.
I meet a certain problem with filesystem detection.
I have a double-controller storage system where block devices either in active or passive state.
Filesystem created on a block device that is available only if its state is 'active'.
Trouble it that your Primary/Failover
node detection is based on a fact that filesystem metadata can at least be read from block devices which isn't my case. However, fail-over information specified in filesystem configuration do contain address of the 'passive' node.
My question that is there is a way to detect filesystems in such configurations with successful determination of a fail-over node?
I don't know if this question is more related to UI or the agent, so feel free to point out it for me.
In:
iml-agent/chroma_agent/action_plugins/settings_management.py
Lines 17 to 25 in 6fda4ed
/etc/iml/profile.conf
will not be created if it's settings equivalent already exists.
This means it will get skipped on an upgrade scenario.
Configure pacemaker using standard resource and fencing agents instead of custom IML ones.
IML is using corosync "rrp_mode: passive" and it means that only one interface is in use and the second one will be used in case of ring0 NIC failure.
According to https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_ch_03_building.html :
`
According to function get_ring0() in lib/corosync.py management interface will be used for ring0:
`
def get_ring0():
# ring0 will always be on the interface used for agent->manager comms
from urlparse import urlparse
server_url = urljoin(os.environ["IML_MANAGER_URL"], "agent")
manager_address = socket.gethostbyname(urlparse(server_url).hostname)
out = AgentShell.try_run(['/sbin/ip', 'route', 'get', manager_address])
match = re.search(r'dev\s+([^\s]+)', out)
if match:
manager_dev = match.groups()[0]
else:
raise RuntimeError("Unable to find ring0 dev in %s" % out)
console_log.info("Chose %s for corosync ring0" % manager_dev)
ring0 = CorosyncRingInterface(manager_dev)
if ring0.ipv4_prefixlen < 9:
raise RuntimeError("%s subnet is too large (/%s)" %
(ring0.name, ring0.ipv4_prefixlen))
return ring0
`
It logically wrong to assign rings (cluster transport level which could send/receive information across all nodes in the cluster) to management network which could be available from the World. Another cons to do that is a fact that Switch L2+/Routers/Firewalls could block multicast/unicast traffic.
In any way we have dedicated interface (back-to-back) which should be used as ring0 and for any inter-cluster communications.
In line with iml-update-check#10.
Because of meta-data incoherence between dnf and yum, we need either use only-yum or pull infrom and synthesize results from both dnf and yum.
Revert 5dbb409 once LU-11461 is fixed for Lustre 2.10/2.12.
Lustre master change: https://review.whamcloud.com/33277
According to official documentation :
Monitoring the fencing devices
Just like any other resource, the stonith class agents also support the monitor operation. Given that we have often seen monitor either not configured or configured in a wrong way, we have decided to devote a section to the matter.
Monitoring stonith resources, which is actually checking status of the corresponding fencing devices, is strongly recommended. So strongly, that we should consider a configuration without it invalid.
Good example of how it works could be taken from "fence-agents-scsi rpm for CentOS v7".
Due to issue 824 we could have a situation when we don't have fencing device or our fencing device is not ready and as a result fencing will not occur never.
Dataset of commands run:
crm_resource -W -r testfs-OST0000_add680: 6
Resource 'testfs-OST0000_add680' not found
Error performing operation: No such device or address
pcs resource create testfs-OST0000_add680-zfs ocf:chroma:ZFS pool=zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1 op start timeout=90 op stop timeout=90 --disabled --group group-testfs-OST0000_add680: 0
pcs resource create testfs-OST0000_add680 ocf:lustre:Lustre target=zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1/testfs-OST0000 mountpoint=/mnt/testfs-OST0000 --disabled --group group-testfs-OST0000_add680: 0
pcs constraint location add testfs-OST0000_add680-primary testfs-OST0000_add680 vm7 20: 1
Error: Resource 'testfs-OST0000_add680' does not exist
corosync.log:
:
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: --- 0.28.1 2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: +++ 0.28.2 (null)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib: @num_updates=2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources: <lrm_resource id="testfs-OST0000_add680" type="Lustre" class="ocf" provider="lustre"/>
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: ++ <lrm_rsc_op id="testfs-OST0000_add680_last_0" operation_key="testfs-OST0000_add680_monitor_0" operation="monitor" crm-debug-origin="do
_update_resource" crm_feature_set="3.0.14" transition-key="2:17:7:43e23f53-fdcf-43f7-8de0-943d43f8ab0a" transition-magic="-1:193;2:17:7:43e23f53-fdcf-43f7-8de0-943d43f8ab0a" exit-reason="" on_node="vm7" call-id="-1" rc-code="193" op-status="-1"
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: ++ </lrm_resource>
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: create lrm_resources
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: cib_devices_update: Updating devices to version 0.28.2
Nov 14 09:43:37 [30372] vm8 stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=vm7/crmd/21, version=0.28.2)
Nov 14 09:43:37 Lustre(testfs-OST0000_add680)[31191]: ERROR: zfs_pool_scsi0QEMU_QEMU_HARDDISK_target1/testfs-OST0000 is not mounted
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: --- 0.28.2 2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: +++ 0.29.0 (null)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: -- /cib/configuration/resources/group[@id='group-testfs-OST0000_add680']/primitive[@id='testfs-OST0000_add680']
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib: @epoch=29, @num_updates=0
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: ++ /cib/configuration/constraints: <rsc_location id="testfs-OST0001_44d8b5-secondary" node="vm7" rsc="testfs-OST0001_44d8b5" score="10"/>
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Completed cib_replace operation for section configuration: OK (rc=0, origin=vm7/cibadmin/2, version=0.29.0)
Nov 14 09:43:37 [30376] vm8 crmd: notice: abort_transition_graph: Transition aborted by deletion of primitive[@id='testfs-OST0000_add680']: Configuration change | cib=0.29.0 source=te_update_diff:456 path=/cib/configuration/resources/group[@id='group-testfs-
OST0000_add680']/primitive[@id='testfs-OST0000_add680'] complete=false
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: stonith_device_remove: Device 'testfs-OST0000_add680' not found (1 active devices)
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: create constraints
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: cib_devices_update: Updating devices to version 0.29.0
Nov 14 09:43:37 [30372] vm8 stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30371] vm8 cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-27.raw
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: --- 0.29.0 2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: +++ 0.29.1 (null)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib: @num_updates=1
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']/lrm_rsc_op[@id='testfs-OST0000_add680_last_0']: @transition-magic=0:7;2:17:7:43e23f53-fdcf-43f7
-8de0-943d43f8ab0a, @call-id=19, @rc-code=7, @op-status=0, @exec-time=21
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=vm7/crmd/22, version=0.29.1)
Nov 14 09:43:37 [30376] vm8 crmd: info: match_graph_event: Action testfs-OST0000_add680_monitor_0 (2) confirmed on vm7 (rc=7)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/crmd/57)
Nov 14 09:43:37 [30376] vm8 crmd: notice: process_lrm_event: Result of probe operation for testfs-OST0000_add680 on vm8: 7 (not running) | call=17 key=testfs-OST0000_add680_monitor_0 confirmed=true cib-update=57
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: modify lrm_rsc_op[@id='testfs-OST0000_add680_last_0']
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: cib_devices_update: Updating devices to version 0.29.1
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: --- 0.29.1 2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: +++ 0.29.2 (null)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib: @num_updates=2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']/lrm_rsc_op[@id='testfs-OST0000_add680_last_0']: @transition-magic=0:7;3:17:7:43e23f53-fdcf-43f7
-8de0-943d43f8ab0a, @call-id=17, @rc-code=7, @op-status=0, @exec-time=28
Nov 14 09:43:37 [30372] vm8 stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30376] vm8 crmd: info: match_graph_event: Action testfs-OST0000_add680_monitor_0 (3) confirmed on vm8 (rc=7)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=vm8/crmd/57, version=0.29.2)
Nov 14 09:43:37 [30376] vm8 crmd: notice: run_graph: Transition 17 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-199.bz2): Complete
Nov 14 09:43:37 [30376] vm8 crmd: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: modify lrm_rsc_op[@id='testfs-OST0000_add680_last_0']
Nov 14 09:43:37 [30372] vm8 stonith-ng: info: cib_devices_update: Updating devices to version 0.29.2
Nov 14 09:43:37 [30372] vm8 stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30375] vm8 pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
Nov 14 09:43:37 [30375] vm8 pengine: info: determine_online_status_fencing: Node vm7 is active
Nov 14 09:43:37 [30375] vm8 pengine: info: determine_online_status: Node vm7 is online
Nov 14 09:43:37 [30375] vm8 pengine: info: determine_online_status_fencing: Node vm8 is active
Nov 14 09:43:37 [30375] vm8 pengine: info: determine_online_status: Node vm8 is online
Nov 14 09:43:37 [30375] vm8 pengine: info: determine_op_status: Operation monitor found resource testfs-OST0000_add680-zfs active on vm7
Nov 14 09:43:37 [30375] vm8 pengine: info: unpack_node_loop: Node 1 is already processed
Nov 14 09:43:37 [30375] vm8 pengine: info: unpack_node_loop: Node 2 is already processed
Nov 14 09:43:37 [30375] vm8 pengine: info: unpack_node_loop: Node 1 is already processed
Nov 14 09:43:37 [30375] vm8 pengine: info: unpack_node_loop: Node 2 is already processed
Nov 14 09:43:37 [30375] vm8 pengine: info: common_print: st-fencing (stonith:fence_chroma): Started vm7
Nov 14 09:43:37 [30375] vm8 pengine: info: common_print: testfs-OST0001_44d8b5 (ocf::lustre:Lustre): Stopped (disabled)
Nov 14 09:43:37 [30375] vm8 pengine: info: group_print: Resource Group: group-testfs-OST0000_add680
Nov 14 09:43:37 [30375] vm8 pengine: info: common_print: testfs-OST0000_add680-zfs (ocf::chroma:ZFS): Stopped (disabled)
Nov 14 09:43:37 [30375] vm8 pengine: notice: DeleteRsc: Removing testfs-OST0000_add680 from vm7
Nov 14 09:43:37 [30375] vm8 pengine: notice: DeleteRsc: Removing testfs-OST0000_add680 from vm8
Nov 14 09:43:37 [30375] vm8 pengine: info: native_color: Resource testfs-OST0001_44d8b5 cannot run anywhere
Nov 14 09:43:37 [30375] vm8 pengine: info: native_color: Resource testfs-OST0000_add680-zfs cannot run anywhere
Nov 14 09:43:37 [30375] vm8 pengine: info: LogActions: Leave st-fencing (Started vm7)
Nov 14 09:43:37 [30375] vm8 pengine: info: LogActions: Leave testfs-OST0001_44d8b5 (Stopped)
Nov 14 09:43:37 [30375] vm8 pengine: info: LogActions: Leave testfs-OST0000_add680-zfs (Stopped)
Nov 14 09:43:37 [30375] vm8 pengine: notice: process_pe_message: Calculated transition 18, saving inputs in /var/lib/pacemaker/pengine/pe-input-200.bz2
Nov 14 09:43:37 [30376] vm8 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Nov 14 09:43:37 [30376] vm8 crmd: info: do_te_invoke: Processing graph 18 (ref=pe_calc-dc-1542188617-41) derived from /var/lib/pacemaker/pengine/pe-input-200.bz2
Nov 14 09:43:37 [30376] vm8 crmd: notice: te_rsc_command: Initiating delete operation testfs-OST0000_add680_delete_0 locally on vm8 | action 3
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Forwarding cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680'] to all (origin=local/crmd/59)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: --- 0.29.2 2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: +++ 0.29.3 (null)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: -- /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='testfs-OST0000_add680']
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: + /cib: @num_updates=3
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680']: OK (rc=0, origin=vm8/crmd/59, version=0.29.2)
Nov 14 09:43:37 [30376] vm8 crmd: info: delete_resource: Removing resource testfs-OST0000_add680 for tengine (internal) on (null)
Nov 14 09:43:37 [30376] vm8 crmd: info: notify_deleted: Notifying tengine on localhost that testfs-OST0000_add680 was deleted
Nov 14 09:43:37 [30371] vm8 cib: info: cib_process_request: Forwarding cib_delete operation for section //node_state[@uname='vm8']//lrm_resource[@id='testfs-OST0000_add680'] to all (origin=local/crmd/60)
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: --- 0.29.2 2
Nov 14 09:43:37 [30371] vm8 cib: info: cib_perform_op: Diff: +++ 0.29.3 (null)
Resource creation and constrain adding is serialized so it's not a race between nodes:
Describe the bug
Disable insecure requests warning is not working. This statement "urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)" in /usr/lib/python2.7/site-packages/chroma_agent/agent_daemon.py has no affect and still outputs following warnings on all lustre servers repeatedly ever few seconds
Warning:
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: InsecureRequestWarning)
To Reproduce
Steps to reproduce the behavior:
Screenshots
pdsh> grep InsecureRequestWarning /var/log/messages | tail -2
mds002: May 31 14:43:26 mds002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
mds002: May 31 14:43:26 mds002 chroma-agent-daemon: InsecureRequestWarning)
mds001: May 31 14:43:28 mds001 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
mds001: May 31 14:43:28 mds001 chroma-agent-daemon: InsecureRequestWarning)
oss007: May 31 14:43:30 oss007 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss007: May 31 14:43:30 oss007 chroma-agent-daemon: InsecureRequestWarning)
oss006: May 31 14:43:27 oss006 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss006: May 31 14:43:27 oss006 chroma-agent-daemon: InsecureRequestWarning)
oss011: May 31 14:43:30 oss011 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss011: May 31 14:43:30 oss011 chroma-agent-daemon: InsecureRequestWarning)
oss005: May 31 14:43:29 oss005 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss005: May 31 14:43:29 oss005 chroma-agent-daemon: InsecureRequestWarning)
oss003: May 31 14:43:31 oss003 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss003: May 31 14:43:31 oss003 chroma-agent-daemon: InsecureRequestWarning)
oss001: May 31 14:43:25 oss001 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss001: May 31 14:43:25 oss001 chroma-agent-daemon: InsecureRequestWarning)
oss008: May 31 14:43:29 oss008 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss008: May 31 14:43:29 oss008 chroma-agent-daemon: InsecureRequestWarning)
oss012: May 31 14:43:30 oss012 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss012: May 31 14:43:30 oss012 chroma-agent-daemon: InsecureRequestWarning)
oss004: May 31 14:43:30 oss004 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss004: May 31 14:43:30 oss004 chroma-agent-daemon: InsecureRequestWarning)
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
oss002: May 31 14:43:30 oss002 chroma-agent-daemon: InsecureRequestWarning)
What version of IML? v5.0
What Operating System? CentOS
What Operating System version? v 7.5
Please let me know if you need additional information. Thank you!!
ldiskfs MMP timeout may be larger than the default 300s timeout of pacemaker.
This should probably be increased.
The Configure NTP plugin is now being implemented in the rust-iml-agent and will be landing in the IML repo: whamcloud/integrated-manager-for-lustre#1200. Once this lands, there is no reason to have the configure_ntp implementation in iml-agent.
In
iml-agent/python-iml-agent.spec
Line 145 in 0840c91
We call preset on chroma-agent.service
.
We also need to call it on iml-storage-server.target
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.