suse-enceladus / cloud-regionsrv-client Goto Github PK

Access the region service to retrieve region local update server information and register the instance

License: GNU Lesser General Public License v3.0

Makefile 0.36% Python 99.45% C 0.19%

cloud-regionsrv-client's Introduction

The region service client obtains cloud specific update server information
from the region service and then uses this information to register the guest
instance with the region local update server.

The server hosts for the region service are configured in the
/etc/regionserverclnt.cfg file.

cloud-regionsrv-client's People

Contributors

Stargazers

Watchers

Forkers

nttc-cis ikapelyukhin bahaa161 rhafer krrenjith moio susegrant grisu48 barbecued thimslugga opensuse-python

cloud-regionsrv-client's Issues

consider instance data caching

Instance data is sent to the repository server with every zypper access via the URL resolver. For everyone of these requests we access the metadata server. We should support a caching mechanism that caches the metadata for a period of time and re-uses it for the zypper requests.

Proxy environment variables not considered if file /etc/sysconfig/proxy does not exist

Due to how cloudregister.registerutils.set_proxy() function is written, if the file /etc/sysconfig/proxy does not exist, the function will return before even looking at environment variables "http_proxy" and "https_proxy".

See https://github.com/SUSE-Enceladus/cloud-regionsrv-client/blob/master/lib/cloudregister/registerutils.py#L845 for the relevant code.

I believe this is an unintended behaviour, and certainly is un-intuitive for user.

Unable to install using Makefile on SLES12sp5

Installing with the top-level Makefile results in the following error:

# make install
error: Too many levels of recursion in macro expansion. It is likely caused by recursive macro declaration.
error: line 124: 
        test -n "$FIRST_ARG" || FIRST_ARG=$1 
        if test "$FIRST_ARG" = "0" ; then 
	   test -f /etc/sysconfig/services && . /etc/sysconfig/services 
           if test "$YAST_IS_RUNNING" != "instsys" -a "$DISABLE_STOP_ON_REMOVAL" != yes ; then 
              for service in 
error: query of specfile cloud-regionsrv-client.spec failed, can't parse
Makefile:10: *** "Version mismatch, will not take any action".  Stop.

System Info

# cat /etc/os-release 
NAME="SLES"
VERSION="12-SP5"
VERSION_ID="12.5"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP5"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp5"

It looks like a minimum reproduction is

$ rpm --eval '%service_del_preun'
error: Too many levels of recursion in macro expansion. It is likely caused by recursive macro declaration.

        test -n "$FIRST_ARG" || FIRST_ARG=$1 
        if test "$FIRST_ARG" = "0" ; then 
	   test -f /etc/sysconfig/services && . /etc/sysconfig/services 
           if test "$YAST_IS_RUNNING" != "instsys" -a "$DISABLE_STOP_ON_REMOVAL" != yes ; then 
              for service in

$ rpm --version
RPM version 4.11.2

Running `containerbuild-regionsrv` throws `AttributeError` on SLES12sp5

When running containerbuild-regionsrv (with systemd service or by hand) results in the following stacktrace:

$ /usr/sbin/containerbuild-regionsrv
Traceback (most recent call last):
  File "/usr/sbin/containerbuild-regionsrv", line 109, in <module>
    main()
  File "/usr/sbin/containerbuild-regionsrv", line 104, in main
    with socketserver.TCPServer((ip, port), ContainerBuildTCPServer) as server:
AttributeError: __exit__

System info:

$ cat /etc/os-release
NAME="SLES"
VERSION="12-SP5"
VERSION_ID="12.5"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP5"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp5"

$ python3 --version
Python 3.4.10

SUSE Azure update failures

We recently upgraded our SUSE SLES 12SP4 azure VMs. Afterwards all additional upgrades failed with the message:

Problem retrieving files from 'SLE-Module-Adv-Systems-Management12-Updates'.
Not all credentials files are eqivalent
Problem retrieving files from 'SLE-Module-Containers12-Updates'.
Not all credentials files are eqivalent

After reversing the issue via strace we discovered that the the susecloud zypper plugin as a apart of the cloud-regionsrv-client is comparing all files in /etc/zypp/credentials.d/ to a base file as a precondition. While this normally works, our VMs also have other private repos with credential files in the same default zypper credential location.

Since these (private) credential files are different than the base SUSE ones all calls to resolve the the SUSE repo URLs ( e.g =plugin:/susecloud? ) via the urlresolver python code fail and none of our repos can refresh anymore.

To try illustrate this I have suggested PR fix for the registerutils module. I'm hoping you will consider this as a mitigation for multiple private repos with different credentials using zypper.

URL resolver plugin dying with a traceback

A excerpt from /var/log/zypper.log on a distro migration helper system:

2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! Traceback (most recent call last):
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !   File "/usr/lib/zypp/plugins/urlresolver/susecloud", line 64, in <module>
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !     plugin.main()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !   File "/usr/lib/python3.6/site-packages/zypp_plugin.py", line 143, in main
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !     self.__collect_frame()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !   File "/usr/lib/python3.6/site-packages/zypp_plugin.py", line 115, in __collect_frame
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !     method(frame.headers, frame.body)
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !   File "/usr/lib/zypp/plugins/urlresolver/susecloud", line 34, in RESOLVEURL
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !     update_server = utils.get_smt()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !   File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 579, in get_smt
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !     replace_hosts_entry(current_smt, server)
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !   File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 907, in replace_hosts_entry
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 !     current_smt_ipv4 = current_smt.get_ipv4()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! AttributeError: 'NoneType' object has no attribute 'get_ipv4'
2019-11-15 11:16:45 <1> localhost(1346) [zypp::plugin++] PluginScript.cc(close):229 Close:PluginScript[1667] /usr/lib/zypp/plugins/urlresolver/susecloud

There's indeed no currentSMTInfo.obj:

localhost:~ # ls -la /var/lib/cloudregister/
total 16
drwxr-xr-x 2 root root 4096 Nov 15 11:16 .
drwxr-xr-x 1 root root  120 Nov 14 22:04 ..
-rw------- 1 root root  219 Nov 15 11:09 availableSMTInfo_1.obj
-rw------- 1 root root  220 Nov 15 11:09 availableSMTInfo_2.obj
-rw------- 1 root root  220 Nov 15 11:09 availableSMTInfo_3.obj

IPv6 support changes broke region servers listed by hostname

Since the addition of IPv6 support back in April, the "regionsrv" value in /etc/regionserverclnt.cfg seems to expect either IPv4 or IPv6 addresses only -- preventing region servers to be listed using host names.

Is this expected?

If expected, do the SSL/TLS certificates have to then be updated to also reference the IP address(es)?

re-registration needs to be triggered

After the server cache is refreshed #14 the client needs to check if the current credentials are valid on the new target server. If not the system needs to be re-registered.

cloud-regionsrv-client conflicts with container-suseconnect

cloud-regionsrv-client tries to create the file /usr/lib/zypp/plugins/urlresolver/susecloud without explicitly conflicting with container-suseconnect or obsoleting it:

cloud-regionsrv-client/cloud-regionsrv-client.spec

Line 170 in b5399b6

%{_usr}/lib/zypp/plugins/urlresolver/susecloud

This causes failures in the SLE BCI images if you try to install the public cloud patterns:

500edf9c12f6:/ # zypper -n in -t pattern Amazon_Web_Services_Instance_Init Amazon_Web_Services_Instance_Tools Amazon_Web_Services_Tools
# snip
File /usr/lib/zypp/plugins/urlresolver/susecloud
  from install of
     cloud-regionsrv-client-10.0.4-150000.6.73.1.noarch (SLE_BCI)
  conflicts with file from package
     container-suseconnect-2.3.0-4.17.1.x86_64 (@System)

Please rename this file or obsolete container-suseconnect.

Registration client should be aware of region changes.

If an instances is registered, but that image moves to a new region, the client should trigger re-registration in order to connect to region-local updates.

dnf support

For dnf suport we need to provide a plugin https://dnf.readthedocs.io/en/latest/api_plugins.html that does client side failover. Then we do not have to change the architecture of the update infrastructure.

guestregister and ca-certificates.service race problem

in some suse distro，guestregister and ca-certificates.service has a race problem like that：https://www.suse.com/support/kb/doc/?id=000021252 ;

I noticed in the new version, there is a commit at 3007f85 that aims to alleviate this race problem.

However, I have a question regarding the import_smtcert_12 function in the file at

cloud-regionsrv-client/lib/cloudregister/registerutils.py

Line 1012 in 3962bc4

def import_smtcert_12(smt):

. When smt.write_cert(key_chain) is executed, it triggers the monitoring of ca-certificates.path, which subsequently invokes the ca-certificates.service to run /usr/sbin/update-ca-certificates for updating the certificates. But then, in the update_ca_chain function, it seems that this update operation is called again, thus causing this issue. I am puzzled as to why this update operation needs to be invoked twice.

Need to add a cleanup option for deprovisioning 'golden images'

Tools like Azure Image Builder run an instance in order to allow customers to simply customize. We need a way for customers or tooling to simply clean up when deprovisioning. This will allow such customized regions to properly register for region-local updates.

Skip explicit cert use

When a region server is set up to use certificates signed by a commonly accepted cert (CA) the client should not insist on having the cert located in a specific location. Rather the client should default to using the system certs infrastructure for validation.

may trigger error condition

Observed on SLES 15 with 8.1.4 and package installed on instance launched from ami-06ea7729e394412c8

registercloudguest --force-new
Could not parse configuration file /usr/lib/zypp/plugins/services/SMT-http_smt-ec2_susecloud_net
File contains no section headers.
file: '/usr/lib/zypp/plugins/services/SMT-http_smt-ec2_susecloud_net', line: 16
'"""This script provides the repositories for on-demand instances."""\n'

Broken registration with manually set registration server IP after reboot

Filing this so that I can investigate later. The instance was registered with --smt-ip flag, after reboot it seems to have cleaned up registration server cache and all zypper repos/services:

2019-11-01 14:14:42,492 INFO:Using API: regionInfo
2019-11-01 14:14:42,618 INFO:Region server arguments: ?regionHint=westus2
2019-11-01 14:14:42,618 INFO:Using region server: 104.45.31.195
2019-11-01 14:14:43,327 INFO:Have extra cached SMT data, clearing cache
2019-11-01 14:14:43,327 INFO:Extra cached server is current registration target, cleaning up registration
2019-11-01 14:14:43,327 INFO:Clean current registration server: ('34.218.245.132', None)

After this, running registercloudguest died with a strack trace:

# registercloudguest 
Traceback (most recent call last):
  File "/usr/sbin/registercloudguest", line 416, in <module>
    utils.clear_new_registration_flag()
  File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 132, in clear_new_registration_flag
    os.unlink(REGISTRATION_DATA_DIR + NEW_REGISTRATION_MARKER)
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/cloudregister/newregistration'

The instance had v9.0.4 of cloud-regionsrv-client, this might not be an issue in the most recent version.

AWS EC2 VMs occasionally fail to register on initial boot

A Longhorn end-to-end test pipeline was previously running using AWS ami-0dd7f664cbd69840b (suse-sles-15-sp4-v20230428-hvm-ssd-x86_64). Terraform was (and is) used to create the VMs and then configure them. When we migrated to AWS ami-04ce03e19570c1618 (suse-sles-15-sp5-v20230719-hvm-ssd-x86_64), we started to experience an issue where occasionally a node would not have any zypper repositories configured by the time our scripts started attempting to install packages. We saw this issue pretty often, though not always. While digging in, we found that this issue was caused by a failure in the registration scripts.

The old (always working) AMI used v10.1.0 of this repo. The new (sometimes failing) AMI uses v10.1.2. It's not clear whether some change in this repo or a larger change between SP4 and SP5 leads to the occasional failure.

Please see longhorn/longhorn#6504 (comment) for logs and command output pulled from two nodes launched by the same Terraform operation. One of them did not fail to register and one did. In that comment, I suspected a temporary network problem (and maybe still do), but the exact mode of failure doesn't make total sense to me. The "bad" node appears to be able to contact and retrieve information from a "region server", but fails to communicate with any SMT. So it seems like the node's network is up and able to communicate externally, but registration still fails.

The workaround for us appears to be executing sudo systemctl restart guestregister when our user scripts start running. So far, this seems to always effectively get us registered so we can start installing packages.