suse-enceladus / cloud-regionsrv-client Goto Github PK
View Code? Open in Web Editor NEWAccess the region service to retrieve region local update server information and register the instance
License: GNU Lesser General Public License v3.0
Access the region service to retrieve region local update server information and register the instance
License: GNU Lesser General Public License v3.0
The region service client obtains cloud specific update server information from the region service and then uses this information to register the guest instance with the region local update server. The server hosts for the region service are configured in the /etc/regionserverclnt.cfg file.
Instance data is sent to the repository server with every zypper access via the URL resolver. For everyone of these requests we access the metadata server. We should support a caching mechanism that caches the metadata for a period of time and re-uses it for the zypper requests.
Due to how cloudregister.registerutils.set_proxy() function is written, if the file /etc/sysconfig/proxy does not exist, the function will return before even looking at environment variables "http_proxy" and "https_proxy".
See https://github.com/SUSE-Enceladus/cloud-regionsrv-client/blob/master/lib/cloudregister/registerutils.py#L845 for the relevant code.
I believe this is an unintended behaviour, and certainly is un-intuitive for user.
Installing with the top-level Makefile results in the following error:
# make install
error: Too many levels of recursion in macro expansion. It is likely caused by recursive macro declaration.
error: line 124:
test -n "$FIRST_ARG" || FIRST_ARG=$1
if test "$FIRST_ARG" = "0" ; then
test -f /etc/sysconfig/services && . /etc/sysconfig/services
if test "$YAST_IS_RUNNING" != "instsys" -a "$DISABLE_STOP_ON_REMOVAL" != yes ; then
for service in
error: query of specfile cloud-regionsrv-client.spec failed, can't parse
Makefile:10: *** "Version mismatch, will not take any action". Stop.
System Info
# cat /etc/os-release
NAME="SLES"
VERSION="12-SP5"
VERSION_ID="12.5"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP5"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp5"
It looks like a minimum reproduction is
$ rpm --eval '%service_del_preun'
error: Too many levels of recursion in macro expansion. It is likely caused by recursive macro declaration.
test -n "$FIRST_ARG" || FIRST_ARG=$1
if test "$FIRST_ARG" = "0" ; then
test -f /etc/sysconfig/services && . /etc/sysconfig/services
if test "$YAST_IS_RUNNING" != "instsys" -a "$DISABLE_STOP_ON_REMOVAL" != yes ; then
for service in
$ rpm --version
RPM version 4.11.2
When running containerbuild-regionsrv
(with systemd service or by hand) results in the following stacktrace:
$ /usr/sbin/containerbuild-regionsrv
Traceback (most recent call last):
File "/usr/sbin/containerbuild-regionsrv", line 109, in <module>
main()
File "/usr/sbin/containerbuild-regionsrv", line 104, in main
with socketserver.TCPServer((ip, port), ContainerBuildTCPServer) as server:
AttributeError: __exit__
System info:
$ cat /etc/os-release
NAME="SLES"
VERSION="12-SP5"
VERSION_ID="12.5"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP5"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp5"
$ python3 --version
Python 3.4.10
We recently upgraded our SUSE SLES 12SP4 azure VMs. Afterwards all additional upgrades failed with the message:
Problem retrieving files from 'SLE-Module-Adv-Systems-Management12-Updates'.
Not all credentials files are eqivalent
Problem retrieving files from 'SLE-Module-Containers12-Updates'.
Not all credentials files are eqivalent
After reversing the issue via strace we discovered that the the susecloud
zypper plugin as a apart of the cloud-regionsrv-client
is comparing all files in /etc/zypp/credentials.d/
to a base file as a precondition. While this normally works, our VMs also have other private repos with credential files in the same default zypper credential location.
Since these (private) credential files are different than the base SUSE ones all calls to resolve the the SUSE repo URLs ( e.g =plugin:/susecloud?
) via the urlresolver
python code fail and none of our repos can refresh anymore.
To try illustrate this I have suggested PR fix for the registerutils
module. I'm hoping you will consider this as a mitigation for multiple private repos with different credentials using zypper.
A excerpt from /var/log/zypper.log
on a distro migration helper system:
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! Traceback (most recent call last):
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! File "/usr/lib/zypp/plugins/urlresolver/susecloud", line 64, in <module>
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! plugin.main()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! File "/usr/lib/python3.6/site-packages/zypp_plugin.py", line 143, in main
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! self.__collect_frame()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! File "/usr/lib/python3.6/site-packages/zypp_plugin.py", line 115, in __collect_frame
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! method(frame.headers, frame.body)
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! File "/usr/lib/zypp/plugins/urlresolver/susecloud", line 34, in RESOLVEURL
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! update_server = utils.get_smt()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 579, in get_smt
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! replace_hosts_entry(current_smt, server)
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 907, in replace_hosts_entry
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! current_smt_ipv4 = current_smt.get_ipv4()
2019-11-15 11:16:45 <2> localhost(1346) [PLUGIN] PluginScript.cc(~PluginDumpStderr):75 ! AttributeError: 'NoneType' object has no attribute 'get_ipv4'
2019-11-15 11:16:45 <1> localhost(1346) [zypp::plugin++] PluginScript.cc(close):229 Close:PluginScript[1667] /usr/lib/zypp/plugins/urlresolver/susecloud
There's indeed no currentSMTInfo.obj
:
localhost:~ # ls -la /var/lib/cloudregister/
total 16
drwxr-xr-x 2 root root 4096 Nov 15 11:16 .
drwxr-xr-x 1 root root 120 Nov 14 22:04 ..
-rw------- 1 root root 219 Nov 15 11:09 availableSMTInfo_1.obj
-rw------- 1 root root 220 Nov 15 11:09 availableSMTInfo_2.obj
-rw------- 1 root root 220 Nov 15 11:09 availableSMTInfo_3.obj
Since the addition of IPv6 support back in April, the "regionsrv" value in /etc/regionserverclnt.cfg seems to expect either IPv4 or IPv6 addresses only -- preventing region servers to be listed using host names.
Is this expected?
If expected, do the SSL/TLS certificates have to then be updated to also reference the IP address(es)?
After the server cache is refreshed #14 the client needs to check if the current credentials are valid on the new target server. If not the system needs to be re-registered.
cloud-regionsrv-client tries to create the file /usr/lib/zypp/plugins/urlresolver/susecloud
without explicitly conflicting with container-suseconnect
or obsoleting it:
This causes failures in the SLE BCI images if you try to install the public cloud patterns:
500edf9c12f6:/ # zypper -n in -t pattern Amazon_Web_Services_Instance_Init Amazon_Web_Services_Instance_Tools Amazon_Web_Services_Tools
# snip
File /usr/lib/zypp/plugins/urlresolver/susecloud
from install of
cloud-regionsrv-client-10.0.4-150000.6.73.1.noarch (SLE_BCI)
conflicts with file from package
container-suseconnect-2.3.0-4.17.1.x86_64 (@System)
Please rename this file or obsolete container-suseconnect
.
If an instances is registered, but that image moves to a new region, the client should trigger re-registration in order to connect to region-local updates.
For dnf suport we need to provide a plugin https://dnf.readthedocs.io/en/latest/api_plugins.html that does client side failover. Then we do not have to change the architecture of the update infrastructure.
in some suse distro,guestregister and ca-certificates.service has a race problem like that:https://www.suse.com/support/kb/doc/?id=000021252 ;
I noticed in the new version, there is a commit at 3007f85 that aims to alleviate this race problem.
However, I have a question regarding the import_smtcert_12 function in the file at
. When smt.write_cert(key_chain) is executed, it triggers the monitoring of ca-certificates.path, which subsequently invokes the ca-certificates.service to run /usr/sbin/update-ca-certificates for updating the certificates. But then, in the update_ca_chain function, it seems that this update operation is called again, thus causing this issue. I am puzzled as to why this update operation needs to be invoked twice.Tools like Azure Image Builder run an instance in order to allow customers to simply customize. We need a way for customers or tooling to simply clean up when deprovisioning. This will allow such customized regions to properly register for region-local updates.
When a region server is set up to use certificates signed by a commonly accepted cert (CA) the client should not insist on having the cert located in a specific location. Rather the client should default to using the system certs infrastructure for validation.
Observed on SLES 15 with 8.1.4 and package installed on instance launched from ami-06ea7729e394412c8
registercloudguest --force-new
Could not parse configuration file /usr/lib/zypp/plugins/services/SMT-http_smt-ec2_susecloud_net
File contains no section headers.
file: '/usr/lib/zypp/plugins/services/SMT-http_smt-ec2_susecloud_net', line: 16
'"""This script provides the repositories for on-demand instances."""\n'
Filing this so that I can investigate later. The instance was registered with --smt-ip
flag, after reboot it seems to have cleaned up registration server cache and all zypper repos/services:
2019-11-01 14:14:42,492 INFO:Using API: regionInfo
2019-11-01 14:14:42,618 INFO:Region server arguments: ?regionHint=westus2
2019-11-01 14:14:42,618 INFO:Using region server: 104.45.31.195
2019-11-01 14:14:43,327 INFO:Have extra cached SMT data, clearing cache
2019-11-01 14:14:43,327 INFO:Extra cached server is current registration target, cleaning up registration
2019-11-01 14:14:43,327 INFO:Clean current registration server: ('34.218.245.132', None)
After this, running registercloudguest
died with a strack trace:
# registercloudguest
Traceback (most recent call last):
File "/usr/sbin/registercloudguest", line 416, in <module>
utils.clear_new_registration_flag()
File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 132, in clear_new_registration_flag
os.unlink(REGISTRATION_DATA_DIR + NEW_REGISTRATION_MARKER)
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/cloudregister/newregistration'
The instance had v9.0.4 of cloud-regionsrv-client
, this might not be an issue in the most recent version.
A Longhorn end-to-end test pipeline was previously running using AWS ami-0dd7f664cbd69840b
(suse-sles-15-sp4-v20230428-hvm-ssd-x86_64
). Terraform was (and is) used to create the VMs and then configure them. When we migrated to AWS ami-04ce03e19570c1618
(suse-sles-15-sp5-v20230719-hvm-ssd-x86_64
), we started to experience an issue where occasionally a node would not have any zypper repositories configured by the time our scripts started attempting to install packages. We saw this issue pretty often, though not always. While digging in, we found that this issue was caused by a failure in the registration scripts.
The old (always working) AMI used v10.1.0
of this repo. The new (sometimes failing) AMI uses v10.1.2
. It's not clear whether some change in this repo or a larger change between SP4 and SP5 leads to the occasional failure.
Please see longhorn/longhorn#6504 (comment) for logs and command output pulled from two nodes launched by the same Terraform operation. One of them did not fail to register and one did. In that comment, I suspected a temporary network problem (and maybe still do), but the exact mode of failure doesn't make total sense to me. The "bad" node appears to be able to contact and retrieve information from a "region server", but fails to communicate with any SMT. So it seems like the node's network is up and able to communicate externally, but registration still fails.
The workaround for us appears to be executing sudo systemctl restart guestregister
when our user scripts start running. So far, this seems to always effectively get us registered so we can start installing packages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.