Giter Site home page Giter Site logo

remoto's Introduction

remoto

A very simplistic remote-command-executor using connections to hosts (ssh, local, containers, and several others are supported) and Python in the remote end.

All the heavy lifting is done by execnet, while this minimal API provides the bare minimum to handle easy logging and connections from the remote end.

remoto is a bit opinionated as it was conceived to replace helpers and remote utilities for ceph-deploy, a tool to run remote commands to configure and setup the distributed file system Ceph. ceph-medic uses remoto as well to inspect Ceph clusters.

Example Usage

The usage aims to be extremely straightforward, with a very minimal set of helpers and utilities for remote processes and logging output.

The most basic example will use the run helper to execute a command on the remote end. It does require a logging object, which needs to be one that, at the very least, has both error and debug. Those are called for stderr and stdout respectively.

This is how it would look with a basic logger passed in:

>>> conn = remoto.Connection('hostname')
>>> run(conn, ['ls', '-a'])
INFO:hostname:Running command: ls -a
DEBUG:hostname:.
DEBUG:hostname:..
DEBUG:hostname:.bash_history
DEBUG:hostname:.bash_logout
DEBUG:hostname:.bash_profile
DEBUG:hostname:.bashrc
DEBUG:hostname:.lesshst
DEBUG:hostname:.pki
DEBUG:hostname:.ssh
DEBUG:hostname:.vim
DEBUG:hostname:.viminfo

The run helper will display the stderr and stdout as ERROR and DEBUG respectively.

For other types of usage (like checking exit status codes, or raising upon them) remoto does provide them too.

Remote Commands

process.run

Calling remote commands can be done in a few different ways. The most simple one is with process.run:

>>> from remoto.process import run
>>> from remoto import connection
>>> Connection = connection.get('ssh')
>>> conn = Connection('myhost')
>>> run(conn, ['whoami'])
INFO:myhost:Running command: whoami
DEBUG:myhost:root

Note however, that you are not capturing results or information from the remote end. The intention here is only to be able to run a command and log its output. It is a fire and forget call.

process.check

This callable, allows the caller to deal with the stderr, stdout and exit code. It returns it in a 3 item tuple:

>>> from remoto.process import check
>>> check(conn, ['ls', '/nonexistent/path'])
([], ['ls: cannot access /nonexistent/path: No such file or directory'], 2)

Note that the stdout and stderr items are returned as lists with the \n characters removed.

This is useful if you need to process the information back locally, as opposed to just firing and forgetting (while logging, like process.run).

Remote Functions

There are two supported ways to execute functions on the remote side. The library that remoto uses to connect (execnet) only supports a few backends natively, and remoto has extended this ability for other backend connections like kubernetes.

The remote function capabilities are provided by LegacyModuleExecute and JsonModuleExecute. By default, both ssh and local connection will use the legacy execution class, and everything else will use the legacy class. The ssh and local connections can still be forced to use the new module execution by setting:

conn.remote_import_system = 'json'

json

The default module for docker, kubernetes, podman, and openshift. It does not require any magic on the module to be executed, however it is worth noting that the library will add the following bit of magic when sending the module to the remote end for execution:

if __name__ == '__main__':
    import json, traceback
    obj = {'return': None, 'exception': None}
    try:
        obj['return'] = function_name(*a)
    except Exception:
        obj['exception'] = traceback.format_exc()
    try:
        print(json.dumps(obj).decode('utf-8'))
    except AttributeError:
        print(json.dumps(obj))

This allows the system to execute function_name (replaced by the real function to be executed with its arguments), grab any results, serialize them with json and send them back for local processing.

If you had a function in a module named foo that looks like this:

import os

def listdir(path):
    return os.listdir(path)

To be able to execute that listdir function remotely you would need to pass the module to the connection object and then call that function:

>>> import foo
>>> conn = Connection('hostname')
>>> remote_foo = conn.import_module(foo)
>>> remote_foo.listdir('.')
['.bash_logout',
 '.profile',
 '.veewee_version',
 '.lesshst',
 'python',
 '.vbox_version',
 'ceph',
 '.cache',
 '.ssh']

Note that functions to be executed remotely cannot accept objects as arguments, just normal Python data structures, like tuples, lists and dictionaries. Also safe to use are ints and strings.

legacy

When using the legacy execution model (the default for local and ssh connections), modules are required to add the following to the end of that module:

if __name__ == '__channelexec__':
    for item in channel:
        channel.send(eval(item))

This piece of code is fully compatible with the json execution model, and would not cause conflicts.

Automatic detection for ssh connections

There is automatic detection for the need to connect remotely (via SSH) or not that it is infered by the hostname of the current host (vs. the host that is connecting to).

If the local host has the same as the remote hostname, a local connection (via Popen) will be opened and that will be used instead of ssh, and avoiding the issues of being able to ssh into the same host.

Automatic detection for using sudo

This magical detection can be enabled by using the detect_sudo flag in the Connection class. It is disabled by default.

When enabled, it will prefix any command with sudo. This is useful for libraries that need super user permissions and want to avoid passing sudo everywhere, which can be non-trivial if dealing with root users that are connecting via SSH.

remoto's People

Contributors

abduhbm avatar alfredodeza avatar andrewschoen avatar cfsnyder avatar chombourger avatar chrisrd avatar hrnciar avatar ktdreyer avatar laoqi avatar matthewoliver avatar oprypin avatar psukys avatar scronkfinkle avatar shaunduncan avatar zmc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

remoto's Issues

import_module doesnt return anything

Hi,
Firstly, thanks for your work on this library.

The method import_module should either return self.remote_module to be refereced by remote_foo or the documentation needs to be modified to access the attributes of the loaded module using conn.remote_module.listdir(), as per your example, without the need of the remote_foo.

local connection is broken

In [5]: from remoto.process import check

In [6]: check(local, ['ls'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-0cba15a27c5a> in <module>
----> 1 check(local, ['ls'])

~/.virtualenvs/e2e/lib/python3.8/site-packages/remoto/process.py in check(conn, command, exit, timeout, **kw)
    173     responsibility to do so.
    174     """
--> 175     command = conn.cmd(command)
    176
    177     stop_on_error = kw.pop('stop_on_error', True)

TypeError: cmd() missing 1 required positional argument: 'cmd'

`process.run` may exit before sending all `stdout`/`stderr` back

This is because remoto is using select and the process.poll() may indicate that the child
process has completed before finishing flushing all the output to be handled by logging.

Running ceph-deploy with a few debug statements in process.run() clearly identifies the issue:

ceph-deploy disk list node2
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy disk list node2
[node2][DEBUG ] connection detected need for sudo
[node2][DEBUG ] connected to host: node2
[node2][DEBUG ] detect platform information from remote host
[node2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
[ceph_deploy.osd][DEBUG ] Listing disks on node2...
[node2][DEBUG ] find the location of an executable
[node2][INFO  ] Running command: sudo /usr/sbin/ceph-disk list
[node2][WARNIN] WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying sgdisk; may not correctly identify ceph volumes with dmcrypt
[node2][DEBUG ] /dev/sda :
[node2][DEBUG ] process.poll() is not None
[node2][DEBUG ] it is actually: 0
[node2][DEBUG ] reads state is: [5]
[node2][DEBUG ] ensure we do not have anything pending
[node2][DEBUG ]  /dev/sda1 other, ext4, mounted on /
[node2][DEBUG ]  /dev/sda2 other, swap
[node2][DEBUG ] /dev/sdb :
[node2][DEBUG ]  /dev/sdb1 ceph data, prepared, unknown cluster 84bd78fc-e2f9-4b4b-a075-0f9157ded94d, osd.0, journal /dev/sdb2
[node2][DEBUG ]  /dev/sdb2 ceph journal, for /dev/sdb1

After getting 0 from process.poll() we still have more output to deal with.

local variable referenced before assignment

args = (), arguments = '', tb_line = ''

    def wrapper(*args):
        arguments = self._convert_args(args)
        if docstring:
            self.logger.debug(docstring)
        self.channel.send("%s(%s)" % (name, arguments))
        try:
            return self.channel.receive()
        except Exception as error:
            # Error will come as a string of a traceback, remove everything
            # up to the actual exception since we do get garbage otherwise
            # that points to non-existent lines in the compiled code
            for tb_line in reversed(str(error).split('\n')):
                if tb_line:
                    exc_line = tb_line
                    break
>           raise RuntimeError(exc_line)
E           UnboundLocalError: local variable 'exc_line' referenced before assignment

.venv/lib/python3.6/site-packages/remoto/connection.py:126: UnboundLocalError

It seems that exc_line is not initiated

upstream (?) release missing fixes

the fix 9f52696 is not in gh/ceph/remoto which is what i was using when i discovered that same bug. that repo doesn't have the fix. should users be using this repo for the most up-to-date version of remoto?

could you add python version flag in class `Connection`

i do some job for ceph-deloy to support archlinux, it seems to work fine.
but the default python version is 3.x in archlinux, so i have to write some code ugly. maybe you can add a param to switch it.
i just change as follow

Connection.py
function: _make_connection_string(self, hostname, _needs_ssh=None, use_sudo=None)
line 57: interpreter = 'sudo python' if use_sudo else 'python'
--->interpreter = 'sudo python2' if use_sudo else 'python2'
line 59: interpreter = 'sudo python' if self.sudo else 'python'
--->interpreter = 'sudo python2' if self.sudo else 'python2'

i hope you can understand my worse english : -๏ผ‰

detect the need for `sudo`

Because the connection initiator (a user on a given machine) might not know what user it will end up being on the remote host.

This can happen if a user configures ssh to use a different user than the current user used for connecting.

0.0.25 on pypi contains pyc files

https://pypi.python.org/packages/source/r/remoto/remoto-0.0.25.tar.gz contains .pyc files, and
https://pypi.python.org/packages/source/r/remoto/remoto-0.0.24.tar.gz does not.

I noticed this because during the RPM build, the test suite fails with odd references to your computer's home directory:

Executing(%check): /bin/sh -e /var/tmp/rpm-tmp.Eitv0l
+ umask 022
+ cd /builddir/build/BUILD
+ cd remoto-0.0.25
+ export REMOTO_NO_VENDOR=1
+ REMOTO_NO_VENDOR=1
++ pwd
+ export PYTHONPATH=/builddir/build/BUILD/remoto-0.0.25
+ PYTHONPATH=/builddir/build/BUILD/remoto-0.0.25
+ py.test-2.7 -v remoto/tests
============================= test session starts ==============================
platform linux2 -- Python 2.7.9 -- py-1.4.26 -- pytest-2.6.4 -- /usr/bin/python
collecting ... collected 0 items / 5 errors

==================================== ERRORS ====================================
_______________ ERROR collecting remoto/tests/test_connection.py _______________
import file mismatch:
imported module 'test_connection' has this __file__ attribute:
  /Users/alfredo/python/remoto/remoto/tests/test_connection.py
which is not the same as the test file we want to collect:
  /builddir/build/BUILD/remoto-0.0.25/remoto/tests/test_connection.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
__________________ ERROR collecting remoto/tests/test_log.py ___________________
import file mismatch:
imported module 'test_log' has this __file__ attribute:
  /Users/alfredo/python/remoto/remoto/tests/test_log.py
which is not the same as the test file we want to collect:
  /builddir/build/BUILD/remoto-0.0.25/remoto/tests/test_log.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
________________ ERROR collecting remoto/tests/test_process.py _________________
import file mismatch:
imported module 'test_process' has this __file__ attribute:
  /Users/alfredo/python/remoto/remoto/tests/test_process.py
which is not the same as the test file we want to collect:
  /builddir/build/BUILD/remoto-0.0.25/remoto/tests/test_process.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
_________________ ERROR collecting remoto/tests/test_rsync.py __________________
import file mismatch:
imported module 'test_rsync' has this __file__ attribute:
  /Users/alfredo/python/remoto/remoto/tests/test_rsync.py
which is not the same as the test file we want to collect:
  /builddir/build/BUILD/remoto-0.0.25/remoto/tests/test_rsync.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
__________________ ERROR collecting remoto/tests/test_util.py __________________
import file mismatch:
imported module 'test_util' has this __file__ attribute:
  /Users/alfredo/python/remoto/remoto/tests/test_util.py
which is not the same as the test file we want to collect:
  /builddir/build/BUILD/remoto-0.0.25/remoto/tests/test_util.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
ERROR: Exception(/home/ktdreyer/fedora-scm/python-remoto/python-remoto-0.0.25-1.fc23.src.rpm) Config(fedora-rawhide-x86_64) 3 minutes 29 seconds
INFO: Results and/or logs in: /home/ktdreyer/fedora-scm/python-remoto/results_python-remoto/0.0.25/1.fc23
INFO: Cleaning up build root ('cleanup_on_failure=True')
Start: clean chroot
Finish: clean chroot
ERROR: Command failed. See logs for output.
 # bash --login -c /usr/bin/rpmbuild -bb --target x86_64 --nodeps  /builddir/build/SPECS/python-remoto.spec 
=========================== 5 error in 0.02 seconds ============================

Running find . -name '*.pyc' -print0 | xargs -0 rm during the RPM's %prep stage fixes this.

Allow remote python interpreter path

Currently, get_python_executable is used to determine the path for python interpreter on remote nodes. This method runs which command to get python3, python2 or python command path on the remote system.

Different systems can have different python environment setups, especially for projects using virtualenvs. JsonModuleExecute can allow passing an option to run python modules using provided python interpreter path on remote nodes.

Something like:

    def __init__(self, conn, module, logger=None, remote_python=None):
        ...
        self.python_executable = remote_python
        ...

Usage of eval vs ast.literal_eval

I've been using some linters for my code base and they report warnings on eval usage and that ast.literal_eval should be a safe replacement. Though, whenever I replace, I receive the following error:

Traceback (most recent call last):
  File "~/.venv/lib/python3.6/site-packages/remoto/connection.py", line 137, in wrapper_continuous
    for item in self.channel:
  File "~/.venv/lib/python3.6/site-packages/execnet/gateway_base.py", line 746, in next
    return self.receive()
  File "~/.venv/lib/python3.6/site-packages/execnet/gateway_base.py", line 737, in receive
    raise self._getremoteerror() or EOFError()
execnet.gateway_base.RemoteError: Traceback (most recent call last):
  File "<string>", line 1072, in executetask
  File "<string>", line 1, in do_exec
  File "<remote exec>", line 156, in <module>
  File "/usr/lib/python3.6/ast.py", line 85, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib/python3.6/ast.py", line 84, in _convert
    raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Call object at 0xb69b4790>


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.venv/lib/python3.6/site-packages/remoto/connection.py", line 151, in wrapper_continuous
    raise RuntimeError(exc_line)
RuntimeError: ValueError: malformed node or string: <_ast.Call object at 0xb69b4790>

be optional about vendoring execnet

execnet needs to be optionally vendored so that upstream distro package maintainers can
package execnet independently.

This will also mean that the imports in remoto need to be imported as a normal dependency, like:

import execnet

As opposed to:

from remoto.lib import execnet

ssh connection exit() leaves behind zombie

Using the simple test script

import remoto
import remoto.process
import time

while True:
    conn = remoto.Connection('root@gnit')
    out, err, code = remoto.process.check(
        conn, ['true'])
    print('out %s err %s code %s' % (out, err, code))
    conn.exit()
    time.sleep(1)

I see zombie children accumulate, one per second:

3875432 pts/1    S+     0:00          \_ /usr/bin/python3 ./test_remoto.py
3875471 pts/1    Z+     0:00              \_ [ssh] <defunct>
3875585 pts/1    Z+     0:00              \_ [ssh] <defunct>
3875719 pts/1    Z+     0:00              \_ [ssh] <defunct>

Probably a wait() call missing somewhere?

Remote Functions: continuous output (generators)

Currently trying to run a function that yields results (is a generator), doesn't work.

Remote function file foo.py:

def func():
    for i in range(100):
        yield i


if __name__ == '__channelexec__':
    for item in channel:
        channel.send(eval(item))

Execution code:

import remoto, logging, foo
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('otherhostname')
conn = remoto.Connection('otherhostname', logger=logger)
remote_foo = conn.import_module(foo)
foo.func()

Received output after executing last line in Execution code:
RuntimeError: DumpError: can't serialize <class 'generator'>

Is there a possible workflow for receiving continuous output from remote function?

respect the true order of stdout and stderr as received from the remote end

Right now process.run will output all stdout first and then stderr which makes it so hard to understand what is going on in remote processes where stdout is mixed with stderr.

execnet does not allow threaded (or class execution) on remote nodes, so select.select() is the only way to accomplish this.

Allow setting environment variables in the `Connection` object

If allow to add/extend environment variables in the Connection class, they can be reused wherever the Popen helpers are called.

This would avoid having to pass in the environment variables to each call that is made.

This is problem is specifically meant to avoid No such file errors on remote hosts when executables cannot be found because the $PATH is wrong.

allow disabling the needs_ssh

So that users can force a local SSH connection. This is necessary for situations where root is needed for a non-sudoer user

release new version to pypi

Mind tagging and pushing a new version? Here's a changelog to get started

0.0.30
------
05-Jul-2016

* Fix test issue with py3
* Remove vendored execnet
* Include tests when building
* Strip carriage-returns from messages in logs

None of the examples from the documentation work

None of the examples from the documentation / README actually work. Apparently, for all of them, there are certain additional steps missing that are not documented.

A single standalone example of how to use remoto to run a remote command, and one to run a single remote function call, would be really, really useful.

better error message when JSON response from remote module is not serializable

remoto relies on serializable responses from remote functions. If the return value is returning non-serializable values, then the error will be a traceback from the JSON module in the remote. This should try/except and attempt to craft a nicer error message if that is the case. Potentially with a str() representation of the object.

This is particularly problematic when trying to serialize bytes.

`process.check` returns a `None` when clients expect a 3 item tuple

When a timeout occurs it returns a None, breakage ensues because the client expects a three item tuple.

[node2][INFO  ] Running command: sudo ceph --cluster=ceph osd stat --format=json
[node2][WARNIN] No data was received after 300 seconds, disconnecting...
Traceback (most recent call last):
  File "/Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy", line 8, in <module>
    load_entry_point('ceph-deploy==1.4.0', 'console_scripts', 'ceph-deploy')()
  File "/Users/alfredo/python/ceph-deploy/ceph_deploy/util/decorators.py", line 62, in newfunc
    return f(*a, **kw)
  File "/Users/alfredo/python/ceph-deploy/ceph_deploy/cli.py", line 147, in main
    return args.func(args)
  File "/Users/alfredo/python/ceph-deploy/ceph_deploy/osd.py", line 532, in osd
    activate(args, cfg)
  File "/Users/alfredo/python/ceph-deploy/ceph_deploy/osd.py", line 338, in activate
    catch_osd_errors(distro.conn, distro.conn.logger, args)
  File "/Users/alfredo/python/ceph-deploy/ceph_deploy/osd.py", line 152, in catch_osd_errors
    status = osd_status_check(conn, args.cluster)
  File "/Users/alfredo/python/ceph-deploy/ceph_deploy/osd.py", line 129, in osd_status_check
    command,
TypeError: 'NoneType' object is not iterable

drop vendored execnet

I ran into this bug in our vendoring logic today:

vendor.py is supposed to clone execnet if there does not already exist a vendored execnet with the right __version__ number.

    if path.exists(vendor_init):
        module_file = open(vendor_init).read()
        metadata = dict(re.findall(r"__([a-z]+)__\s*=\s*['\"]([^'\"]*)['\"]", module_file))
        if metadata.get('version') != version:
            run(['rm', '-rf', vendor_dest])

So we parse execnet's __init__.py into metadata['version'] there, and if it doesn't match the version we expect, we delete the vendor_dest and try cloning from Git.

The problem is that our vendored execnet fork is tagged and vendored here as "1.2post2", but its __version__ variable remains unchanged from upstream: 1.2.0. So 1.2post2 will never equal 1.2.0, and we effectively hit the git clone code path every single time we invoke setup.py for anything. This causes issues when trying to package remoto in buildsystems that disallow internet access.

It would be great to drop the vendoring entirely to avoid this problem.

failure to detect some hosts to prevent local ssh

From a host that has a hostname like vpm179 but it is also vpm179.front.sepia.ceph.com:

In [3]: remoto.connection.needs_ssh('vpm179.front.sepia.ceph.com')
Out[3]: True

In [4]: remoto.connection.needs_ssh('vpm179')
Out[4]: False

Output of remote function return has debug info

I am attempting to use remoto to access a remote function in a module. I am attempting to get it to work via an interactive interpreter:

from remoto import connection
Connection = connection.get('ssh')
conn = Connection('nas01-internal')
conn.remote_import_system = 'json'
import nas_drive_manager
remote_nas_drive_manager = conn.import_module(nas_drive_manager)
remote_nas_drive_manager.get_drive_info('total_current_files', '/mnt/enclosure0/front/column0/drive1')

This is what it returns:

DEBUG:plot01:trying to determine remote python executable with python3
INFO:plot01:Running command: which python3
INFO:plot01:Running command: /usr/bin/python3
83

What I need is to just get the 83 - not sure if I am using it in the wrong way. The python module has no logging enabled.

`stdin.encode` does not properly encode the stdin on python3

Link to issue: https://tracker.ceph.com/issues/51291

I was getting the following error when trying to use Ceph wih the docker container for 15.2.13:

RuntimeError: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.nsevry --config-json -
2021-06-28T12:19:36.468436+0000 mgr.athos2 (mgr.5211678) 1623980 : cephadm [INF] Deploying daemon mds.cephfs.athos2.adupvw on athos2
2021-06-28T12:19:36.504448+0000 mgr.athos2 (mgr.5211678) 1623985 : cephadm [ERR] Traceback (most recent call last):
2021-06-28T12:19:36.504677+0000 mgr.athos2 (mgr.5211678) 1623986 : cephadm [ERR]   File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check
2021-06-28T12:19:36.504883+0000 mgr.athos2 (mgr.5211678) 1623987 : cephadm [ERR]     response = result.receive(timeout)
2021-06-28T12:19:36.505087+0000 mgr.athos2 (mgr.5211678) 1623988 : cephadm [ERR]   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive
2021-06-28T12:19:36.505290+0000 mgr.athos2 (mgr.5211678) 1623989 : cephadm [ERR]     raise self._getremoteerror() or EOFError()
2021-06-28T12:19:36.505525+0000 mgr.athos2 (mgr.5211678) 1623990 : cephadm [ERR] execnet.gateway_base.RemoteError: Traceback (most recent call last):
2021-06-28T12:19:36.505741+0000 mgr.athos2 (mgr.5211678) 1623991 : cephadm [ERR]   File "<string>", line 1088, in executetask
2021-06-28T12:19:36.505951+0000 mgr.athos2 (mgr.5211678) 1623992 : cephadm [ERR]   File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check
2021-06-28T12:19:36.506161+0000 mgr.athos2 (mgr.5211678) 1623993 : cephadm [ERR]   File "/usr/lib/python3.6/subprocess.py", line 863, in communicate
2021-06-28T12:19:36.506370+0000 mgr.athos2 (mgr.5211678) 1623994 : cephadm [ERR]     stdout, stderr = self._communicate(input, endtime, timeout)
2021-06-28T12:19:36.506579+0000 mgr.athos2 (mgr.5211678) 1623995 : cephadm [ERR]   File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate
2021-06-28T12:19:36.506785+0000 mgr.athos2 (mgr.5211678) 1623996 : cephadm [ERR]     input_view = memoryview(self._input)
2021-06-28T12:19:36.506993+0000 mgr.athos2 (mgr.5211678) 1623997 : cephadm [ERR] TypeError: memoryview: a bytes-like object is required, not 'str'
2021-06-28T12:19:36.507202+0000 mgr.athos2 (mgr.5211678) 1623998 : cephadm [ERR] 
2021-06-28T12:19:36.507409+0000 mgr.athos2 (mgr.5211678) 1623999 : cephadm [ERR] 
2021-06-28T12:19:36.508516+0000 mgr.athos2 (mgr.5211678) 1624001 : cephadm [ERR] Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.adupvw --config-json -
Traceback (most recent call last):
  File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check
    response = result.receive(timeout)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive
    raise self._getremoteerror() or EOFError()
execnet.gateway_base.RemoteError: Traceback (most recent call last):
  File "<string>", line 1088, in executetask
  File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check
  File "/usr/lib/python3.6/subprocess.py", line 863, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate
    input_view = memoryview(self._input)
TypeError: memoryview: a bytes-like object is required, not 'str'

I dug into the source code and found this on line 151 of remoto/process.py

  if stdin:
        if not isinstance(stdin, bytes):
            stdin.encode('utf-8', errors='ignore')
        stdout_stream, stderr_stream = process.communicate(stdin)
    else:

In python3.6 (the version shipped with the ceph container) stdin.encode('utf-8', errors='ignore') does not mutate stdin, and instead only returns the modified variable. To verify this, I modified the code like so such that it sets stdin to the encoded value

  if stdin:
        if not isinstance(stdin, bytes):
            stdin = stdin.encode('utf-8', errors='ignore')
        stdout_stream, stderr_stream = process.communicate(stdin)
    else:

Ceph was able to reconfigure and run without issue after this

`JsonModuleExecute` is extremely verbose when executing remote functions

This happens because process.check is used which logs the remote command at info level. The workaround is to set the level higher, but this would cause other useful information to be omitted.

The function docstring is logging at debug level for example. Not a good combination, and this output is overkill:

INFO:ADEZA:Running command: kubectl get pod -l component=api -n current -o name
DEBUG:ADEZA:trying to determine remote python executable with python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c which python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3
INFO:ADEZA:Running command: kubectl exec -i -n current pod/current-anchore-engine-api-554689b88d-khhhv -- /bin/sh -c /usr/bin/python3

Deadlock scenario in process._remote_check due to use of sub-process stdout/stderr pipes and read methods

The use of read methods on subprocess stdin/stderr pipes can cause deadlocks. The following warning is noted in the python docs:

Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

This can happen in process.py._remote_check.

In this particular case, a deadlock occurs when the sub-process fills the OS stderr pipe buffer and is blocked while attempting to continue writing to stderr. In that case, stdout never reaches EOF and _remote_check is blocked trying to read from stdout.

This issue was discovered because it was causing a deadlock while attempting to upgrade a Ceph cluster from 15.2.10 -> 16.2.3. This deadlock brought our whole upgrade process to a halt.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.