acehoss / rnsh Goto Github PK

rnsh is a command-line utility written in Python that facilitates shell sessions over Reticulum networks and aims to provide a similar experience to SSH.

License: MIT License

Python 100.00%

rnsh's People

Contributors

Stargazers

Watchers

Forkers

simplyequipped markqvist erethon sw3nlab oilcrest

rnsh's Issues

Automated tests for packet retry mechanism

Perhaps use the PipeInterface to create a lossy interface that can drop or mangle traffic?

Getting an asyncio.exceptions.CancelledError when trying to connect to the BBS

When trying to connect to the BBS mentioned in markqvist/Reticulum#231 I'm getting an asyncio.exceptions.CancelledError error. Platform is Debian 11, Python 3.9.2 and latest RNS (0.4.9). I'm connected to both public nodes of the testnet.

rnsh 1490b99d47d3bac32270cfa90f771af8 -vvvvvv
[2023-02-19 22:16:59] [Debug]   Connected to locally available Reticulum instance via: LocalInterface[37428]
[2023-02-19 22:16:59] [Verbose] Configuration loaded from /home/dgrig/.reticulum/config
[2023-02-19 22:16:59] [Verbose] Loaded 310 known destination from storage
[2023-02-19 22:16:59] [Verbose] Loaded Transport Identity from storage
[2023-02-19 22:16:59] [Info]    rnsh.rnsh._initiate_link       Requesting path... [MainThread]
[2023-02-19 22:16:59] [Extra]   Valid announce for <1490b99d47d3bac32270cfa90f771af8> 2 hops away, received via <4e1ac99b1c4a4161cd06bfa04b74754c> on LocalInterface[37428]
[2023-02-19 22:16:59] [Debug]   Destination <1490b99d47d3bac32270cfa90f771af8> is now 2 hops away via <4e1ac99b1c4a4161cd06bfa04b74754c> on LocalInterface[37428]
[2023-02-19 22:16:59] [Debug]   rnsh.rnsh._initiate_link       No link [MainThread]
[2023-02-19 22:16:59] [Extra]   Registering link <0cfd19c5683e3a6aab16356fe2050334>
[2023-02-19 22:16:59] [Debug]   Link request <0cfd19c5683e3a6aab16356fe2050334> sent to <rnsh.default.afd460e2af7939a622e4faf5ab13e842/1490b99d47d3bac32270cfa90f771af8>
[2023-02-19 22:16:59] [Info]    rnsh.rnsh._initiate_link       Establishing link... [MainThread]
[2023-02-19 22:17:00] [Debug]   Path request for <9513e2d9f8ea795cbac394fbbd547ecd> on LocalInterface[37428]
[2023-02-19 22:17:00] [Debug]   Ignoring path request for <9513e2d9f8ea795cbac394fbbd547ecd> on LocalInterface[37428], no path known
[2023-02-19 22:17:00] [Debug]   Path request for <8ed1daeee0d96214a32b1ecad27201e5> on LocalInterface[37428]
[2023-02-19 22:17:00] [Debug]   Ignoring path request for <8ed1daeee0d96214a32b1ecad27201e5> on LocalInterface[37428], no path known
[2023-02-19 22:17:09] [Verbose] Link establishment timed out
[2023-02-19 22:17:09] [Debug]   rnsh.retry.RetryThread         stopping timer thread [MainThread]
Traceback (most recent call last):
  File "/home/dgrig/reticulum2/lib/python3.9/site-packages/rnsh/rnsh.py", line 538, in _rnsh_cli_main
    return_code = await _initiate(
  File "/home/dgrig/reticulum2/lib/python3.9/site-packages/rnsh/rnsh.py", line 335, in _initiate
    await _initiate_link(
  File "/home/dgrig/reticulum2/lib/python3.9/site-packages/rnsh/rnsh.py", line 315, in _initiate_link
    if not await _spin(until=lambda: _link.status == RNS.Link.ACTIVE, timeout=timeout):
  File "/home/dgrig/reticulum2/lib/python3.9/site-packages/rnsh/rnsh.py", line 227, in _spin
    raise asyncio.CancelledError()
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dgrig/reticulum2/bin/rnsh", line 8, in <module>
    sys.exit(rnsh_cli())
  File "/home/dgrig/reticulum2/lib/python3.9/site-packages/rnsh/rnsh.py", line 560, in rnsh_cli
    return_code = asyncio.run(_rnsh_cli_main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
asyncio.exceptions.CancelledError

I haven't looked at the source code yet, but I'm happy to answer any questions until then if it helps debugging this.

Startup exception on Android

On Android, this exception is thrown at program startup, but it still works fine afterwards (at least as initiator):

Traceback (most recent call last):
  File "/data/data/com.termux/files/usr/lib/python3.10/site-packages/psutil/_common.py", line 399, in wrapper
    return cache[key]
KeyError: (('/proc',), frozenset())

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/data/com.termux/files/usr/lib/python3.10/site-packages/psutil/_pslinux.py", line 285, in <module>
    set_scputimes_ntuple("/proc")
  File "/data/data/com.termux/files/usr/lib/python3.10/site-packages/psutil/_common.py", line 401, in wrapper
    ret = cache[key] = fun(*args, **kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.10/site-packages/psutil/_pslinux.py", line 268, in set_scputimes_ntuple
    with open_binary('%s/stat' % procfs_path) as f:
  File "/data/data/com.termux/files/usr/lib/python3.10/site-packages/psutil/_common.py", line 728, in open_binary
    return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
PermissionError: [Errno 13] Permission denied: '/proc/stat'

This looks like a common problem, and a possible solution is to redirect stderr to /dev/null temporarily before the import. It's a little hacky, but just might work.

# set stderr to dev/null
sys.stderr = open(os.devnull, "w")
import psutil

# after importing, set stderr to original 
sys.stderr = sys.__stderr__

From discussion.

Double timestamp when logging to journald

When running under systemd, a timestamp is added to each line by journald in addition to the first one added by the RNS logger.

Dependency compatibility for rns 0.5.1

Hi @acehoss!

I was wondering if you could update the rns dependency for rnsh to rns >= 0.5.0 instead of rns == 0.5.0, as it is now. I just released RNS 0.5.1, and due to the dependency issue, rnsh is giving dependency errors in pip.

Also, when you have time, I sent you a message on matrix :)

Rework protocol handling for better version compatibility

Refactor the protocol handling parts of the application to use an interface that can be implemented by multiple protocol versions. This way, when new features are added to the protocol, older versions can interoperate for a longer time.

Interrupt does not work if the channel is saturated

I was testing running i=0; while [ $i -lt 1000000 ]; do i=$((i+1)); printf "%032d\n" $i; done in a shell session over rnsh, and I could not terminate it with Control-C. Over ssh the termination works fine.

Listener stops responding to new links after some time

Without fail, the listener stops responding to new links after an hour or so. I'm not sure if the time is specific or random, but it seems to always happen. This is the same issue that caused #9.

Start/Restart the listener
Connect and it works fine
Disconnect and leave it idle for least 30 minutes
Can't connect anymore

Remove service name from aspects

Since the service name is included in the aspects, the initiator is required to provide it along with the destination hash to recreate the destination object. This adds unnecessary complexity on the initiator end, since the aspects change the destination hash value. If the destination hash stayed the same for all services using the same identity file, that would make more sense.

Instead, the service name could be provided only on the listener end and used to determine which identity to use. If an identity for that service doesn't exist yet, it will be created. On the listener then, either the identity file or the service name should be provided, not both. The initiator would only ever need to supply the destination hash.

The command line would look like this:

    rnsh -l [--config <configfile>] [-i <identityfile> | -s <service_name>] 
         [-v... | -q...] [-b <period>] (-n | -a <identity_hash> [-a <identity_hash>] ...) 
         [-A | -C] [[--] <program> [<arg> …]]
    rnsh [--config <configfile>] [-i <identityfile>] [-v... | -q...] [-N] [-m] 
         [-w <timeout>] <destination_hash> [[--] <program> [<arg> ...]]

Intermittent communications failures, particularly over LoRa

I've seen some intermittent but repeatable failures while testing on LoRa links. Even at higher bandwidths, sessions will spontaneously stop communicating. The exact circumstances are still unclear, but I have a few data points.

911.325 MHz, BW 500 KHz, SF 8, CR 5
Connect an rnsh session
Run top -s 30 (macOS) ortop -d 30 for Linux -- to keep data running over the session
After some amount of time on the order of 5 minutes, no more refreshes will be sent
The listener may show that the process has been terminated and the link has been closed.
The initiator will likely show nothing, even with a couple of -vs
Typing on the initiator session will cause a packet to be sent. This will eventually retry out.
Big frustration At this point, initiating a new session may fail to connect link, or actually start the session and remote process but send no data.
Since the listener sends no data, the initiator tty is not yet in raw mode and a Ctrl-C will terminate the session.
Restarting the listener restores ability to connect.

The issue could be at any layer: rnsh, Reticulum, or RNode. Often it looks like packets aren't being received on one end or the other.

Lack of non-tty mode breaks some scripts

Without a non-tty mode, the remote command's stderr is redirected to stdout, and this causes issues--particularly with binary streams. This is why the file transfer with tar over a pipe doesn't work--there is some stray information printed to stderr at the beginning of the transfer that mangles the tar stream. A forced non-tty mode is needed, like ssh -T that causes stderr to be sent as a separate stream.

On the remote end, this would be implemented by opening three pipes with os.pipe(…), calling os.fork(), and then connecting the pipes to the correct file descriptors with os.dup2(...).

The current implementation using pty.fork() still needs to be used for tty mode, otherwise the remote command doesn't think it is connected to a tty and won't provide interactive features.

"Unhandled exception: Path not found" when connecting through hops

Hi, and thanks for writing this tool, first.
I am testing it in different configurations, and it works when the peers are directly visible on the same local network, but I have this issue when connected via at least one hop.
The listener waits, and the initiator returns
Unhandled exception: Path not found

In the same configuration, the peers can see each other (tested with nomadnet), and a link can successfully be created using the reticulum utility
rnx

For this specific case, reticulum v 0.5.5 and rnsh v 0.1.1 are used.
Linstener on Pizero2W Raspbian GNU/Linux 10 (buster) (32 Bit)
Initiator MacOs.

Any hints to solve the issue would be appreciated.

Linux listener hangs on initiator exit

On Linux there is a hang when exiting the remote session, and the listener spins one cpu at 100% when the initiator exits. It hangs for a long time, but does exit eventually.

From discussion.

Listener: new protocol does not handle no-auth connections correctly

Even with the --no-auth option set on the listener, a session still follows the same state sequence, starting at LSSTATE_WAIT_IDENT after link establishment, rather than advancing directly to LSSTATE_WAIT_VERS. The initiator_identified callback needs a tweak as well then to handle an identification event even if it was not necessary.

The initiator identifies by default, unless the -N option is specified, so unless a user explicitly opts out, the protocol will not break for them.

Obtrusive error message on LoRa sessions

LoRa connections, I'm seeing frequent errors like this:

[2023-05-12 10:33:31] [Error]   Decryption failed on link <e1b885eb922404c3882778f70dd1ed47>. The contained exception was: Fernet token HMAC was invalid
[2023-05-12 10:33:31] [Error]   An error ocurred while receiving data on <RNS.Channel.Channel object at 0x103ab1dd0>. The contained exception was: 'NoneType' object is not subscriptable

The data around it seems to be correct--I ran ls /bin over LoRa and that message appeared several times, but it appears that no text was lost. But the error message interrupts the flow of text, and especially in TUI apps really messes up the formatting.

The first error comes from RNS and will probably need an override or option there to suppress that message or reduce its error level.

The second error looks like a channel callback is still called but with a None value rather than data and this is not handled correctly.

Sliding window acknowledgements

As link RTT increases, the throughput of the terminal session decreases. This is because currently only one outstanding packet is allowed on the link at a time. The current packet needs to travel to the recipient and the proof packet be received back before the next packet can be sent.

In the rnsh-bbs listener debug logs, it's clear that this is at least one of, if not the primary bottleneck for throughput. The pending message found message in the logs shows where traffic is being held up.

Some amount of back pressure on each end is good--waiting for more data before sending helps increase packet size to improve link efficiency. But if the packets are already maxed out, then it's only increasing latency.

Linux will show docopts but silently exits when real options specified

After installing the dev build from wheel, rnsh just returns to the command line if a valid set of options is specified, even just -p. Adding verbose flags doesn't change anything.

OS: Ubuntu 20.04.4 LTS x86_64 
Host: SEi TBD by 
Kernel: 5.15.0-58-generic 
Uptime: 28 days, 19 hours, 8 mins 
Packages: 2066 (dpkg), 9 (snap) 
Shell: bash 5.0.17 
Resolution: 1280x720 
Terminal: /dev/pts/3 
CPU: Intel i5-8279U (8) @ 4.100GHz 
GPU: Intel Iris Plus Graphics 655 
Memory: 7803MiB / 15841MiB

Data corruption

The latest changes in RNS have fixed the issues with channels and listeners unresponsive. I ran top with rnsh overnight over both ethernet and LoRa and the sessions are still running this morning--something that was not possible on the previous version.

However, I am noticing some subtle problems. In the session running over ethernet, I'm noticing text out of place after a minute or so. I simultaneously ran a session over SSH and it had no problems. (Screenshots at the bottom)

It seems like macOS top doesn't clear the terminal or the line, but rather only updates characters when they change by positioning the cursor and printing the updated text. This means if a data message is lost or duplicated, it could result in this garbling (particularly if relative cursor positioning is used). This could be a bug introduced by the recent changes in RNS, but I think I may have seen issues like this before that update.

SSH:

rnsh - notice the garbled first few lines compared to the SSH session:

De-duplicate packets

First I need to switch to retrying with Packet.resend() instead of creating a new packet when a timeout occurs. I assume that RNS will automatically de-duplicate packets that use resend.

Whatever the case, the retry mechanism I'm using currently creates a new packet after the timeout, but this has occasional issues with the original packet eventually arriving and causing spurious data in the stream or breaking the protocol.

And then once the Buffer API is done, all this changes (for the better!)