Giter Site home page Giter Site logo

Comments (6)

EricCousineau-TRI avatar EricCousineau-TRI commented on July 17, 2024 2

Thanks! Yes, that matches what we have experienced.
For point (1), this can be concretely reproduced with something like:

bazel-bin/external/ros2/ros2 daemon stop
env ROS_LOCALHOST_ONLY=0 bazel-bin/external/ros2/ros2 daemon start
# then run normal commands, with implied or explicit ROS_LOCALHOST_ONLY=1, and see the discrepancy appear

Two suggestions for improvement:

  • If it does not exist already, there should be a (or many?) warning(s) about this behavior when mixing invocations; I do not see it presently in docs (link). Basically, ROS_LOCALHOST_ONLY in one bash session could mean nothing depending on how the daemon was launched

  • Ideally, there should be a quick hash based on networking for the daemon; clients should die or complain loudly if their candidate daemon has a different hash. More ideally, there should be a verbose indication of what the differences are (e.g. "These key environment variables are different [...]")

from drake-ros.

sloretz avatar sloretz commented on July 17, 2024

I think the results can be explained by the ros2 CLI reusing a daemon despite a different ROS_LOCALHOST_ONLY setting. I'll look more closely at ros2cli to see if that's the case.

Why does ROS_DOMAIN_ID=0 not work, locally or remotely, for ROS_LOCALHOST_ONLY=0?
This contrasts with documentation that seems to indicate ROS_DOMAIN_ID=0 is valid: https://docs.ros.org/en/humble/Concepts/About-Domain-ID.html#choosing-a-domain-id-short-version

ROS_DOMAIN_ID=0 is definitely valid.

I'm having trouble following the formatting, so I'll try giving the tests numbers. Do I understand correctly that in each test you run oracle_cc on one machine, and ros2 node list on another? Were the tests run in the order listed?

[Test 1] By default, I do not see cross-talk traffic:

What I think is happening:

  • Machine 1: oracle_cc; ROS_DOMAIN_ID=0, ROS_LOCALHOST=1
  • Machine 2: ros2 node list; ROS_DOMAIN_ID=0, ROS_LOCALHOST=1 -> creates a daemon for ROS_DOMAIN_ID=0 with ROS_LOCALHOST_ONLY=1 set

Assuming this test was run first, the referenced commit does set ROS_LOCALHOST_ONLY=1 by default.
Assuming no other daemon was running, the ros2 node list created one with ROS_LOCALHOST_ONLY=1 and ROS_DOMAIN_ID=0. I believe there is one daemon created per ROS_DOMAIN_ID, but from your test results, I suspect that it doesn't create a new daemon when ROS_LOCALHOST_ONLY changes.

[Test 2] Using ROS_LOCALHOST_ONLY=1:

What I think is happening:

  • Machine 1: oracle_cc; ROS_DOMAIN_ID=0, ROS_LOCALHOST=1
  • Machine 2: ros2 node list; ROS_DOMAIN_ID=0, ROS_LOCALHOST=1 -> Uses existing daemon with ROS_DOMAIN_ID=1 and ROS_LOCALHOST_ONLY=1

Same as the first test, the expected result is no inter-machine communication

[Test 3] Using ROS_LOCALHOST_ONLY=0:

What I think is happening:

  • Machine 1: oracle_cc; ROS_DOMAIN_ID=0, ROS_LOCALHOST=0
  • Machine 2: ros2 node list; ROS_DOMAIN_ID=0, ROS_LOCALHOST=0 -> Uses existing daemon with ROS_DOMAIN_ID=1 and ROS_LOCALHOST_ONLY=1

I think the intuitive result would be inter-machine communication happens, but if the ros2 node list command reused the daemon then it would explain why you saw no inter-machine communication.

[Test 4] On top of that, I don't see local traffic when using ROS_LOCALHOST_ONLY=1 with ROS_DOMAIN_ID=0. Unclear why.

I'm not sure what commands were run for this one.

[Test 5] Using ROS_LOCALHOST_ONLY=0: [...] I see cross-talk traffic if ROS_DOMAIN_ID=1 ROS_DOMAIN_ID=1

What I think is happening:

  • Machine 1: oracle_cc; ROS_DOMAIN_ID=0, ROS_LOCALHOST=0
  • Machine 2: ros2 node list; ROS_DOMAIN_ID=0, ROS_LOCALHOST=0 ->Creates new daemon with ROS_DOMAIN_ID=1 and ROS_LOCALHOST_ONLY=0

The ros2 CLI will definitely make a new daemon here, so inter machine communication here is expected.

from drake-ros.

EricCousineau-TRI avatar EricCousineau-TRI commented on July 17, 2024

Ah, one three additional suggestions:

  • Add a ros2 --no-daemon [...] option for additional debugging (beyond stopping / restarting / hash checks), with documentation stating that it will be slower
  • Add a mention of ros2 daemon to output of ros2 doctor --report
  • State the relevant environment variables / process ID / args /etc. when using ros2 daemon status

\cc @cottsay

from drake-ros.

sloretz avatar sloretz commented on July 17, 2024

Add a ros2 --no-daemon [...] option for additional debugging (beyond stopping / restarting / hash checks), with documentation stating that it will be slower

This one exists, but in a different place. Commands that use the daemon offer a --no-daemon option and a --spin-time option which says how many seconds to wait for discovery

ros2 node list --no-daemon --spin-time 1
$ ros2 node list --help
[...]
  --spin-time SPIN_TIME
                        Spin time in seconds to wait for discovery (only applies when not using an already running daemon)
[...]
  --no-daemon           Do not spawn nor use an already running daemon

from drake-ros.

EricCousineau-TRI avatar EricCousineau-TRI commented on July 17, 2024

Gotcha! Is it easy to tell which commands need it? (and how many?)

A (dumb?) suggestion is to hoist the daemon arguments to top-level, even if unused; then users can easily know they're disabling it with an alias / wrapper for ros2.


Side note: Does our rmw_isolation still correctly isolate, even if a daemon is invoked?
Concretely, we do things to DDS that is not expressible by ROS_DOMAIN_ID, but the daemon seems to get located by the domain itself?

from drake-ros.

EricCousineau-TRI avatar EricCousineau-TRI commented on July 17, 2024

Related to #99, it may be good for users of Ubuntu (and other systems of similar config?) to use the startup script as Shane illustrated in eclipse-cyclonedds/cyclonedds#1400

from drake-ros.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.