Bug report Required Info: <li

Solitary messages published from ROS 1 publishers that do not latch will sometimes not arrive at the ROS 2 subscriber about ros1_bridge HOT 8 CLOSED

dljsjr commented on September 6, 2024

Solitary messages published from ROS 1 publishers that do not latch will sometimes not arrive at the ROS 2 subscriber

from ros1_bridge.

Comments (8)

dljsjr commented on September 6, 2024 1

That makes sense. Thanks for discussing it with me. I might try to make a pass at the README's later this week after I play around with it a bit more.

from ros1_bridge.

dirk-thomas commented on September 6, 2024

From the description I have my doubts that the single message would be received by any ROS 1 subscriber / node. There is a none-zero time between creating a publisher and the connections between all interested subscribers being established. During that interval "early" published messages are expected to be lost. That is by design in an asynchronous publish / subscribe system (without any kind of caching which the latching does provide). If you can confirm that this is the source of your problem I don't think there is anything in the bridge to improve this behavior.

from ros1_bridge.

dljsjr commented on September 6, 2024

Just to avoid confusion, this is an orthogonal issue to the hard hang issue. We're still investigating that. It was just something we discovered during testing.

The scripts we're using to test (the ones without the latching publishers) are acceptance testing scripts from the University of Edinburgh that have worked fine in a pure ROS 1 environment in the past, they use it to shake out their robot every time they have a maintenance visit from NASA and NASA has used them a bit in the past as well. You can find them here: https://github.com/ipab-slmc/valkyrie_testing_edi. Our API is moving from custom comms w/ optional custom ROS 1 translator to DDS + ROS 2 compliant conventions w/ optional ROS 1 bridge. So we're updating their scripts to use as an acceptance test for the ROS 1 bridge layer. And on the ROS 2 side of things we're using Reliable QoS configurations for Fast-RTPS.

It just seems like it's a regression that we have to make the publisher scripts latch when using the bridge and we didn't before. But I understand that the bridge is fundamentally different than pure ROS 1 so it might be useful to document more formally in the bridge usage instructions instead of "fixing" the "issue" which might not be an issue at all and just a side effect of the technology. It could also be specific to Fast-RTPS but we don't have the ability to change our DDS implementation currently but we're investigating that.

from ros1_bridge.

dirk-thomas commented on September 6, 2024

But I understand that the bridge is fundamentally different than pure ROS 1

The ros1_bridge is a "normal" ROS 1 node polling the master frequently for information about available topics / publishers / subscribers / servers / clients and then creates ROS 1 publishers / subscribers / servers / clients on demand. I would say nothing "fundamentally different" and just "pure ROS 1".

Since the master needs to be polled for information the delay between you publisher getting created and the bridge actually subscribing to the topic will likely be longer than in the case where you already have a subscriber running. I would guess that the increased delay is causing your msg loss.

You either update your code to not rely on the time until the connections are established to be very short or you want to start the bridge with explicitly bridging the topic in question without relying on the polled information from the master.

from ros1_bridge.

dljsjr commented on September 6, 2024

Maybe a better way to phrase it is that I acknowledge using the bridge to connect a ROS 1 publisher to a ROS 2 subscriber is a very different beast than publishing from a ROS 1 node to a ROS 2 subscriber with no middleman.

I guess where I'm coming from is that the way the docs are written seems to imply that the flags for things like --bridge-all-topics should be used sparingly. It mentions that it's useful for things like rqt and listing topics and that it's off by default for efficiency reasons but their is no discussion about the tradeoff when running in the purely dynamic mode. The issue I'm experiencing is admittedly not a bug based on the way you are describing it but it's also not an intuitive behavior to a user that doesn't have a firm understanding of the bridge internals and who would maybe think to try and avoid using the forced bridging flags based on the language in the docs (we spent almost a week of our testing time in TX avoiding using those flags because we thought we "weren't supposed to" use them, and about 60% of our issues went away when we enabled them).

from ros1_bridge.

dirk-thomas commented on September 6, 2024

Using --bridge-all-topics implies a significant overhead since you bridge potentially a lot of message unnecessarily - that is why it is not usually recommended.

I don't think that code which creates a publisher and publishes one message immediately is a common use case. Simply because ROS doesn't guarantee that this works all the time - even without the bridge being involved. Please feel free to update the docs to add a paragraph (or more) about this scenario to guide future readers.

from ros1_bridge.

dljsjr commented on September 6, 2024

I don't have a good intuition for what exactly the overhead is here, is it computational or memory? We haven't noticed any issues when running the bridge with those flags but we have relatively beefy systems.

I'd love to make a PR against the README but I'm far from an expert on how the bridge internals work so I just want to make sure I have a firm grasp on it to make sure we're using it correctly.

from ros1_bridge.

dirk-thomas commented on September 6, 2024

I don't have a good intuition for what exactly the overhead is here, is it computational or memory?

If your ROS graph has many topics / services you are not interested in to bridge all of these will be subscribed to and messages will be sent to the bridge unnecessarily. Depending on the size of your system and the size and frequency of the messages that can pose a significant overhead, e.g. just consider a camera node advertising raw as well as compressed topics. Maybe without the bridge most of them are not even being used so the node doesn't perform any computation for them. With the bridge using --bridge-all-topics there are now subscribers for every topic so the message need to be generated, serialized, transferred, and deserialized just to be thrown away.

from ros1_bridge.

Solitary messages published from ROS 1 publishers that do not latch will sometimes not arrive at the ROS 2 subscriber about ros1_bridge HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent