Comments (15)
@Jmeyer1292, I'm running into issues with unacknowledged result messages that leave the action client in a WAITING_FOR_RESULT state. Is this similar to the issues you have observed? If it is, what queue sizes for the client/server seemed to fixed it for you?
from actionlib.
@safrimus Frankly, I set the queue sizes to zero so that they buffer indefinitely. It's kind of a heavy weight solution and may not be appropriate for many users but it has helped me.
I added a PR #47 that lets you set these sizes through parameters should you be interested in testing.
A better solution would have us look into the action protocol itself to ensure that the server gets an acknowledgement to any Result types it sends.
from actionlib.
+1 for acknowledgement of Result messages. Especially since it's a required message without which the client is essentially stuck waiting for a message that may have simply been "missed".
from actionlib.
Making queue sizes configurable through parameters #47 seems like a good idea. In the past queue sizes were infinite (at least in the Python implementation). This change would need thorough testing before being integrated.
I agree that implementing a clean acknowledgement of Result messages could prevent this "action client hanging forever" issue, and would be another step forward.
from actionlib.
I can confirm all problems mentioned here, in particular those related to result msgs getting lost, see my post here: answers.ros.org/question/240056/bug-in-ros-extending-simpleactionserver-client-to-allow-rejecting-goals . Configurable queue sizes with a default value of 10 (or better 100) would be great.
One question: when I use the default Ubuntu 16.04 ROS packages, how do I get this update? I guess I've 2 possibilities: a) wait until it is merged in the Ubuntu packages (possibly takes too long), and b) patching the sources in my ROS installation. Am I right? Thanks!
from actionlib.
Update: If I increase the queue sizes to 10, 100, or 1000 (or even 0 -> infinite), and I setup a large number (~35) of requesting clients (on a 1 core VM), messages still get lost (causing my clients to timeout)?! Very strange, need to investigate this further.
Update 2 (edit): Messages still can get lost regardless of the queue size. Any ideas, why this happens?
@Jmeyer1292 Setting the queue sizes to zero fixed your issues completely?
from actionlib.
@CodeFinder2 have you resolved this issue? Any updates? We are experiencing similar message loss regardless of the queue size in the ActionServer. The python client does not experience the same behavior, this only happens with the C++ client.
from actionlib.
@CodeFinder2 @progtologist I had a single requesting client but with many goals. The queue size changes did seem to alleviate my issues for this case. There may be another queue issue or race condition w/ many clients.
from actionlib.
@progtologist Unfortunately, I've not solved it yet. And to me, it seems to be a serious problem in the actionlib. Note that all my posts in this issue were/are related to the C++ interface. I've not used the Python variant.
I also thought about network buffer sizes being too small so that there is actual packet loss (even though it's TCP), see, e. g., http://stackoverflow.com/questions/7865069/how-to-find-the-socket-buffer-size-of-linux. However, I don't really think that this is likely the reason. (I've not yet tried to increase the buffer size.)
I can possibly provide code that allows one to reproduce the issue. However, It also seems to be related to the underlying software/hardware configuration (#cores, #threads, speed, ...), so not sure if it also allows others to reproduce it with my code. If anyone is interessted (and would like to debug / investigate this issue), please let me know.
from actionlib.
Well in our experience, there are some aspects that are affecting this behavior
- Use an async spinner, use as many threads as possible (given the number of processors) (Python does this by default)
- Change the settings of the client/server queues (C++ has different defaults compared to python, why is that really?)
- The problem arises mostly on high latency communications, e.g. over the internet, in a LAN the above two configurations solved any issues whatsoever.
My colleague @vagvaz has had more hands-on experience on this so I am inviting him in the conversation.
from actionlib.
@CodeFinder2 can you share your code if it is not too complex ? I have a similar issue, but with our complete robotic stack, so it is a bit hard to solve. If you have a smaller example leading to the problem, I would take a look if I am able to reproduce it on my hardware.
from actionlib.
@adegroote Here's the code for testing:
- actionlib_client.cpp: https://pastebin.com/bbJAMtyX
- actionlib_server.cpp: https://pastebin.com/ZBwqrib9
- RoundTrip.action: https://pastebin.com/FAXpLyif
- CMakeLists.txt: https://pastebin.com/5Pme9mHh
If it prints something like "Some error during the i-th action processing [...]", that indicates an error we are interested in (although it may 'just' be feedback messages which shouldn't get lost, too). For whatever reason, I am not able to find the other code snippets (also showing dropped goals).
Sorry for the late response!
from actionlib.
Additionally, here's a move_base_msgs.MoveBaseAction
client written in Python which I successfully used to "reproduce" the issue with move_base! For me, it requires to run the script using python move_base_square.py
(along with a running move_base w/ DWA) about 10 times. Most probably, in ~1 out of 10 tries, the scripts sends a goal (cf. "Sending goal {:d}/4: {:s}\n") which is never received by the server (= move_base). I verified this in another terminal using rostopic echo /move_base/goal
: the goal was never received by any of the subscribers. However, when the client times out for the first goal (cf. code), it is able to send the remaining 3 goals successfully (forming a square).
This must be serious issue in the (actionlib?) implementation ...
from actionlib.
@CodeFinder2 I can't seem to be able to reproduce this issue using your code, I've set up my environment as following:
- virtual machine with one core running Ubuntu Xenial and Kinetic
- actionlib both from package and compiled from source
- simulated latency via
tc
(I've tried 100ms and 200ms) - ran for 1000 iterations several times
but (un)fortunately the client always succeeds.
@progtologist @vagvaz were you able to reproduce this issue with @CodeFinder2 's code? What kind of environment have you seen this issue showing up? Do you happen to have a snippet that can show this behavior? Thanks in advance.
from actionlib.
@CodeFinder2 It seems we have experienced similar issue.
Actually we've found a race condition in relation with ros_comm (fixed here ros/ros_comm#1054).
So for one subscriber first message could be lost. This seems to explain that first "result" (or "goal") can be missed as it is single shot message.
Hope this could help you.
from actionlib.
Related Issues (20)
- How to resume a task after canceled a task
- Noetic Python 2 dependency on wxtools HOT 6
- Segmentation Fault when I run client node HOT 2
- `waitForResult` returns before doneCb HOT 1
- [noetic] Where is axclient.py ? HOT 7
- Restarting the action-server during goal execution makes CPP-action-client crash in windows, ROS1 Melodic HOT 1
- race condition in SimpleActionClient HOT 1
- exercise_simple_client test failing on Noetic HOT 1
- wait_for_server can return before result or feedback topic is fully connected
- Implement interfaces for easier mocking
- Suppress debug in stdout HOT 1
- noetic install broken HOT 2
- Broken sending goals from doneCb in noetic HOT 3
- After calling cancel_all_goals() isNewGoalAvailable returns true
- New release for ros noetic? HOT 4
- "getElem() should not see invalid handles" in kinetic HOT 1
- race condition in SimpleActionClient.send_goal
- GoalStatus.to_string(0) in action_client.py result the wrong key attribute
- Lost feedback on python actionclient HOT 2
- Getting an error when I try to open axclient.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from actionlib.