Comments (29)
The traceback looks right to me. :) Just kidding. That definitely looks like
a
problem.
It looks like it's a problem with Python <2.5 (not x86_64). There's a feature
that
psshlib uses that was introduced in Python 2.5, and the workaround I did for
Python
2.4 seems to be broken. I think I have access to a machine with Python 2.4
somewhere, so I think I should be able to test it out there.
Thanks for the report. I'll let you know when I have something to test.
Original comment by [email protected]
on 26 Feb 2010 at 6:27
- Changed state: Started
from parallel-ssh.
Okay. I've made a commit that should fix this crash in Python 2.4. Would you
mind
testing to see if this works for you, too? If it works, I will release a
version
2.1.1. Let me know if you need instructions for cloning the Git repository and
testing. Thanks for your help.
Original comment by [email protected]
on 26 Feb 2010 at 8:10
from parallel-ssh.
Your fix is full of win. Thank you!
$ pssh -i -H localhost date
[1] 19:02:40 [SUCCESS] localhost
Sat Feb 27 19:02:40 UTC 2010
Pete
Original comment by [email protected]
on 27 Feb 2010 at 7:04
from parallel-ssh.
Further issues, probably similar and probably not warranting a separate ticket,
but if
you want me to break it out, I will.
When I try to run with more than one host, I see this on my Macbook:
$ pssh -i -H localhost -H localhost date
[1] 11:30:58 [SUCCESS] localhost
Sat Feb 27 11:30:58 PST 2010
[2] 11:30:58 [SUCCESS] localhost
Sat Feb 27 11:30:58 PST 2010
When I run on Python 2.4 (same system as above):
$ pssh -i -H localhost -H localhost date
Traceback (most recent call last):
File "/usr/bin/pssh", line 5, in ?
pkg_resources.run_script('pssh==2.1', 'pssh')
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-
py2.4.egg/pkg_resources.py", line 489, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-
py2.4.egg/pkg_resources.py", line 1214, in run_script
exec script_code in namespace, namespace
File "/usr/bin/pssh", line 119, in ?
File "/usr/bin/pssh", line 110, in do_pssh
File "build/bdist.linux-x86_64/egg/psshlib/manager.py", line 61, in run
File "build/bdist.linux-x86_64/egg/psshlib/manager.py", line 113, in start_tasks
File "build/bdist.linux-x86_64/egg/psshlib/task.py", line 84, in start
File "/usr/lib64/python2.4/subprocess.py", line 550, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 988, in _execute_child
data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
OSError: [Errno 4] Interrupted system call
Pete
Original comment by [email protected]
on 27 Feb 2010 at 7:33
from parallel-ssh.
It looks like this is a bug in Python that was fixed today in Python 3.1 and
2.6:
http://bugs.python.org/issue1068268
I wonder if there's any way we can work around this.
Original comment by [email protected]
on 1 Mar 2010 at 6:28
from parallel-ssh.
I think I have a workaround for the problem described in comments 4 and 5.
pemerson,
would you please do a git pull again and see if this works for you? Thanks.
Original comment by [email protected]
on 1 Mar 2010 at 8:33
from parallel-ssh.
Original comment by [email protected]
on 1 Mar 2010 at 8:39
- Changed title: pssh broken with Python 2.4
from parallel-ssh.
For me it looks like the first host succeeds, and then the second host is just
hanging.
When I control-c it, I get this:
Traceback (most recent call last):
File "/usr/bin/pssh", line 5, in ?
pkg_resources.run_script('pssh==2.1', 'pssh')
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-
py2.4.egg/pkg_resources.py", line 489, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-
py2.4.egg/pkg_resources.py", line 1214, in run_script
exec script_code in namespace, namespace
File "/usr/bin/pssh", line 119, in ?
File "/usr/bin/pssh", line 110, in do_pssh
File "build/bdist.linux-x86_64/egg/psshlib/manager.py", line 73, in run
File "build/bdist.linux-x86_64/egg/psshlib/manager.py", line 174, in interrupted
File "build/bdist.linux-x86_64/egg/psshlib/task.py", line 111, in interrupted
File "build/bdist.linux-x86_64/egg/psshlib/task.py", line 99, in _kill
OSError: [Errno 3] No such process
Original comment by [email protected]
on 2 Mar 2010 at 4:57
from parallel-ssh.
Issue 17 has been merged into this issue.
Original comment by [email protected]
on 2 Mar 2010 at 6:13
from parallel-ssh.
pemerson, is this with commit "7c6d668" ("work around
http://bugs.python.org/issue1068268")?
I'll keep on looking at it, but I'm not getting any errors when I run the
command you posted in
comment #4. I'll keep on trying to reproduce it, but is there anything you can
think of that
might make it easier for me to reproduce this error? Thanks.
Original comment by [email protected]
on 2 Mar 2010 at 6:20
from parallel-ssh.
pemerson, I just pushed a commit that should stop the "OSError: [Errno 3] No
such
process" error, but the real problem is that it was hanging to begin with. I'm
still
trying to reproduce this hang.
Original comment by [email protected]
on 2 Mar 2010 at 6:29
from parallel-ssh.
This was a nasty problem, but I think I've finally fixed it. Please do a "git
pull",
which should get you commit fe8306c, and let me know if you still see problems.
Thanks.
Original comment by [email protected]
on 2 Mar 2010 at 9:16
from parallel-ssh.
Looks like it's working for me - thanks!
Can you maybe release this as a v2.1.1 when you get a chance?
Original comment by [email protected]
on 2 Mar 2010 at 10:28
from parallel-ssh.
I would love to release this as version 2.1.1, but I'm a little nervous about
doing it
before we hear from pemerson.
Original comment by [email protected]
on 2 Mar 2010 at 10:34
from parallel-ssh.
pemerson, have you had a chance to try out the fix from yesterday? Thanks.
Original comment by [email protected]
on 3 Mar 2010 at 8:47
from parallel-ssh.
So strange, I replied, but it looks like gmail ate the outbound email.
All good here!
I think 12 seconds is far too long for a parallel ssh to two nodes,
but that's probably for a separate thread.
Here's the output:
$ time pssh -i -H localhost -H localhost whoami
[1] 02:39:44 [SUCCESS] localhost
pete
[2] 02:39:45 [SUCCESS] localhost
pete
real 0m12.921s
user 0m10.676s
sys 0m1.402s
Original comment by [email protected]
on 4 Mar 2010 at 6:09
from parallel-ssh.
pemerson, it might be related, so maybe it should still go in this bug report.
Unfortunately, I'm not having much luck reproducing it. On my Python 2.4
system,
pssh does the parallel ssh to two nodes in 0.33 seconds on average. Do you
have any
other information that would help reproduce it? If not, I could whip up a
custom
commit with a bunch of print statements that might be able to give more
information.
I should probably go ahead and release pssh 2.1.1 now, to at least get it
working for
people with Python 2.4, but let's keep on working on your problem in this issue
for
now.
Original comment by [email protected]
on 4 Mar 2010 at 6:52
from parallel-ssh.
Well, it's definitely in the script, as this works with all due speed:
$ cat mypssh
#!/usr/bin/python
import os
os.system("ssh -A localhost whoami")
os.system("ssh -A localhost whoami")
$ time ./mypssh
pete
pete
real 0m1.236s
user 0m0.014s
sys 0m0.021s
Other than that, I'm not sure how I can help, but I'd be glad to run a custom
pssh
when you can add in some debugging / timing statements.
Pete
Original comment by [email protected]
on 4 Mar 2010 at 7:00
from parallel-ssh.
I've released PSSH 2.1.1. At least people with Python 2.4 shouldn't see
crashes
anymore.
pemerson, I just pushed a branch called "issue15". Would you please do a "git
pull;
git checkout issue15" and give me the output? The debugging info is a little
crude,
but if it turns out to be helpful, I might leave it in and add a "--debug"
option or
something.
Original comment by [email protected]
on 4 Mar 2010 at 7:40
from parallel-ssh.
Did you git push issue15?
$ git clone git://aml.cs.byu.edu/pssh.git
Initialized empty Git repository in /home/pete/pssh/.git/
remote: Counting objects: 771, done.
remote: Compressing objects: 100% (423/423), done.
remote: Total 771 (delta 540), reused 452 (delta 323)
Receiving objects: 100% (771/771), 198.62 KiB, done.
Resolving deltas: 100% (540/540), done.
$ cd pssh
$ git checkout issue15
error: pathspec 'issue15' did not match any file(s) known to git.
Original comment by [email protected]
on 4 Mar 2010 at 7:50
from parallel-ssh.
Oops. That should have been "git checkout origin/issue15". Sorry for the
mistake.
Original comment by [email protected]
on 4 Mar 2010 at 7:55
from parallel-ssh.
Ah, well, I'm still a git newb (but liking what I've seen so far)!
$ time pssh -i -H localhost -H localhost whoami
Thu Mar 4 20:04:32 2010 process starting
Thu Mar 4 20:04:38 2010 process started
Thu Mar 4 20:04:38 2010 process starting
Thu Mar 4 20:04:44 2010 process started
Thu Mar 4 20:04:44 2010 task still running
Thu Mar 4 20:04:44 2010 task still running
Thu Mar 4 20:04:44 2010 starting select
Thu Mar 4 20:04:44 2010 select finished
Thu Mar 4 20:04:44 2010 closing stderr
Thu Mar 4 20:04:44 2010 task still running
Thu Mar 4 20:04:44 2010 task still running
Thu Mar 4 20:04:44 2010 starting select
Thu Mar 4 20:04:44 2010 select finished
Thu Mar 4 20:04:44 2010 closing stdout
Thu Mar 4 20:04:44 2010 task finished
[1] 20:04:44 [SUCCESS] localhost
pete
Thu Mar 4 20:04:44 2010 task still running
Thu Mar 4 20:04:44 2010 task still running
Thu Mar 4 20:04:44 2010 starting select
Thu Mar 4 20:04:45 2010 select finished
Thu Mar 4 20:04:45 2010 task still running
Thu Mar 4 20:04:45 2010 starting select
Thu Mar 4 20:04:45 2010 select finished
Thu Mar 4 20:04:45 2010 closing stdout
Thu Mar 4 20:04:45 2010 task still running
Thu Mar 4 20:04:45 2010 starting select
Thu Mar 4 20:04:45 2010 select finished
Thu Mar 4 20:04:45 2010 closing stderr
Thu Mar 4 20:04:45 2010 task still running
Thu Mar 4 20:04:45 2010 starting select
Thu Mar 4 20:04:45 2010 handling sigchld
Thu Mar 4 20:04:45 2010 select interrupted
Thu Mar 4 20:04:45 2010 task finished
[2] 20:04:45 [SUCCESS] localhost
pete
real 0m13.008s
user 0m10.684s
sys 0m1.394s
Original comment by [email protected]
on 4 Mar 2010 at 8:06
from parallel-ssh.
Fascinating. I put a timestamp just before the Popen and just after the Popen
on a
whim. I really didn't think there was a chance that the Popen would actually
be
hanging. I have know idea why the Popen call would hang for 6 seconds. Do you
have
any ideas?
Original comment by [email protected]
on 4 Mar 2010 at 8:39
from parallel-ssh.
This probably isn't relevant, but what do you get if you do this in the Python
interactive interpreter:
os.sysconf("SC_OPEN_MAX")
Original comment by [email protected]
on 4 Mar 2010 at 9:40
from parallel-ssh.
$ python
Python 2.4.3 (#1, Sep 3 2009, 15:37:37)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.sysconf("SC_OPEN_MAX")
1000000
Original comment by [email protected]
on 4 Mar 2010 at 9:44
from parallel-ssh.
What did you do to your system? :) On mine, SC_OPEN_MAX is 4096.
It looks like what's happening is it's taking forever to close all open file
descriptors. In Python 2.6, they added os.closerange to make this more
efficient
when the maximum file descriptor is really high. To improve performance for
older
versions of Python, we could set FD_CLOEXEC with fcntl on all of our file
descriptors. For more information on the problem, see:
http://bugs.python.org/issue1663329
I'll try to see how bad it is to set FD_CLOEXEC as a long-term workaround.
Original comment by [email protected]
on 4 Mar 2010 at 10:11
from parallel-ssh.
Okay, try running the latest master (with the "set FD_CLOEXEC" commit), and see
if
that goes more quickly.
Original comment by [email protected]
on 4 Mar 2010 at 10:30
from parallel-ssh.
Oh, HUGE win. Well done!
$ time pssh -i -H localhost -H localhost whoami
[1] 23:41:41 [SUCCESS] localhost
pete
[2] 23:41:41 [SUCCESS] localhost
pete
real 0m0.895s
user 0m0.075s
sys 0m0.031s
Original comment by [email protected]
on 4 Mar 2010 at 11:43
from parallel-ssh.
I'm glad I could make you happy. :) So why does your system have such a high
maximum
file descriptor number?
Anyway, this fix will show up in version 2.2, which I'm guessing is about a
month
away. One of the main holdups there is man pages; if you want 2.2 to happen
more
quickly, feel free to help with issue #10. :)
Original comment by [email protected]
on 5 Mar 2010 at 3:57
- Changed state: Verified
from parallel-ssh.
Related Issues (20)
- API module to enable easier use of PSSH as a library (patch included)
- TypeError when using Input script
- Erorr Code 255
- Not allowing relative paths makes no sense HOT 3
- pslurp can't use rsync HOT 2
- Patch: pass -o SendEnv in a way that is friendly on Mac OS HOT 1
- How to distribute different files to different hosts? HOT 1
- motd being printed only on debian squeeze boxes HOT 4
- SIGINFO handler gives task status HOT 1
- Manpage name problem HOT 1
- Sudo requires allocation of a real pseudoterminal HOT 6
- Installation errors: "setup.py install" HOT 1
- pslurp multiple files HOT 1
- Bandwith limit with pscp (parallel scp) HOT 4
- Summary of failed and successful execution HOT 2
- Teardown code in test classes is never called
- IPv6 host address processing broken
- Allow '-' as an alias for stdin
- cannot parse for more than one -O/-o options; cannot use -i option
- Crash on Archlinux/Python3.4 when asking for password (Includes one line fix) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parallel-ssh.