Comments (79)
@exarkun set owner to @itamarst |
---|
- The
twisted/test/test_tcp_internals.py
will fail to set the new rlimit if the number of open files is close to (within 10) the open file limit. It would also be nice if this change were made in a separate ticket/branch. - The
twisted/test/test_ssl.py
change seems fairly unrelated. It would be nice if it were made in a separate ticket/branch. - The
twisted/internet/iocpreactor/abstract.py
still needs a test. Maybe if the ratio of test difficulty to "fix" difficulty is so high, the fix can be made before the test - but that still involves understanding the case it fixes and being confident that it really works. Either way, it would be nice if this were done in a separate ticket/branch. lambda foo: bar(foo)
is an antipattern. Intwisted/internet/test/test_tls.py
, inAbortSSLConnectionTest.buildReactor
, just usecooperator.cooperate
as the patch value.- In
twisted/internet/test/test_tcp.py
, in theminimizeTCPBuffers
docstring, likely makes me uncomfortable. Are the tests that use this helper non-deterministic? And is the IOCP case going to be more likely to fail (or pass?) spuriously due to its lack of buffer size tweaks? - In
twisted/internet/test/test_tcp.py
, the trick with Xs and Ys used byReadAbortServerProtocol
andAbortingClient
could probably be explained. - Typo in
StreamingProducerClient.pauseProducing
's docstring - Wwe.
1. Also, the behavior thecallLater(0.05)
is working around seems like a bug in Twisted. The reactor should really be moving bytes on to the next level more aggressively (I have some numbers that show this is a performance win too). Separate ticket, of course. And then the comment above the 0.05 can link to that ticket, and we can get rid of the delayed call when that ticket is resolved. - The SSL handling in
runAbortTest
is sort of inverted. It would be better to let a subclass override something to define any extra exceptions. This can be fixed later though (it's a nice, easy refactoring ticket for some newbie). - in r32606, was it the addition of the loop, or the corrected handling of the
paused
attribute that fixed the tests with a "modern version of pyOpenSSL"? - TLS
test_fullWriteBufferAfterByteExchange
is still pretty slow. It does 300 writes now instead of 100, and writes 90MB instead of the 30MB it wrote last time. I really want all oftwisted.internet.test.test_tls
to take under a second. In my ideal universe, it also wouldn't take 90MB of data to test basic transport features.
Thanks.
from twisted.
@exarkun set owner to @exarkun |
---|
from twisted.
@jyknight set owner to @jyknight |
---|
from twisted.
Automation removed owner |
---|
from twisted.
@exarkun set owner to @itamarst |
---|
- For TLS,
test_fullWriteBufferAfterByteExchange
takes foreeeeeeeeeeeeeeever. I would really like to not add half a dozen new tests that take 45 seconds of 100% CPU each. The branch'stwisted.internet.test.test_tls
takes longer than the whole trunk test suite. I don't really understand why this is. It happens with_newtls
but not_oldtls
. Does that mean it's caused by the producer bugs in_newtls
? But then why does it peg the CPU instead of idle for a while? - The skip logic for
ReactorBuilder
tests is inbuildReactor
.AbortSSLConnectionTest
uses SSL APIs before it callsbuildReactor
though. - Lots of trailing whitespace to clean up
- You should merge forward again:
1. pick up the Failure fix so 'trial --debug' works
1. there are some new tests in trunk that fail when this branch is merged into it,twisted.internet.test.test_tcp.TCPPortTestsBuilder.test_serverGetHostOnIPv4
. I expect this is straightforward to fix, justclient.getpeername()
earlier. - Might the previous point suggest the cause of the failing tests on Windows? They mainly appear to be failing with unexpected
ConnectionLost
failures. It could be that whereas previously the connections were not shut down hard enough to for the client to notice (or to notice quickly), now they are and the tests don't account for this. Why it only happens on Windows, I don't know, I can only conjecture that it is similar in cause to all of the other issues in which Windows behaves differently from other platforms. classImplements
intwisted/internet/tcp.py
is imported but unused- It would be nice to use endpoints instead of
ClientCreator
in all new code, I think. - The
if self.disconnected:
check inreadConnectionLost
intwisted/internet/tcp.py
doesn't appear to be executed by the test suite - Any reason to directly compare
reason.value.__class__
inconnectionLost
intwisted/internet/tcp.py
andtwisted/internet/iocpreactor/tcp.py
instead of usingFailure.check
? - in
twisted/internet/_oldtls.py
, the modified conditionals instopReading
andstopWriting
are only half tested. The test suite doesn't exercise the don't-call-self._base
-method case of either one. Maybe I'm not really worried about this, but I'd like to at least know which case the newif self.disconnected
check is handling there. Add a comment or something? - In
runAbortTest
, I think rather thandeferLater
, it might be better to just verify there are no extra event sources left in the reactor and then finish right away.
Each of the actual new unit tests probably need more consideration than I have given them. Broadly the idea seems correct, but I don't really understand them completely yet. Hopefully this is enough of a review for you to make some further progress though!
from twisted.
@itamarst removed owner |
---|
Anyone can review!
from twisted.
@itamarst set owner to PenguinOfDoom |
---|
Pavel, could you take a look at the IOCP code and see if it makes sense to you? Thanks!
from twisted.
@itamarst removed owner |
---|
from twisted.
@exarkun set owner to @itamarst |
---|
- The buildbot failures look pretty unrelated to me, but the rhel/fedora failures on gtk may mean that these new tests push us over some limit (remember, gtk leaks file descriptors like crazy, probably at least 1 per reactor).
- The howto says that
connectionLost
will be invoked re-entrantly, but it looks like the implementation goes for non-re-entrant. - in
_SocketCloser._closeSocket
, I'm not sure how accurate the last bit of the comment is. I think SO_LINGER affects shutdown the same way it affects close. It seems fine to skip the shutdown in the non-orderly case, but if it's not actually required then the comment shouldn't say it is. - Oh no,
Connection._aborting
. This absolutely must to be the last ad hoc state attribute we add to transports. After this, anything new needs to be added to an explicit state machine. For now, please add documentation for the attribute. - Oh no,
hasattr
inConnection.connectionLost
. Sure would be nice to have an explicit state machine. :) - the
abortConnection
in twisted/internet/tcp.py is almost identical totwisted/internet/iocpreactor/tcp.py
. They probably should be identical, and it would be nice to avoid the duplication. - In
twisted/internet/test/test_tcp.py
,minimizeTCPBuffers
docstring is misformatted. Also, it'd be nice to know why this is a useful thing to do, either with more docstring or comments at the call sites. - Also, it has an
XXX
in it. StreamingProducerClient.write
seems to provokepauseProducing
after just a single write. Does it need to do 99 more? Could it early out afterpauseProducing
? And should it be upset if it gets through all the writes without being paused?- Why
0.05
inStreamingProducerClient.pauseProducing
? - in
twisted/internet/test/test_tls.py
,ContextGeneratingMixin
should be new-style. - And it looks like
AbortSSLConnectionTest.buildReactor
could useself.patch
to be a little simpler. - The news fragment could talk about TCP and SSL I guess, since SSL seems to support this feature as well. Also that comma in the middle is gross. :) ... abortConnection() which, unlike loseConnection(), always ...
- Which test exercises the
self.stopWriting()
fix in twisted/internet/iocpreactor/abstract.py (line ~365)? - I have some reservations about the
ReadAbortServerProtocol
-based tests. These are providing a pretty strong (but fuzzy) guarantee - no bytes written in the same reactor iteration as anabortConnection
call will be sent. If we wanted to switch back to synchronous write attempts, these tests will become unhappy. I don't know what to do about this, maybe nothing. - The docstrings for the multi-
abortConnection
-call test methods confused me into thinking that multiple calls toloseConnection
would raise an exception. - All the
runAbortTest
-based tests are great, as far as readability goes. It took a couple minutes to figure out whatrunAbortTest
does, but then the individual test methods were a breeze. - Some of their docstrings could be expanded though. dataReceived calls abortConnection(), and then throws exception. - yea, then what?
from twisted.
@itamarst removed owner |
---|
Added @SInCE, ready for review:
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-10
from twisted.
@exarkun set owner to @itamarst |
---|
Okay. I think this is good to merge. Thanks!
from twisted.
@itamarst set status to closed |
---|
(In [32851]) Merge abortConnection-78-10.
Author: itamar, jknight, exarkun, ivank, glyph
Review: exarkun, thijs
Fixes: #7153
Add abortConnection() method to TCP and TLS transports.
from twisted.
@exarkun commented |
---|
Mechanism separate from policy.
from twisted.
@glyph commented |
---|
I assert that GTK does not leak file descriptors when simply running the reactor, and I have attached a file which I believe proves it. Perhaps this isn't true of our rhel6 buildbot, so someone can run this file there and let me know if there's a problem. However, I've run this program on every Ubuntu release back to Hardy and none of them exhibit the described behavior.
from twisted.
@itamarst commented |
---|
Re #11338... not really sure what you're suggesting. None of the new tests open many connections...
Re #14, quite possibly there is no test covering it. But I've written enough convoluted tests. If you really feel this is a problem, I can revert that fix :)
All other comments have been addressed. I just have to figure out why merging forward breaks a test.
from twisted.
@itamarst commented |
---|
#!html
<pre>
Sounds good. Maybe with a default timeout? Which should
probably be high enough that existing code won't be affected
by slow connections (and for producers this is especially an
issue, though I'm not sure if we have any code that depends
on loseConnection's producer interaction).
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
I bet that our existing code deals with
loseConnection+Producers differently in different reactors.
That should be on our list of things to clean up when we
revamp the Producer API.
</pre>
from twisted.
@itamarst commented |
---|
#!html
<pre>
Actually it's in tcp.py (or actually maybe abstract.py), so
the reactor shouldn't matter.
</pre>
from twisted.
@itamarst commented |
---|
#!html
<pre>
Similar to loseConnection(), but doesn't flush the write
buffers or wait for producer to finish. Useful for when you
know connection should die ASAP, because the other side is
totally screwed up anyway.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
I hope you don't object. BUT I DON'T CARE IF YOU DO, HAHA
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
This should probably just be an optional argument to
loseConnection, i.e.
loseConnection(terminateTimeout=600)
A lot of code uses loseConnection() and expects the socket
to be closed eventually, but it is pretty easy to construct
an attack which will cause Twisted to never ever close a
socket this way. By adding an optional argument to this
interface, we can enhance the semantics of the existing
method in a mostly-backwards-compatible way.
</pre>
from twisted.
@itamarst commented |
---|
#!html
<pre>
Hm. Wouldn't loseConnection time out potentially break existing code? Like,
"send really huge file and then loseConnection". Of course, if the default
timeout was large enough it'd mostly be OK...
</pre>
from twisted.
naked commented |
---|
#!html
<pre>
I wrote a patch to this effect, although the method is called abortConnection().
My needs were to explicitly close a connection without waiting for write
availability on the socket/whatever first. Also this means that with SSL, we
don't send or wait for the reply to close notify, which is important.
There is a problem though, that in the normal case, connectionLost() is invoked
from the reactor after doWrite or doRead has returned CONNECTION_LOST or
CONNECTION_DONE. In this case, connectionLost() is called directly as a result
of calling abortConnection(). I do not know enough about twisted internals to
say if this is a serious problem or not, and how it should best be fixed.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
We do have code that depends on that behavior. Quite a bit, actually. I think I prefer
abortConnection(), since terminateTimeout is a convenience and will have all the issues normally
associated with callLater.
</pre>
from twisted.
@jyknight commented |
---|
#!html
<pre>
I think there's two issues here. One is a API to abort the connection. The other is the timeout
issue.
For the write timeout issue, I don't think an absolute timeout on loseConnection is the right
thing. Instead, it should have behavior like: "If there is data in the write buffer, and I have not sent any
bytes for the last minute, the connection is as good as dead." I believe this behavior can and should
default to on without compatibility problems.
</pre>
from twisted.
@jyknight commented |
---|
#!html
<pre>
Here's a different patch. transport.abortConnection() loses the connection and tcp's _closeSocket is
updated to do an abrupt TCP disconnect unless the reason is ConnectionDone. Seems like it works but
mostly untested.
</pre>
from twisted.
foom commented |
---|
(In [17458]) An abortConnection() function for transport, and some tests.
Refs #7153
from twisted.
@glyph commented |
---|
I was not trying to make a point about the validity of this ticket.
I was responding to the comment by kvogt that "This is actually a pretty evil security hole". This ticket isn't going to fix any security hole (or, more accurately, DoS). It may be a necessary prerequisite, but we should probably have a separate issue to deal with the larger problem.
from twisted.
@itamarst commented |
---|
Obviously we should have abortConnection that is separate from loseConnection. The question is, should loseConnection additionally have some sort of a default timeout case where it calls abortConnection. loseConnection essentially is a policy, since it can be implemented in terms of producer API + abortConnection.
from twisted.
kvogt commented |
---|
This is actually a pretty evil security hole. I logged into a server to see why it was out of memory, and found 1434 open connections with no activity. A malicious user was opening hundreds of connections to a twisted server but not reading from them. Since loseConnection() won't close the sockets unless the write buffer is empty, I have no way to fight this attack at the application level.
We have had a twisted IRC server and an entire twisted live video cluster (93 nodes) get destroyed by this exact exploit once a certain hacking group figured it out. In the mean time I'm patching in abortConnection2.diff, but I thought this real world example might be useful information for people trying to fix this.
from twisted.
@itamarst commented |
---|
I'm torn between happiness at live video clustering and sadness about the exploit. Upping priority, changing to defect.
from twisted.
@glyph commented |
---|
I'm a little concerned about the form of the solution here. It means that every protocol that is not specifically designed to be resistant to this particular exploit (in other words, all the protocols in Twisted) will continue to be a problem. Perhaps a better way to deal with the exploit would be to have a default timeout on loseConnection if no data is being consumed by the client? Or an easy way to set up administrative limits on the number of simultaneous connections from a single IP?
I haven't thought too hard about the solution yet, but I definitely don't think we could consider this form of DoS dealt with if you have to hand-patch some code for every port that you want to open.
from twisted.
ivank commented |
---|
Many tests fail with abortConnection-78 when using iocpreactor. One issue is that it doesn't pass in the 'orderly' argument. I think there are other issues too.
from twisted.
@exarkun commented |
---|
Shouldn't we just fix this patch? The ticket is still open, after all...
from twisted.
@itamarst commented |
---|
Could you open a ticket for the windows connectionlost/connectiondone issue you discovered? With a failing test or minimal example if at all possible. Thanks!
from twisted.
ivank commented |
---|
The patch causes almost all SSL-related tests to fail on Windows, even with the select reactor. If I discover why, I will post here.
from twisted.
ivank commented |
---|
By SSL-related, I meant anything that uses SSL. Not SSL tests in particular.
from twisted.
ivank commented |
---|
Why the tests fail on Windows:
On Windows, twisted.internet.tcp.Connection.connectionLost
is incorrectly called with reason
ConnectionLost
even for "normal" disconnects.
[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
Sometimes you see ConnectionDone
, but most of the time it is the incorrect ConnectionLost
. It "should" be called with reason
ConnectionDone
(the correct behavior, which you see on Linux)
This incorrect reason
causes a disorderly _closeSocket
on almost every SSL connection, which causes data loss and test failures.
Hopefully someone who knows how PyOpenSSL and OpenSSL work will comment :-)
from twisted.
@itamarst commented |
---|
Oh, I hadn't realized the bug was in the patch. Good plan!
from twisted.
truekonrads commented |
---|
Lo and behold!
I have the connectionLost example nailed down:
from twisted.internet import win32eventreactor
win32eventreactor.install()
from twisted.internet import reactor
from twisted.protocols.policies import TrafficLoggingFactory
from twisted.web.client import HTTPClientFactory,_parse
from twisted.python.util import println
from twisted.internet.defer import DeferredList
from OpenSSL import SSL
class NoVerifyClientContextFactory:
"""A context factory for SSL clients."""
isClient = 1
method = SSL.SSLv3_METHOD
def getContext(self):
def x(*args):
return True
ctx=SSL.Context(self.method)
#print dir(ctx)
ctx.set_verify(SSL.VERIFY_NONE,x)
return ctx
url="https://online.lkb.lv/"
httpfactory=HTTPClientFactory(url)
dl=[]
for i in xrange(0,10):
print i
factory = TrafficLoggingFactory(httpfactory,"C:\\foo%i" %i)
dl.append(httpfactory.deferred)
pscheme,host, port,ppath = _parse(url)
reactor.connectSSL(host, port, factory,NoVerifyClientContextFactory())
reactor.run()
DeferredList(dl).addCallback(lambda _:reactor.stop())
Requirmentes are win32eventreactor and SSL connections!
from twisted.
ivank commented |
---|
I've been looking at this again. I noticed that abortConnection
in abstract.py
does a very wrong thing -- it calls connectionLost
synchronously, leading to some really broken behavior where a server calling abortConnection
would have connectionLost
called twice.
The 20091225 patch makes abortConnection
empty the buffers and go through the typical loseConnection
codepath, but with changes to return the correct error reason in _postLoseConnection
. Hopefully someone knows if these changes are on the right track.
Like the previous patches, this patch patch breaks the IOCP reactor, and still has the OpenSSL problem mentioned above (so it breaks many tests that use SSL on Windows). It also needs a test for a server aborting a client, not just a client aborting. There's also likely an untested codepath in _sendCloseAlert
, but it is beyond my abilities to describe.
from twisted.
ivank commented |
---|
Not below, above.
from twisted.
ivank commented |
---|
Here is why I suspected there is an untested codepath in _sendCloseAlert
: before I changed this in _closeWriteConnection
, all of the tests still passed:
- if result is main.CONNECTION_DONE:
+ if shouldClose:
from twisted.
@glyph commented |
---|
(In [28708]) Branching to 'abortConnection-78-2'
from twisted.
@glyph commented |
---|
(In [28709]) Pull the changes forward and make the changes slight less awful re #7153
from twisted.
@exarkun commented |
---|
(In [29980]) Branching to 'abortConnection-78-3'
from twisted.
ivank commented |
---|
Given the SSL-related issues here, this will be a lot easier after #4854 is done.
from twisted.
naked commented |
---|
Sorry I can't test this more fully, but some just a simple comment:
Do verify with wireshark that the connection is really in the state you expect it to be in. Basically, what it should be in is a state where the the TCP window advertised by the recipient is zero, and the sender is periodically testing whether the window is still zero. The timeout is doubled on each try, so it might take a long time for you to see the retransmits.
So, tcpdump or wireshark, and the response packets from server should end up looking like this:
10:40:07.413776 IP localhost.14141 > localhost.41522: Flags [.], ack 1, win 0, options [nop,nop,TS val 735552 ecr 707404], length 0
The important parts are: ACK packet, window size 0.
When the connection is in a stable state like this, the normal loseConnection() should be called.
from twisted.
@itamarst commented |
---|
So, I wrote some tests, and read some code. What I learned:
-
There already exists code that calls FileDescriptor.connectionLost directly in a code path that may be reentrant: loseConnection if you've previously done loseWriteConnection. It wouldn't be reentrant with doWrite though.
-
I wrote a test that simulates the likely use case of "abortConnection if other sides stop reading what I'm writing". It works both with the "call FileDescriptor.connectionLost immediately" implementation, and, contrary to my expectations, with another implementation based on loseConnection.
-
Writing tests for this is hard.
Item 2 has me very confused. Anyone care to take a look and see what exactly is wrong with my code and/or conceptual model? I would think the file descriptor would never be writeable if the other side stopped reading, thus doWrite would never be called, and yet it is.
from twisted.
naked commented |
---|
I think sending RST is somewhat a separate issue.
The main issue is freeing resources for the userland process. Just calling close() on the socket achieves this well enough (as long as no pending writes are waited and no SSL close is waited). This looks like an orderly shutdown to the remote end (unless there's SSL) in most cases. The exception to this is if the other end sends a data packet after we have called close() - in this case, the kernel will send a RST packet to the connection. (This could be avoided by never calling close(), but by calling shutdown() to close the sending end of the connection only.)
However, if the connections are indeed malicious, it might be that the wish is to free up resources for the kernel as well - atleast as much as is possible. Normally, the kernel keeps sending already written data after close(), and keeps handling the socket shutdown FIN exchanges. If a socket is closed with SO_LINGER on and it's timeout set to zero, the kernel will discard its write buffers and send a RST to the other end. (I'm not sure if this could end up being an orderly shutdown if buffers are empty.) This way the kernel can free up most of the memory immediately, but it will still keep TIME_WAIT sockets around for quite a while.
But, I must say, I believe Twisted is going to be the only project anywhere that is going to bother with tricking around with SO_LINGER. And the whole thing is actually even somewhat non-portable.
The feature has been missing for 8 years - and for 8 years I've had to add the abortConnection() API to every single server I've written, since every one of them have had a need for it. Perhaps it is time to finally slay the beast?
from twisted.
@itamarst commented |
---|
What's the motivation for sending a RST instead of a FIN when doing abortConnection()? I'm preparing to rebranch this, and then maybe try to get everything working, so I'm trying to understand the code and its choices.
from twisted.
naked commented |
---|
And obviously, the normal loseConnection() shouldn't be able to close the socket, while abortConnection() should! :-)
from twisted.
@itamarst commented |
---|
I don't think the RST issue is a problem, I was just curious. And finally fixing this is my goal, yes.
More problematic: calling connectionLost directly from abortConnection will have reentrancy issues, leading to potential connectionLost being called twice. This was the original patch. The new patch (apparently never applied to subversion for some reason) relies via loseConnection on the socket eventually becoming writeable... which is exactly what won't happen if you're dealing with a wedged other side that has stopped reading.
The solution is probably going to be a horrible combination of callLater(0, self.connectionLost), overriding doRead and doWrite to always return CONNECTION_LOST, and then making sure connectionLost can handle being called twice. Unless someone has a better idea?
(It's time to replace FileDescriptor with something that has an explicit state machine... but I'll only think about that once this is done, since this is more immediately important.)
from twisted.
@itamarst commented |
---|
(In [31596]) Branching to 'abortConnection-78-4'
from twisted.
@itamarst commented |
---|
Review time! Out of desperation, in part. My questions for reviewers, in addition to whatever you can find that's wrong:
-
What tests are still missing? I still suspect the possibility of FileDescriptor.connectionLost being called twice, somehow.
-
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-5 has py-without-middles not skipping the new TLS tests I added. Any idea why?
-
What is causing the failing tests in (some of) the Windows buildslaves? They pass on trunk.
from twisted.
@itamarst commented |
---|
(In [31784]) Branching to 'abortConnection-78-5'
from twisted.
@itamarst commented |
---|
(In [31911]) Branching to 'abortConnection-78-6'
from twisted.
@itamarst commented |
---|
(In [31916]) "Fix" slowness in new TLS code by doing 100 smaller transport.write()s instead of one large one.
Refs #7153
from twisted.
@itamarst commented |
---|
Further investigation of the TLS test slowness suggests the problem may not be in TLS producer code, though I did see hints it may be buggy, but perhaps in different way than I originally thought. Rather, it seems that writing a really large chunk of data to a (t.p.tls) TLS transport is far more expensive than using multiple writes in a row (see [31916]). This seems like a bug in the new TLS code.
from twisted.
@itamarst commented |
---|
Remaining work include figuring out why the changes to readConnectionLost and old SSL code's start/stopReading are necessary, getting rid of ClientCreator and maybe some whitespace cleanup.
Work will continue once [#5063](#5063) is fixed and we can merge forward; this will allow getting rid of some code to workaround #5063.
from twisted.
@itamarst commented |
---|
(In [32373]) Branching to 'abortConnection-78-7'
from twisted.
@itamarst commented |
---|
Ready for review, I guess. Some notes:
- I'd like feedback from Pavel about IOCP version.
- http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-8 has some remaining mysterious errors which seem like they shouldn't be my fault but maybe they are (py-without-models and the fedora builder - the win32 interface/adapter ones are known issue, I think.) Thoughts?
- The change to _oldtls appears necessary for tests to pass, but makes me slightly wary given _oldtls sets disconnected=True even when it's not really disconnected...
- Use callLater(0, f) to make protocol's connectionLost call non-reentrant. Sorry, but there's no better API at the moment; we can fix it along with all the other internal uses of this idiom when we do have a better API.
from twisted.
@itamarst commented |
---|
(In [32457]) Branching to 'abortConnection-78-8'
from twisted.
@exarkun commented |
---|
Re #11338... not really sure what you're suggesting. None of the new tests open many connections...
You added new tests though. And each test probably leaks a file descriptor.
from twisted.
@glyph commented |
---|
Replying to exarkun:
Re #11338... not really sure what you're suggesting. None of the new tests open many connections...
You added new tests though. And each test probably leaks a file descriptor.
For those who might see this and not be hip enough to have our entire ticket database memorized, exarkun's talking about: http://twistedmatrix.com/trac/ticket/4482
from twisted.
@glyph commented |
---|
Or... no, that's a different bug? Hrm. I can't find a bug report about FD leaks in gtkreactor. I thought I knew about this though. Anyone have a reference handy?
from twisted.
@itamarst commented |
---|
OK, I think I know what to do now. Taking off from review until I can get buildbot in better shape.
from twisted.
@itamarst commented |
---|
(In [32604]) Branching to 'abortConnection-78-9'
from twisted.
@itamarst commented |
---|
Noooooooooooooooooooooooooooooooooo f u cfreactor:
[ERROR]
Traceback (most recent call last):
Failure: exceptions.RuntimeError: BUG: We should be paused a this point.
twisted.internet.test.test_tls.AbortSSLConnectionTest_CFReactor.test_fullWriteBufferAfterByteExchange
Doesn't happen on the other OS X builder, strangely. Can I just ignore that? Please? Or mark it failing? I don't want to have to deal with another crappy broken reactor :(
There's still also random mystery errors elsewhere (item 1) -- suggestions?
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-9
from twisted.
@itamarst commented |
---|
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-9
We're down to failures that are even more mysterious and not-my-faulty.
Only unaddressed review comment is 14; see comments above. I can, if you want, remove the fix and open new ticket (Jerub claims to have seen bugs caused by it, so possibly writing a test isn't impossible).
from twisted.
@exarkun commented |
---|
Not exactly what I thought I was talking about, but it seems likely to be related anyway: https://bugzilla.gnome.org/show_bug.cgi?id=579406. I think you didn't see the problem on Ubuntu because eventually Ubuntu backported the fix, but maybe RHEL did not.
from twisted.
@itamarst commented |
---|
Created ticket #5301 for item 1.
Created ticket #5302 for item 2.
Created ticket #5300 for item 3.
from twisted.
@itamarst commented |
---|
I got rid of the minimize thing, it's not strictly necessary and the tests are still reasonable without it, and also dealt with 6 and 7.
Re 8, I will file a ticket after this branch is merged.
As far as 9, I think it was the bug fix, not the loop, so I lowered the number down to 100 (30MB write)
As far as 10 goes, the above means it's faster and writes less... but I think there may actually be a bottleneck in TLS. I don't think crypto is so slow that it should add add 2 CPU seconds, and it's much slower if I transfer same number of bytes with a smaller number of (larger) writes. Anyway, it's down to 3 seconds on my computer now. I don't want to lower it much more, because the idea is to fill up buffers.
Since I've addressed all review comments, I'll put this up for review once I can merge forward to include the fix for #5301.
from twisted.
@thijstriemstra commented |
---|
Can you also add @since
for the new public methods and classes?
from twisted.
@itamarst commented |
---|
Good point, will do. And maybe rename the abort mixin to start with an _ if it doesn't already.
from twisted.
@itamarst commented |
---|
(In [32809]) Address review items 4-7. Refs #7153
from twisted.
@itamarst commented |
---|
(In [32824]) Branching to 'abortConnection-78-10'
from twisted.
@itamarst commented |
---|
Thanks for the info. So it sounds like there's a problem case where certain BSDs (which?) will block on close() if SO_LINGER is set to 0, even for non-blocking sockets? Or am I misunderstanding?
from twisted.
koebenhavn commented |
---|
Just for refrence regarding SO_LINGER:
Portability note 1: Some implementations of the BSD socket API do not implement SO_LINGER at all. On such systems, applying SO_LINGER either fails with EINVAL or is (silently) ignored. Having SO_LINGER defined in the headers is no guarantee that SO_LINGER is actually implemented.
Portability note 2: Since the BSD documentation on SO_LINGER is sparse and inadequate, it is not surprising to find the various implementations interpreting the effect of SO_LINGER differently. For instance, the effect of SO_LINGER on non-blocking sockets is not mentioned at all in BSD documentation, and is consequently treated differently on different platforms. Taking case 3 for example: Some implementations behave as described above. With others, a non-blocking socket close() succeed immediately leaving the rest to a background process. Others ignore non-blocking'ness and behave as if the socket were blocking. Yet others behave as if SO_LINGER wasn't in effect [as if the case 1, the default, was in effect], or ignore linger->l_linger [case 3 is treated as case 2]. Given the lack of
adequate documentation, such differences are not (by themselves) indicative of an "incomplete" or "broken" implementation. They are simply different, not incorrect.
Portability note 3: Some implementations of the BSD socket API do not implement SO_LINGER completely. On such systems, the value of linger->l_linger is ignored (always treated as if it were zero).
/Event.
Replying to naked:
I think sending RST is somewhat a separate issue.
The main issue is freeing resources for the userland process. Just calling close() on the socket achieves this well enough (as long as no pending writes are waited and no SSL close is waited). This looks like an orderly shutdown to the remote end (unless there's SSL) in most cases. The exception to this is if the other end sends a data packet after we have called close() - in this case, the kernel will send a RST packet to the connection. (This could be avoided by never calling close(), but by calling shutdown() to close the sending end of the connection only.)
However, if the connections are indeed malicious, it might be that the wish is to free up resources for the kernel as well - atleast as much as is possible. Normally, the kernel keeps sending already written data after close(), and keeps handling the socket shutdown event FIN exchanges. If a socket is closed with SO_LINGER on and it's timeout set to zero, the kernel will discard its write buffers and send a RST to the other end. (I'm not sure if this could end up being an orderly shutdown if buffers are empty.) This way the kernel can free up most of the memory immediately, but it will still keep TIME_WAIT sockets around for quite a while.
But, I must say, I believe Twisted is going to be the only project anywhere that is going to bother with tricking around with SO_LINGER. And the whole thing is actually even somewhat non-portable.
The feature has been missing for 8 years - and for 8 years I've had to add the abortConnection() API to every single server I've written, since every one of them have had a need for it. Perhaps it is time to finally slay the beast?
from twisted.
Related Issues (20)
- mypy failing on trunk HOT 1
- exactly pin all dependencies for CI so we stop having failures for trunk
- Speed up http_headers operations
- More HTTP client speedups
- Memory leak with Twisted 24.3.0 HOT 7
- Remove isinstance assertions in Deferred
- Cython version of Deferred HOT 32
- `twisted.internet._sslverify.OpenSSLDefaultPaths` leaks memory because OpenSSL's `SSL_CTX_load_verify_locations` does
- twistd FTP TAP server allow configuring user home folder from twistd CLI HOT 3
- Evaluate extending OpenSSLCertificateOptions to set SSL_OP_CLEANSE_PLAINTEXT option
- `contextvars` pypi package is still listed as a supported optional dependency in the documentation
- Idea: Remove `_get_async_param` method HOT 1
- Verification: Ok even for protocols disabled HOT 5
- Speed up twisted.web, part 1 of N HOT 5
- Adding custom cipher list to serverFromString HOT 5
- Continued Use of Insecure Default Parameters HOT 8
- Deprecation of SSH ciphers HOT 1
- t.c.ssh.channel.SSHChannel.write has no limit to the buffer
- t.c.ssh.filetransferserver.FileTransferServer.packet_READ with unhandled error
- type confusion in KnownHosts.verifyHostKey traceback when adding ip key
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twisted.