Giter Site home page Giter Site logo

Comments (79)

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun set owner to @itamarst
  1. The twisted/test/test_tcp_internals.py will fail to set the new rlimit if the number of open files is close to (within 10) the open file limit. It would also be nice if this change were made in a separate ticket/branch.
  2. The twisted/test/test_ssl.py change seems fairly unrelated. It would be nice if it were made in a separate ticket/branch.
  3. The twisted/internet/iocpreactor/abstract.py still needs a test. Maybe if the ratio of test difficulty to "fix" difficulty is so high, the fix can be made before the test - but that still involves understanding the case it fixes and being confident that it really works. Either way, it would be nice if this were done in a separate ticket/branch.
  4. lambda foo: bar(foo) is an antipattern. In twisted/internet/test/test_tls.py, in AbortSSLConnectionTest.buildReactor, just use cooperator.cooperate as the patch value.
  5. In twisted/internet/test/test_tcp.py, in the minimizeTCPBuffers docstring, likely makes me uncomfortable. Are the tests that use this helper non-deterministic? And is the IOCP case going to be more likely to fail (or pass?) spuriously due to its lack of buffer size tweaks?
  6. In twisted/internet/test/test_tcp.py, the trick with Xs and Ys used by ReadAbortServerProtocol and AbortingClient could probably be explained.
  7. Typo in StreamingProducerClient.pauseProducing's docstring - Wwe.
    1. Also, the behavior the callLater(0.05) is working around seems like a bug in Twisted. The reactor should really be moving bytes on to the next level more aggressively (I have some numbers that show this is a performance win too). Separate ticket, of course. And then the comment above the 0.05 can link to that ticket, and we can get rid of the delayed call when that ticket is resolved.
  8. The SSL handling in runAbortTest is sort of inverted. It would be better to let a subclass override something to define any extra exceptions. This can be fixed later though (it's a nice, easy refactoring ticket for some newbie).
  9. in r32606, was it the addition of the loop, or the corrected handling of the paused attribute that fixed the tests with a "modern version of pyOpenSSL"?
  10. TLS test_fullWriteBufferAfterByteExchange is still pretty slow. It does 300 writes now instead of 100, and writes 90MB instead of the 30MB it wrote last time. I really want all of twisted.internet.test.test_tls to take under a second. In my ideal universe, it also wouldn't take 90MB of data to test basic transport features.

Thanks.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun set owner to @exarkun

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
jyknight's avatar @jyknight set owner to @jyknight

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
Automation's avatar Automation removed owner

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun set owner to @itamarst
  1. For TLS, test_fullWriteBufferAfterByteExchange takes foreeeeeeeeeeeeeeever. I would really like to not add half a dozen new tests that take 45 seconds of 100% CPU each. The branch's twisted.internet.test.test_tls takes longer than the whole trunk test suite. I don't really understand why this is. It happens with _newtls but not _oldtls. Does that mean it's caused by the producer bugs in _newtls? But then why does it peg the CPU instead of idle for a while?
  2. The skip logic for ReactorBuilder tests is in buildReactor. AbortSSLConnectionTest uses SSL APIs before it calls buildReactor though.
  3. Lots of trailing whitespace to clean up
  4. You should merge forward again:
    1. pick up the Failure fix so 'trial --debug' works
    1. there are some new tests in trunk that fail when this branch is merged into it, twisted.internet.test.test_tcp.TCPPortTestsBuilder.test_serverGetHostOnIPv4. I expect this is straightforward to fix, just client.getpeername() earlier.
  5. Might the previous point suggest the cause of the failing tests on Windows? They mainly appear to be failing with unexpected ConnectionLost failures. It could be that whereas previously the connections were not shut down hard enough to for the client to notice (or to notice quickly), now they are and the tests don't account for this. Why it only happens on Windows, I don't know, I can only conjecture that it is similar in cause to all of the other issues in which Windows behaves differently from other platforms.
  6. classImplements in twisted/internet/tcp.py is imported but unused
  7. It would be nice to use endpoints instead of ClientCreator in all new code, I think.
  8. The if self.disconnected: check in readConnectionLost in twisted/internet/tcp.py doesn't appear to be executed by the test suite
  9. Any reason to directly compare reason.value.__class__ in connectionLost in twisted/internet/tcp.py and twisted/internet/iocpreactor/tcp.py instead of using Failure.check?
  10. in twisted/internet/_oldtls.py, the modified conditionals in stopReading and stopWriting are only half tested. The test suite doesn't exercise the don't-call-self._base-method case of either one. Maybe I'm not really worried about this, but I'd like to at least know which case the new if self.disconnected check is handling there. Add a comment or something?
  11. In runAbortTest, I think rather than deferLater, it might be better to just verify there are no extra event sources left in the reactor and then finish right away.

Each of the actual new unit tests probably need more consideration than I have given them. Broadly the idea seems correct, but I don't really understand them completely yet. Hopefully this is enough of a review for you to make some further progress though!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst removed owner

Anyone can review!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst set owner to PenguinOfDoom

Pavel, could you take a look at the IOCP code and see if it makes sense to you? Thanks!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst removed owner

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun set owner to @itamarst
  1. The buildbot failures look pretty unrelated to me, but the rhel/fedora failures on gtk may mean that these new tests push us over some limit (remember, gtk leaks file descriptors like crazy, probably at least 1 per reactor).
  2. The howto says that connectionLost will be invoked re-entrantly, but it looks like the implementation goes for non-re-entrant.
  3. in _SocketCloser._closeSocket, I'm not sure how accurate the last bit of the comment is. I think SO_LINGER affects shutdown the same way it affects close. It seems fine to skip the shutdown in the non-orderly case, but if it's not actually required then the comment shouldn't say it is.
  4. Oh no, Connection._aborting. This absolutely must to be the last ad hoc state attribute we add to transports. After this, anything new needs to be added to an explicit state machine. For now, please add documentation for the attribute.
  5. Oh no, hasattr in Connection.connectionLost. Sure would be nice to have an explicit state machine. :)
  6. the abortConnection in twisted/internet/tcp.py is almost identical to twisted/internet/iocpreactor/tcp.py. They probably should be identical, and it would be nice to avoid the duplication.
  7. In twisted/internet/test/test_tcp.py, minimizeTCPBuffers docstring is misformatted. Also, it'd be nice to know why this is a useful thing to do, either with more docstring or comments at the call sites.
  8. Also, it has an XXX in it.
  9. StreamingProducerClient.write seems to provoke pauseProducing after just a single write. Does it need to do 99 more? Could it early out after pauseProducing? And should it be upset if it gets through all the writes without being paused?
  10. Why 0.05 in StreamingProducerClient.pauseProducing?
  11. in twisted/internet/test/test_tls.py, ContextGeneratingMixin should be new-style.
  12. And it looks like AbortSSLConnectionTest.buildReactor could use self.patch to be a little simpler.
  13. The news fragment could talk about TCP and SSL I guess, since SSL seems to support this feature as well. Also that comma in the middle is gross. :) ... abortConnection() which, unlike loseConnection(), always ...
  14. Which test exercises the self.stopWriting() fix in twisted/internet/iocpreactor/abstract.py (line ~365)?
  15. I have some reservations about the ReadAbortServerProtocol-based tests. These are providing a pretty strong (but fuzzy) guarantee - no bytes written in the same reactor iteration as an abortConnection call will be sent. If we wanted to switch back to synchronous write attempts, these tests will become unhappy. I don't know what to do about this, maybe nothing.
  16. The docstrings for the multi-abortConnection-call test methods confused me into thinking that multiple calls to loseConnection would raise an exception.
  17. All the runAbortTest-based tests are great, as far as readability goes. It took a couple minutes to figure out what runAbortTest does, but then the individual test methods were a breeze.
  18. Some of their docstrings could be expanded though. dataReceived calls abortConnection(), and then throws exception. - yea, then what?

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst removed owner

Added @SInCE, ready for review:

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-10

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun set owner to @itamarst

Okay. I think this is good to merge. Thanks!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst set status to closed

(In [32851]) Merge abortConnection-78-10.

Author: itamar, jknight, exarkun, ivank, glyph
Review: exarkun, thijs
Fixes: #7153

Add abortConnection() method to TCP and TLS transports.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun commented

Mechanism separate from policy.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

I assert that GTK does not leak file descriptors when simply running the reactor, and I have attached a file which I believe proves it. Perhaps this isn't true of our rhel6 buildbot, so someone can run this file there and let me know if there's a problem. However, I've run this program on every Ubuntu release back to Hardy and none of them exhibit the described behavior.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Re #11338... not really sure what you're suggesting. None of the new tests open many connections...

Re #14, quite possibly there is no test covering it. But I've written enough convoluted tests. If you really feel this is a problem, I can revert that fix :)

All other comments have been addressed. I just have to figure out why merging forward breaks a test.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented
#!html
<pre>
Sounds good. Maybe with a default timeout? Which should
probably be high enough that existing code won't be affected
by slow connections (and for producers this is especially an
issue, though I'm not sure if we have any code that depends
on loseConnection's producer interaction).

</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented
#!html
<pre>
I bet that our existing code deals with
loseConnection+Producers differently in different reactors.
 That should be on our list of things to clean up when we
revamp the Producer API.

</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented
#!html
<pre>
Actually it's in tcp.py (or actually maybe abstract.py), so
the reactor shouldn't matter.

</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented
#!html
<pre>
Similar to loseConnection(), but doesn't flush the write
buffers or wait for producer to finish. Useful for when you
know connection should die ASAP, because the other side is
totally screwed up anyway.

</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented
#!html
<pre>
I hope you don't object.  BUT I DON'T CARE IF YOU DO, HAHA

</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented
#!html
<pre>
This should probably just be an optional argument to
loseConnection, i.e.

    loseConnection(terminateTimeout=600)

A lot of code uses loseConnection() and expects the socket
to be closed eventually, but it is pretty easy to construct
an attack which will cause Twisted to never ever close a
socket this way.  By adding an optional argument to this
interface, we can enhance the semantics of the existing
method in a mostly-backwards-compatible way.

</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented
#!html
<pre>
Hm. Wouldn't loseConnection time out potentially break existing code? Like,
"send really huge file and then loseConnection". Of course, if the default
timeout was large enough it'd mostly be OK...
</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
naked's avatar naked commented
#!html
<pre>
I wrote a patch to this effect, although the method is called abortConnection().

My needs were to explicitly close a connection without waiting for write
availability on the socket/whatever first. Also this means that with SSL, we
don't send or wait for the reply to close notify, which is important.

There is a problem though, that in the normal case, connectionLost() is invoked
from the reactor after doWrite or doRead has returned CONNECTION_LOST or
CONNECTION_DONE. In this case, connectionLost() is called directly as a result
of calling abortConnection(). I do not know enough about twisted internals to
say if this is a serious problem or not, and how it should best be fixed.
</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented
#!html
<pre>
We do have code that depends on that behavior.  Quite a bit, actually.  I think I prefer 
abortConnection(), since terminateTimeout is a convenience and will have all the issues normally 
associated with callLater.
</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
jyknight's avatar @jyknight commented
#!html
<pre>
I think there's two issues here. One is a API to abort the connection. The other is the timeout 
issue.

For the write timeout issue, I don't think an absolute timeout on loseConnection is the right 
thing. Instead, it should have behavior like: "If there is data in the write buffer, and I have not sent any 
bytes for the last minute, the connection is as good as dead." I believe this behavior can and should 
default to on without compatibility problems.
</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
jyknight's avatar @jyknight commented
#!html
<pre>
Here's a different patch. transport.abortConnection() loses the connection and tcp's _closeSocket is 
updated to do an abrupt TCP disconnect unless the reason is ConnectionDone. Seems like it works but 
mostly untested.
</pre>

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
foom's avatar foom commented

(In [17458]) An abortConnection() function for transport, and some tests.

Refs #7153

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

I was not trying to make a point about the validity of this ticket.

I was responding to the comment by kvogt that "This is actually a pretty evil security hole". This ticket isn't going to fix any security hole (or, more accurately, DoS). It may be a necessary prerequisite, but we should probably have a separate issue to deal with the larger problem.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Obviously we should have abortConnection that is separate from loseConnection. The question is, should loseConnection additionally have some sort of a default timeout case where it calls abortConnection. loseConnection essentially is a policy, since it can be implemented in terms of producer API + abortConnection.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
kvogt's avatar kvogt commented

This is actually a pretty evil security hole. I logged into a server to see why it was out of memory, and found 1434 open connections with no activity. A malicious user was opening hundreds of connections to a twisted server but not reading from them. Since loseConnection() won't close the sockets unless the write buffer is empty, I have no way to fight this attack at the application level.

We have had a twisted IRC server and an entire twisted live video cluster (93 nodes) get destroyed by this exact exploit once a certain hacking group figured it out. In the mean time I'm patching in abortConnection2.diff, but I thought this real world example might be useful information for people trying to fix this.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

I'm torn between happiness at live video clustering and sadness about the exploit. Upping priority, changing to defect.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

I'm a little concerned about the form of the solution here. It means that every protocol that is not specifically designed to be resistant to this particular exploit (in other words, all the protocols in Twisted) will continue to be a problem. Perhaps a better way to deal with the exploit would be to have a default timeout on loseConnection if no data is being consumed by the client? Or an easy way to set up administrative limits on the number of simultaneous connections from a single IP?

I haven't thought too hard about the solution yet, but I definitely don't think we could consider this form of DoS dealt with if you have to hand-patch some code for every port that you want to open.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

Many tests fail with abortConnection-78 when using iocpreactor. One issue is that it doesn't pass in the 'orderly' argument. I think there are other issues too.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun commented

Shouldn't we just fix this patch? The ticket is still open, after all...

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Could you open a ticket for the windows connectionlost/connectiondone issue you discovered? With a failing test or minimal example if at all possible. Thanks!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

The patch causes almost all SSL-related tests to fail on Windows, even with the select reactor. If I discover why, I will post here.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

By SSL-related, I meant anything that uses SSL. Not SSL tests in particular.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

Why the tests fail on Windows:

On Windows, twisted.internet.tcp.Connection.connectionLost is incorrectly called with reason ConnectionLost even for "normal" disconnects.

[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.

Sometimes you see ConnectionDone, but most of the time it is the incorrect ConnectionLost. It "should" be called with reason ConnectionDone (the correct behavior, which you see on Linux)

This incorrect reason causes a disorderly _closeSocket on almost every SSL connection, which causes data loss and test failures.

Hopefully someone who knows how PyOpenSSL and OpenSSL work will comment :-)

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Oh, I hadn't realized the bug was in the patch. Good plan!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
truekonrads's avatar truekonrads commented

Lo and behold!
I have the connectionLost example nailed down:

from twisted.internet import win32eventreactor 
win32eventreactor.install()

from twisted.internet import reactor
from twisted.protocols.policies import TrafficLoggingFactory
from twisted.web.client import HTTPClientFactory,_parse
from twisted.python.util import println
from twisted.internet.defer import DeferredList
from OpenSSL import SSL

class NoVerifyClientContextFactory:
    """A context factory for SSL clients."""

    isClient = 1
    method = SSL.SSLv3_METHOD

    def getContext(self):
        def x(*args):
            return True
        ctx=SSL.Context(self.method)
        #print dir(ctx)
        ctx.set_verify(SSL.VERIFY_NONE,x)
        return ctx


url="https://online.lkb.lv/"
httpfactory=HTTPClientFactory(url)
dl=[]
for i in xrange(0,10):
    print i
    factory = TrafficLoggingFactory(httpfactory,"C:\\foo%i" %i)
    dl.append(httpfactory.deferred)
    pscheme,host, port,ppath = _parse(url)
    reactor.connectSSL(host, port, factory,NoVerifyClientContextFactory())
reactor.run()
DeferredList(dl).addCallback(lambda _:reactor.stop())                                     

Requirmentes are win32eventreactor and SSL connections!

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

I've been looking at this again. I noticed that abortConnection in abstract.py does a very wrong thing -- it calls connectionLost synchronously, leading to some really broken behavior where a server calling abortConnection would have connectionLost called twice.

The 20091225 patch makes abortConnection empty the buffers and go through the typical loseConnection codepath, but with changes to return the correct error reason in _postLoseConnection. Hopefully someone knows if these changes are on the right track.

Like the previous patches, this patch patch breaks the IOCP reactor, and still has the OpenSSL problem mentioned above (so it breaks many tests that use SSL on Windows). It also needs a test for a server aborting a client, not just a client aborting. There's also likely an untested codepath in _sendCloseAlert, but it is beyond my abilities to describe.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

Not below, above.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

Here is why I suspected there is an untested codepath in _sendCloseAlert: before I changed this in _closeWriteConnection, all of the tests still passed:

-	        if result is main.CONNECTION_DONE: 
+	        if shouldClose:

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

(In [28708]) Branching to 'abortConnection-78-2'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

(In [28709]) Pull the changes forward and make the changes slight less awful re #7153

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun commented

(In [29980]) Branching to 'abortConnection-78-3'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
ivank's avatar ivank commented

Given the SSL-related issues here, this will be a lot easier after #4854 is done.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
naked's avatar naked commented

Sorry I can't test this more fully, but some just a simple comment:

Do verify with wireshark that the connection is really in the state you expect it to be in. Basically, what it should be in is a state where the the TCP window advertised by the recipient is zero, and the sender is periodically testing whether the window is still zero. The timeout is doubled on each try, so it might take a long time for you to see the retransmits.

So, tcpdump or wireshark, and the response packets from server should end up looking like this:

10:40:07.413776 IP localhost.14141 > localhost.41522: Flags [.], ack 1, win 0, options [nop,nop,TS val 735552 ecr 707404], length 0

The important parts are: ACK packet, window size 0.

When the connection is in a stable state like this, the normal loseConnection() should be called.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

So, I wrote some tests, and read some code. What I learned:

  1. There already exists code that calls FileDescriptor.connectionLost directly in a code path that may be reentrant: loseConnection if you've previously done loseWriteConnection. It wouldn't be reentrant with doWrite though.

  2. I wrote a test that simulates the likely use case of "abortConnection if other sides stop reading what I'm writing". It works both with the "call FileDescriptor.connectionLost immediately" implementation, and, contrary to my expectations, with another implementation based on loseConnection.

  3. Writing tests for this is hard.

Item 2 has me very confused. Anyone care to take a look and see what exactly is wrong with my code and/or conceptual model? I would think the file descriptor would never be writeable if the other side stopped reading, thus doWrite would never be called, and yet it is.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
naked's avatar naked commented

I think sending RST is somewhat a separate issue.

The main issue is freeing resources for the userland process. Just calling close() on the socket achieves this well enough (as long as no pending writes are waited and no SSL close is waited). This looks like an orderly shutdown to the remote end (unless there's SSL) in most cases. The exception to this is if the other end sends a data packet after we have called close() - in this case, the kernel will send a RST packet to the connection. (This could be avoided by never calling close(), but by calling shutdown() to close the sending end of the connection only.)

However, if the connections are indeed malicious, it might be that the wish is to free up resources for the kernel as well - atleast as much as is possible. Normally, the kernel keeps sending already written data after close(), and keeps handling the socket shutdown FIN exchanges. If a socket is closed with SO_LINGER on and it's timeout set to zero, the kernel will discard its write buffers and send a RST to the other end. (I'm not sure if this could end up being an orderly shutdown if buffers are empty.) This way the kernel can free up most of the memory immediately, but it will still keep TIME_WAIT sockets around for quite a while.

But, I must say, I believe Twisted is going to be the only project anywhere that is going to bother with tricking around with SO_LINGER. And the whole thing is actually even somewhat non-portable.

The feature has been missing for 8 years - and for 8 years I've had to add the abortConnection() API to every single server I've written, since every one of them have had a need for it. Perhaps it is time to finally slay the beast?

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

What's the motivation for sending a RST instead of a FIN when doing abortConnection()? I'm preparing to rebranch this, and then maybe try to get everything working, so I'm trying to understand the code and its choices.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
naked's avatar naked commented

And obviously, the normal loseConnection() shouldn't be able to close the socket, while abortConnection() should! :-)

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

I don't think the RST issue is a problem, I was just curious. And finally fixing this is my goal, yes.

More problematic: calling connectionLost directly from abortConnection will have reentrancy issues, leading to potential connectionLost being called twice. This was the original patch. The new patch (apparently never applied to subversion for some reason) relies via loseConnection on the socket eventually becoming writeable... which is exactly what won't happen if you're dealing with a wedged other side that has stopped reading.

The solution is probably going to be a horrible combination of callLater(0, self.connectionLost), overriding doRead and doWrite to always return CONNECTION_LOST, and then making sure connectionLost can handle being called twice. Unless someone has a better idea?

(It's time to replace FileDescriptor with something that has an explicit state machine... but I'll only think about that once this is done, since this is more immediately important.)

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [31596]) Branching to 'abortConnection-78-4'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Review time! Out of desperation, in part. My questions for reviewers, in addition to whatever you can find that's wrong:

  1. What tests are still missing? I still suspect the possibility of FileDescriptor.connectionLost being called twice, somehow.

  2. http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-5 has py-without-middles not skipping the new TLS tests I added. Any idea why?

  3. What is causing the failing tests in (some of) the Windows buildslaves? They pass on trunk.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [31784]) Branching to 'abortConnection-78-5'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [31911]) Branching to 'abortConnection-78-6'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [31916]) "Fix" slowness in new TLS code by doing 100 smaller transport.write()s instead of one large one.

Refs #7153

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Further investigation of the TLS test slowness suggests the problem may not be in TLS producer code, though I did see hints it may be buggy, but perhaps in different way than I originally thought. Rather, it seems that writing a really large chunk of data to a (t.p.tls) TLS transport is far more expensive than using multiple writes in a row (see [31916]). This seems like a bug in the new TLS code.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Remaining work include figuring out why the changes to readConnectionLost and old SSL code's start/stopReading are necessary, getting rid of ClientCreator and maybe some whitespace cleanup.

Work will continue once [#5063](#5063) is fixed and we can merge forward; this will allow getting rid of some code to workaround #5063.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [32373]) Branching to 'abortConnection-78-7'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Ready for review, I guess. Some notes:

  1. I'd like feedback from Pavel about IOCP version.
  2. http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-8 has some remaining mysterious errors which seem like they shouldn't be my fault but maybe they are (py-without-models and the fedora builder - the win32 interface/adapter ones are known issue, I think.) Thoughts?
  3. The change to _oldtls appears necessary for tests to pass, but makes me slightly wary given _oldtls sets disconnected=True even when it's not really disconnected...
  4. Use callLater(0, f) to make protocol's connectionLost call non-reentrant. Sorry, but there's no better API at the moment; we can fix it along with all the other internal uses of this idiom when we do have a better API.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [32457]) Branching to 'abortConnection-78-8'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun commented

Re #11338... not really sure what you're suggesting. None of the new tests open many connections...

You added new tests though. And each test probably leaks a file descriptor.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

Replying to exarkun:

Re #11338... not really sure what you're suggesting. None of the new tests open many connections...

You added new tests though. And each test probably leaks a file descriptor.

For those who might see this and not be hip enough to have our entire ticket database memorized, exarkun's talking about: http://twistedmatrix.com/trac/ticket/4482

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
glyph's avatar @glyph commented

Or... no, that's a different bug? Hrm. I can't find a bug report about FD leaks in gtkreactor. I thought I knew about this though. Anyone have a reference handy?

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

OK, I think I know what to do now. Taking off from review until I can get buildbot in better shape.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [32604]) Branching to 'abortConnection-78-9'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Noooooooooooooooooooooooooooooooooo f u cfreactor:

[ERROR]
Traceback (most recent call last):
Failure: exceptions.RuntimeError: BUG: We should be paused a this point.

twisted.internet.test.test_tls.AbortSSLConnectionTest_CFReactor.test_fullWriteBufferAfterByteExchange

Doesn't happen on the other OS X builder, strangely. Can I just ignore that? Please? Or mark it failing? I don't want to have to deal with another crappy broken reactor :(

There's still also random mystery errors elsewhere (item 1) -- suggestions?

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-9

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-9

We're down to failures that are even more mysterious and not-my-faulty.

Only unaddressed review comment is 14; see comments above. I can, if you want, remove the fix and open new ticket (Jerub claims to have seen bugs caused by it, so possibly writing a test isn't impossible).

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
exarkun's avatar @exarkun commented

Not exactly what I thought I was talking about, but it seems likely to be related anyway: https://bugzilla.gnome.org/show_bug.cgi?id=579406. I think you didn't see the problem on Ubuntu because eventually Ubuntu backported the fix, but maybe RHEL did not.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Created ticket #5301 for item 1.

Created ticket #5302 for item 2.

Created ticket #5300 for item 3.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

I got rid of the minimize thing, it's not strictly necessary and the tests are still reasonable without it, and also dealt with 6 and 7.

Re 8, I will file a ticket after this branch is merged.

As far as 9, I think it was the bug fix, not the loop, so I lowered the number down to 100 (30MB write)

As far as 10 goes, the above means it's faster and writes less... but I think there may actually be a bottleneck in TLS. I don't think crypto is so slow that it should add add 2 CPU seconds, and it's much slower if I transfer same number of bytes with a smaller number of (larger) writes. Anyway, it's down to 3 seconds on my computer now. I don't want to lower it much more, because the idea is to fill up buffers.

Since I've addressed all review comments, I'll put this up for review once I can merge forward to include the fix for #5301.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
thijstriemstra's avatar @thijstriemstra commented

Can you also add @since for the new public methods and classes?

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Good point, will do. And maybe rename the abort mixin to start with an _ if it doesn't already.

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [32809]) Address review items 4-7. Refs #7153

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

(In [32824]) Branching to 'abortConnection-78-10'

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
itamarst's avatar @itamarst commented

Thanks for the info. So it sounds like there's a problem case where certain BSDs (which?) will block on close() if SO_LINGER is set to 0, even for non-blocking sockets? Or am I misunderstanding?

from twisted.

twisted-trac avatar twisted-trac commented on May 19, 2024
koebenhavn's avatar koebenhavn commented

Just for refrence regarding SO_LINGER:

Portability note 1: Some implementations of the BSD socket API do not implement SO_LINGER at all. On such systems, applying SO_LINGER either fails with EINVAL or is (silently) ignored. Having SO_LINGER defined in the headers is no guarantee that SO_LINGER is actually implemented.

Portability note 2: Since the BSD documentation on SO_LINGER is sparse and inadequate, it is not surprising to find the various implementations interpreting the effect of SO_LINGER differently. For instance, the effect of SO_LINGER on non-blocking sockets is not mentioned at all in BSD documentation, and is consequently treated differently on different platforms. Taking case 3 for example: Some implementations behave as described above. With others, a non-blocking socket close() succeed immediately leaving the rest to a background process. Others ignore non-blocking'ness and behave as if the socket were blocking. Yet others behave as if SO_LINGER wasn't in effect [as if the case 1, the default, was in effect], or ignore linger->l_linger [case 3 is treated as case 2]. Given the lack of
adequate documentation, such differences are not (by themselves) indicative of an "incomplete" or "broken" implementation. They are simply different, not incorrect.
Portability note 3: Some implementations of the BSD socket API do not implement SO_LINGER completely. On such systems, the value of linger->l_linger is ignored (always treated as if it were zero).

/Event.

Replying to naked:

I think sending RST is somewhat a separate issue.

The main issue is freeing resources for the userland process. Just calling close() on the socket achieves this well enough (as long as no pending writes are waited and no SSL close is waited). This looks like an orderly shutdown to the remote end (unless there's SSL) in most cases. The exception to this is if the other end sends a data packet after we have called close() - in this case, the kernel will send a RST packet to the connection. (This could be avoided by never calling close(), but by calling shutdown() to close the sending end of the connection only.)

However, if the connections are indeed malicious, it might be that the wish is to free up resources for the kernel as well - atleast as much as is possible. Normally, the kernel keeps sending already written data after close(), and keeps handling the socket shutdown event FIN exchanges. If a socket is closed with SO_LINGER on and it's timeout set to zero, the kernel will discard its write buffers and send a RST to the other end. (I'm not sure if this could end up being an orderly shutdown if buffers are empty.) This way the kernel can free up most of the memory immediately, but it will still keep TIME_WAIT sockets around for quite a while.

But, I must say, I believe Twisted is going to be the only project anywhere that is going to bother with tricking around with SO_LINGER. And the whole thing is actually even somewhat non-portable.

The feature has been missing for 8 years - and for 8 years I've had to add the abortConnection() API to every single server I've written, since every one of them have had a need for it. Perhaps it is time to finally slay the beast?

from twisted.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.