Giter Site home page Giter Site logo

Comments (11)

antrik avatar antrik commented on July 22, 2024

@jdm yes, the "unknown token" message is probably the relevant one -- AIUI this shouldn't happen ever, even if there are failures elsewhere...

The error check is in the code introduced by #94 -- though it's not necessarily a regression in the new implementation: IIRC we have had rare intermittents in ipc-channel in the past :-( Don't remember exactly which tests, but it very well might have been this one.

Either way, @danlrobertson is probably the most familiar with this part of the code now -- so maybe he has an idea what to do about this? :-)

from ipc-channel.

dlrobertson avatar dlrobertson commented on July 22, 2024

Thanks for bringing this issue up! I think this is because we currently use the file descriptor number as the Token here. This could be similar to the same issue fixed with the auto-incrementing id. I see a few possible fixes for this

  • Switch to using Slab instead of a HashMap.
  • Use an auto-incrementing id. This presents a problem when considering a 32bit arch.

@antrik @jdm thoughts?

from ipc-channel.

antrik avatar antrik commented on July 22, 2024

Ugh... I did investigate this somewhat a while back, but never got around to write down my findings :-(

I'm pretty sure the problem is not actually with token collisions.

(If we wanted to avoid using FDs as tokens, that could be done easily: antrik@deaf8aa . I'm not sure though it would really be helpful at all...)

The mio tokens -- unlike the IDs passed back to the caller -- are used only internally in OsIpcReceiverSet, and the implementation makes sure the tokens live exactly as long as the FDs they refer to. So unless there is a bug in there, they can never collide even in the face of FD reuse.

As far as I can see, we can only get an invalid token here if we actually get an event for an FD we already closed. For this to happen, two distinct conditions need to be met at the same time:

  1. We get another readable event for a channel, after we already got an ECONNRESET while attempting to receive from it previously.

It turns out this is actually possible with big transfers (which is the case for the test failure reported here): the recv() implementation in Unix back-end will return this error not only when the sender end of the main channel is closed, but also when the sender end of the dedicated channel for a big transfer is closed -- so we could get this error even if the actual sender is still present.

I'd say the fact that our recv() returns an ECONNRESET in this case is a bug that needs to be fixed (my bad) -- though I'm not sure how this situation can actually be triggered...

  1. We get to process an event for an FD we already closed explicitly (upon getting the bogus ECONNRESET).

One way this could happen is if there is an outstanding duplicate of the FD: as the underlying socket is still open in this case, epoll() will still report events for it, even if the FD that was originally added to the set has already been closed. However, since we are no longer duplicating channel FDs after #127 , I don't think this can actually happen...

The other way this could happen is if another readable event has already been queued, before we get to close the channel upon processing the bogus ECONNRESET. I don't really see how this can happen either though, since in my understanding epoll() will not return more than one event for the same FD in a single invocation; and we close the FD before it is invoked again...

from ipc-channel.

antrik avatar antrik commented on July 22, 2024

FWIW, I submitted #148 for the ECONNRESET issue -- but I have no idea whether this will actually help with the problem at all...

from ipc-channel.

antrik avatar antrik commented on July 22, 2024

BTW, I realised (again) that while we don't explicitly clone FDs, sending a receiver over another channel does create a duplicate FD temporarily, until the one in the originating process is dropped. (Which happens before the high-level send() returns to the caller, but after the low-level send() is finished, and thus possibly after already seeing use in the receiving process.)

Receiver transfers are actually happening in the router mechanism -- so this may very well be part of the problem.

from ipc-channel.

antrik avatar antrik commented on July 22, 2024

#148 doesn't seem to affect the situation at all ( http://build.servo.org/builders/linux-rel-wpt/builds/2401/steps/test__2/logs/stdio looks pretty much the same as before) -- so apparently it's not related to the big transfers after all. I can't think of any other situation right now that could trigger condition 1. though :-(

from ipc-channel.

dlrobertson avatar dlrobertson commented on July 22, 2024

I just did some debugging and was able to reliably repro the issue with the following.

# Only hits the panic if you have the "--binary-arg=--multiprocess"
./mach test-wpt --binary-arg=--multiprocess "eventsource/eventsource-constructor-url-multi-window.htm"

It has been far to long since I've worked on "real" servo code. I've had to re-learn a lot, so it took way longer than it should have. I realized that the readable events that were panicing on were errors. #149 just doesn't panic if it hits the unknown token branch with an error event. Not sure if it entirely fixes the issue, but I no longer hit panics when I run the above wpt test.

from ipc-channel.

antrik avatar antrik commented on July 22, 2024

@danlrobertson indeed many ipc-channel issues only manifest when actually using multiple processes... Though I do understand that this wasn't obvious. Not sure what we could do to improve the situation :-( Maybe add a contributing guide to ipc-channels README?

(It was actually worse at the time you worked on this, since Servo was not even using ipc-channel in some places unless running multi-process -- that only changed quite recently.)

from ipc-channel.

dlrobertson avatar dlrobertson commented on July 22, 2024

Maybe add a contributing guide to ipc-channels README?

Definitely, my plan is to start documenting all the things once the shared memory PR lands. I'll add creating a CONTRIBUTING.md to that.

from ipc-channel.

antrik avatar antrik commented on July 22, 2024

Just put it in the main README I'd say -- it's not like it's so large that people might get lost... ;-) (Many smaller projects do it like that.)

Either way, yay for more documentation! :-)

(Though I might actually do this particular bit myself, unless you beat me to it...)

from ipc-channel.

dlrobertson avatar dlrobertson commented on July 22, 2024

(Though I might actually do this particular bit myself, unless you beat me to it...)

My goal is mostly selfish :) I figured I'd learn a good bit about the structures and methods by documenting them. I was going to start with the basics in ipc.rs or platform/unix/mod.rs since that is where I've spent most my time.

from ipc-channel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.