Giter Site home page Giter Site logo

Comments (10)

Phencys avatar Phencys commented on July 22, 2024

Hello Delian,

Thanks for the feedback, our project always likes to listen to new people trying out our software.

As for the bugfixes:

  • The SIG_CHILD handler, is already been taken care of currently. Even though we don't explicitly create any signal handlers in the connector processes, we have our Config lib to thank for this. In lib/config.cpp we have the activate() function which sets up all the signal handlers appropriately. This function is called in every connector, as well as the main process.
  • The S.close() call in the child is indeed something we have missed out on, thanks for pointing it out :) We are currently working hard to get the 1.3.1 patch release up and running, and the fix will be incorporated into this patch.

Greetings,
Erik

from mistserver.

delian avatar delian commented on July 22, 2024

You may already know it, but S.close() does not help because of the unclean shutdown you do on a socket in the libmist. The result is that the loop is blocked and the whole process stop performing. I intentionally did close(S.getSocket()) as this is a way to correctly close the socket and not blocking the process, avoiding libmist.
A similar issue exists in the httpconnector, which is partially hidden by the fact that you use threads there. As a result it leaves a huge amount of half closed sockets and even though I haven't been affected personally, I am guessing it is also bad for the performance of the application. With threads you do not need to close the socket in the master thread, but you still need to have a clean close at the end of the thread, which is not happening for some reason.

The code you have in mistlib's socket.cpp ...

  if (sock != -1){
    shutdown(sock, SHUT_RDWR);
    errno = EINTR;
    while (::close(sock) != 0 && errno == EINTR){
    }
    sock = -1;
  }

...will block forever (at least on linux) if you have non empty buffers in the socket (quite often). Also, if you carefully look, you will see that the while will block forever. Why simply you do not replace that code with:

if (sock!=-1) { close(sock); sock = -1; }

... and leave the kernel to do the thing you were trying to do? This way your threads will not block and S.close() will perform correctly.

from mistserver.

Phencys avatar Phencys commented on July 22, 2024

Hello Delian,

Thanks for the reply.

I am wondering however, to what extent the shutdown of our sockets would be
considered "unclean". As far as we know we do everything the way it should
-- same as the linux kernel headers.

Regards,

On Mon, Jan 27, 2014 at 12:39 PM, Delian Delchev
[email protected]:

You may already know it, but S.close() does not help because of the
unclean shutdown you do on a socket in the libmist (I guess you already
know it). The result is that the loop is blocked and the whole process stop
performing. I intentionally did close(S.getSocket()) as this is a way to
correctly close the socket and not blocking the process.
A similar issue exists in the httpconnector, which is partially hidden by
the fact that you use threads there. As a result it leaves a huge amount of
half closed sockets and even though I haven't been affected personally, I
am guessing it is also bad for the performance of the application.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-33360498
.

from mistserver.

delian avatar delian commented on July 22, 2024

Just do a little experiment - try to debug httpprogressive process, having S.close() set, while there is a traffic sent to the user (the user is having a lower bw connection so the output socket buffer is full).
Then you will see that S.close() will block. When you start investigating you will see that shutdown(...) and then ::close() never returns 0. Even if shutdown is successful, close may return error, because the socket is already destroyed. So your loop will block forever. close + shutdown is redundant operation. But also shutdown does allow a half closed sessions (even though you try to close them in both directions by a parameter) and this is what I see in the lsof when my mist server crashes -

MistConnH 1733 root cwd DIR 252,0 4096 655618 /home/mist/mistserver
MistConnH 1733 root rtd DIR 252,0 4096 2 /
...
MistConnH 1733 root 5u IPv6 10190 0t0 TCP *:http-alt (LISTEN)
MistConnH 1733 root 6u unix 0xffff88013788ad80 0t0 11681606 socket
MistConnH 1733 root 7u IPv6 14745133 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48088 (CLOSE_WAIT)
MistConnH 1733 root 8u IPv6 15386914 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49406 (CLOSE_WAIT)
MistConnH 1733 root 9u IPv6 14745134 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48089 (CLOSE_WAIT)
MistConnH 1733 root 10u IPv6 11694990 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:46120 (ESTABLISHED)
MistConnH 1733 root 11u IPv6 13536320 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:47544 (ESTABLISHED)
MistConnH 1733 root 12u IPv6 15332175 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49332 (CLOSE_WAIT)
MistConnH 1733 root 13u unix 0xffff880107a0d380 0t0 14746778 socket
MistConnH 1733 root 14u IPv6 14745139 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48090 (CLOSE_WAIT)
MistConnH 1733 root 15u IPv6 14745142 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48091 (CLOSE_WAIT)
MistConnH 1733 root 16u IPv6 13162442 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:47372 (ESTABLISHED)
MistConnH 1733 root 17u IPv6 15462473 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49528 (CLOSE_WAIT)
MistConnH 1733 root 18u IPv6 14745145 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48092 (CLOSE_WAIT)
MistConnH 1733 root 19u IPv6 14992270 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48694 (CLOSE_WAIT)
MistConnH 1733 root 20u IPv6 15439011 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49493 (CLOSE_WAIT)
MistConnH 1733 root 21u IPv6 14745147 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48093 (CLOSE_WAIT)
MistConnH 1733 root 22u IPv6 14777189 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48267 (CLOSE_WAIT)
MistConnH 1733 root 23u unix 0xffff8801341be080 0t0 11697711 socket
MistConnH 1733 root 24u IPv6 14745582 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48132 (CLOSE_WAIT)
MistConnH 1733 root 25u IPv6 14745149 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48094 (CLOSE_WAIT)
MistConnH 1733 root 26u IPv6 14745155 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48095 (CLOSE_WAIT)
MistConnH 1733 root 27u IPv6 15519771 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49602 (CLOSE_WAIT)
MistConnH 1733 root 28u IPv6 14745586 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:48133 (CLOSE_WAIT)
...
MistConnH 1733 root 1081u IPv6 15142618 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49014 (CLOSE_WAIT)
MistConnH 1733 root 1082u IPv6 15142644 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49015 (CLOSE_WAIT)
MistConnH 1733 root 1083u IPv6 15147136 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49016 (CLOSE_WAIT)
MistConnH 1733 root 1084u IPv6 15147188 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49017 (CLOSE_WAIT)
MistConnH 1733 root 1085u IPv6 15147217 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49018 (CLOSE_WAIT)
....
MistConnH 1733 root 1994u IPv6 15811895 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49991 (CLOSE_WAIT)
MistConnH 1733 root 1995u IPv6 15812200 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49992 (CLOSE_WAIT)
MistConnH 1733 root 1996u IPv6 15812575 0t0 TCP 365limed05.visir.is:http-alt->365limed05.visir.is:49993 (CLOSE_WAIT
....
(see 2000+ half closed sockets)

http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#closedown

from mistserver.

delian avatar delian commented on July 22, 2024

I am doing a little experiment - replacing the close() code in the mistlib with what I just said (pure close), as I just realized from my logs that my mist httpconnector has been blocked because the total max amount of threads has been reached, as they are not cleaned because they have been blocked by the close() method.

from mistserver.

delian avatar delian commented on July 22, 2024

For the moment the test I did behave well, I have now a lot less close_wait sockets, but I still have them. Thay may happen if the socket is not closed somewhere, while the thread exits (and this way it will even keep the thread alive). Soon that may exhaust the total amount of threads per process quota. So either this was not the reason for the CLOSE_WAIT or it was not the only reason. I need to wait for 24h to confirm though.

However, I have another suggestion -
Your DEBUG printouts are very much useless. Not only that but in 1.2 (and I guess in 1.3) they are not fixed according to the new structures so there are couple of places the compilation breaks if you compile it with DEBUG.

I would suggest you instead of doing

#ifdef DEBUG
fprintf(err,'asldas');
#endif 

to replace them with a macro, defined in debug.h for example (where you will do your checks).

The macro can have the value like:

#ifdef DEBUG
#define dbg(...) { fprintf(stderr,"%s:%d:",__FILE__,__LINE__); fprintf(stderr,##__VA_ARGS__); };
#else
#define dbg(...)
#endif

Then you can remove the unistd.h and the rest includes you use in every file just because of the debug, and replace the #if DEBUG>xxx or #ifdef DEBUG's with just dbg("%s", .... );
This will simplify the compilation. Will allow you easily to redefine the debug macro rules, will decrease the amount of lines you have in every code, but mainly for me - i will be able to find which one of the processes and where exactly generated that printout, as to be fair, there are a lot of messages with the same value, produced by different processes and it is really hard to follow a flow, if you have multiple connections.

from mistserver.

Thulinma avatar Thulinma commented on July 22, 2024

Funny that you should mention this - I have just finished making exactly this change.
I'm also working on unifying the connector code and cleaning up some left over sockets (there are several other places where a socket or two was forgotten).
You can expect this code to land in the development branch somewhere today or tomorrow, and in the master branch sometime this week.
Running some tests first, before making it all publicly available.

On a different but related topic: calling shutdown before closing is necessary on some systems to signal EOF to the other end of the connection. However, calling shutdown on a socket that is still going to be used is of course horribly wrong - and I'm working on fixing this, right now. Same as above - code will land sometime today or tomorrow.

Thanks again for your continued feedback and comments! If you find anything else that you think isn't behaving as it should, please don't hesitate to let us know. :-)

from mistserver.

Thulinma avatar Thulinma commented on July 22, 2024

For reference - the code I spoke about just landed in the development branches of mistserver/mistlib. Make sure to checkout and compile both, or it won't work.

Not all debug messages have been updated to the new format yet - and not all of them work at levels above 4 right now. Fixes for that will arrive soon, too.

from mistserver.

delian avatar delian commented on July 22, 2024

Ok, I will look at it, I never even thought there may be different branches there :)

from mistserver.

Thulinma avatar Thulinma commented on July 22, 2024

We keep the master branch at the currently stable version. The development branch has the latest changes and contains "soon to be stable" code. So, if you want the latest changes you'll want to keep up with the development branches :-)

from mistserver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.