Giter Site home page Giter Site logo

Comments (10)

davepacheco avatar davepacheco commented on August 10, 2024

How reproducible is this problem for you? Are you (or can you) run with --abort-on-uncaught-exception and share the resulting core file?

from manatee.

AmlingPalantir avatar AmlingPalantir commented on August 10, 2024

Uh, pretty reproducible (as in I can make it happen again with stress testing, not that it happens often). Due to the vagaries of how we're running it I'm slightly reticent to go down this path, especially when I think the nature of the crash is relatively clear from the debug logging I had added (both error handlers are called and in an order that makes the second one crash).

from manatee.

davepacheco avatar davepacheco commented on August 10, 2024

I think I understand what you're saying, but without understanding this better, I'm not sure what the most appropriate fix is. (I don't want to blindly add a defensive check that papers over other potential problems.) Do you happen to still have the debug output, or anything about which errors were emitted? Was it the same underlying error that was reported on both the query and the client?

Thanks again for reporting this.

from manatee.

AmlingPalantir avatar AmlingPalantir commented on August 10, 2024

I super hate to be that guy, but I'm not sure what I can or can't share so I'm in a weird spot. At the least the error the query handler logged was:

{"name":"manatee-sitter","hostname":"XXX","pid":126104,"component":"PostgresMgr","level":40,"err":{"message":"terminating connection due to administrator command","name":"error","stack":"error: terminating connection due to administrator command\n at Connection.parseE (PATH/manatee/node_modules/pg/lib/connection.js:521:11)\n at Connection.parseMessage (PATH/manatee/node_modules/pg/lib/connection.js:351:17)\n at Socket. (PATH/manatee/node_modules/pg/lib/connection.js:101:22)\n at Socket.emit (events.js:95:17)\n at Socket. (stream_readable.js:765:14)\n at Socket.emit (events.js:92:17)\n at emitReadable (_stream_readable.js:427:10)\n at emitReadable (_stream_readable.js:423:5)\n at readableAddChunk (_stream_readable.js:166:9)\n at Socket.Readable.push (_stream_readable.js:128:10)","code":"57P01"},"msg":"got err","time":"2015-09-17T22:49:38.024Z","src":{"file":"PATH/manatee/lib/postgresMgr.js","line":1675,"func":"PostgresMgr._pgQueryFini"},"v":0}

The error the client error handler logged (right before dying) was:

{"name":"manatee-sitter","hostname":"XXX","pid":126104,"component":"PostgresMgr","level":50,"err":{"message":"read ECONNRESET","name":"Error","stack":"Error: read ECONNRESET\n at errnoException (net.js:905:11)\n at TCP.onread (net.js:559:19)","code":"ECONNRESET"},"msg":"got pg client error","time":"2015-09-17T22:49:38.026Z","src":{"file":"PATH/manatee/lib/postgresMgr.js","line":1615},"v":0}

This was immediately after it decided it was going to stop postgres to assume the role of sync (last normal line above first error was the "trying SIGINT" message).

from manatee.

davepacheco avatar davepacheco commented on August 10, 2024

Thanks. Interesting that they were two different errors.

from manatee.

AmlingPalantir avatar AmlingPalantir commented on August 10, 2024

Sorry to be so sparing with the actual data, but the powers that be move
slowly. Here is an extremely minimal slice of that debugging log (just time
and msg, and maybe a minor comment of mine in square brackets):

2015-09-17T22:49:36.019Z _pgQueryKick dispatching request
2015-09-17T22:49:36.019Z start query 'end' callback
2015-09-17T22:49:36.019Z start _pgQueryFini
2015-09-17T22:49:36.020Z finish _pgQueryFini
2015-09-17T22:49:36.020Z end query 'end' callback
2015-09-17T22:49:37.021Z _pgQueryKick dispatching request
2015-09-17T22:49:37.021Z start query 'end' callback
2015-09-17T22:49:37.022Z start _pgQueryFini
2015-09-17T22:49:37.022Z finish _pgQueryFini
2015-09-17T22:49:37.022Z end query 'end' callback
2015-09-17T22:49:38.012Z incoming event [event was activeChange]
2015-09-17T22:49:38.019Z incoming event [event was clusterStateChange]
2015-09-17T22:49:38.020Z assuming role [role was sync]
2015-09-17T22:49:38.020Z PostgresMgr.reconfigure: entering
2015-09-17T22:49:38.021Z PostgresMgr._standby: entering
2015-09-17T22:49:38.021Z PostgresMgr.initDb: stop postgres
2015-09-17T22:49:38.021Z PostgresMgr.stop: entering
2015-09-17T22:49:38.021Z PostgresMgr.stop: trying SIGINT
2015-09-17T22:49:38.022Z _pgQueryKick dispatching request
2015-09-17T22:49:38.023Z start query 'error' callback
2015-09-17T22:49:38.023Z start _pgQueryFini
2015-09-17T22:49:38.024Z got err [this was the first error I pasted above]
2015-09-17T22:49:38.025Z _pgQueryFini: nuking _pgClient due to error
2015-09-17T22:49:38.026Z finish _pgQueryFini
2015-09-17T22:49:38.026Z finish query 'error' callback
2015-09-17T22:49:38.026Z start client error callback
2015-09-17T22:49:38.026Z got pg client error [this was the second error I pasted above]

from manatee.

AmlingPalantir avatar AmlingPalantir commented on August 10, 2024

I put up a sketch of a rewrite of some of the state management which fixes the problem, for values of "fixes" equal to "based on my limited understanding of javascript" and "could no longer repro crash".

I got the go-ahead to post a core dump so if you'd still rather see that I guess I can remove the proposed fix locally, reproduce again, etc. I guess let me know either way.

from manatee.

davepacheco avatar davepacheco commented on August 10, 2024

Thanks for taking a swing at fixing this. I'm not sure I understand the new invariants, though. Before the change, the invariants were that there was only ever one active client, and it was represented by self._pgClient. There was only ever one outstanding request, and it's represented by self._pgRequestOutstanding. Your change obviously assumes that it's possible to get a client's error callback fired even after it's no longer the only client. Is that what happened in your test case? The change also assumes that when a query ends, it may not be the currently outstanding request. How can that happen?

At 1645 in the new file, it seems like you leak the 'error' listener with each new query. If we issue a request that succeeds, but there's a subsequent client error (say, 10 requests later), won't we fire 'error' callbacks for all of the successful requests that happened before it?

Thanks again for taking a look. I know this code is rather hairy, and it's also tricky to get right. (It could also be made a lot simpler with a fix for node-postgres#718.)

from manatee.

AmlingPalantir avatar AmlingPalantir commented on August 10, 2024

Thanks for taking a swing at fixing this. I'm not sure I understand the new
invariants, though. Before the change, the invariants were that there was
only ever one active client, and it was represented by self._pgClient. There
was only ever one outstanding request, and it's represented by
self._pgRequestOutstanding. Your change obviously assumes that it's possible
to get a client's error callback fired even after it's no longer the only
client. Is that what happened in your test case?

Uh, I am not strictly sure whether or not I observed that. I chose
"self._pgClient !== selfClient" over "self._pgClient !== null" because I think
it's going to be safer and more resilient (either to mistakes or to future
mucking in this code).

The change also assumes that when a query ends, it may not be the currently
outstanding request. How can that happen?

Again, I don't have an execution path that leads to that. Maybe client errors
(killing request) but query actually then completes? I don't know if that's
possible due to extreme lack of knowledge of what the postgres library
guarantees or doesn't. As I implied earlier, most choices in what I wrote are
informed by fear and desire for resilience in the face of uncertainty (about
what current libraries do, about what people might change surrounding code to
do, etc.).

At 1645 in the new file, it seems like you leak the 'error' listener with
each new query. If we issue a request that succeeds, but there's a subsequent
client error (say, 10 requests later), won't we fire 'error' callbacks for
all of the successful requests that happened before it?

Uh, sure does seem that way. At the least _pgQueryFini won't do anything
stupid since it will check the outstanding request and log the error as
spurious (presumably several times). Maybe the other two handlers (query error
and query completion) should remove that one listener specifically? Let me go
actually learn enough JS to figure out how to do that...

from manatee.

AmlingPalantir avatar AmlingPalantir commented on August 10, 2024

Updated the PR (now that the new cut has been through a few nights of not breaking).

from manatee.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.