Comments (10)
How reproducible is this problem for you? Are you (or can you) run with --abort-on-uncaught-exception and share the resulting core file?
from manatee.
Uh, pretty reproducible (as in I can make it happen again with stress testing, not that it happens often). Due to the vagaries of how we're running it I'm slightly reticent to go down this path, especially when I think the nature of the crash is relatively clear from the debug logging I had added (both error handlers are called and in an order that makes the second one crash).
from manatee.
I think I understand what you're saying, but without understanding this better, I'm not sure what the most appropriate fix is. (I don't want to blindly add a defensive check that papers over other potential problems.) Do you happen to still have the debug output, or anything about which errors were emitted? Was it the same underlying error that was reported on both the query and the client?
Thanks again for reporting this.
from manatee.
I super hate to be that guy, but I'm not sure what I can or can't share so I'm in a weird spot. At the least the error the query handler logged was:
{"name":"manatee-sitter","hostname":"XXX","pid":126104,"component":"PostgresMgr","level":40,"err":{"message":"terminating connection due to administrator command","name":"error","stack":"error: terminating connection due to administrator command\n at Connection.parseE (PATH/manatee/node_modules/pg/lib/connection.js:521:11)\n at Connection.parseMessage (PATH/manatee/node_modules/pg/lib/connection.js:351:17)\n at Socket. (PATH/manatee/node_modules/pg/lib/connection.js:101:22)\n at Socket.emit (events.js:95:17)\n at Socket. (stream_readable.js:765:14)\n at Socket.emit (events.js:92:17)\n at emitReadable (_stream_readable.js:427:10)\n at emitReadable (_stream_readable.js:423:5)\n at readableAddChunk (_stream_readable.js:166:9)\n at Socket.Readable.push (_stream_readable.js:128:10)","code":"57P01"},"msg":"got err","time":"2015-09-17T22:49:38.024Z","src":{"file":"PATH/manatee/lib/postgresMgr.js","line":1675,"func":"PostgresMgr._pgQueryFini"},"v":0}
The error the client error handler logged (right before dying) was:
{"name":"manatee-sitter","hostname":"XXX","pid":126104,"component":"PostgresMgr","level":50,"err":{"message":"read ECONNRESET","name":"Error","stack":"Error: read ECONNRESET\n at errnoException (net.js:905:11)\n at TCP.onread (net.js:559:19)","code":"ECONNRESET"},"msg":"got pg client error","time":"2015-09-17T22:49:38.026Z","src":{"file":"PATH/manatee/lib/postgresMgr.js","line":1615},"v":0}
This was immediately after it decided it was going to stop postgres to assume the role of sync (last normal line above first error was the "trying SIGINT" message).
from manatee.
Thanks. Interesting that they were two different errors.
from manatee.
Sorry to be so sparing with the actual data, but the powers that be move
slowly. Here is an extremely minimal slice of that debugging log (just time
and msg, and maybe a minor comment of mine in square brackets):
2015-09-17T22:49:36.019Z _pgQueryKick dispatching request
2015-09-17T22:49:36.019Z start query 'end' callback
2015-09-17T22:49:36.019Z start _pgQueryFini
2015-09-17T22:49:36.020Z finish _pgQueryFini
2015-09-17T22:49:36.020Z end query 'end' callback
2015-09-17T22:49:37.021Z _pgQueryKick dispatching request
2015-09-17T22:49:37.021Z start query 'end' callback
2015-09-17T22:49:37.022Z start _pgQueryFini
2015-09-17T22:49:37.022Z finish _pgQueryFini
2015-09-17T22:49:37.022Z end query 'end' callback
2015-09-17T22:49:38.012Z incoming event [event was activeChange]
2015-09-17T22:49:38.019Z incoming event [event was clusterStateChange]
2015-09-17T22:49:38.020Z assuming role [role was sync]
2015-09-17T22:49:38.020Z PostgresMgr.reconfigure: entering
2015-09-17T22:49:38.021Z PostgresMgr._standby: entering
2015-09-17T22:49:38.021Z PostgresMgr.initDb: stop postgres
2015-09-17T22:49:38.021Z PostgresMgr.stop: entering
2015-09-17T22:49:38.021Z PostgresMgr.stop: trying SIGINT
2015-09-17T22:49:38.022Z _pgQueryKick dispatching request
2015-09-17T22:49:38.023Z start query 'error' callback
2015-09-17T22:49:38.023Z start _pgQueryFini
2015-09-17T22:49:38.024Z got err [this was the first error I pasted above]
2015-09-17T22:49:38.025Z _pgQueryFini: nuking _pgClient due to error
2015-09-17T22:49:38.026Z finish _pgQueryFini
2015-09-17T22:49:38.026Z finish query 'error' callback
2015-09-17T22:49:38.026Z start client error callback
2015-09-17T22:49:38.026Z got pg client error [this was the second error I pasted above]
from manatee.
I put up a sketch of a rewrite of some of the state management which fixes the problem, for values of "fixes" equal to "based on my limited understanding of javascript" and "could no longer repro crash".
I got the go-ahead to post a core dump so if you'd still rather see that I guess I can remove the proposed fix locally, reproduce again, etc. I guess let me know either way.
from manatee.
Thanks for taking a swing at fixing this. I'm not sure I understand the new invariants, though. Before the change, the invariants were that there was only ever one active client, and it was represented by self._pgClient. There was only ever one outstanding request, and it's represented by self._pgRequestOutstanding. Your change obviously assumes that it's possible to get a client's error callback fired even after it's no longer the only client. Is that what happened in your test case? The change also assumes that when a query ends, it may not be the currently outstanding request. How can that happen?
At 1645 in the new file, it seems like you leak the 'error' listener with each new query. If we issue a request that succeeds, but there's a subsequent client error (say, 10 requests later), won't we fire 'error' callbacks for all of the successful requests that happened before it?
Thanks again for taking a look. I know this code is rather hairy, and it's also tricky to get right. (It could also be made a lot simpler with a fix for node-postgres#718.)
from manatee.
Thanks for taking a swing at fixing this. I'm not sure I understand the new
invariants, though. Before the change, the invariants were that there was
only ever one active client, and it was represented by self._pgClient. There
was only ever one outstanding request, and it's represented by
self._pgRequestOutstanding. Your change obviously assumes that it's possible
to get a client's error callback fired even after it's no longer the only
client. Is that what happened in your test case?
Uh, I am not strictly sure whether or not I observed that. I chose
"self._pgClient !== selfClient" over "self._pgClient !== null" because I think
it's going to be safer and more resilient (either to mistakes or to future
mucking in this code).
The change also assumes that when a query ends, it may not be the currently
outstanding request. How can that happen?
Again, I don't have an execution path that leads to that. Maybe client errors
(killing request) but query actually then completes? I don't know if that's
possible due to extreme lack of knowledge of what the postgres library
guarantees or doesn't. As I implied earlier, most choices in what I wrote are
informed by fear and desire for resilience in the face of uncertainty (about
what current libraries do, about what people might change surrounding code to
do, etc.).
At 1645 in the new file, it seems like you leak the 'error' listener with
each new query. If we issue a request that succeeds, but there's a subsequent
client error (say, 10 requests later), won't we fire 'error' callbacks for
all of the successful requests that happened before it?
Uh, sure does seem that way. At the least _pgQueryFini won't do anything
stupid since it will check the outstanding request and log the error as
spurious (presumably several times). Maybe the other two handlers (query error
and query completion) should remove that one listener specifically? Let me go
actually learn enough JS to figure out how to do that...
from manatee.
Updated the PR (now that the new cut has been through a few nights of not breaking).
from manatee.
Related Issues (8)
- manatee hard codes postgres username
- Various wrong paths and typos HOT 1
- node-gyp fails to rebuild HOT 3
- lib/adm.js leaks ZK clients HOT 2
- need to bump some deps for recent versions of Node? HOT 1
- manatee should support using the system postgres
- ./tools/mkdevsitters: postgres 12 git commit went missing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from manatee.