Comments (14)
Let me know if I can help diagnose this problem. I wrote the stream implementation here and I know it's quite complicated and not easy to understand, so if there's anything I can do to help don't hesitate to ask 😃
from apm-nodejs-http-client.
Kibana's dev server starts up with multiple Node.js processes
It's unclear is all these processes are started via the cluster
module, or if some are started up via a traditional child_process.fork()
.
It's also unclear if all these processes are serving Kibana and being instrumented by the agent, or if some processes are independent of that.
Finally, it's unclear what's meant by a "kibana restart" -- does this mean some (all) of the child processes are restarted?
Understanding Kibana's process model will be critical in understanding this bug. Without the cluster
modules, each running node.js process will have its own apm-agent attached, with its own TCP connections to APM Server. However, if these processes are created with the cluster
module that means they share network resources and TCP connections.
Our current working theory on this bug is during the process cycling of a restart (waves hands vaguely) bad things happen with the processes and the TCP connections while things are settling. (theory: one process closes the connection but other processes try to use that connection))
In addition to solving this for kibana, this also points to a general need to expand our multi-process support.
from apm-nodejs-http-client.
Another aspect to consider here -- users have reported they're using APM Server in the cloud when this error occurs. This means their agent configuration looks something like
var apm = require('elastic-apm-node').start({
// Set custom APM Server URL (default: http://localhost:8200)
serverUrl: 'https://long-fake-string.apm.us-east-1.aws.cloud.es.io:443',
// could be GCP or Azure as well
})
Understanding what sort of load balancing layers exist between the Agent and the APM Server in the cloud will be important in diagnosing this issue.
from apm-nodejs-http-client.
From Slack:
Here’s what I think is happening:
- The error itself is caused by the agent aborting the request after serverTimeout has been reached (15s)
- the agent writes data to a stream, and pipes that stream to a request. it will close this stream every 10s, which should close the request as well
- in some cases, the socket for the outgoing HTTP request is created shortly before the Kibana development server’s file watcher starts (e.g.
watching for changes (8010 files)
) - in some cases, mostly when a secure connection has not been established before the file watcher starts, the stream is closed before the socket has established a (secure) connection. when this happens, the request never ends and eventually times out
- I’m not sure why a secure connection is not established. If connect fires before the file watcher starts usually it will get a secureConnect later. In other cases, it sends a ClientHello but never receives a ServerHello. I’ve tried fiddling with keepAliveMsecs but wasn’t able to consistently fix it
from apm-nodejs-http-client.
I see this happening on starts, but also restarts, and potentially at any time during the lifecycle of the proxy server, but I haven't been able to confirm the latter yet.
from apm-nodejs-http-client.
@dgieselaar Do you know if it happens outside of Kibana as well, or have you only seen this in Kibana so far? If only in Kibana, do you know if it also happens if connecting to a non-proxied APM Server?
from apm-nodejs-http-client.
I've only seen it in Kibana in development mode with a proxy Kibana server (which is the proxy I'm referring to). I've not tried any other ways of running Kibana.
from apm-nodejs-http-client.
Dario and Tyler have been using Kibana's master branch, which IIUC no longer uses cluster as of elastic/kibana@fd1328f
from apm-nodejs-http-client.
I was able to consistently reproduce this by delaying the initialisation of the stream by about ~1.5s.I did this in a very gross manner, which was adding a timeout before initialising the StreamChopper instance. There is probably a better way. What the right delay is probably is dependent on the machine. But, for it to consistently reproduce the stream has to be created before the file watcher log message (watching for changes
), and the socket should only connect after this message.
from apm-nodejs-http-client.
Thanks all for helping out with this. Here is a bit more information:
While in development, I was able to see the socket hang up without a Kibana server restart or any other change.
server log [13:37:44.590] [info][plugins][watcher] Your basic license does not support watcher. Please upgrade your license.
server log [13:37:44.594] [info][crossClusterReplication][plugins] Your basic license does not support crossClusterReplication. Please upgrade your license.
server log [13:37:44.595] [info][kibana-monitoring][monitoring][monitoring][plugins] Starting monitoring stats collection
server log [13:37:45.353] [info][listening] Server running at http://localhost:5601/ued
server log [13:37:45.866] [info][server][Kibana][http] http server running at http://localhost:5601/ued
APM Server transport error (ECONNRESET): socket hang up
APM Server transport error (ECONNRESET): socket hang up
There have been discussions around this being related to the Kibana server restarts, so I decided to work on reproducing outside that environment.
I have been able to reproduce using a 8.0.0 snapshot build of Kibana.
- Download Kibana snapshot from https://artifacts-api.elastic.co/v1/versions/8.0.0-SNAPSHOT/builds/latest
- Either also download and run Elasticsearch - or using
yarn es shapshot
from the Kibana repository - Start Kibana with
ELASTIC_APM_SERVER_TIMEOUT=10 ELASTIC_APM_ACTIVE=true bin/kibana --elasticsearch.username=kibana --elasticsearch.password=changeme
I am wondering if this is an issue with the APM server in Cloud. Is there anything that would be helpful to make that determination or rule it out?
from apm-nodejs-http-client.
@dgieselaar has informed me my previous comment was due to the apmRequestTimeout
being the same as the serverTimeout
.
from apm-nodejs-http-client.
I have recently started sending data to a different APM Server instance, also in cloud, and am not seeing this error anymore, at least not as often.
from apm-nodejs-http-client.
@dgieselaar v3.14.0 of the agent includes a fix for the blocking behaviour issue we were seeing with the agent talking to APM server. I have elastic/kibana#97509 open to update Kibana to use the new agent. Would you be able/willing some time to try to reproduce those same errors you were seeing with the updated agent?
from apm-nodejs-http-client.
I don't know for certain, but I've not heard any more APM Server transport error (ECONNRESET): socket hang up
issues come up in the intervening time (2y+). I'm hoping that the changes in #144 resolved this issue.
I'm closing now. We can re-open this or an issue on https://github.com/elastic/apm-agent-nodejs later if the issue re-occurs. (Note that in elastic/apm-agent-nodejs#3507 the http-client code was moved to the apm-agent-nodejs repo.)
from apm-nodejs-http-client.
Related Issues (20)
- Node 11 support HOT 2
- TypeError: Cannot convert undefined or null to object HOT 2
- Cannot get this to send out spans HOT 2
- Non-standard indentation and undefined catch scope HOT 6
- Indentation with EsLint and Prettier HOT 2
- Need to increment git ignore configuration HOT 5
- test failure with node v15.5.0 HOT 1
- perf: Alternate Options to Initial Stream Corking
- Feature Request: Logging in the Client HOT 1
- Flush seems to complete before request is sent HOT 5
- Blocking Behavior under Benchmarking Load HOT 11
- consider changing payloadLogFile handling to *not* unzip the payload data HOT 1
- 'npm run coverage' errors; breaks checks for node v15 HOT 1
- `client.flush(cb)` callback may not be called in v9.7.0 HOT 1
- Client Behavior in AWS Lambda Enviornment
- CI implementations (in GitHub actions and Jenkinis) duplicate the number of builds per PR HOT 2
- Audit failure due to a vulnerability in semver HOT 1
- Vulnerability in semver HOT 1
- Throws error `Cannot read property 'length' of undefined` on StreamChopper.Writable.write() HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apm-nodejs-http-client.