Giter Site home page Giter Site logo

Kibana Instrumentation and `APM Server transport error (ECONNRESET): socket hang up` Log Messages about apm-nodejs-http-client HOT 14 CLOSED

elastic avatar elastic commented on September 21, 2024 2
Kibana Instrumentation and `APM Server transport error (ECONNRESET): socket hang up` Log Messages

from apm-nodejs-http-client.

Comments (14)

watson avatar watson commented on September 21, 2024 1

Let me know if I can help diagnose this problem. I wrote the stream implementation here and I know it's quite complicated and not easy to understand, so if there's anything I can do to help don't hesitate to ask 😃

from apm-nodejs-http-client.

astorm avatar astorm commented on September 21, 2024

Kibana's dev server starts up with multiple Node.js processes

Screen Shot 2021-01-08 at 2 36 48 PM

It's unclear is all these processes are started via the cluster module, or if some are started up via a traditional child_process.fork().

It's also unclear if all these processes are serving Kibana and being instrumented by the agent, or if some processes are independent of that.

Finally, it's unclear what's meant by a "kibana restart" -- does this mean some (all) of the child processes are restarted?

Understanding Kibana's process model will be critical in understanding this bug. Without the cluster modules, each running node.js process will have its own apm-agent attached, with its own TCP connections to APM Server. However, if these processes are created with the cluster module that means they share network resources and TCP connections.

Our current working theory on this bug is during the process cycling of a restart (waves hands vaguely) bad things happen with the processes and the TCP connections while things are settling. (theory: one process closes the connection but other processes try to use that connection))

In addition to solving this for kibana, this also points to a general need to expand our multi-process support.

from apm-nodejs-http-client.

astorm avatar astorm commented on September 21, 2024

Another aspect to consider here -- users have reported they're using APM Server in the cloud when this error occurs. This means their agent configuration looks something like

var apm = require('elastic-apm-node').start({

  // Set custom APM Server URL (default: http://localhost:8200)
  
  serverUrl: 'https://long-fake-string.apm.us-east-1.aws.cloud.es.io:443',
  // could be GCP or Azure as well
})

Understanding what sort of load balancing layers exist between the Agent and the APM Server in the cloud will be important in diagnosing this issue.

from apm-nodejs-http-client.

dgieselaar avatar dgieselaar commented on September 21, 2024

From Slack:

Here’s what I think is happening:

  • The error itself is caused by the agent aborting the request after serverTimeout has been reached (15s)
  • the agent writes data to a stream, and pipes that stream to a request. it will close this stream every 10s, which should close the request as well
  • in some cases, the socket for the outgoing HTTP request is created shortly before the Kibana development server’s file watcher starts (e.g. watching for changes (8010 files))
  • in some cases, mostly when a secure connection has not been established before the file watcher starts, the stream is closed before the socket has established a (secure) connection. when this happens, the request never ends and eventually times out
  • I’m not sure why a secure connection is not established. If connect fires before the file watcher starts usually it will get a secureConnect later. In other cases, it sends a ClientHello but never receives a ServerHello. I’ve tried fiddling with keepAliveMsecs but wasn’t able to consistently fix it

from apm-nodejs-http-client.

dgieselaar avatar dgieselaar commented on September 21, 2024

I see this happening on starts, but also restarts, and potentially at any time during the lifecycle of the proxy server, but I haven't been able to confirm the latter yet.

from apm-nodejs-http-client.

watson avatar watson commented on September 21, 2024

@dgieselaar Do you know if it happens outside of Kibana as well, or have you only seen this in Kibana so far? If only in Kibana, do you know if it also happens if connecting to a non-proxied APM Server?

from apm-nodejs-http-client.

dgieselaar avatar dgieselaar commented on September 21, 2024

I've only seen it in Kibana in development mode with a proxy Kibana server (which is the proxy I'm referring to). I've not tried any other ways of running Kibana.

from apm-nodejs-http-client.

trentm avatar trentm commented on September 21, 2024

Dario and Tyler have been using Kibana's master branch, which IIUC no longer uses cluster as of elastic/kibana@fd1328f

from apm-nodejs-http-client.

dgieselaar avatar dgieselaar commented on September 21, 2024

I was able to consistently reproduce this by delaying the initialisation of the stream by about ~1.5s.I did this in a very gross manner, which was adding a timeout before initialising the StreamChopper instance. There is probably a better way. What the right delay is probably is dependent on the machine. But, for it to consistently reproduce the stream has to be created before the file watcher log message (watching for changes), and the socket should only connect after this message.

from apm-nodejs-http-client.

tylersmalley avatar tylersmalley commented on September 21, 2024

Thanks all for helping out with this. Here is a bit more information:

While in development, I was able to see the socket hang up without a Kibana server restart or any other change.

server    log   [13:37:44.590] [info][plugins][watcher] Your basic license does not support watcher. Please upgrade your license.
server    log   [13:37:44.594] [info][crossClusterReplication][plugins] Your basic license does not support crossClusterReplication. Please upgrade your license.
server    log   [13:37:44.595] [info][kibana-monitoring][monitoring][monitoring][plugins] Starting monitoring stats collection
server    log   [13:37:45.353] [info][listening] Server running at http://localhost:5601/ued
server    log   [13:37:45.866] [info][server][Kibana][http] http server running at http://localhost:5601/ued
APM Server transport error (ECONNRESET): socket hang up
APM Server transport error (ECONNRESET): socket hang up

There have been discussions around this being related to the Kibana server restarts, so I decided to work on reproducing outside that environment.

I have been able to reproduce using a 8.0.0 snapshot build of Kibana.

I am wondering if this is an issue with the APM server in Cloud. Is there anything that would be helpful to make that determination or rule it out?

from apm-nodejs-http-client.

tylersmalley avatar tylersmalley commented on September 21, 2024

@dgieselaar has informed me my previous comment was due to the apmRequestTimeout being the same as the serverTimeout.

from apm-nodejs-http-client.

dgieselaar avatar dgieselaar commented on September 21, 2024

I have recently started sending data to a different APM Server instance, also in cloud, and am not seeing this error anymore, at least not as often.

from apm-nodejs-http-client.

trentm avatar trentm commented on September 21, 2024

@dgieselaar v3.14.0 of the agent includes a fix for the blocking behaviour issue we were seeing with the agent talking to APM server. I have elastic/kibana#97509 open to update Kibana to use the new agent. Would you be able/willing some time to try to reproduce those same errors you were seeing with the updated agent?

from apm-nodejs-http-client.

trentm avatar trentm commented on September 21, 2024

I don't know for certain, but I've not heard any more APM Server transport error (ECONNRESET): socket hang up issues come up in the intervening time (2y+). I'm hoping that the changes in #144 resolved this issue.

I'm closing now. We can re-open this or an issue on https://github.com/elastic/apm-agent-nodejs later if the issue re-occurs. (Note that in elastic/apm-agent-nodejs#3507 the http-client code was moved to the apm-agent-nodejs repo.)

from apm-nodejs-http-client.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.