Giter Site home page Giter Site logo

Comments (2)

Dieterbe avatar Dieterbe commented on August 21, 2024

Hypothesis

usage of spooling in TCP routes (or reconnecting connections) leaks (opens many and doesn't close) many file descriptors.

Investigation approach

  • dig in at the destination level, as it owns the TCP connections and spooling
  • don't investigate route, table and other structures. As they are removed from TCP connections and spooling.
  • focus on "normal" operation with a TCP destination running, connections dropping and reconnecting, spool usage etc. Not shutdown of the relay.

Learnings

Diagram of where FD's are created and owned

Destination{
    connUpdate: chan *Conn {
	      conn :*net.TCPConn  // these conns get created by NewConn() which is
                                  // called in dest.updateConn() and are owned by dest.relay()
    }
    spool: Spool {
        queue: nsqd.Diskqueue
    }
}

Connections

  • does Conn close conn: yes upon network error or calling Close(). (*)
    Conn does not touch its writer (which embeds the conn), when closing the conn but that should be ok.

  • updateConn does not close the prior connection, because it does not have access to it.
    updateConn is called from...

  1. destination.relay(), once at startup (when no pre-existing conn), and once if the conn has gone down, at which point it should have been closed (see isAlive check) (*)
  2. when updating the destinations address, when a pre-existing, active conn may be live -> this was a bug that is now fixed: the previous conn was not being closed. #481

(*) these mechanism rely on channels messaging between different routines. If goroutines get stuck, processing may not properly advance. This is best diagnosed with goroutine dumps.

Spool/diskqueue

  • dest.Run creates NewSpool() (and only there do we call NewSpool)
  • dest.Run runs only when route.Run() and AddDest() are called, which are considered "safe" paths.

So then, looking at NewSpool()...
spool stays tied to the destination until dest is shutdown, which does not happen during "normal operation". Likewise, 1 diskqueue is tied to 1 spool for its entire lifetime.
So it comes down to the diskqueue. After some digging, it seems that it correctly opens/closes its file descriptors.

How to avoid in the future?

see diagonstics instructions added in #481
TODO: instrumentation for open FD's?

from carbon-relay-ng.

Dieterbe avatar Dieterbe commented on August 21, 2024

The two ways I see to instrument open fd's are:

  1. polling /proc/pid/fd and counting the number of files in there.
  2. accounting ourselves, any time we open/close a conn or file.

1 is expensive, 2 is verbose and possibly inaccurate.

I think i will scrap that idea for now.

from carbon-relay-ng.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.