Giter Site home page Giter Site logo

Comments (15)

charlievieth avatar charlievieth commented on June 19, 2024 1

from bosh-agent.

flavorjones avatar flavorjones commented on June 19, 2024

It might also be worth noting here that the windows VM is consistently at ~25% CPU utilization, even when nothing is going on. Not clear to me whether this is due to the concourse worker job or the bosh agent.

from bosh-agent.

crawsible avatar crawsible commented on June 19, 2024

@flavorjones Hrm, very interesting and definitely unexpected (cc: @davidjahn). Thanks for reporting! I've scheduled a bug investigation in our backlog here.

FWIW, naively killing the bosh director on my cf deployment on GCP, using a stemcell v1200.1 release candidate, wasn't enough to reproduce this issue for me. That's pretty surprising -- I can't imagine any running process aside from the agent would be affected by the director's accessibility. I think I know the answer to this question, but just to be sure, would you happen to know whether the agent was performing any director-requested work at the time the director stopped?

from bosh-agent.

davidjahn avatar davidjahn commented on June 19, 2024

Ohhh this is interesting... not sure if the problem is agent induced alone or is some interaction with concourse... we will see if we can reproduce it on GCP.

from bosh-agent.

charlievieth avatar charlievieth commented on June 19, 2024

One thing to note. If the agent dies it's jobs (the services it creates) will keep running.

from bosh-agent.

charlievieth avatar charlievieth commented on June 19, 2024

Is that red line the agent's CPU usage, and if not do we know which process it correlates to?

from bosh-agent.

mhoran avatar mhoran commented on June 19, 2024

In investigating this bug, we discovered a completely unrelated bug where the agent will not restart after termination. Thanks!

We're still trying to reproduce the behavior you're seeing. In the stemcell you're using (where the aforementioned unrelated bug is not present), the agent does try to restart when the director connection is lost. We don't see our VMs using 100% CPU, though. Could you tell us a bit more about the instance types in this deployment?

Also, logs from /var/vcap/bosh/log would be super helpful.

from bosh-agent.

flavorjones avatar flavorjones commented on June 19, 2024

OK, just landed and will try to reproduce and get y'all some logs and maybe some screenshots from Task Manager.

from bosh-agent.

mhoran avatar mhoran commented on June 19, 2024

OK, so we do see that, with the default of restarting the agent every 5 seconds on failure, about 25% CPU usage. This is a bit excessive, so we're going to go with an exponential backoff for restarting the agent. We'll back off up to 5 minutes, and then try to start the agent every 5 minutes thereafter. Note that the Linux agent is also chatty on startup when NATS is unreachable, but probably doesn't have such an expensive bootstrap process.

It's worth noting that with a backoff of 5 minutes, when you bring your director back online and the resurrector is enabled, the resurrector may recreate the VM before the agent has a chance to restart itself.

from bosh-agent.

mhoran avatar mhoran commented on June 19, 2024

(Hopefully) Fixed in dc9a5b4.

from bosh-agent.

flavorjones avatar flavorjones commented on June 19, 2024

Looks good. I updated service_wrapper.xml with the changes from dc9a5b4 and went through the stop/uninstall/install/start cycle. Here's what CPU util looked like while the director was stopped:

screenshot from 2017-08-12 17-13-48

Thanks all!

from bosh-agent.

crawsible avatar crawsible commented on June 19, 2024

@flavorjones Great to hear! We'll ping back here and close this issue once this agent change makes its way into a 2012R2 stemcell release.

from bosh-agent.

cppforlife avatar cppforlife commented on June 19, 2024

@crawsible did this make it through?

from bosh-agent.

crawsible avatar crawsible commented on June 19, 2024

@cppforlife Yes it did, thanks for the reminder.

from bosh-agent.

flavorjones avatar flavorjones commented on June 19, 2024

For the record, I believe this was patched in 1200.3 (though it may have been 1200.2).

from bosh-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.