Giter Site home page Giter Site logo

Comments (15)

driskell avatar driskell commented on July 3, 2024

Hello.

When you're looking at the lc-admin. Can you see what it says under the transport bit?

Key metric is the "pending payloads". If that sticks at 10 always, then essentially it's saying it's sent all it could (max 10 is default) to Logstash and is waiting for responses. If it's less than 10 then it's never waiting and always sending. In all honesty I've never seen a case where it was less than 10 and there was active harvesters ("completion" < 100) which would indicate Logstash running faster than logs can be read and sent.

Usually things are IO bound and the slowest link is nearly always Logstash due to the grok/user-agent lookup/ip lookup stuff going on. Based on that the usual way to speed up would be look at Logstash side and see if it's needing more CPU, or at the ES ingestion side to see if IO/CPU is there.

Regarding files held open - nowadays I see that as undesirable - it's a case where when things are working fine, holding the file after deletion for a few minutes to make sure everything was read and sent is a good thing - but if things aren't working fine it will hold it until all logs are sent and that could mean many deleted files held open if Logstash is hitting resource limits. It's something I haven't got around to looking at but log-courier should really have a setting that will abandon holding a deleted file open more than a specific amount of time.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

Hello, and thanks for the response. Wonderful software.

I must have an "older" version (ppa from ubuntu) because I don't see that metric. This is an output of what I have currently from running a status:

0xc000eb0600:
    error: n/a
    harvester:
      codecs: none
      completion: 67.30813995648552
      current_offset: 67702147
      last_eof_offset: n/a
      last_known_size: 100585378
      processed_lines: 374686
      speed_bps: 126984.0948625534
      speed_lps: 689.7015856023845
      stale_bytes: 0
      status: alive
    id: 0xc000eb0600
    orphaned: no
    path: /var/log/mylog.log
    status: running
    type: file

Or is there a command line switch I need to add? Thanks again!

from log-courier.

driskell avatar driskell commented on July 3, 2024

Hmmm would you be willing to post the full status output? Possibly it is called publisher not transport.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

Ah yes, I see it now:

  endpoints:
    logstash-1:5001:
      averageLatency: 1
      pendingPayloads: 8
      publishedLines: 60600878
      server: logstash-1:5001
      status: Active
    logstash-2:5001:
      averageLatency: 3
      pendingPayloads: 0
      publishedLines: 24883666
      server: logstash-2:5001
      status: Active
  status:
    pendingPayloads: 10
    publishedLines: 85483520
    speed: 707.028702900108
reload: callback
version: 2.0.6```

Ok so if that stays at 10, then logstash is my issue? Correct?

from log-courier.

driskell avatar driskell commented on July 3, 2024

It looks like one endpoint isn’t doing much as it’s got 0 pending - what’s the “method”? I think default is random but will only send to one at any one time and use any other as failover - if both are fine to be processing it may be worth trying loadbalance method.

https://github.com/driskell/log-courier/blob/master/docs/Configuration.md#method

It is saying 700 lines per second speed though so maybe it is catching up? If you started log courier with lots of full files it might still be sending everything and might catch up eventually.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

Method is "loadbalance". I ran a status again and now I see this:

  endpoints:
    logstash-1:5001:
      averageLatency: 2
      pendingPayloads: 5
      publishedLines: 61546030
      server: logstash-1:5001
      status: Active
    logstash-2:5001:
      averageLatency: 3
      pendingPayloads: 5
      publishedLines: 25255378
      server: logstash-2:5001
      status: Active
  status:
    pendingPayloads: 10
    publishedLines: 86800384
    speed: 0
reload: callback
version: 2.0.6```

Would adding another logstash instance help?  Or since pending payloads is at "10", it won't do much in this case?

from log-courier.

driskell avatar driskell commented on July 3, 2024

Ah here we have speed 0 and momentarily blocked pipeline in essence so yes I think there is some bottleneck on the logstash side.

Adding another could help but one thing I found helpful was to also optimise the processing on logstash. Specifically consider looking at the actions getting taken on events like grok as I know that tends to be the slowest. Some patterns can be slower than others.

from log-courier.

driskell avatar driskell commented on July 3, 2024

Worth checking logstash CPU etc too a and the ES cluster CPU to make sure you pinpoint where the restricted element is.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

I tweaked my logstash using workers and batch size setting and CPU/Mem are doing just fine so this leads me to believe it to be on the elasticsearch side of things since they are CPU bound.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

As a follow up, when I run lc-admin to get a status, I see pending payload maxed at 10, which stated above, says it is waiting on logstash. I have 2 logstash servers, load balanced, and both have CPU (x4) that runs at about 50% and memory is not fully consumed either and no IOWait. In you experiences, could this also be network issues?

from log-courier.

driskell avatar driskell commented on July 3, 2024

You could try increasing the max pending payloads. Memory usage will increase but perhaps the throughput is such that 10 is not enough. Essentially it dictates max amount of data in transit at any one moment in time.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

Documentation reads default is 4. Is that correct? Or is it 10? Assuming since I see pendingPayloads: 10 I should increase that beyond 10. Thanks for the help!

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

Thank you for all the help! Great app.

from log-courier.

driskell avatar driskell commented on July 3, 2024

@dale-busse-av No problem at all! Thanks for the patience. Did you manage to get things improved in the end?
You're right too - that default is incorrect. I'll remember to update that eventually.

from log-courier.

dale-busse-av avatar dale-busse-av commented on July 3, 2024

Yep we did. Thanks to your help, I was able to determine my ES cluster was the bottleneck. Learned about log-courier and appreciated all the help. Thanks again!!

from log-courier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.