Giter Site home page Giter Site logo

Comments (12)

driskell avatar driskell commented on July 23, 2024 2

Multiline codec is now deprecated in newer Logstash so this makes codec support way simpler and more reliable.

I'm also looking at JSON decoding in the shipper.

from log-courier.

driskell avatar driskell commented on July 23, 2024

Hi @bcicen

Log Courier produces structured data out of the log file. It takes the line, the host, the path, and any additional fields, to generate the event. So the resulting event transmitting to Logstash already has JSON structure to it.

The best way at the moment to decode and "merge" is to use a filter on Logstash side to decode the line field. I don't plan to allow codecs in the plugin as those generally expect single line data and not structured data.

I am considering adding a JSON codec to Log Courier in the future, to do this on the client side which will save a filter on the indexers. It's a really small resource gain compared to the multiline and filter codec, however, as json decoding in logstash is very fast indeed.

Jason

from log-courier.

mahnunchik avatar mahnunchik commented on July 23, 2024

+1

from log-courier.

matthughes avatar matthughes commented on July 23, 2024

Maybe, I'm wrong, but I believe the OP is just referring to supporting this: http://logstash.net/docs/1.4.2/codecs/json. Logstash forwarder supports it and I don't see any reference to it in their source code. Basically given a JSON log 'line' of:

{
  foo: "bar",
  bin: "baz",
  message: "this is the message"
}

should produce a source of:

{
  "_index": "logstash-2015.01.16",
  "_type": "...",
  "_id": "AUrz8DM4tItrB37ovWRo",
  "_score": 1,
  "_source": {
    "host": "...",
    "message": "this is the message",
    "foo": "bar",
    "bin": "baz"
    "offset": 3288,
    "path": "/opt/app/logs/log.json",
    "type": "devlog",
    "@version": "1",
    "@timestamp": "2015-01-16T18:10:12.716Z"
  }
}

Instead log-courier does this, squashing the JSON entry into the message field:

{
  "_index": "logstash-2015.01.16",
  "_type": "...",
  "_id": "AUrz8DM4tItrB37ovWRo",
  "_score": 1,
  "_source": {
    "host": "...",
    "message": "{ \"message\": \"this is the message\", \"foo\": \"bar\", \"bin\": \"baz\"}"
    "offset": 3288,
    "path": "/opt/app/logs/log.json",
    "type": "devlog",
    "@version": "1",
    "@timestamp": "2015-01-16T18:10:12.716Z"
  }
}

All I do to configure this with LSF is:

input {
  lumberjack {
    codec => "json"
  }
}

output {
   elasticsearch {
      host => "localhost"
      protocol => "transport"
   }
}

I don't need any filters. Again, I can't even tell if LSF is doing anything special to support this; certainly don't see any references to json codec in their codebase. I tried both json and json_lines but both just embed the whole JSON structure inside message.

from log-courier.

matthughes avatar matthughes commented on July 23, 2024

This is my workaround for the time being:

input {
  lumberjack {
    port => 9000
    codec => "json"
  }

  courier {
    port => 9001
    codec => "json"
  }
}

filter {
  if [shipper] == "lc-1.3" {
      json {
        source => "message"
      }
  }
}

I have my clients declare a 'shipper' field of lc-1.3. That way if we ever get a JsonCodec, I can just change that value to new version and won't double parse the json in the future.

from log-courier.

driskell avatar driskell commented on July 23, 2024

I'm going to look at this again.

The issue is LSF does not stream logs to codec properly. So if the codec is say, multi line, it corrupts entries by mixing entries from different clients together. JSON would work OK - the problem is the streaming codecs like multi line. I just do not want to implement something that is inherently broken, even if it works "sometimes", and that's why I removed it when I forked.

I can see TCP input has a real working implementation where streams are handled correctly. I'll use that as a reference point. The internal queue in the courier plugin looks to be the only barrier but I'll know more once I can sit down and have another think.

It would definitely be useful to support codecs as now logstash 1.5 makes installing third party plugins and codecs so easy it'll be silly not to allow one to take advantage of them.

from log-courier.

jenshz avatar jenshz commented on July 23, 2024

+1

from log-courier.

driskell avatar driskell commented on July 23, 2024

Things work great with JSON etc so to resolve this ticket would be feasible. However, it would then mean the "multiline" codec could be used. This is where things gets complicated.

If there's a multiline event that consists of 11 lines, and only 10 lines are received so far, but not the final 11th line - the data cannot be acknowledged, otherwise if we lose connection or logstash crashes, the chunk is lost a lone-wolf single line event appears (I see this as corruption).

Overall this means some heavy work to the acknowledgement code. I've done some initial work in a feature branch to allow the codecs, it's just missing the heavy work on acknowledgements.

from log-courier.

driskell avatar driskell commented on July 23, 2024

Further thoughts:

What if a codec is added that performs other types of modifications, such as filters events, or combines them in an arbitrary fashion... such codecs would be completely impossible to handle with acknowledgements without the codec being aware we are aiming for guaranteed delivery.

As such, it may be the PR #95 could be all we need for now - but we're back in the realm of some codecs will just act strangely and break.

Proposal: I will hardcode that only plain, json, and other tested codecs will be allowed, throwing configuration error otherwise.

Interestingly, I noticed someone started looking at guaranteed pipeline in Logstash before I did:
catalyst@5b9d27b
Further work there could mean codecs themselves tag as "I support guaranteed delivery" - and it means that events aren't acknowledged until they reach elasticsearch ... definitely the path to go.

from log-courier.

landonix avatar landonix commented on July 23, 2024

+1 for json codec support, filters are not needed

from log-courier.

padusumilli avatar padusumilli commented on July 23, 2024

+1 for json codec support

from log-courier.

Balanc3 avatar Balanc3 commented on July 23, 2024

+1

from log-courier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.