Comments (12)
Multiline codec is now deprecated in newer Logstash so this makes codec support way simpler and more reliable.
I'm also looking at JSON decoding in the shipper.
from log-courier.
Hi @bcicen
Log Courier produces structured data out of the log file. It takes the line, the host, the path, and any additional fields, to generate the event. So the resulting event transmitting to Logstash already has JSON structure to it.
The best way at the moment to decode and "merge" is to use a filter on Logstash side to decode the line field. I don't plan to allow codecs in the plugin as those generally expect single line data and not structured data.
I am considering adding a JSON codec to Log Courier in the future, to do this on the client side which will save a filter on the indexers. It's a really small resource gain compared to the multiline and filter codec, however, as json decoding in logstash is very fast indeed.
Jason
from log-courier.
+1
from log-courier.
Maybe, I'm wrong, but I believe the OP is just referring to supporting this: http://logstash.net/docs/1.4.2/codecs/json. Logstash forwarder supports it and I don't see any reference to it in their source code. Basically given a JSON log 'line' of:
{
foo: "bar",
bin: "baz",
message: "this is the message"
}
should produce a source of:
{
"_index": "logstash-2015.01.16",
"_type": "...",
"_id": "AUrz8DM4tItrB37ovWRo",
"_score": 1,
"_source": {
"host": "...",
"message": "this is the message",
"foo": "bar",
"bin": "baz"
"offset": 3288,
"path": "/opt/app/logs/log.json",
"type": "devlog",
"@version": "1",
"@timestamp": "2015-01-16T18:10:12.716Z"
}
}
Instead log-courier does this, squashing the JSON entry into the message field:
{
"_index": "logstash-2015.01.16",
"_type": "...",
"_id": "AUrz8DM4tItrB37ovWRo",
"_score": 1,
"_source": {
"host": "...",
"message": "{ \"message\": \"this is the message\", \"foo\": \"bar\", \"bin\": \"baz\"}"
"offset": 3288,
"path": "/opt/app/logs/log.json",
"type": "devlog",
"@version": "1",
"@timestamp": "2015-01-16T18:10:12.716Z"
}
}
All I do to configure this with LSF is:
input {
lumberjack {
codec => "json"
}
}
output {
elasticsearch {
host => "localhost"
protocol => "transport"
}
}
I don't need any filters. Again, I can't even tell if LSF is doing anything special to support this; certainly don't see any references to json codec in their codebase. I tried both json and json_lines but both just embed the whole JSON structure inside message.
from log-courier.
This is my workaround for the time being:
input {
lumberjack {
port => 9000
codec => "json"
}
courier {
port => 9001
codec => "json"
}
}
filter {
if [shipper] == "lc-1.3" {
json {
source => "message"
}
}
}
I have my clients declare a 'shipper' field of lc-1.3. That way if we ever get a JsonCodec, I can just change that value to new version and won't double parse the json in the future.
from log-courier.
I'm going to look at this again.
The issue is LSF does not stream logs to codec properly. So if the codec is say, multi line, it corrupts entries by mixing entries from different clients together. JSON would work OK - the problem is the streaming codecs like multi line. I just do not want to implement something that is inherently broken, even if it works "sometimes", and that's why I removed it when I forked.
I can see TCP input has a real working implementation where streams are handled correctly. I'll use that as a reference point. The internal queue in the courier plugin looks to be the only barrier but I'll know more once I can sit down and have another think.
It would definitely be useful to support codecs as now logstash 1.5 makes installing third party plugins and codecs so easy it'll be silly not to allow one to take advantage of them.
from log-courier.
+1
from log-courier.
Things work great with JSON etc so to resolve this ticket would be feasible. However, it would then mean the "multiline" codec could be used. This is where things gets complicated.
If there's a multiline event that consists of 11 lines, and only 10 lines are received so far, but not the final 11th line - the data cannot be acknowledged, otherwise if we lose connection or logstash crashes, the chunk is lost a lone-wolf single line event appears (I see this as corruption).
Overall this means some heavy work to the acknowledgement code. I've done some initial work in a feature branch to allow the codecs, it's just missing the heavy work on acknowledgements.
from log-courier.
Further thoughts:
What if a codec is added that performs other types of modifications, such as filters events, or combines them in an arbitrary fashion... such codecs would be completely impossible to handle with acknowledgements without the codec being aware we are aiming for guaranteed delivery.
As such, it may be the PR #95 could be all we need for now - but we're back in the realm of some codecs will just act strangely and break.
Proposal: I will hardcode that only plain, json, and other tested codecs will be allowed, throwing configuration error otherwise.
Interestingly, I noticed someone started looking at guaranteed pipeline in Logstash before I did:
catalyst@5b9d27b
Further work there could mean codecs themselves tag as "I support guaranteed delivery" - and it means that events aren't acknowledged until they reach elasticsearch ... definitely the path to go.
from log-courier.
+1 for json codec support, filters are not needed
from log-courier.
+1 for json codec support
from log-courier.
+1
from log-courier.
Related Issues (20)
- Duplicates observed when log-courier configuration file is overwritten HOT 3
- Undocumented behavior for paths matched by multiple fileglobs HOT 1
- log-courier can't seem to handle `~` for home in certain contexts. HOT 2
- error in logstash 7 HOT 4
- log-courier and logstash > 7.4.0? HOT 10
- Throughput question HOT 15
- does the includes configuration still work correctly? HOT 3
- Hold time setting not closing files properly
- syslog "progname" uses a full path rather than a basename HOT 1
- Information about payloads / different debug level ? HOT 3
- error while compiling code in the command line using MK in windows for Z wave HOT 1
- version.rb missing from the plugin package
- log-courier admin socket stuck HOT 2
- lc-admin does not default to tcp transport HOT 2
- order of files in lc-admin UI
- Logstash configuration auto reload is blocked by log-courier input HOT 2
- PPA packaging is using deprecated compatibility level 9 HOT 1
- 2.10.0 packages missing from PPA (Ubuntu)
- Receiver reload can cause panic
- TCP streaming receiver aborts if too many events received
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from log-courier.