Giter Site home page Giter Site logo

Comments (36)

mkedwards avatar mkedwards commented on August 19, 2024

For discussion's sake, this is the bit of code that could be reworked to use a JsonGenerator instead of an ObjectWriter:

 86     private String render(LogEvent event, ObjectNode srcRootNode) {
 87 
 88         // Resolve node and bail-out if empty.
 89         JsonNode dstRootNode = resolveNode(event, srcRootNode);
 90         if (dstRootNode.isNull()) {
 91             return "{}";
 92         }
 93 
 94         // Serialize resolved node.
 95         try {
 96             ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
 97             try (OutputStreamWriter writer = new OutputStreamWriter(outputStream, JSON_CHARSET)) {
 98                 objectWriter.writeValue(writer, dstRootNode);
 99                 writer.write(System.lineSeparator());
100             }
101             return outputStream.toString(JSON_CHARSET_NAME);
102         } catch (IOException error) {
103             throw new RuntimeException("failed serializing JSON", error);
104         }
105 
106     }

It would be ideal not to have to create the dstRootNode intermediate object at all. As a halfway step to that, I think it would be useful to subclass the MultiformatMessage interface (or add an interface mixin or something) to add a streamMessageTo(JsonGenerator out) method. Pass it an instance buffer of TokenBuffer, and then use objectMapper.readTree(buffer.asParser()) to get a JsonNode that can be inserted into dstRootNode.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Note that streaming events into a JsonGenerator implementation is also a good way to serialize to XML. See FasterXML/jackson-dataformat-xml#216 for details of the setup (which the Message doesn't need to care about; it's the Layout's job to prepare the outer envelope).

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

@mkedwards, really appreciated the detailed feedback and hard work (incl. #18)! Please, keep them coming! (Apologies for my slow pace. Could not take much time off from my day job due to upcoming season in e-commerce business.)

In addition to things you mentioned, in its current state, resolveNode() is working more like an interpreter rather than a compiler. That is, at each invocation it repeats many if/else branching on the templateRootNode tree, whereas they could have checked once and compiled the appropriate Function<LogEvent, JsonNode> function to avoid the repeating branching.

But before diving into nitty gritty details of all this optimization tricks, I first want to build a harness to enable us to quantify the performance impact of changes. For this purpose, I will add a JMH benchmark for measuring the rendering performance. Give me 1-2 days to push it to the master.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

TODOs:

  • Accept ObjectMapper as a parameter in LogstashLayout.Builder.
  • Use Jackson Afterburner module.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Adding Afterburner is not improving anything -- makes sense, we are not doing any POJO serialization. Long live benchmarks!

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

I love what you're doing, and there's no need to apologize for the pace. We're going through the open-sourcing motions here, should have something out that uses the Streamable API shortly.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Public now: https://github.com/nexiahome/formatted_data_message/tree/logstash-streamable

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

I think fish tagging and using JsonGenerator are two separate concerns, let me address them individually.

Fish Tagging

Trying to push a streamable LogEvent API to the upstream (that is, Log4j 2.X master) is quite some challenge. Further, due to the consumer agnostic nature of streaming, our root visitor will need to pass over each emitted item (exception, log level, message, etc.), even though it might not necessarily be interested in particular items.

In the commit 591ed46 that I pushed to master, I traverse the template JSON object once and compile a function that accesses only to the fields of interest. AFAIC, this approach delivers the same efficiency goal the fish tagging aims and avoids unnecessary visits to fields that are not of interest.

Using JsonGenerator

I spent quite some time to get JsonGenerator working, though failed each time due to the following short-coming: there is no clean way to exclude empty/null fields in JsonGenerator, AFAIK. You can see my work in json-generator branch (221e68c). Layout works, but fails to clean-up null fields.

Consider the following case: Given {"exception": {"stacktrace": "${json:exceptionStackTrace}"}} template, how would you know if you need to start exception object without knowing if stacktrace is available in the LogEvent, it will not be excluded due to an exclusion pattern, etc.? In json-generator branch, see the output of the failing test.

One important remark: Replacing ObjectMapper with JsonGenerator raised the bar from 29k log/sec to 37k log/sec on a single thread!

Conclusion

At this stage, plugin delivers 29k log/sec. (Note that these are full blown LogEvents, that is, they contain MDC, NDC, and all the exception whistles and bells. In reality, the majority of the log messages are way more lightweight.) I can employ ThreadLocal<ByteArrayOutputStream> in TemplateRenderer, use TransactionalByteArrayOutputStream for the JsonGenerator short-coming aforementioned, etc. But I have my doubts about the functionality we will get in return compared to the complexity we will be introducing to a simple, maintainable code base. What do you think? What is the use case that would necessitate a throughput more than 29k log/sec?

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Distributed trace/profiling infrastructure, à la Google Dapper https://ai.google/research/pubs/pub36356 . A colleague and I built something similar a few years ago for a JavaScript/Python/C++ ETL/datastore/query architecture, with always-on tracing down to tasks with a granularity less than a millisecond, at an overhead of (depending on how it was measured) 1/2 to 2 percent. We then fed the trace data into an instance of our own product and used it to drive performance work. I find myself needing to do something similar again, this time in a JS/Ruby/Java world. The last time around we paid a lot of attention to minimizing allocation churn in the logging and thread-context maintenance pathways, and that paid off in minimal overhead. Don't see why it'd be any different this time. 😎

I'll take a look at the null-handling issues you've encountered, and see if I can propose something constructive. We've got one strategy for that built into the lazy-evaluation/memoization wrapper we whipped up, which isn't yet hardened for production but sort of illustrates the idea. We have some work to do around conditional logging too. And we're using this in a Netty-based event-driven system, so context agility is very important. (Probably needs a custom ContextDataInjector.)

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

The other thing that we found to be very important in the previous project was readability/familiarity of the syntactic sugar at the log site. In C++ that was handled with macro/template magic to interpolate tagged values into a template string, with automagical fallbacks for untagged parameters, all wrapped up in a syntax identical to the Google-logging library we were already using. We'll want to do something similar in Java-space, probably involving horrible generic/reflection machinery. I'm not that clever at Java 8+ yet, but I fear that's going to have to change.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Fair enough. I am working on a solution.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Very cool! Thank you, and please let me know how I can help. I will be baking this into our Netty-based system soon -- we're currently sorting out thread affinity stuff -- and should be able to produce a relevant parallel benchmark that spews per-message log traffic into kafka for later aggregation into per-device traffic stats.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Done. In json-generator branch, see the benchmark results reported in README.md. I would really appreciate an in-detail review of the commit. If you approve, I will merge it to master and put the ribbon for a new release. For the records, I have GiST'ed the output of LogstashLayoutBenchmark on my machine.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Fantastic. I'll give it a good focused review tomorrow, probably with Eric at my elbow, and bake it into our soon-to-go-to-production Netty-based system where we can get some real-world metrics.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

I haven't forgotten about this. Getting to it now. Just had to rework a library under our Netty thing first, so I could get a high-volume logger where I wanted it.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Just finished wrapping up the GC-free version of the layout. I increased the performance up to 1,109,501 msg/sec with 0.645 KB/sec alloc rate. I am busy with spitting out a plot after the JMH benchmark runs.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

@mkedwards, @emschwar, I am done with my changes. Would you mind taking a look when you have time, please?

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Excellent. I picked up your garbage-free layout change earlier today and have just started testing with it. I'll rebase on your latest push and give it a real review as well as testing.

By the way, have you thought about reloading the config file when it changes, the way that log4j2 itself does? I've gotten partway there by changing the template loading to use ConfigurationSource, which gives me a file to watch (and gives classloader-dependent classpath search consistent with log4j2's). Now I just need to figure out what to do when we notice a change.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

@mkedwards, hmmm..., no, we do not listen configuration changes. Let me check that. But please do review the rest at your convenience.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

I'm having trouble combining the KafkaAppender and LogstashLayout. That used to work, but it regressed recently. Any idea offhand why?

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

To be precise, I'm using this branch of our fork of log4j2-logstash-layout: https://github.com/nexiahome/log4j2-logstash-layout/tree/json-generator

and this branch of our streamable message class: https://github.com/nexiahome/formatted_data_message/tree/logstash-streamable

and when I run mvn test in formatted_data_message, which logs some JSON-formatted messages to both console and kafka, the console version looks fine and the kafka version has no message body. Something about the LogstashLayout being attached to the KafkaAppender doesn't work quite right.

I'll try to reproduce this without our forkage, but if you have any insight into where I might look, that would be quite helpful.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

For context, here's what I'm currently getting out of the JSON formatter:

{
  "@version" : 1,
  "@timestamp" : "2018-11-07T11:49:27.774-08:00",
  "source_host" : "BitPushers-MacBook-Pro-4.local",
  "level" : "INFO",
  "logger_name" : "org.apache.logging.log4j.message.lazy.FormattedDataMessageTest",
  "thread_name" : "main",
  "message" : {
    "type" : "some_message_type",
    "id" : "some_message_id",
    "message" : "This is an exception message. a=aVal b=bVal",
    "template" : "This is an exception message. a=%(a) b=%(b)",
    "a" : "aVal",
    "b" : "bVal",
    "c" : "cVal"
  },
  "class" : "org.apache.logging.log4j.message.lazy.FormattedDataMessageTest",
  "method" : "throwerLogs",
  "file" : "FormattedDataMessageTest.java",
  "line_number" : 134,
  "exception" : {
    "exception_class" : "java.lang.RuntimeException",
    "exception_message" : "Nolan Ryan pitch",
    "stacktrace" : "java.lang.RuntimeException: Nolan Ryan pitch\n\tat org.apache.logging.log4j.message.lazy.FormattedDataMessageTest.throwerLogs(FormattedDataMessageTest.java:132)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:515)\n\tat org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:115)\n\tat org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:171)\n\tat org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:72)\n\tat org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:167)\n\tat org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:114)\n\tat org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:59)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:105)\n\tat org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:72)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:95)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:71)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1540)\n\tat org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:110)\n\tat org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:72)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:95)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:71)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1540)\n\tat org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:110)\n\tat org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:72)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:95)\n\tat org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:71)\n\tat org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)\n\tat org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)\n\tat org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)\n\tat org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)\n\tat org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)\n\tat org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)\n\tat org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)\n\tat org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)\n\tat org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:142)\n\tat org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:117)\n\tat org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)\n\tat org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)\n\tat org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)\n\tat org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)\n"
  }
}
{
  "@version" : 1,
  "@timestamp" : "2018-11-07T11:49:28.150-08:00",
  "source_host" : "BitPushers-MacBook-Pro-4.local",
  "level" : "INFO",
  "logger_name" : "org.apache.logging.log4j.message.lazy.FormattedDataMessageTest",
  "thread_name" : "main",
  "message" : {
    "type" : "a_message_type",
    "id" : "a_message_id",
    "message" : "This is a message. a=aVal b=bVal",
    "template" : "This is a message. a=%(a) b=%(b)",
    "a" : "aVal",
    "b" : "bVal",
    "c" : "cVal"
  },
  "class" : "org.apache.logging.log4j.message.lazy.FormattedDataMessageTest",
  "method" : "loggerLogs",
  "file" : "FormattedDataMessageTest.java",
  "line_number" : 125
}

Our goal is to be able to stream structured logging messages into kafka (or similar plumbing) and ingest them from there into a mostly column-oriented datastore. Entries in the column-oriented datastore don't keep any of the non-timestamp high-cardinality columns, such as message.message or exception.stacktrace. When you've got to see those columns, you go get the raw log message, looking up from a hash over a few columns that get you a reasonable degree of uniqueness (@timestamp, source_host, maybe line_number) into a document store. But for most kinds of analysis, it's the message.id and the structured message parameters that you want to see, along with @timestamp and source_host and context parameters. (That's where the ContextDataInjector comes in; we haven't written that part yet, but I hope to get to it soon, once we've got the ingestion part up and running.)

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Okay, so what's going on here is that the Kafka appender wants toByteArray(). You probably want to inherit from AbstractStringLayout instead of Layout<String> to get the fast conversion from the String returned by toSerializable() to a byte[]. We've done this in our fork; I'll disentangle that from our version/artifactory noise and send you a separate PR.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

I had deliberately switched from AbstractStringLayout to Layout<String> to have a gc-free rendering. AbstractStringLayout requires a String toSerializable(LogEvent) method, which, as you can guess, spoils the entire parade of tricks for gc-free plumbing. (How would you create a String without generating garbage?) I am curious why KafkaAppender requires the layout to extend from AbstractStringLayout.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

It doesn't require it to extend from there; that just seemed like the smoothest way to get toByteArray(). Tell me more about how gc-free rendering works, and let's figure out how to get KafkaAppender to use that path.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Disclaimer: Following is my interpretation of the Log4j 2 source code. I am neither a Log4j 2 expert, nor committer, but just someone with quite some experience above the average.

Every layout is expected to implement org.apache.logging.log4j.core.Layout<T extends Serializable>, which extends Encoder<LogEvent>. There, the javadoc states the following: Since 2.6, Layouts can encode (Encoder#encode(Object, ByteBufferDestination)) a LogEvent directly to a ByteBufferDestination without creating temporary intermediary objects. That is, implementing Layout<String> in LogstashLayout is 1) sufficient for an interface compliant Log4j 2 layout and 2) provides access to garbage-free layout rendering via Encoder#encode().

Assuming you are using the default KafkaAppender packaged with Log4j 2, I am really puzzled why it stopped working.

(Almost) all the default Log4j 2 layouts (JSONLayout, XMLLayout, etc.) extend from AbstractStringLayout, which implements StringLayout, which extends Layout<String> with an extra Charset getCharset() method. AbstractStringLayout contains the base set of utilities to ease String rendering, though these are not really usable for a gc-free LogstashLayout, hence I implemented Layout<String> directly myself. What I can further try is implementing StringLayout and see if that helps. I first need to reproduce the KafkaAppender problem locally. Need to get a Kafka Docker image, etc.

I will keep you updated.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Sorry, totally my mistake. It should have been fixed with 72db860 that I've just pushed. Would you mind testing further, please? I really would like to ship a release for this branch.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Thanks; this looks like a clean, efficient way to provide the needed interface. I'll apply our local changes (mostly just the Streamable interface) on top of it and verify.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

For the records, (as of date) KafkaAppender is not garbage-free, hence it requires a handle to the byte[] rather than implementing ByteBufferDestination.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Maybe we can fix that in KafkaAppender?

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Well... That depends:

  • If you want to have a garbage-free KafkaAppender, that will probably necessitate a garbage-free Kafka client too. And sorry, I will pass that.
  • If you just want to make the KafkaAppender work with ByteBufferDestinations, that should be doable. But I don't really have any experience with Kafka Java client and hence don't know what such a feature will entail.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

For what it's worth, I don't think it would be terribly difficult to extend the Kafka producer code to implement ByteBufferDestination. There are half a dozen layers of plumbing on the way down to https://github.com/apache/kafka/blob/1.1.1/clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java#L175, but they all basically treat byte[] value as opaque. It would be pretty straightforward to wrap LogstashLayout.encode(LogEvent source, ByteBufferDestination destination) in a lambda that captures this and source and adapts ByteBufferDestination to java.io.DataOutputStream. The result would be assignable to java.util.function.Consumer<java.io.DataOutputStream>. Plumb that through KafkaAppender to KafkaProducer and down to DefaultRecord, and we're done.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Verified that KafkaAppender works fine after rebasing. Can't yet say I've fully reviewed the code; blocking out time on Tuesday for that.

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

@mkedwards, did you have chance to test in your side?

from log4j2-logstash-layout.

vy avatar vy commented on August 19, 2024

Released v0.15. Let's address further discussion in another ticket.

from log4j2-logstash-layout.

mkedwards avatar mkedwards commented on August 19, 2024

Sorry I haven't updated this one in a bit. We're running happily in QA with the code state at https://github.com/nexiahome/log4j2-logstash-layout/tree/eb35c1c208cf534bf493cc25946c700262f6ed2d , which is based on bbda80a . I'll rebase on v0.15 as released in advance of our canary in production (planned for Monday).

from log4j2-logstash-layout.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.