Giter Site home page Giter Site logo

apache-log-parser's Introduction

Apache Log Aggregating Client

This application is meant to emulate a client that a user would install to provide observability metrics for their apache server.

How it works

There is one main routine and two subroutines.

The main routine bootstraps an input and output channel, then passes these to the channels that will each respectively handle this I/O.

The input channel is handled by a simple routine that calls an OSS library "github.com/gocarina/gocsv" to serialize the data.

The output routine takes the real-time data and minor calculations done by the main routine and handles additional processing (like interval calculations) and prints any alarms triggered.

The primary reason for this split is to avoid blocking on I/O as much as possible. However, the output routine is probably unnecessary and could be done in the main thread. I found the extra output routine made the code more complex than I wanted it to be (especially with the need to add an extra wait group just for this purpose), but it was fun to debug and would be interesting to try in production code to see if there's any difference in performance, so I left it in.

Once both channels are setup, I run the main loops which schedules the printing of alarms and interval stats. After the CSV file is completely read, I flush any data remaining, wait for the output thread to finish processing it, and then exit.

How to install

  • Install go.
  • run go get

I ran this on the following go version, but if you have an older one installed, you can update the go.mod file to use something prior to go 15.

$ go version
> go version go1.15.8 darwin/amd64

How to run application

Running is simple. Here's how to run with example parameters. The input file defaults to "input_files/sample_csv.txt"

go run main.go -interval=10 -window-retention=290 -alarm-threshold=10 -input-filepath=<your-filepath>

How run tests

go test ./...

Caveats

Interval Printing

Interval printing in a way violates some best practices in GO around sending data instead of sharing memory. The idea is that we should send messages in between processes to avoid data corruption due to race conditions. In my case, I am copying a slice of the length of the interval supplied by the user and sending it over to be processed for output. This slice contains shallow copies (i.e. pointers) instead of being a complete copy.

That being said, the buffer is treated as read-only, and this is a standard pattern in producer consumer models like kafka, which I'm trying to emulate. The window is large enough so that slices being processed should never be overwritten, since the main go routing blocks until the output process is done with the previous slice.

Potential Improvements

Output Buffer

At the moment, I'm just printing everything to standard out. This is fine because in production, most of the time we just redirect standard out to the buffer of our choosing. However, I configued the code to use a writer interface, so that if we wanted to replace it with a different write buffer in the future it would be straightfoward.

2min warning

I left the 2 min warning as unconfigurable. It might be nice to allow users to configure it as they see fit... (e.g. 15 minutes). This would be useful when people has a bursty traffic profile and want to amortize over a longer period.

apache-log-parser's People

Contributors

hardboiled avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.