ciscocsirt / netsarlacc Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 5.0 158 KB

License: BSD 3-Clause "New" or "Revised" License

Go 99.02% Makefile 0.98%

netsarlacc's People

Contributors

Stargazers

Watchers

Forkers

btlyons1 mikedawg hardware-forest-utopia siddhartharao17 aka0

netsarlacc's Issues

Code should accept a path via cmdline / config and not assume working in '.'

Right now all of the file based operations (logging, PID file, daemonization) all assume paths are relative to '.' which isn't adequate for production. The daemonization code can't chdir to / right now for this reason.

It should be possible to specify a pidfile path and a logging directory path and a template file path. The code should also be able to continue functioning even after a chdir /

init scripts

It would be nice to provide init scripts / service files for the various common distributions.

Explore making protocol handlers more modular

Right now the HTTP (and to a lesser extend SMTP) protocol handling is just built straight into the code and isn't modular at all. It would be nice if all the stuff needed to handle a protocol were put together into a file with a clearly defined API to implement reading and interacting and logging of the activity.

The main benefit of this would be at a code organization level. It might also lower the bar to adding support for more interactive protocols or extending the interactivity of an existing protocol like SMTP.

Add generic "listen" protocol.

It would be great if netsarlacc had a protocol handler that just listened for whatever the client sends it and records that. Maybe listen for a second or so before closeing the connection. This could give people some flexibility to handle other protocols where the client says something first without actually having to handle any of the protocol details.

Need to audit usage of time for timezone correctness

We haven't been careful about timezones. I've only been testing on a machine running in UTC so any timezone bugs would not have shown up.

Ultimately the code should log in UTC by default but support a local timezone with a cmdline flag / config file option.

Moar, better, updated docs, readme, examples

We gotta get our readme and other docs up-to-date. We should add an authors file and figure out a license too. Examples of how to run the sinkhole and how to tune a system to make it perform well should be completed too.

Consider migrating from text/template to html/template to prevent XSS

To quote the go documentation https://golang.org/pkg/text/template/ "To generate HTML output, see package html/template, which has the same interface as this package but automatically secures HTML output against certain attacks."

Get socket handles and then drop privs by passing handles through daemon

It is possible to bind to the sockets and then drop privileges and start a daemon and pass it the file handles. This would fit in well with the existing daemonization code.

IPv6 support

netsarlacc should be able to handle IPv6 in the same way it handles v4.

Optional headers like User-Agent and Referer end up with empty cells in the error page

A client doesn't have to provide a Referer or User-Agent header. We have both of these fields in a table in the template. When a client doesn't provide them the value cell in the table is empty. It would be nice if there was a conditional check that could prevent the whole table row from being printed.

Test cases

Write test cases

Unit test
Benchmarking

Start using branches

To avoid any issues, we should start using branches and submitting pull request instead of committing to the master branch.

Consider decoupling reading from socket and the parsing, logging of HTTP, and sending responses

Right now each worker does a read on the socket to get the HTTP request, parses the request, and then builds a response to send back.

Building the response is very CPU intensive so we don't want too many workers going in parallel or everything slows down. Testing shows that maximum performance is achieved roughly when the number of workers matches the number of physical CPU cores.

However, reading on the socket can tie a worker up doing nothing for however long the client-read-timeout is (by default 300 ms). This means with just a small number of bogus requests at a time, all the workers can be tied up for 300 ms. It's easy to starve legitimate requests by tying up workers in this way.

One possible solution is to have a very large pool of workers that just read from the sockets and then a smaller set of workers that parse the requests and respond. The easiest thing to do here is to take out the reading code and put it in its own worker pool and then feed the read results to the existing (now modified) workers.

So instead of ACCEPT -> WORKER we'd have ACCEPT -> READ -> WORKER where there are many more routines implementing READ. The more routines that are dedicated to just reads, the harder it is to tie up all of the reading routines waiting for a timeout.

Move the stopping of the logger into the logger.go file

The logger should know how to stop itself instead of relying on netsarlacc.go to do a lot of the stopping and error checking work itself.

Parameterize and CONSTize many magic values / strings / paths

There are lots of names, values, strings, and path info that's hardcoded all over the place. We should bring these out into variables and then hopefully expose them via a configuration file.

Consider restructuring how logging is done in the worker

Right now the way a connection is handled by a worker is a header struct is filled out with information about the request. Assuming the request goes well, that header struct is then encoded into a LoggedRequest object and returned with a nil error:

validConnLogging := LoggedRequest{Timestamp: time.Now().UTC().String(), Header: req_header, SourceIP: sourceIP, SourcePort: sourcePort, Destination: allHeaders["host"], EncodedConn: raw}
return validConnLogging, nil

If something goes wrong, we instead fill out an empty LoggedRequest and sends the error:

return LoggedRequest{}, err

The trouble with this is that in (almost?) all cases we want to log at least some basic information about the client that caused the error. I think we should change the Header struct to a information log struct where we store all the information we can log about a client. As soon as we get any bytes from the client we can put them in the raw_data field in the information struct. The same goes for when we learn the client IP and port. Then either on success or error we encode what we've recorded in the information struct into a LoggedRequest.

We could create a function that takes an information log struct and reads all the non-nil fields and fills out and returns a LoggedRequest struct. Then returning on error would look more like:

return BuildLogRequest(client_info), err

And success would look pretty much the same:

return BuildLogRequest(client_info), nil

This also gives us the opportunity to set the error message in the JSON we log. A log could look like:

{"error":"true", "error_message":"Request header failed regex validation", "src_ip":"1.2.3.4", "src_port":"5678", "raw_data":"476554202f746869735f4745545f7761735f47655420485454502f312e300d0a0d0a"}

Factor out template code and try to re-use some work between invocations

The template code goes and fetches the template file every time. The template fetching and filling out should be moved into its own function and that function should try to re-use work so that it doesn't have to fetch the file every time.

Support a configuration file to specify logging dir, ports, and TLS certs

Right now we have a number of things hardcoded. It would be nice to create a configuration struct that specifies where we log, what ports we listen on, what protocols, and the TLS certs for any TLS-wrapped protocols. Then we could have a JSON configuration file to populate this struct.

Move global configuration vars to a config struct

There are many global configuration variables just mixed into the netsarlacc namespace. It would be nice to throw these into a struct and then maybe pass a reference to the struct around instead of accessing them as global vars across files.

Some JSON fields contain sub fields

Right now src_ip and dest_ip contain sub fields:

"src_ip":{"IP":"127.0.0.1","Port":54660,"Zone":""}
"dest_ip":"127.0.0.1:3333"

We should have a src_ip, src_port, and dst_ip, dst_port. We may actually consider not having dst_ip at all since it'll be the same for the sinkhole. Instead we should probably include a sinkhole instance name / ID in the json so that if we're running more than one sinkhole we can tell them apart in the logs.

dest_name is the "Host:" header provided by the client

Get code ready for additional protocols

Right now the sinkhole is built assuming HTTP only but people may want to add other protocols like SMTP, POP, IRC, etc. The code should at least be ready for these additional protocols to be added without having to restructure too much code.

Current prints to stdout needs to be cleaned up

Right now there are multiple prints that go to stdout which are fine for testing but won't work for production. We should cleanup the output. If we still need output for debugging purposes we should support a debug (or verbose) flag and send the output to stderr instead.

"Snap" log file names to 10 minute boundaries

Right now log file names are based on the moment the sinkhole was started and rotate every 10 minutes after that. This produces names like:

sinkhole-2017-05-30-22-14-43.log

Instead we should snap logging into files that fall on 10 minute boundaries for the hour like

12:00:00
12:10:00
12:20:00
....

This will require a minor re-work of the timer to instead be a loop checking time.Now() and doing some basic rounding / modular arithmetic. Then we could just drop seconds from the filename altogether. The code protected by the mutex that closes the old file and opens a new one doesn't really have to change, just the name creation code needs to change.

JSON error output doesn't contain any information about time or client

When a client sends bogus data and generates an error nothing about the client or the time is written:

2017/03/03 22:29:18 {"raw_data":"476554202f746869735f4745545f7761735f47655420485454502f312e300d0a0d0a"}

Sinkhole instance name / ID

As mentioned in issue #2 , we should add a sinkhole instance name/ID instead of the dst IP.

Possibily record TLS SNI if the client uses it

Go seems to support SNI https://golang.org/pkg/crypto/tls/#ClientHelloInfo

If the client uses a server name for TLS it would be nice to record it in the logs.

Fatal errors should (try to) gracefully shut down sinkhole

We have a number of places in the code where we check for an error that shouldn't ever happen and if the error does happen, it's not clear that the code can actually recover. One such example is in opening / closing log files in the logger code. If an error happens on open or write or close we're pretty much hosed and we should do our best to just shut everything down after we've sent some details about the error to syslog.

I'm not sure yet of the best way to signal all the goroutines to gracefully stop but we need to look into it.

Error reading from connection assumes IO timeout

Before reading from a connection we set a read deadline. Then when we try to read we check for an error and if an error happened we report it with fmt.Println("Error reading:", err.Error())

But then later we try to write to the socket to tell the client it was an io timeout: work.Connection.Write([]byte("Error I/O timeout. \n")) however there are other reasons this could error. For example the client could close the connection before we call read. If we try to write to a socket that is in an error state we could just compound the trouble. Instead we should just move on without trying to send anything to the client.

Parsing of headers inefficient, includes separating space in value

The way header parsing is done right now isn't particularly efficient and it's a bet error-prone. By splitting on ":" it leads to possibly many fields if the user-controlled value contains ":". Also, the space after the header ":" is not part of the value yet the current way they're parse leads to the space being included as part of the value. Example:

<td rowspan="1" colspan="1"> curl/7.52.1</td>

Add build versioning / version number

Right now there is no "version" or build date or any other information integrated into the binary when built. It would be a nice to, at a minimum, integrate a version number. A build date would be nice too. The git commit hash, build host, and build user are also options.

Possible ways to do this:
https://www.atatus.com/blog/golang-auto-build-versioning/
https://stackoverflow.com/questions/11354518/golang-application-auto-build-versioning

Needs a daemon mode for running unattended

We need to be able to daemonize and interact cleanly with init scripts or similar. This also means we need a way of gracefully stopping the daemon non-interactively. Presumably via a single handler but there may be a better "Go" way.

Collapse the starting of the ReadWorkers and normal workers into one start function

These two functions are basically identical but use different channels. Instead make a meta function that we can use to pass the channels to as arguments. This eliminates code duplication and any chance the two functions could accidentally diverge from each other.

Move the worker stopping code into the dispatcher file

The dispatcher.go file handles starting workers. It should also handle more of the stopping of the workers instead of relying on netsarlacc.go to fully track the stopping. This will probably mean moving the stop channels over to it too.

All logging should be done via a single goroutine

Right now any worker that produces logs calls straight into the logging routines. This opens up the possibility for race conditions, especially during log rotation. We should create a logging goroutine and pass all log contents over a channel to eliminate any chance of races.

Possibly record version of TLS negoiated

It would be nice to record the TLS version used by clients.

Worker pool

Rework the worker pool design

Logging is prepending a timestamp on our JSON output

Example:

2017/03/02 22:27:30 {"timestamp":"2017-03-02 22:27:30.00813237 +0000 UTC","bytes_client":"144","http_method":"GET","url_path":"/test/path/here.txt?crap","http_version":"HTTP/1.1","http_user_agent":"curl/7.52.1","dest_name":"127.0.0.1:3333","http_referer":"\u003cscript\u003ealert(\"pwned\");\u003c/script\u003e","src_ip":{"IP":"127.0.0.1","Port":58328,"Zone":""},"dest_ip":"127.0.0.1:3333","raw_data":"474554202f746573742f706174682f686572652e7478743f6372617020485454502f312e310d0a486f73743a203132372e302e302e313a333333330d0a557365722d4167656e743a206375726c2f372e35322e310d0a4163636570743a202a2f2a0d0a526566657265723a203c7363726970743e616c657274282270776e656422293b3c2f7363726970743e0d0a0d0a"}

And:

2017/03/03 22:29:18 {"raw_data":"476554202f746869735f4745545f7761735f47655420485454502f312e300d0a0d0a"}

Maybe we should just roll our own logging library? That would eliminate the dependency on an external lib like Lumberjack.

No TLS support yet

The code doesn't currently support TLS. We probably should support listening on multiple different sockets with a flag specifying which sockets are TLS. That way we can listen on 80, 443, 8000, 8080, an 8443.

Look into increasing TCP backlock on listening socket

Go doesn't seem to provide a way to control the TCP backlog setting on a socket but if we create the socket with syscalls we should be able to.

Rework socket read() calls to do more than one

Right now sockets only get one read() call:

                                // Make enough space to recieve client bytes                                                                  
                                read.Buffer = make([]byte, 8192)

                                err = read.Conn.SetReadDeadline(time.Now().Add(time.Millisecond * time.Duration(*ClientReadTimeout)))

But for large requests that don't all fit into one packet, sometimes the kernel will make a subset of the data available on the socket and not all of it will come back in the single read. This causes parsing to see a truncated request.

It seems like a good option here would be to call read() with an initial timeout and then exponentially decrease the timeout value for subsequent calls to read() until either a minimum is reached (maybe 50ms) or a timeout occurs because no more data is available.

The danger here is letting a client trickle data to the server so that the read loop never ends. An absolute cap on total read time needs to be set. The current ClientReadTimeout could be re-worked to be a maximum time to spend reading and any time left over from the first read could be used for a second, third, fourth, etc.

Use advisory locking via flock() on log files

Multiple copies of netsarlacc may stomp on each other's logs. We should use advisory locking via flock() to watch out for this.