Giter Site home page Giter Site logo

bitkeks / python-netflow-v9-softflowd Goto Github PK

View Code? Open in Web Editor NEW
109.0 10.0 55.0 191 KB

PyPI "netflow" package. NetFlow v9 parser, collector and analyzer implemented in Python 3. Developed and tested with softflowd

Home Page: https://bitkeks.eu/blog/2016/08/collecting-netflow-v9-on-openwrt.html

License: MIT License

Python 100.00%
netflow netflow-v9-parser netflow-exports softflowd netflow-v9 python3 analyzer

python-netflow-v9-softflowd's Introduction

Python NetFlow/IPFIX library

This package contains libraries and tools for NetFlow versions 1, 5 and 9, and IPFIX. It is available on PyPI as "netflow".

Version 9 is the first NetFlow version using templates. Templates make dynamically sized and configured NetFlow data flowsets possible, which makes the collector's job harder. The library provides the netflow.parse_packet() function as the main API point (see below). By importing netflow.v1, netflow.v5 or netflow.v9 you have direct access to the respective parsing objects, but at the beginning you probably will have more success by running the reference collector (example below) and look into its code. IPFIX (IP Flow Information Export) is based on NetFlow v9 and standardized by the IETF. All related classes are contained in netflow.ipfix.

Data flow diagram

Copyright 2016-2023 Dominik Pataky [email protected]

Licensed under MIT License. See LICENSE.

Using the library

If you chose to use the classes provided by this library directly, here's an example for a NetFlow v5 export packet:

  1. Create a collector which listens for exported packets on some UDP port. It should then receive UDP packets from exporters.
  2. Inside the UDP packets, the NetFlow payload is contained. For NetFlow v5 it should begin with bytes 0005 for example.
  3. Call the netflow.parse_packet() function with the payload as first argument (takes string, bytes string and hex'd bytes).

Example UDP collector server (receiving exports on port 2055):

import netflow
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(("0.0.0.0", 2055))
payload, client = sock.recvfrom(4096)  # experimental, tested with 1464 bytes
p = netflow.parse_packet(payload)  # Test result: <ExportPacket v5 with 30 records>
print(p.header.version)  # Test result: 5

Or from hex dump:

import netflow
p = netflow.parse_packet("00050003000379a35e80c58622a...")  # see test_netflow.py
assert p.header.version == 5  # NetFlow v5 packet
assert p.flows[0].PROTO == 1  # ICMP flow

In NetFlow v9 and IPFIX, templates are used instead of a fixed set of fields (like PROTO). See collector.py on how to handle these. You must store received templates in between exports and pass them to the parser when new packets arrive. Not storing the templates will always result in parsing failures.

Using the collector and analyzer

Since v0.9.0 the netflow library also includes reference implementations of a collector and an analyzer as CLI tools. These can be used on the CLI with python3 -m netflow.collector and python3 -m netflow.analyzer. Use the -h flag to receive the respective help output with all provided CLI flags.

Example: to start the collector run python3 -m netflow.collector -p 9000 -D. This will start a collector instance at port 9000 in debug mode. Point your flow exporter to this port on your host and after some time the first ExportPackets should appear (the flows need to expire first). After you collected some data, the collector exports them into GZIP files, simply named <timestamp>.gz (or the filename you specified with --file/-o).

To analyze the saved traffic, run python3 -m netflow.analyzer -f <gzip file>. The output will look similar to the following snippet, with resolved hostnames and services, transferred bytes and connection duration:

2017-10-28 23:17.01: SSH     | 4.25M    | 15:27 min | local-2 (<IPv4>) to local-1 (<IPv4>)
2017-10-28 23:17.01: SSH     | 4.29M    | 16:22 min | remote-1 (<IPv4>) to local-2 (<IPv4>)
2017-10-28 23:19.01: HTTP    | 22.79M   | 47:32 min | uwstream3.somafm.com (173...) to local-1 (<IPv4>)
2017-10-28 23:22.01: HTTPS   | 1.21M    | 3 sec     | fra16s12-in-x0e.1e100.net (2a00:..) to local-1 (<IPv6>)
2017-10-28 23:23.01: SSH     | 93.79M   | 21 sec    | remote-1 (<IPv4>) to local-2 (<IPv4>)
2017-10-28 23:51.01: SSH     | 14.08M   | 1:23.09 hours | remote-1 (<IPv4>) to local-2 (<IPv4>)

Please note that the collector and analyzer are experimental reference implementations. Do not rely on them in production monitoring use cases! In any case I recommend looking into the netflow/collector.py and netflow/analyzer.py scripts for customization. Feel free to use the code and extend it in your own tool set - that's what the MIT license is for!

Resources

Development environment

The library was specifically written in combination with NetFlow exports from Hitoshi Irino's fork of softflowd (v1.0.0) - it should work with every correct NetFlow/IPFIX implementation though. If you stumble upon new custom template fields please let me know, they will make a fine addition to the netflow.v9.V9_FIELD_TYPES collection.

Running and creating tests

The test files contain tests for all use cases in the library, based on real softflowd export packets. Whenever softflowd is referenced, a compiled version of softflowd 1.0.0 is meant, which is probably NOT the one in your distribution's package. During the development of this library, two ways of gathering these hex dumps were used. First, the tcpdump/Wireshark export way:

  1. Run tcpdump/Wireshark on your public-facing interface (with tcpdump, save the pcap to disk).
  2. Produce some sample flows, e.g. surf the web and refresh your mail client. With Wireshark, save the captured packets to disk.
  3. Run tcpdump/Wireshark again on a local interface.
  4. Run softflowd with the -r <pcap_file> flag. softflowd reads the captured traffic, produces the flows and exports them. Use the interface you are capturing packets on to send the exports to. E.g. capture on the localhost interface (with -i lo or on loopback) and then let softflowd export to 127.0.0.1:1337.
  5. Examine the captured traffic. Use Wireshark and set the CFLOW "decode as" dissector on the export packets (e.g. based on the port). The data fields should then be shown correctly as Netflow payload.
  6. Extract this payload as hex stream. Anonymize the IP addresses with a hex editor if necessary. A recommended hex editor is bless.

Second, a Docker way:

  1. Run a softflowd daemon in the background inside a Docker container, listening on eth0 and exporting to e.g. 172.17.0.1:1337.
  2. On your host start Wireshark to listen on the Docker bridge.
  3. Create some traffic from inside the container.
  4. Check the softflow daemon with softflowctl dump-flows.
  5. If you have some flows shown to you, export them with softflowctl expire-all.
  6. Your Wireshark should have picked up the epxort packets (it does not matter if there's a port unreachable error).
  7. Set the decoder for the packets to CFLOW and copy the hex value from the NetFlow packet.

Your exported hex string should begin with 0001, 0005, 0009 or 000a, depending on the version.

The collector is run in a background thread. The difference in transmission speed from the exporting client can lead to different results, possibly caused by race conditions during the usage of the GZIP output file.

python-netflow-v9-softflowd's People

Contributors

bitkeks avatar cooox avatar deeso avatar gitoldgrumpy avatar grafolean avatar j-licht avatar kaysiz avatar pr0ps avatar randerzander avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-netflow-v9-softflowd's Issues

Performance improvements

Hi @bitkeks,

A bug report agains my app prompted me to upgrade this project to a newer version (thank you for adding support for V9 Options!). While upgrading, I have noticed that I have quite a few performance improvements in a separate branch that I have never made a PR for. Checking it, I saw that the changes are still relevant, so I took the liberty of opening a PR. I hope you find it useful.

The changes solve the performance issues I have encountered when running Grafolean NetFlow bot (which uses this package) on a very busy network. I have used PyFlame to generate flame charts, then analysed the parts where the app seemed to spend most time in. The last version of this branch all but removed the bottleneck. Unfortunately, this was happening over a year ago, and I no longer have access to the testing system so I was unable to test these improvements again. However looking at the code I believe the effect should still be the same.

I am certain that the code could be nicer and that even further optimizations could be achieved, so please see this PR as a set of ideas. Feel free to steal whatever you find valuable and disregard the rest. :)

Kind regards,

Anze

Incorrect clearing of Enterprise flag bit

It looks to me that the enterprise flag bit is incorrectly cleared. Looking at line 957 in ipfix.py it clears the 7th bit when according to the spec the enterprise flag is the 15th bit.

The flag is first checked on line 953 using the 7th bit because it is working on a byte, it then unpacks a short moving the flag to the 15th bit.

Error while handling packets with no template

Greetings.
I used a mikrotik generated netflow and found there is a problem when the full template is not send with every packet.

i.e

Listening on interface :2055
Starting the NetFlow listener
Received data from 192.168.255.254, length 204
Processed ExportPacket with 2 flows.
Received data from 192.168.255.254, length 1268
Exception happened during processing of request from ('192.168.255.254', 2055)
Traceback (most recent call last):
  File "/usr/lib/python3.4/socketserver.py", line 305, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.4/socketserver.py", line 331, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.4/socketserver.py", line 344, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.4/socketserver.py", line 673, in __init__
    self.handle()
  File "main.py", line 72, in handle
    export = ExportPacket(data, self.TEMPLATES)
  File "/home/stelios/projects/netflow/python-netflow-v9-softflowd/src/netflow/collector_v9.py", line 266, in __init__
    dfs = DataFlowSet(data[offset:], self.templates)
  File "/home/stelios/projects/netflow/python-netflow-v9-softflowd/src/netflow/collector_v9.py", line 146, in __init__
    fkey = field_types[field.field_type]
KeyError: 225
Received data from 192.168.255.254, length 204
Processed ExportPacket with 2 flows.

The first packet contains a complete template flow for two templates (with id;s 256 and 257)
256 is for ipv4 traffic and id 257 is for IPV6.
Since the IPV6 has different fields when a data packet arrives for IPV4 it does not have the fields.

According to the specs, the sender does not have to send the entire template everytime and the recipient has to cache the template and use it, till it expires or its resend.
softflowd sends only one template with id 1024 so probably this is why you haven't seen it.

i.e

https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html

unpack requires a buffer of 4 bytes

Hello I used the collector with 2 mikrotik routers. The router1 has little traffic, about 150 connections and it worked very well,

but the router2 has enough traffic about 11200 connections, here I had the following exception

Exception happened during processing of request from ('192.168.19.250', 9000)
Traceback (most recent call last):
  File "/usr/lib/python3.6/socketserver.py", line 317, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.6/socketserver.py", line 348, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.6/socketserver.py", line 361, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.6/socketserver.py", line 721, in __init__
    self.handle()
  File "main.py", line 74, in handle
    export = ExportPacket(data, self.templates)
  File "/home/kukulcan/Documentos/github/python-netflow-v9-softflowd/src/netflow/collector_v9.py", line 316, in __init__
    tfs = TemplateFlowSet(data[offset:])
  File "/home/kukulcan/Documentos/github/python-netflow-v9-softflowd/src/netflow/collector_v9.py", line 270, in __init__
    field_type, field_length = struct.unpack('!HH', data[offset:offset+4])
struct.error: unpack requires a buffer of 4 bytes

Any idea how to increase the buffer?

Thanks.

IPFIX: check if handling of signed values is needed

The bytes parser in IPFIX packets uses struct.unpack for each field, based on the corresponding field length. But this implementation ignores signed/unsigned fields, defaulting to unsigned. Since there is at least one signed32 in the fields dict, this check must be implemented. Tests should cover this case as well.

IPFIX

Thanks for this great project. Any plans for supporting IPFIX?

Analyzer error

Hi @bitkeks ,
Thank for your app. I try it today, capture some packet from a ASR 1002 router netflow v9 export.
But when I analyzer .gz file, it's show error:
(p3venv) [vuht@dashboard python-netflow-v9-softflowd]$ python analyzer.py -f 1581912445.gz
Traceback (most recent call last):
File "analyzer.py", line 215, in
for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]):
File "analyzer.py", line 215, in
for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]):
KeyError: 'FIRST_SWITCHED'

Incorrect IPFIX TemplateField type checking

Hi,

not completely sure if actually an issue, but it seems there is bug within the datatype-check of the IPFIX TemplateFields in lines 769-785 of ipfix.py. The datatype check in the respective lines 772, 775, and 777 are not checking for the actual datatype (i.e., string) of the IPFIX TemplateField but for its name (i.e., applicationGroupName).

I think to fix this, line 742 should be changed to

discovered_fields.append((field_type.name, field_type_id, datatype))

Accordingly line 769 should be changed to

for index, ((field_type_name, field_type_id, field_datatype), value) in enumerate(zip(discovered_fields, pack)):

and the aforementioned lines 772, 775, and 777 should check field_datatype instead of field_type_name.

Hope this description of the problem helps.

Best :)

analyze_json.py - error

Hi,

Could you please to fix this issue?

python3 analyze_json.py 1534395555.json 
Traceback (most recent call last):
  File "analyze_json.py", line 127, in <module>
    data = json.loads(fh.read())
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

v9 template example?

I I cannot create a template to receive v9 packages. Are there any examples I missed?

Failed to decode a v9/IPFIX ExportPacket

Hi
i want collect and analysis ipfix flow ,and i found your awesome python code .
but when i start "python3 -m netflow.collector -p 9000 -D"
i get this message "Failed to decode a v9/IPFIX ExportPacket - will re-attempt when a new template is discovered"
is there any documentation for cli ?

Add tests for netflow v5

Right now, tests are only implemented and tested for netflow v9. They should be extended with tests for netflow v5. For further infos on how to create test cases, see the section Running and creating tests in the README.

Fails to run on python 3.7 due to lru_cache syntax

Both pip install 0.10.2 and from git clone fails to run python3 -m netflow.collector on python 3.7 with exception

$ python -m netflow.collector -h
Traceback (most recent call last):
File "C:\Users\dhristov\AppData\Local\Continuum\anaconda3\envs\py37\lib\runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "C:\Users\dhristov\AppData\Local\Continuum\anaconda3\envs\py37\lib\runpy.py", line 109, in get_module_details
import(pkg_name)
File "C:\techno\git\python-netflow-v9-softflowd\netflow_init
.py", line 10, in
from .utils import parse_packet
File "C:\techno\git\python-netflow-v9-softflowd\netflow\utils.py", line 16, in
from .ipfix import IPFIXExportPacket
File "C:\techno\git\python-netflow-v9-softflowd\netflow\ipfix.py", line 526, in
class IPFIXDataTypes:
File "C:\techno\git\python-netflow-v9-softflowd\netflow\ipfix.py", line 559, in IPFIXDataTypes
def by_name(cls, key: str) -> Optional[DataType]:
File "C:\Users\dhristov\AppData\Local\Continuum\anaconda3\envs\py37\lib\functools.py", line 477, in lru_cache
raise TypeError('Expected maxsize to be an integer or None')
TypeError: Expected maxsize to be an integer or None

The root cause is lru_cache before 3.8 has to have at least to be "lru_cache()".

Maybe should bump up the dependency to 3.8 from 3.5 in the setup.py

Analyzer issue

Hi there!
I've got an issue while using the analyzer. I've collected some data by collector, which gain data from cisco router. When I try to analyze these data by analyzer I've got nothing. Nothing is happening. When I used analyzer with -v parameter, I got some info like this:
image
Do anyone have any idea how to resolve this?

FIRST_SWITCHED and LAST_SWITCHED keys are missing in parsed packet

I have softflowd (softflowd-1.0.0) running in my pfsense box with "Flow Tracking Level" set to Full and the "Netflow Version" set to 9. When I use nfcapd to capture packets and inspect them using nfdump, I see expected results. An example flow record is shown below.

Flow Record: 
  Flags        =              0x06 FLOW, Unsampled
  label        =            <none>
  export sysid =                 1
  size         =                80
  first        =        1587416220 [2020-04-20 16:57:00]
  last         =        1587416220 [2020-04-20 16:57:00]
  msec_first   =               557
  msec_last    =               711
  src addr     =     HIDDEN_WAN_IP
  dst addr     =           1.1.1.1
  src port     =             12118
  dst port     =               853
  fwd status   =                 0
  tcp flags    =              0x1b ...AP.SF
  proto        =                 6 TCP
  (src)tos     =                 0
  (in)packets  =                12
  (in)bytes    =              1044
  input        =                 1
  output       =                 1
  ip router    =       192.168.1.1
  engine type  =                 0
  engine ID    =                 0
  received at  =     1587416521844 [2020-04-20 17:02:01.844]

However, when running the collector and analyzer with the same softflowd settings, I am getting an error:

$ python3 -m netflow.analyzer -f 1587416506.gz
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/siam/Projects/python-netflow/venv/lib/python3.8/site-packages/netflow/analyzer.py", line 261, in <module>
    for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]):
  File "/home/siam/Projects/python-netflow/venv/lib/python3.8/site-packages/netflow/analyzer.py", line 261, in <lambda>
    for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]):
KeyError: 'FIRST_SWITCHED'

Inspecting an element in the flows list in analyzer.py, the collected flows are missing keys (see below). The UNKNOWN_FIELD_TYPE may be one of either FIRST_SWITCHED or LAST_SWITCHED

{'INPUT_SNMP': 1, 'IN_BYTES': 1480, 'IN_PKTS': 9, 'IPV4_DST_ADDR': '199.197.246.60', 'IPV4_SRC_ADDR': 'WAN_IP', 'IP_PROTOCOL_VERSION': 4, 'L4_DST_PORT': 443, 'L4_SRC_PORT': 28453, 'NF_F_FLOW_CREATE_TIME_MSEC': 1587416629854, 'OUTPUT_SNMP': 1, 'PROTOCOL': 6, 'SRC_TOS': 0, 'TCP_FLAGS': 26, 'UNKNOWN_FIELD_TYPE': 1587416630141}

Since nfcapd is capturing the FIRST_SWITCHED and LAST_SWITCHED fields and this library isn't, could there be an issue with parsing somewhere? I have not debugged with a raw hex dump, but can if you want me to.

KeyError: 'IP_PROTOCOL_VERSION'

Get some kind of parsing error:

python3 analyze_json.py 1519108140.json
Traceback (most recent call last):
File "analyze_json.py", line 133, in
con = Connection(pending, flow)
File "analyze_json.py", line 48, in init
ips = getIPs(src)
File "analyze_json.py", line 22, in getIPs
if flow['IP_PROTOCOL_VERSION'] == 4:
KeyError: 'IP_PROTOCOL_VERSION'

Example from json-file:

{"IPV4_SRC_PREFIX": 0, "LAST_SWITCHED": 2397666550, "L4_SRC_PORT": 47597, "PROTOCOL": 6, "IN_BYTES": 436, "TCP_FLAGS": 25, "SRC_MASK": 0, "INPUT_SNMP": 52, "IPV4_DST_ADDR": 2249824527, "IPV4_SRC_ADDR": 2249835426, "FLOW_SAMPLER_ID": 7, "FIRST_SWITCHED": 2397666400, "DST_AS": 0, "DIRECTION": 1, "OUTPUT_SNMP": 52, "IPV4_NEXT_HOP": 2887713825, "SRC_AS": 0, "IN_PKTS": 1, "DST_MASK": 0, "L4_DST_PORT": 80, "SRC_TOS": 0}

Debug from main.py looks "fine":

Received data from x.y.z.n, length 372
Processed ExportPacket with 6 flows.
...

Output is from a Cisco 2T-sup and is working in pmacct etc.

// David

Implement options templates/data records

As noted in #29, options templates and options data records are missing a correct implementation. The current workaround is a for flowsets with ID 1, which is always the Options Template Flowset. The mixing might lead to errors, therefore correct handling is needed

cc @j-licht

No Export packets appear

Hi @bitkeks ,
First, thank you so much for this app.
I met issues, when run it on my system with debug mode. I config from a router ASR 1002 and Vmware Distributed Switch to this one, but after 10 minutes, no export packets appear on the screen.
I saw many connections from the exporters with tcpdump.

Fixing tests

Hi @bitkeks!

First of all, thank you for this app, it works like a charm!

I am trying to do some performance optimizations on packet parsing (would be happy to create a PR later if you are interested and if I'm successful of course).

However while trying to fix this, I also tried running the tests.py to make sure I will be getting the same results, but stumbled across some trouble as 5 out of 6 tests failed to run, because tests were using the wrong index (p[1] instead of p[2]) when addressing the records. I assume client was added between timestamp and records somewhere along the line. I took the liberty of creating a PR for this.

Just a side note: test_analyzer still fails on my machine (Python 3.6) because it uses capture_output parameter to subprocess.run(), which was added in Python 3.7. But I assume that is not a problem for most users, and it doesn't affect me since I don't use the analyzer.

While on the subject, IIUC the decoded packets are not compared to some known representation in the tests? I am asking because I would like to avoid regressions when doing performance optimizations. Does adding this to the tests sound like a good idea to you, or does something like this even maybe already exist?

collection from softflow fails

Hi,

after installing and running the collector, I get the following error:

Exception happened during processing of request from (, 55757)
Traceback (most recent call last):
File "/usr/lib/python3.5/socketserver.py", line 313, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 341, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.5/socketserver.py", line 681, in init
self.handle()
File "main.py", line 72, in handle
export = ExportPacket(data, self.TEMPLATES)
File "/root/netflow/src/netflow/collector_v9.py", line 312, in init
tfs = TemplateFlowSet(data[offset:])
File "/root/netflow/src/netflow/collector_v9.py", line 266, in init
field_type, field_length = struct.unpack('!HH', data[offset:offset+4])
struct.error: unpack requires a bytes object of length 4

Any idea what could be the issue?

Add TCP to collector for IPFIX

In IPFIX, exporters and collectors can/should use TCP or SCTP for their connection instead of UDP. At least TCP should be added to the collector in this repo.

Related to #20 IPFIX

Add sqlite3 as alternative storage

Until now a custom gzip storage format was used for the collector. This approach is based on the first implementation where a simple JSON dict was exported to a file. In the future, sqlite3 might bring more stability to persistent storage.

Note: Postgres is also a candidate, but I do not want to introduce external dependencies (psychopg2, sqlalchemy) just yet. Better focus on small footprint and stability.

Template is not detected

Hi,

I wanted to report an issue that I encountered. I am using flowd from mindrot.org to send the netflow data to my python script.
Unfortunately the Netflow implementation is not getting the template.

I checked the packets that are send by flowd and sometimes it send the information containing the template but the netflow implementation doesn't handle it. For reference I included the output and the template that is send by flowid 0.

NetFlow v9 packet detected, but no templates dict was passed! For correct parsing of packets with templates, create a 'templates' dict and pass it into the 'parse_packet' function.
(0, 172, b'\x01\x00\x00\x14\x00\x08\x00\x04\x00\x0c\x00\x04\x00\x0f\x00\x04\x00\n\x00\x02\x00\x0e\x00\x02\x00\x02\x00\x04\x00\x01\x00\x04\x00\x18\x00\x04\x00\x17\x00\x04\x00\x16\x00\x04\x00\x15\x00\x04\x00\x07\x00\x02\x00\x0b\x00\x02\x00\x06\x00\x01\x00\x04\x00\x01\x00\x05\x00\x01\x00\x10\x00\x04\x00\x11\x00\x04\x00\t\x00\x01\x00\r\x00\x01\x01\x03\x00\x14\x00\x1b\x00\x10\x00\x1c\x00\x10\x00>\x00\x10\x00\n\x00\x02\x00\x0e\x00\x02\x00\x02\x00\x04\x00\x01\x00\x04\x00\x18\x00\x04\x00\x17\x00\x04\x00\x16\x00\x04\x00\x15\x00\x04\x00\x07\x00\x02\x00\x0b\x00\x02\x00\x06\x00\x01\x00\x04\x00\x01\x00\x05\x00\x01\x00\x10\x00\x04\x00\x11\x00\x04\x00\t\x00\x01\x00\r\x00\x01')
(1, 0, 0, 20, 0, 8, 0, 4, 0, 12, 0, 4, 0, 15, 0, 4, 0, 10, 0, 2, 0, 14, 0, 2, 0, 2, 0, 4, 0, 1, 0, 4, 0, 24, 0, 4, 0, 23, 0, 4, 0, 22, 0, 4, 0, 21, 0, 4, 0, 7, 0, 2, 0, 11, 0, 2, 0, 6, 0, 1, 0, 4, 0, 1, 0, 5, 0, 1, 0, 16, 0, 4, 0, 17, 0, 4, 0, 9, 0, 1, 0, 13, 0, 1, 1, 3, 0, 20, 0, 27, 0, 16, 0, 28, 0, 16, 0, 62, 0, 16, 0, 10, 0, 2, 0, 14, 0, 2, 0, 2, 0, 4, 0, 1, 0, 4, 0, 24, 0, 4, 0, 23, 0, 4, 0, 22, 0, 4, 0, 21, 0, 4, 0, 7, 0, 2, 0, 11, 0, 2, 0, 6, 0, 1, 0, 4, 0, 1, 0, 5, 0, 1, 0, 16, 0, 4, 0, 17, 0, 4, 0, 9, 0, 1, 0, 13, 0, 1)
NetFlow v9 packet detected, but no templates dict was passed! For correct parsing of packets with templates, create a 'templates' dict and pass it into the 'parse_packet' function.

The rest is send by flowid = 256.

I wanted to define the template manually but I don't know how I could do it.

If someone has an idea on how to fix it. Please let me know.

Thank you in advance
Mike

Performance tests

Tests which cover functionality and correct parsing of packets exist. But what about performance? The tests should also cover some performance measurements to detect optimizable or even leaking code fragments.

Investigate softflowd small packets

During development and testing I came about a situation where neighbor solicitation flows would cause softflowd to go crazy. Scenario: something in the network causes NS packets every 5-30 seconds. EACH of these flows will be picked up by softflowd as TWO flows (neighbor solicitation and neighbor advertisement). Then these flows will be exported. And for some reason, softflowd exports them in a single UDP packet each. Result: softflowd also captures each of these UDP packets as one flow, causing exponential packet generation.

This causes the collector to amout a massive stream of incoming packets, eventually even going OOM.

will re-attempt when a new template is discovered

Good day, i was trying to collect some netflow v9 statistics, but when i try to collect them using the main.py script, i got the following error:

@usflmianexaut01:~/netflow/netflowv9/python-netflow-v9-softflowd$ python3 main.py -p 9996 -D
Starting the NetFlow listener on 0.0.0.0:9996
Received 128 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 160 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 92 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 160 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 160 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 264 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 128 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 160 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 296 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
Received 264 bytes of data from ('172.240.245.2', 57071)
Failed to decode a v9 ExportPacket - will re-attempt when a new template is discovered
^CShutting down the NetFlow listener
Received KeyboardInterrupt, passing through
@usflmianexaut01:~/netflow/netflowv9/python-netflow-v9-softflowd$

I dont know how to change this new template thing, is that something that needs to be changed on the router or on the server itself?. Thanks for your kind help

version 0.12.1 causes "TypeError: unsupported operand type(s) for |: 'type' and 'type' "

Hi
I have a nightly process that installs the latest netflow version and then run some functionality.
I noticed that yesterday a new version was released. Today, my nightly process installed the latest version (0.12.1) and started my script. I got the following exception:

  File "/tmp/my_script.py", line 12, in <module>
    import netflow
  File "/usr/local/lib/python3.8/dist-packages/netflow/__init__.py", line 10, in <module>
    from .utils import parse_packet
  File "/usr/local/lib/python3.8/dist-packages/netflow/utils.py", line 13, in <module>
    from .ipfix import IPFIXExportPacket
  File "/usr/local/lib/python3.8/dist-packages/netflow/ipfix.py", line 935, in <module>
    class IPFIXExportPacket:
  File "/usr/local/lib/python3.8/dist-packages/netflow/ipfix.py", line 974, in IPFIXExportPacket
    def flows(self) -> list[IPFIXTemplateRecord | IPFIXOptionsTemplateRecord | IPFIXDataRecord]:
TypeError: unsupported operand type(s) for |: 'type' and 'type'

The quick fix I made was using the previous version 0.11.3.

I noticed this recent commit that should fix the issue, but there is no new version released. Is there an upcoming release planned?

to ntop?

and how do i get the data into ntop?

Infinite loop when parsing packet

I found a scenario when

def parse_packet(data: Union[str, bytes], templates: Dict = None) -> Union[V1ExportPacket, V5ExportPacket,
leads to an infinite loop. The example I provide here is an edge-case scenario when some parts of the packet are filled with zero bytes, probably because of packet loss.

The bug is in the constructor of V9ExportPacket, specifically, the infinite loop occurs because the function stuck in the while loop

while offset != len(data):
Below is an example for such a packet, an explanation of why the loop never finishes, and a proposed solution:

The packet (UDP data):

b'\x00\t\x00\x05{r\xe80b\x0bq}x\xcf4\xe3\x01\x02\x00\x00\x01\x04\x00H\x00\x00\x00\x00\x00\x00\x00n\x00\x00\x00\x01\x01\x00\x00\x00\x00\n \x07j\x06\x06\\\x08\x00\x00\rk\x15\xc8\x00\x00\x00\x0b{r\xe4H{r\xe4H\x08\x00\x00\x00\x00\x00\x00\x048\xbfd\x01\xc7e\xad\x1e\rk\x15\xc8\x00\x00\x00\x00\x00\x01\x04\x00H\x00\x00\x00\x00\x00\x00\x00g\x00\x00\x00\x01\x11\x00\x00\xc9Q\xac\x18\x0b\x03\x06\x06\\\x08\x005\x01\x00\x00\x01\x00\x00\x00\x0b{r\xe4H{r\xe4H\x00\x00\x00\x00\x00\x00\x00\x04C\x17{\x01\xc7e\xad\xa5\x01\x00\x00\x01\xc3\x9c\x005\x00\x01\x04\x00H\x00\x00\x00\x00\x00\x00\x00J\x00\x00\x00\x01\x01\x00\x00\x00\x00\xac\x19\xbc2\x06\x06\\\x08\x00\x00(pH\xcd\x00\x00\x00\x0b{r\xe80{r\xe80\x08\x00\x00\x00\x00\x00\x00\x04\x8f\x07\x1a\x01\xc7e\xaeB(pH\xcd\x00\x00\x00\x00\x00\x01\x04\x00H\x00\x00\x00\x00\x00\x00\x00F\x00\x00\x00\x01\x06\x00\x02\xcb\xef\n0h\x1f\x06\x06\\\x08\x01\xbb\x14*I\x18\x00\x00\x00\x0b{r\xe80{r\xe80\x00\x00\x00\x00\x00\x00\x00\x04\x80\x1cx\x01\xc7e\xae\x1d\x14*I\x18Z+\x01\xbb\x00\x01\x04\x00H\x00\x00\x00\x00\x00\x00\x00F\x00\x00\x00\x01\x06\x00\x02\xfe\x80\n/`\x12\x06\x06\\\x08\x01\xbb\x14*I\x18\x00\x00\x00\x0b{r\xe80{r\xe80\x00\x00\x00\x00\x00\x00\x00\x04\x08\x06\xb0\x01\xc7e\xae(\x14*I\x18\xd4\xb5\x01\xbb\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

A Demonstration of the Infinite Loop

Most of this packet consists of zero bytes. It's a V9ExportPacket

class V9ExportPacket:
(starts with \x00\t) At the start of the constructor, the V9Header is calculated. The length is constant in the code so the offset
offset = self.header.length
starts with 20. If we follow the code by using the above payload we can see that pack
pack = struct.unpack('!HH', data[offset:offset + 4])
is always equal to (260, 72). So flowset_id
flowset_id = pack[0] # = template id
equals 260 and the following piece of code is executed:

# Data / option flowsets
# First, check if template is known
if flowset_id not in self._templates:
    # Could not be parsed, continue to check for templates
    skipped_flowsets_offsets.append(offset)
    offset += flowset_length
    continue

Specifically, 260 doesn't appear in self._templates as this key is missing from V9_FIELD_TYPES

V9_FIELD_TYPES = {
which means that
offset += flowset_length
is executed, so offset is increased by 72. This is performed 5 times, so eventually offset equals 20 + 5 * 72 = 380. Starting from this offset all remaining bytes of data are zeros (\x00). So
pack = struct.unpack('!HH', data[offset:offset + 4])
is (0, 0) and the following piece of code is executed:

# Data template flowsets
if flowset_id == 0:  # TemplateFlowSet always have id 0
    tfs = V9TemplateFlowSet(data[offset:])
    # Update the templates with the provided templates, even if they are the same
    for id_, template in tfs.templates.items():
        if id_ not in self._templates:
            self._new_templates = True
        self._templates[id_] = template
    offset += tfs.length
    continue

The constructor of V9TemplateFlowSet also calculates pack and sets length to pack[1] which is 0. The offset here starts from 4. Since the condition

while offset < self.length:
is false the loop doesn't even start, so we get V9TemplateFlowSet with empty templates. Hence the loop over templates
for id_, template in tfs.templates.items():
is skipped,
offset += tfs.length
is executed and offset remains 380. The loop continues and since offset never grows, it will repeat infinitely.

Proposed Solution

The solution I propose is to add a check right after

offset += tfs.length
to ensure that the offset grew. If it remains the same, break the loop. We can also add a safety that checks data[offset:] right after the main loop condition
while offset != len(data):
condition, and breaks if set(data[offset:]) == {0} (rest of data is all zeros).

Would like to get your opinion about the bug and the solution I propose.

Find a better way to pair two flows into one connection

In the analyzer, two flows from the same pair of hosts are matched with each other. Then one of the hosts is determined as the source, the other as the destination. This is currently done by looking at the size of the flows, and earlier versions used the lower port to determine which host was the destination (e.g. expecting a port like 80 to be a destination and 33251 to be the client).

# Assume the size that sent the most data is the source
# TODO: this might not always be right, maybe use earlier timestamp?
size1 = fallback(flow1, ['IN_BYTES', 'IN_OCTETS'])
size2 = fallback(flow2, ['IN_BYTES', 'IN_OCTETS'])
if size1 >= size2:
src = flow1
dest = flow2
else:
src = flow2
dest = flow1
# TODO: this next approach uses the lower port as the service identifier
# port1 = fallback(flow1, ['L4_SRC_PORT', 'SRC_PORT'])
# port2 = fallback(flow2, ['L4_SRC_PORT', 'SRC_PORT'])
#
# src = flow1
# dest = flow2
# if port1 > port2:
# src = flow2
# dest = flow1

Maybe timestamps could solve this issue, since the initiating flow must have an earlier timestamp than the responding flow. In early tests, this failed due to equal timestamps, but the research was not completed.

Add tests for netflow v1

Right now, tests are only implemented and tested for netflow v9. They should be extended with tests for netflow v1. For further infos on how to create test cases, see the section Running and creating tests in the README.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.