Giter Site home page Giter Site logo

corelight / pycommunityid Goto Github PK

View Code? Open in Web Editor NEW
24.0 7.0 9.0 50 KB

A Python implementation of the Community ID flow hashing standard

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
flow-hashing network-monitoring network-security network-security-monitoring community-id

pycommunityid's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycommunityid's Issues

additional request-reply pair ICMP Extended Echo

At https://tools.ietf.org/html/rfc8335 an additional request-reply pair is documented.
ICMP field Type: Extended Echo Request. The value for ICMPv4 is 42. The value for ICMPv6 is 160
ICMP field Type: Extended Echo Reply. The value for ICMPv4 is 43. The value for ICMPv6 is 161

Since that wasn’t in the version 1 table, hashes have been using the code octet instead of the request-response peer type value.

FlowTupleError port invalid for specific ports

Hi, I started to experiment with community ID and pycommunityid and I think that I found a bug in function in_nbo():

def in_nbo(self):
"""
Returns a copy of this tuple where the addresses and port are
rendered into NBO byte strings.
"""
saddr = self._addr_to_nbo(self.saddr)
daddr = self._addr_to_nbo(self.daddr)
if isinstance(self.sport, int):
sport = struct.pack('!H', self.sport)
else:
sport = self.sport
if isinstance(self.dport, int):
dport = struct.pack('!H', self.dport)
else:
dport = self.dport
return FlowTuple(self.proto, saddr, daddr, sport, dport, self.is_one_way)

  • The problem is with the creation of the new FlowTuple at the end of function,
  • exception will occur if sport or dport is from range 11569 - 11577:
    communityid.error.FlowTupleError: Destination port "b'-1'" invalid

You can test it with your sample application:
$ community-id tcp 10.0.0.1 10.0.0.2 10 11569

Number 11569 in hex is 0x2D31 and that is '-1' in ASCII. I think the problem is with this line in function is_port(val):

port = int(val)

  • bytes value of number 11569 is decoded as -1 and that is wrong
  • I think other big port numbers can be problematic because they can be represented as ASCII characters too
    • port with number 14392 is 0x3838 in hex and 88 in ASCII

I hope somebody will check this bug and will found a solution.

Community id generated using pycommunity id mismatch the one generated using suricata

Issue

I have a pcap, when i run suricata on it, it produces flows with cids
when I run zeek on it, and generate the cid of each zeek flow using pycommunityid library, some flows don't have the same cids produced by suricata


Steps to reproduce

here's the pcap i used: https://github.com/stratosphereips/StratosphereLinuxIPS/blob/develop/dataset/test7-malicious.pcap

i ran suricata using the following command on it
suricata -r test7-malicious.pcap

i ran zeek using the following cmd on it
zeek -C -r test7-malicious.pcap

for each line in the zeek conn.log output i ran the following script to get the cid of each flow

proto = flow.proto.lower()
cases = {
    'tcp': communityid.FlowTuple.make_tcp,
    'udp': communityid.FlowTuple.make_udp,
    'icmp': communityid.FlowTuple.make_icmp,
}
try:
    tpl = cases[proto](flow.saddr, flow.daddr, flow.sport, flow.dport)
    return self.community_id.calc(tpl)
except KeyError:
    return ''

now for example this flow produced by suricata:

{"timestamp": "2018-03-09T22:49:16.520001+0200", "flow_id": 1898491295854895, "event_type": "flow", "src_ip": "fe80:0000:0000:0000:00d2:4591:568e:c3d1", "src_port": 5353, "dest_ip": "ff02:0000:0000:0000:0000:0000:0000:00fb", "dest_port": 5353, "proto": "UDP", "app_proto": "failed", "flow": {"pkts_toserver": 13, "pkts_toclient": 0, "bytes_toserver": 5188, "bytes_toclient": 0, "start": "2018-03-09T22:49:16.553263+0200", "end": "2018-03-09T22:50:26.234272+0200", "age": 70, "state": "new", "reason": "timeout", "alerted": false}, "community_id": "1:JpepHprmBz0RFdlLGhEMO4jAPvA="}

is the same as this flow produced by zeek:

conn.log:{"ts":1520628556.553263,"uid":"CJwrIjmGopvQP6Gx1","id.orig_h":"fe80::d2:4591:568e:c3d1","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":14.121544122695923,"orig_bytes":1892,"resp_bytes":0,"conn_state":"S0","local_orig":false,"local_resp":false,"missed_bytes":0,"history":"D","orig_pkts":7,"orig_ip_bytes":2228,"resp_pkts":0,"resp_ip_bytes":0,"orig_l2_addr":"68:5b:35:b1:55:93","resp_l2_addr":"33:33:00:00:00:fb"}

however, pycommunity id gives me this cid: 1:Ij3wBn8AhEgwlNMz41h3vXi0yL8= which doesn't match the one produced by suricata for the same flow


update

when I tried generating the cid using zeek's corelight plugin Corelight/CommunityID, I got the same uid as pycommunityid library

{"ts":1520628556.553263,"uid":"C0ADPg3q0T5H6xlzdb","id.orig_h":"fe80::d2:4591:568e:c3d1","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":14.121544122695923,"orig_bytes":1892,"resp_bytes":0,"conn_state":"S0","local_orig":false,"local_resp":false,"missed_bytes":0,"history":"D","orig_pkts":7,"orig_ip_bytes":2228,"resp_pkts":0,"resp_ip_bytes":0,"orig_l2_addr":"68:5b:35:b1:55:93","resp_l2_addr":"33:33:00:00:00:fb","community_id":"1:Ij3wBn8AhEgwlNMz41h3vXi0yL8="}

i guess this means that suricata is the one doing something wrong, and not pycommunityid?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.