Giter Site home page Giter Site logo

Comments (5)

TheTechromancer avatar TheTechromancer commented on June 18, 2024

Thanks for your work here.

Does this actually result in a bug in bbot? I'm aware these regexes aren't perfect but they shouldn't be being used for validation; only for event type detection. The actual validation happens later via ipaddress and urllib.

These regexes were designed for speed and simplicity. In case where a full rfc-compliant regex is required, I'd much rather offload it to an official library (others have already written better validation):

try:
    ipaddress.ip_address(data)
    # it's an ip
except ValueError:
    # it's a DNS name

So we can avoid situations this:
Screenshot_20240522-212054.png

from bbot.

colin-stubbs avatar colin-stubbs commented on June 18, 2024

No existing module currently uses ipv6_regex directly. It just seems to get used as part of open port regexes and url regexes so perhaps indirectly it does.

I've started using it directly though, as I'm also interested in detecting as much IP addressing related to targets as possible, in particular in situations in which IP's are used directly instead of DNS names which while uncommon do occur particularly within internal networks.

I totally agree you'll want to avoid having to manage/maintain regex patterns and offloading it to a central library that's going to do a better job if it would be ideal.

That said... making patterns available for modules to use via bbot/core/helpers/regexes.py seems to be the current approach to providing a simple and reliable interface to do that?

~/bbot$ grep -C 4 _regex\. bbot/core/helpers/dns.py
            results.add((rdtype, self._clean_dns_record(record.target)))
        elif rdtype == "TXT":
            for s in record.strings:
                s = self.parent_helper.smart_decode(s)
                for match in dns_name_regex.finditer(s):
                    start, end = match.span()
                    host = s[start:end]
                    results.add((rdtype, host))
        elif rdtype == "NSEC":
~/bbot$ 
~/bbot$ grep -E '_regex\.(match|find)' bbot/modules/*.py
bbot/modules/azure_tenant.py:        matches = self.helpers.regexes.uuid_regex.findall(authorization_endpoint)
bbot/modules/azure_tenant.py:        found_domains = list(set(self.d_xml_regex.findall(r.text)))
bbot/modules/digitorus.py:            for match in extract_regex.finditer(content):
bbot/modules/git.py:                if getattr(result, "status_code", 0) == 200 and "[core]" in text and not self.fp_regex.match(text):
bbot/modules/httpx.py:            if tempdir.is_dir() and self.httpx_tempdir_regex.match(tempdir.name):
bbot/modules/__init__.py:    if e.is_dir() and dir_regex.match(e.name) and not e.name == "modules":
bbot/modules/massdns.py:        digits = self.digit_regex.findall(d)
bbot/modules/rapiddns.py:        for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/riddler.py:        for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/sslcert.py:            if issuer.emailAddress and self.helpers.regexes.email_regex.match(issuer.emailAddress):
bbot/modules/sslcert.py:            if subject.emailAddress and self.helpers.regexes.email_regex.match(subject.emailAddress):
bbot/modules/viewdns.py:                if self.date_regex.match(table_cells[1].text.strip()):
bbot/modules/virustotal.py:        for match in self.helpers.regexes.dns_name_regex.findall(text):
~/bbot$ 

ipaddress only used by ipneighbour,

~/bbot$ grep ipaddress bbot/modules/*.py
bbot/modules/ipneighbor.py:import ipaddress
bbot/modules/ipneighbor.py:        network = ipaddress.ip_network(f"{main_ip}/{netmask}", strict=False)
~/bbot$ 

None of them seem to use get_event_type() as the test modules do though perhaps that's the best validation process after any form of extraction?

~/bbot$ grep get_event_type bbot/modules/*.py
~/bbot$ 

from bbot.

TheTechromancer avatar TheTechromancer commented on June 18, 2024

Ah okay, I'm starting to see your use case. Are you wanting to extract IP addresses from HTTP responses, etc.?

from bbot.

TheTechromancer avatar TheTechromancer commented on June 18, 2024

I should mention we have lots of helpers for converting to IP addresses/networks, parsing, validation, etc. that don't require you to import anything. From inside a module, these are available under self.helpers.

from bbot.

TheTechromancer avatar TheTechromancer commented on June 18, 2024

#1399 has been merged into dev.

from bbot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.