Comments (5)
Thanks for your work here.
Does this actually result in a bug in bbot? I'm aware these regexes aren't perfect but they shouldn't be being used for validation; only for event type detection. The actual validation happens later via ipaddress
and urllib
.
These regexes were designed for speed and simplicity. In case where a full rfc-compliant regex is required, I'd much rather offload it to an official library (others have already written better validation):
try:
ipaddress.ip_address(data)
# it's an ip
except ValueError:
# it's a DNS name
So we can avoid situations this:
from bbot.
No existing module currently uses ipv6_regex directly. It just seems to get used as part of open port regexes and url regexes so perhaps indirectly it does.
I've started using it directly though, as I'm also interested in detecting as much IP addressing related to targets as possible, in particular in situations in which IP's are used directly instead of DNS names which while uncommon do occur particularly within internal networks.
I totally agree you'll want to avoid having to manage/maintain regex patterns and offloading it to a central library that's going to do a better job if it would be ideal.
That said... making patterns available for modules to use via bbot/core/helpers/regexes.py
seems to be the current approach to providing a simple and reliable interface to do that?
~/bbot$ grep -C 4 _regex\. bbot/core/helpers/dns.py
results.add((rdtype, self._clean_dns_record(record.target)))
elif rdtype == "TXT":
for s in record.strings:
s = self.parent_helper.smart_decode(s)
for match in dns_name_regex.finditer(s):
start, end = match.span()
host = s[start:end]
results.add((rdtype, host))
elif rdtype == "NSEC":
~/bbot$
~/bbot$ grep -E '_regex\.(match|find)' bbot/modules/*.py
bbot/modules/azure_tenant.py: matches = self.helpers.regexes.uuid_regex.findall(authorization_endpoint)
bbot/modules/azure_tenant.py: found_domains = list(set(self.d_xml_regex.findall(r.text)))
bbot/modules/digitorus.py: for match in extract_regex.finditer(content):
bbot/modules/git.py: if getattr(result, "status_code", 0) == 200 and "[core]" in text and not self.fp_regex.match(text):
bbot/modules/httpx.py: if tempdir.is_dir() and self.httpx_tempdir_regex.match(tempdir.name):
bbot/modules/__init__.py: if e.is_dir() and dir_regex.match(e.name) and not e.name == "modules":
bbot/modules/massdns.py: digits = self.digit_regex.findall(d)
bbot/modules/rapiddns.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/riddler.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/sslcert.py: if issuer.emailAddress and self.helpers.regexes.email_regex.match(issuer.emailAddress):
bbot/modules/sslcert.py: if subject.emailAddress and self.helpers.regexes.email_regex.match(subject.emailAddress):
bbot/modules/viewdns.py: if self.date_regex.match(table_cells[1].text.strip()):
bbot/modules/virustotal.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
~/bbot$
ipaddress only used by ipneighbour,
~/bbot$ grep ipaddress bbot/modules/*.py
bbot/modules/ipneighbor.py:import ipaddress
bbot/modules/ipneighbor.py: network = ipaddress.ip_network(f"{main_ip}/{netmask}", strict=False)
~/bbot$
None of them seem to use get_event_type()
as the test modules do though perhaps that's the best validation process after any form of extraction?
~/bbot$ grep get_event_type bbot/modules/*.py
~/bbot$
from bbot.
Ah okay, I'm starting to see your use case. Are you wanting to extract IP addresses from HTTP responses, etc.?
from bbot.
I should mention we have lots of helpers for converting to IP addresses/networks, parsing, validation, etc. that don't require you to import anything. From inside a module, these are available under self.helpers
.
from bbot.
#1399 has been merged into dev.
from bbot.
Related Issues (20)
- Filedownload.handle_event (url_unverified) HOT 1
- Badsecrets taking a long time HOT 3
- Git clone interacting with console HOT 1
- SSLCert: duplicate malformed certificates HOT 1
- Bug in IIS Shortnames HOT 1
- Bug in BadDNS HOT 1
- Stdout dies mid-scan HOT 4
- ASN Error HOT 1
- Ability to set timeout on individual modules
- Option to Raise FILESYSTEM and WEBSCREENSHOTs as Base64 Blobs HOT 4
- Optimize scan status message HOT 1
- Better discovery path tracking for dnsbrute_mutations
- New Module: Apache Tika & `RAW_DATA` events HOT 11
- InternetDB: option to display open ports HOT 2
- WPScan Installation Error HOT 13
- Modile jwt_tool to check for jwts with certain CVE issues? HOT 2
- Enable Cookies By Default
- Don't Increment Scope Distance for Hostless Events HOT 2
- Optimize Neo4j
- Discrepancies in wappalyzer findings. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bbot.