ndfred / unifi-pi-hole Goto Github PK
View Code? Open in Web Editor NEWA Pi-hole equivalent for the Unifi Security Gateway
A Pi-hole equivalent for the Unifi Security Gateway
Uncomment the ads_lists = download_ads_list_urls('https://v.firebog.net/hosts/lists.php?type=tick')
line in build_rules.py
and boom:
$ ./build_rules.py
Parsing https://hosts-file.net/grm.txt
Parsing https://reddestdream.github.io/Projects/MinimalHosts/etc/MinimalHostsBlocker/minimalhosts
Parsing https://raw.githubusercontent.com/StevenBlack/hosts/master/data/KADhosts/hosts
Parsing https://raw.githubusercontent.com/StevenBlack/hosts/master/data/add.Spam/hosts
Parsing https://v.firebog.net/hosts/static/w3kbl.txt
Parsing https://adaway.org/hosts.txt
Traceback (most recent call last):
File "./build_rules.py", line 99, in <module>
sys.exit(main())
File "./build_rules.py", line 94, in main
output_rules('configure.sh')
File "./build_rules.py", line 74, in output_rules
for domain in parse_host_file(url):
File "./build_rules.py", line 59, in parse_host_file
for line in download_file(url).split('\n'):
File "./build_rules.py", line 25, in download_file
return urllib2.urlopen(url).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
This doesn't occur when using curl
so something in the python fetching stack is not behaving right.
What do the blacklists and whitelists look like?
How do they generate their regexp list?
When and how do they leverage their web server to serve pixels?
How do they collect statistics to build a list of top blocked domains?
We have tests, let's run them!
The logic to determine whether a host is a domain is pretty crude, and doesn't consider domains like .co.uk (see the test_is_domain
failure)
Maybe we could use the tldextract module or import some of the code: https://github.com/john-kurkowski/tldextract
I thought this was standard but after I removed the blacklist package the configuration options disappeared. I'll need to dig to understand how to get these back.
The USG configuration can specify a blacklist URL, but we'd still like to refresh it every day / week. Should we reload dnsmasq through a configured cron command?
Hi,
If I understand this right there is no Extra Hardware required as Pi-Hole is installed on the USG.
Will this script install the new Pi Hole 5 version on a USG Pro?
How is the Pi Hole Gui reachable for setup?
Produce a whitelist and a blacklist, with configuration instructions to deploy on the Unifi USG and EdgeMax
Re-generate these files periodically through continuous integration if possible
Have the USG re-download the rules periodically if possible
With 700k hosts that might make dnsmasq
slow or use a lot of memory
Would this solution cost a lot of space on disk?
The server sends back HTML rather than the expected list of hosts:
Parsing https://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts;showintro=0
<html>
<head>
<meta name="author" value="Peter Lowe; [email protected]">
<meta name="description" value="Blocklist of hostnames and domains for blocking ads, trackers and others (format: hosts -- in hosts file format)">
<meta name="keywords" value="ad blocking, blocking ads, ad servers, trackers, ads, banners, hosts file, privacy">
<link rel=alternate type="application/rss+xml" title="Blocklist of hostnames and domains for blocking ads, trackers and others (format: hosts -- in hosts file format) (RSS - last 50 added)" href="rss/1.0/adservers.rss">
<title>
Blocklist of hostnames and domains for blocking ads, trackers and others (format: hosts -- in hosts file format)</title>
<link rel='stylesheet' href='/css/pgl.css' type='text/css'></head>
<body>
<p><pre>
</pre>
</body>
</html>
The firebog website suggests using whitelists to leverage more advanced host lists, compile these so we can use them
remove_duplicate_domains
is very inefficient and causes CI to time out (see 1b2a725). Figure out a faster way to match sub-domains of an existing domain.
It is hand curated and looks much smaller / less prone to breaking websites than the other sources. See https://github.com/dgraham/Ka-Block/blob/master/Ka-Block.safariextension/blockerList.json
When trying to run the configuration commands the system runs out of memory:
# commit; save; exit
[ service dns forwarding blacklist ]
NOTI[001]14:58:42.963: Starting blacklist update...
INFO[002]14:58:42.965: Removing stale blacklists...
INFO[003]14:58:43.043: Downloading domains source unifi-pi-hole
INFO[004]15:00:02.339: unifi-pi-hole: downloaded: 360565
INFO[005]15:00:02.340: unifi-pi-hole: extracted: 360563
INFO[006]15:00:02.341: unifi-pi-hole: dropped: 2
INFO[007]15:00:14.945: Downloading hosts source unifi-pi-hole
INFO[008]15:01:42.968: unifi-pi-hole: downloaded: 375021
INFO[009]15:01:42.970: unifi-pi-hole: extracted: 109722
INFO[00a]15:01:42.971: unifi-pi-hole: dropped: 265299
NOTI[00b]15:01:46.341: Total entries found: 735586
NOTI[00c]15:01:46.343: Total entries extracted 470285
NOTI[00d]15:01:46.345: Total entries dropped 265301
ERRO[00e]15:01:46.349: ReloadDNS():
error: fork/exec /bin/bash: cannot allocate memory
Commit failed
It looks like a lot of hosts entries get dropped, maybe bad formatting or duplicates with the domains, worth filtering out when we produce the files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.