Giter Site home page Giter Site logo

mitchellkrogza / nginx-ultimate-bad-bot-blocker Goto Github PK

View Code? Open in Web Editor NEW
3.7K 130.0 462.0 465.09 MB

Nginx Block Bad Bots, Spam Referrer Blocker, Vulnerability Scanners, User-Agents, Malware, Adware, Ransomware, Malicious Sites, with anti-DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail for Repeat Offenders

License: Other

Shell 97.51% PHP 2.45% HTML 0.04%
nginx nginx-server bot-blocker bots spam-blocker spambot-security spam-protection spam-filtering spam-prevention spam-referers porn-filter gambling-filter scanners vulnerability-scanners referer-blocker referrer-spam spyware adware malware spam-referrer-blocker

nginx-ultimate-bad-bot-blocker's Introduction

Nginx Ultimate Bad Bot Spam Referrer Blocker - Nginx Block Bad Bots, Vulnerability Scanners, Malware and Adware, Malicious Sites, Spam Referrers, Bad Referrers, Spam Blocker with DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail for Repeat Offenders

REPO BECOME A SPONSOR
DUB Your logo and link to your domain will appear here if you become a sponsor. Simply email me on [email protected] if you would like to sponsor this project as South Africa is not supported yet under the Github sponsor program.
Follow @ArchIsTheBest Help Support Me at https://ko-fi.com/mitchellkrog

Nginx Bad Bot and User-Agent Blocker, Spam Referrer Blocker, Anti DDOS, Bad IP Blocker and Wordpress Theme Detector Blocker

The Ultimate Nginx Bad Bot, User-Agent, Spam Referrer Blocker, Adware, Malware and Ransomware Blocker, Clickjacking Blocker, Click Re-Directing Blocker, SEO Companies and Bad IP Blocker with Anti DDOS System, Nginx Rate Limiting and Wordpress Theme Detector Blocking. Stop and Block all kinds of bad internet traffic even Fake Googlebots from ever reaching your web sites. PLEASE SEE: Definition of Bad Bots

Version: V4.2024.05.4482

Bad Referrers Blocked: 7104

Bad User-Agents (Bots) Blocked: 662

Fake Googlebots Blocked: 217


Help Support This Project

Buy me Coffee



Tested On:
nginx version: nginx/1.10.x -> mainstream ✔️

Not Using Nginx? See the Get the APACHE ULTIMATE BAD BOT BLOCKER

Please make sure you are subscribed to Github Notifications to be notified when the blocker is updated or when any important or mission critical (potentially breaking) changes may take place.


EASY AUTO CONFIGURATION INSTRUCTIONS FOR THE NGINX BAD BOT BLOCKER

Please follow the instructions below step by step ❗

  • This is our new preferred method of installation which is now done through a set of shell scripts contributed to this repo and maintained by Stuart Cardall @itoffshore who is one of the Alpine Linux package maintainers.

  • The instructions below are for a quick and painfree installation process which downloads all required files for the blocker and the scripts include adding the required includes to your nginx.conf and nginx .vhost files. The setup script assumes your vhost config files are located in /etc/nginx/sites-available/ and each vhost config file ends with a file extension of .vhost

  • For manual installation instructions please see - Please see: https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/MANUAL-CONFIGURATION.md

  • setup-ngxblocker, install-ngxblocker and update-ngxblocker can all be configured with custom installation / update locations from the command line. (See Step 11 of the instructions to show you how use these scripts and non-standard Nginx locations)

  • Run any of the setup, install or update scripts with --help or -h to view options.

PLEASE NOTE: For those using Let's Encrypt SSL Certificates the preferred and 100% working method is to use the Webroot Authenticator Method. There appears to be some issues with people using the http challenge method but can confirm that webroot work flawlessly. We are uncertain at this point whether the http-01 challenge method is a certbot or nginx bug.

[WARN] DUPLICATE NETWORK MESSAGES FROM NGINX

PLEASE READ THIS

The Duplicate network reports from Nginx is NOT a bug nor can it be fixed, this is the desired behaviour of the blocker. Daily updates of IP blacklists cause some well known IP's and ranges to be blacklisted old value "1" these are then whitelisted at the very end of globalblocklist which is the order of loading which then sets IP's we know are good to their new value "0" thereby whitelisting them. It has been this way since day 1 of the blocker and will remain this way. These are simple [WARN] messages not [EMERG] messages and they do not affect the operation of Nginx in any way whatsoever.


Linux

Download install-ngxblocker to your /usr/local/sbin/directory and make the script executable.

sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/install-ngxblocker -O /usr/local/sbin/install-ngxblocker
sudo chmod +x /usr/local/sbin/install-ngxblocker

If your Linux distribution does not have wget you can replace the wget command above using curl as follows:

curl -sL https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/install-ngxblocker -o /usr/local/sbin/install-ngxblocker
FreeBSD

Install the package.

pkg install www/nginx-ultimate-bad-bot-blocker

Alternatively install via portmaster:

portmaster www/nginx-ultimate-bad-bot-blocker

Now run the install-ngxblocker script in DRY-MODE which will show you what changes it will make and what files it will download for you. This is only a DRY-RUN so no changes are being made yet.

The install-ngxblocker downloads all required files including the setup and update scripts.

cd /usr/local/sbin
sudo ./install-ngxblocker

This will show you output as follows of the changes that will be made (NOTE: this is only a DRY-RUN no changes have been made)

Checking url: https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/include_filelist.txt

** Dry Run ** | not updating files | run  as 'install-ngxblocker -x' to install files.

Creating directory: /etc/nginx/bots.d

REPO = https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master

Downloading [FROM]=>  [REPO]/conf.d/globalblacklist.conf            [TO]=>  /etc/nginx/conf.d/globalblacklist.conf
Downloading [FROM]=>  [REPO]/conf.d/botblocker-nginx-settings.conf  [TO]=>  /etc/nginx/conf.d/botblocker-nginx-settings.conf

REPO = https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master

Downloading [FROM]=>  [REPO]/bots.d/blockbots.conf              [TO]=>  /etc/nginx/bots.d/blockbots.conf
Downloading [FROM]=>  [REPO]/bots.d/ddos.conf                   [TO]=>  /etc/nginx/bots.d/ddos.conf
Downloading [FROM]=>  [REPO]/bots.d/whitelist-ips.conf          [TO]=>  /etc/nginx/bots.d/whitelist-ips.conf
Downloading [FROM]=>  [REPO]/bots.d/whitelist-domains.conf      [TO]=>  /etc/nginx/bots.d/whitelist-domains.conf
Downloading [FROM]=>  [REPO]/bots.d/blacklist-user-agents.conf  [TO]=>  /etc/nginx/bots.d/blacklist-user-agents.conf
Downloading [FROM]=>  [REPO]/bots.d/blacklist-ips.conf          [TO]=>  /etc/nginx/bots.d/blacklist-ips.conf
Downloading [FROM]=>  [REPO]/bots.d/bad-referrer-words.conf     [TO]=>  /etc/nginx/bots.d/bad-referrer-words.conf
Downloading [FROM]=>  [REPO]/bots.d/custom-bad-referrers.conf   [TO]=>  /etc/nginx/bots.d/custom-bad-referrers.conf

REPO = https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master

Downloading [FROM]=>  [REPO]/setup-ngxblocker      [TO]=>  /usr/local/sbin/setup-ngxblocker
Downloading [FROM]=>  [REPO]/update-ngxblocker     [TO]=>  /usr/local/sbin/update-ngxblocker

setup-ngxblocker, install-ngxblocker and update-ngxblocker can all be configured with custom installation / update locations from the command line.

Run any of the setup, install or update scripts with --help or -h to view options.


Now run the install script with the -x parameter to download all the necessary files from the repository:

cd /usr/local/sbin/
sudo ./install-ngxblocker -x

This will give you the following output:

Checking url: https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/include_filelist.txt

Creating directory: /etc/nginx/bots.d

REPO = https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master

Downloading [FROM]=>  [REPO]/conf.d/globalblacklist.conf            [TO]=>  /etc/nginx/conf.d/globalblacklist.conf...OK
Downloading [FROM]=>  [REPO]/conf.d/botblocker-nginx-settings.conf  [TO]=>  /etc/nginx/conf.d/botblocker-nginx-settings.conf...OK

REPO = https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master

Downloading [FROM]=>  [REPO]/bots.d/blockbots.conf              [TO]=>  /etc/nginx/bots.d/blockbots.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/ddos.conf                   [TO]=>  /etc/nginx/bots.d/ddos.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/whitelist-ips.conf          [TO]=>  /etc/nginx/bots.d/whitelist-ips.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/whitelist-domains.conf      [TO]=>  /etc/nginx/bots.d/whitelist-domains.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/blacklist-user-agents.conf  [TO]=>  /etc/nginx/bots.d/blacklist-user-agents.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/blacklist-ips.conf          [TO]=>  /etc/nginx/bots.d/blacklist-ips.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/bad-referrer-words.conf     [TO]=>  /etc/nginx/bots.d/bad-referrer-words.conf...OK
Downloading [FROM]=>  [REPO]/bots.d/custom-bad-referrers.conf   [TO]=>  /etc/nginx/bots.d/custom-bad-referrers.conf...OK

REPO = https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master

Downloading [FROM]=>  [REPO]/setup-ngxblocker      [TO]=>  /usr/local/sbin/setup-ngxblocker...OK
Downloading [FROM]=>  [REPO]/update-ngxblocker     [TO]=>  /usr/local/sbin/update-ngxblocker...OK

All the required files have now been downloaded to the correct folders on Nginx for you direct from the repository.

MAKE SURE you set your setup and update scripts to be executable by running the following two commands. This is important before continuing with Step 4 and onwards.

sudo chmod +x /usr/local/sbin/setup-ngxblocker
sudo chmod +x /usr/local/sbin/update-ngxblocker

setup-ngxblocker, install-ngxblocker and update-ngxblocker can all be configured with custom installation / update locations from the command line.

Run any of the setup, install or update scripts with --help or -h to view options.


Now run the setup-ngxblocker script in DRY-MODE which will show you what changes it will make and what files it will download for you. This is only a DRY-RUN so no changes are being made yet.

cd /usr/local/sbin/
sudo ./setup-ngxblocker

This will give you output as follows (this output below assumes your nginx.conf file already has the default include of /etc/nginx/conf.d/*) All Nginx installations I know of have this default include in the nginx.conf file distributed with all versions.

Checking url: https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/include_filelist.txt

** Dry Run ** | not updating files | run  as 'setup-ngxblocker -x' to setup files.

INFO:      /etc/nginx/conf.d/* detected               => /etc/nginx/nginx.conf
inserting: include /etc/nginx/bots.d/blockbots.conf;  => /etc/nginx/sites-available/mydomain2.com.vhost
inserting: include /etc/nginx/bots.d/ddos.conf;       => /etc/nginx/sites-available/mydomain2.com.vhost
inserting: include /etc/nginx/bots.d/blockbots.conf;  => /etc/nginx/sites-available/mydomain1.com.vhost
inserting: include /etc/nginx/bots.d/ddos.conf;       => /etc/nginx/sites-available/mydomain1.com.vhost

Whitelisting ip:  x.x.x.x  => /etc/nginx/bots.d/whitelist-ips.conf

This script also whitelists your IP in the whitelist-ips.conf file for you. Further IP's or IP ranges can be added to your customizable whitelits-ips.conf file located in /etc/nginx/bots.d/whitelist-ips.conf.

setup-ngxblocker, install-ngxblocker and update-ngxblocker can all be configured with custom installation / update locations from the command line.

Run any of the setup, install or update scripts with --help or -h to view options.


Now run the setup script with the -x parameter to make all the necessary changes to your nginx.conf (if required) and also to add the required includes into all your vhost files.

This setup-ngxblocker script assumes that all your vhost files located in /etc/nginx/sites-available end in an extension .vhost. It is good practice to make all your vhost config files end with a .vhost extension but if you prefer to stick what you already have eg .conf you can simply modify run setup-ngxblocker using the -e parameter to specify the extension you use for your vhost files.

For instance if your vhost files end in .conf you will change this execute setup-ngxblocker with an additional command line parameter as follows:

sudo ./setup-ngxblocker -x -e conf

So now let's run the setup script and let it make all the changes we need to make the Bot Blocker active on all your sites.

cd /usr/local/sbin/
sudo ./setup-ngxblocker -x

You will see output as follows:

Checking url: https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/include_filelist.txt

INFO:      /etc/nginx/conf.d/* detected               => /etc/nginx/nginx.conf
inserting: include /etc/nginx/bots.d/blockbots.conf;  => /etc/nginx/sites-available/mydomain2.com.vhost
inserting: include /etc/nginx/bots.d/ddos.conf;       => /etc/nginx/sites-available/mydomain2.com.vhost
inserting: include /etc/nginx/bots.d/blockbots.conf;  => /etc/nginx/sites-available/mydomain1.com.vhost
inserting: include /etc/nginx/bots.d/ddos.conf;       => /etc/nginx/sites-available/mydomain1.com.vhost

Whitelisting ip:  x.x.x.x  => /etc/nginx/bots.d/whitelist-ips.conf

You will note it has done the includes in all the .vhost files on my test bed server and also whitelisted your own IP address in the whitelist-ips.conf file for you. Further IP's or IP ranges can be added to your customizable whitelits-ips.conf file located in /etc/nginx/bots.d/whitelist-ips.conf.

What this setup script has done has simply added the following include statements into your .vhost files for you, it also adds /etc/nginx/conf.d/* to the includes in nginx.conf (if not already in nginx.conf), otherwise, the whole script will fail.

# Bad Bot Blocker
include /etc/nginx/bots.d/ddos.conf;
include /etc/nginx/bots.d/blockbots.conf;

setup-ngxblocker, install-ngxblocker and update-ngxblocker can all be configured with custom installation / update locations from the command line.

Run any of the setup, install or update scripts with --help or -h to view options.


Now test your nginx configuration

sudo nginx -t

and you should see

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Now simply reload / restart Nginx and the Bot Blocker will immediately be active and protecting all your web sites.

sudo nginx -t && sudo nginx -s reload

or

sudo service nginx restart

That's it, the blocker is now active and protecting your sites from thousands of malicious bots and domains.


Now setup cron to automatically update the blocker for you every day so you always have the latest up to date protection.

sudo crontab -e

Add the following line at the end of your crontab file. Note adding the -e command line parameter to specify your email address where the update report is sent to. Obviously substitute [email protected] with your real email address or you will not receive the email when the script has updated.

00 22 * * * sudo /usr/local/sbin/update-ngxblocker -e [email protected]

This will update the blocker every night for you at 10 PM.

If you want it to update more frequently (as sometimes I push out 3-4 updates a day) you can set it as follows to run the cron every 8 hours, although just once a day is more than enough.

00 */8 * * * sudo /usr/local/sbin/update-ngxblocker -e [email protected]

If you don't want any email notification after an update (not advisable in case Nginx ever has an EMERG when reloading), then simply run your cron as follows.

00 */8 * * * sudo /usr/local/sbin/update-ngxblocker -n

If you would rather send e-mail via mailgun then run your cron as so:

00 22 * * * sudo /usr/local/sbin/update-ngxblocker -g [email protected] -d yourdomain.com -a mailgun api key -f [email protected]

That's it, the blocker will automatically keep itself up to date and also reload Nginx once it has downloaded the latest version of the globalblacklist.conf file.


You can now customize any of the following files below to suit your environment or requirements. These include files never get modified during an update using the auto update script above so whatever customizations you do here will never be overwritten during an update.

/etc/nginx/bots.d/whitelist-ips.conf
/etc/nginx/bots.d/whitelist-domains.conf
/etc/nginx/bots.d/blockbots.conf
/etc/nginx/bots.d/blacklist-domains.conf
/etc/nginx/bots.d/blacklist-user-agents.conf
/etc/nginx/bots.d/blacklist-ips.conf
/etc/nginx/bots.d/bad-referrer-words.conf
/etc/nginx/bots.d/custom-bad-referrers.conf
/etc/nginx/bots.d/ddos.conf

Let's say for some "obscure" reason you actually want to block GoogleBot from accessing your site. You would simply add it to the /etc/nginx/bots.d/blacklist-user-agents.conf file and it will over-ride the default whitelist for GoogleBot. the same applies to any other bots that are whitelisted by default.

All include files are commented for your convenience.


If This This Project helped you out, help support it

Buy me Coffee


(TEST THAT IT IS WORKING)

TESTING

Run the following commands one by one from a terminal on another linux machine against your own domain name.

substitute http:// yourdomain.com ❗ in the examples below with your own REAL domain name ❗

curl -A "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -I http://yourdomain.com

curl -A "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" -I http://yourdomain.com

Should respond with 200 OK

curl -A "Xenu Link Sleuth/1.3.8" -I http://yourdomain.com

curl -A "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" -I http://yourdomain.com

Should respond with either of the following error messages:

  • curl: (52) Empty reply from server
  • curl: (56) TCP connection reset by peer
  • curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

curl -I http://yourdomain.com -e http://100dollars-seo.com

curl -I http://yourdomain.com -e http://zx6.ru

Should respond with either of the following error messages:

  • curl: (52) Empty reply from server
  • curl: (56) TCP connection reset by peer
  • curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

The Nginx Ultimate Bot Blocker is now WORKING and PROTECTING your web sites !!!

substitute http:// yourdomain.com ❗ in the examples below with your own REAL domain name ❗

NOTE to Cloudflare Users !!!

If you are a Cloudflare user who is using the Cloudflare CDN / Caching System you should always disable the Cloudflare CDN (set gray Cloud)

While testing you will get the correct response codes results as below:

  • curl: (52) Empty reply from server
  • curl: (56) TCP connection reset by peer
  • curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

This is by design. The CDN is doing its work, meaning, the first response from your server said to the bot, "go away" by issuing it 444 or 443 responses.

Cloudflare cache's that response and hence the second time you test you will get served a 520 Origin Error origin error message.

While testing the blocker, disable the CDN / cacheing system and once you are happy with your tests, re-enable the CDN/Cache on your live environment as you do want the bots to get that response from Cloudflare.

Most other CDN systems will probably show the same behavior so always disable a CDN during testing to rule out anything that will interfere with your testing. Always make sure to re-enable the CDN when done testing !!

Once you Enable Cloudflare CDN, test again.

While testing, you get

  • curl: 520 Origin Error

This means the CDN is working.


OPTIONAL

INSTALLING THE BLOCKER TO NON-STANDARD NGINX FOLDER LOCATIONS

Some people build Nginx themselves and do not end up having the standard nginx folder locations at /etc/nginx

For users like this you can run the install-ngxblocker, setup-ngxblocker and update-ngxblocker specifying your folder location in the command lines as follows.

sudo ./install-ngxblocker -x -c /usr/local/nginx/conf.d -b /usr/local/nginx/bots.d

sudo ./setup-ngxblocker -x -c /usr/local/nginx/conf.d -b /usr/local/nginx/bots.d

sudo ./update-ngxblocker -c /usr/local/nginx/conf.d -b /usr/local/nginx/bots.d -e [email protected]

This will automatically put the files into the locations you specify, it will do the includes into your vhosts using your custom locations and when update-ngxblocker pulls a new update it will also now automatically re-write the "Include" sections inside the globalblacklist.conf file your own custom locations. Thanks again to Stuart Cardall @itoffshore for his contributions of these excellent scripts.


[WARN] DUPLICATE NETWORK MESSAGES FROM NGINX

PLEASE READ THIS

The Duplicate network reports from Nginx is NOT a bug nor can it be fixed, this is the desired behaviour of the blocker. Daily updates of IP blacklists cause some well known IP's and ranges to be blacklisted old value "1" these are then whitelisted at the very end of globalblocklist which is the order of loading which then sets IP's we know are good to their new value "0" thereby whitelisting them. It has been this way since day 1 of the blocker and will remain this way. These are simple [WARN] messages not [EMERG] messages and they do not affect the operation of Nginx in any way whatsoever.


WHY BLOCK BAD BOTS ?

Definition of Bad Bots

Bad bots are:
  • Bad Referrers
  • Bad User-Agent Strings
  • Spam Referrers
  • Spam Bots and Bad Bots
  • Nuisance or Unwanted Bots
  • Sites Linked to Lucrative Malware, Adware and Ransomware Clickjacking Campaigns
  • Vulnerability scanners
  • Gambling and Porn Web Sites
  • E-mail harvesters
  • Content scrapers
  • Link Ranking Bots
  • Aggressive bots that scrape content
  • Image Hotlinking Sites and Image Thieves
  • Bots or Servers linked to viruses or malware
  • Government surveillance bots
  • Botnet Attack Networks (Mirai)
  • Known Wordpress Theme Detectors (Updated Regularly)
  • SEO companies that your competitors use to try improve their SEO
  • Link Research and Backlink Testing Tools
  • Stopping Google Analytics Ghost Spam
  • Browser Adware and Malware (Yontoo etc)

(Over 4000 bad referers, spam referrers, user-agents, bad bots, bad IP's, porn, gambling and clickjacking sites, lucrative seo companies, wordpress theme detectors and counting)


Help Support This Project

Thousand of hours of programming and testing have gone into this project, show some love

Buy me Coffee


Welcome to the Ultimate Nginx Bad Bot, User-Agent, Spam Referrer Blocker, Adware, Malware and Ransomware Blocker, Click-Jacking Blocker, Click-Redirect Blocker and Bad IP Blocker with Anti DDOS System, Nginx Rate Limiting and Wordpress Theme Detector Blocking.

Bots attempt to make themselves look like other software or web sites by disguising their user agent. Their user agent names may look harmless, perfectly legitimate even.

For example, "^Java" but according to Project Honeypot, it's actually one of the most dangerous BUT a lot of legitimate bots out there have "Java" in their user agent string so the approach taken by many to block "Java" is not only ignorant but also blocking out very legitimate crawlers including some of Google's and Bing's and makes it very clear to me that those people writing bot blocking scripts seldom ever test them.

Spam Referrers and Spam Domain Names use very clever techniques to hop off your sites running very lucrative click-jacking and click-redirecting campaigns which serve ads to unsuspecting people browsing the web or even planting malware, adware or ransomware into their browsers which then become part of their lucrative network of bots.

This Bot Blocker includes hundreds of domain names and IP addresses that most people will not even see in their Nginx logs. This comes as a result of all my sites running of SSL and using Content-Security-Policy (CSP) which blocks things before they even get to Nginx and I have picked up and continue to pick up some of the worst domains and bots out there.

A massive amount of Porn, Gambling and Fake News web sites are also blocked in this blocker script which also grows at a rapid pace.

Unfortunately most bot blocker scripts out there are simply copy and pasted from other people's scripts and made to look like their own work. This one was inspired by the one created by https://github.com/mariusv and I contributed to that project but went off into a totally new layout, cleaned it up big time and started from scratch. It is now a completely independent project. It's clean, it works and has been thoroughly tested.


THE BASICS

This nginx bad bot bot blocker list is designed to be a global Nginx include file and uses the Nginx map $http_user_agent, map $http_referer and geo $validate_client directives.

This way the .conf file is loaded once into memory by Nginx and is available to all web sites that you operate. You simply need to use an Include statement in an Nginx vhost conf file.


IT'S CENTRALISED:

The beauty of this is that it is one central file used by all your web sites. This means there is only place to make amendments ie. adding new bots that you discover in your log files. Any changes are applied immediately to all sites after a simple "sudo service nginx reload". But of course always do a sudo nginx -t to test any config changes before you reload.


IT IS TINY AND LIGHTWEIGHT

The file is tiny in size. At the time of this writing and the first public commit of this the file size including all the commenting "which nginx ignores" currently at a mere 185 kb in size and already containing over 5000 bad domains, bad bots and bad IP addresses. It is so lightweight that Nginx does not even know it's there. It already contains thousands of entries - total updated at the top of this README.


IT IS ACCURATE AND IS FALSE POSITIVE PROOF

Unlike many other bad bot blockers out there for Nginx and Apache where people simply copy and paste lists from others, this list has been built from the ground up and tested thoroughly and I mean thoroughly for now over 10 months. It comes from actual server logs that are monitored daily and there are at least 3-10 new additions to this file almost daily.

It has also been throughly tested for false positives using months of constant and regular testing and monitoring of log files.

All web sites listed in the bad referers are checked one by one before they are even added. Simply copying anything that look suspicious in your log file and adding it to a blocker like this without actually seeing what it is first .... well it's foolish to say the least.


DROP THEM AND THAT'S IT

Nginx has a lovely error called 444 which just literally drops the connection. All these rules issue a 444 response so if a rule matches, the requesting IP simply get's no response and it would appear that your server does not exist to them or appears to be offline.

A test with curl using one of the test command line's documented in the /conf.d/globalblacklist.conf file will give a simple "curl: (52) Empty reply from server" and that's the reply the bad referrers and bots get.


RATE LIMITING FUNCTIONALITY BUILT IN

For bot's or spiders that you still want to allow but want to limit their visitation rate, you can use the built in rate limiting functions I have included. The file is extensively commented throughout so you should figure it out otherwise simply message me if you are having problems.


PULL REQUESTS / CORRECTIONS / FALSE POSITIVES:

Log an Issue or to contribute your own bad referers, bots or to make corrections to any incorrectly blocked bots or domains please fork a copy of this repository and send pull requests on the individual files located here and then send a pull request (PR).

All Additions, Removals and Corrections will all be checked for accuracy before being merged into main blocker.

ISSUES:

Log an Issue regarding incorrect listings or any other problems on the issues system and they will be investigated and removed if necessary. I responde very quickly to user problems and have helped countless users for days on end to get their bot blocker working. You could say I am mad (disputable) but I love helping people and do not ignore issues or people with problems getting this to work.


FEATURES OF THE NGINX BAD BOT BLOCKER:

  • Extensive Lists of Bad and Known Bad Bots and Scrapers (updated almost daily)
  • Blocking of Spam Referrer Domains and Web Sites
  • Blocking of SEO data collection companies like Semalt.com, Builtwith.com, WooRank.com and many others (updated regularly)
  • Blocking of clickjacking Sites linked to Adware, Malware and Ransomware
  • Blocking of Porn and Gambling Web Sites who use Lucrative Ways to Earn Money through Serving Ads by hopping off your domain names and web sites.
  • Blocking of Bad Domains and IP's that you cannot even see in your Nginx Logs. Thanks to the Content Security Policy (CSP) on all my SSL sites I can see things trying to pull resources off my sites before they even get to Nginx and get blocked by the CSP.
  • Anti DDOS Filter and Rate Limiting of Agressive Bots
  • Alphabetically ordered for easier maintenance (Pull Requests Welcomed)
  • Commented sections of certain important bots to be sure of before blocking
  • Includes the IP range of Cyveillance who are known to ignore robots.txt rules and snoop around all over the Internet.
  • Whitelisting of Google, Bing and Cloudflare IP Ranges
  • Whitelisting of your own IP Ranges that you want to avoid blocking by mistake.
  • Ability to add other IP ranges and IP blocks that you want to block out.
  • If its out there and it's bad it's already in here and BLOCKED !!

UNDERSTANDS PUNYCODE / IDN DOMAIN NAMES

A lot of lists out there put funny domains into their hosts file. Your hosts file and DNS will not understand this. This list uses converted domains which are in the correct DNS format to be understood by any operating system. Avoid using lists that do not put the correctly formatted domain structure into their lists.

For instance The domain:

lifehacĸer.com (note the K)

actually translates to:

xn--lifehacer-1rb.com

You can do an nslookup on any operating system and it will resolve correctly.

nslookup xn--lifehacer-1rb.com

	origin = dns1.yandex.net
	mail addr = iskalko.yandex.ru
	serial = 2016120703
	refresh = 14400
	retry = 900
	expire = 1209600
	minimum = 14400
xn--lifehacer-1rb.com	mail exchanger = 10 mx.yandex.net.
Name:	xn--lifehacer-1rb.com
Address: 78.110.60.230
xn--lifehacer-1rb.com	nameserver = dns2.yandex.net.
xn--lifehacer-1rb.com	text = "v=spf1 redirect=_spf.yandex.net"
xn--lifehacer-1rb.com	nameserver = dns1.yandex.net.

ALWAYS MONITOR WHAT YOU ARE DOING:

MAKE SURE to monitor your web site logs after implementing this. I suggest you first load this into one site and monitor it for any possible false positives before putting this into production on all your web sites.

Do not sit like an ostrich with your head in the sand, being a responsible server operator and web site owner means you must monitor your logs frequently. A reason many of you ended up here in the first place because you saw nasty looking stuff in your Nginx log files.

Also monitor your logs daily for new bad referers and user-agent strings that you want to block. Your best source of adding to this list is your own server logs, not mine.

Feel free to contribute bad referers from your own logs to this project by sending a Pull Request (PR). You can however rely on this list to keep out 99% of the baddies out there.


HOW TO MONITOR YOUR LOGS DAILY (The Easy Way):

With great thanks and appreciation to

https://blog.nexcess.net/2011/01/21/one-liners-for-apache-log-files/

To monitor your top referer's for a web site's log file's on a daily basis use the following simple cron jobs which will email you a list of top referer's / user agents every morning from a particular web site's log files. This is an example for just one cron job for one site. Set up multiple one's for each one you want to monitor. Here is a cron that runs at 8am every morning and emails me the stripped down log of referers. When I say stripped down, the domain of the site and other referers like Google and Bing are stripped from the results. Of course you must change the log file name, domain name and your email address in the examples below. The second cron for collecting User agents does not do any stripping out of any referers but you can add that functionality if you like copying the awk statement !~ from the first example.

Cron for Monitoring Daily Referers on Nginx

00 08 * * * tail -10000 /var/log/nginx/mydomain-access.log | awk '$11 !~ /google|bing|yahoo|yandex|mywebsite.com/' | awk '{print $11}' | tr -d '"' | sort | uniq -c | sort -rn | head -1000 | mail -s "Top 1000 Referers for Mydomain.com" [email protected]

This emails you a daily list of referrers using an awk command to exclude domains like google, bing and your own domain name.

Cron for Monitoring Daily User Agents on Nginx

00 08 * * * tail -50000 /var/log/nginx/mydomain-access.log | awk '{print $12}' | tr -d '"' | sort | uniq -c | sort -rn | head -1000 | mail -s "Top 1000 Agents for Mydomain.com" [email protected]

This emails you a list of top User-Agents who visited your site in the last 24 hours, helpful for spotting any rogue or suspicious looking User-Agents strings.


BLOCK AGGRESSIVE BOTS AT FIREWALL LEVEL USING FAIL2BAN:

I have added a custom Fail2Ban filter and action that I have written which monitors your Nginx logs for bots that generate a large number of 444 errors. This custom jail for Fail2Ban will scan logs over a 1 week period and ban the offender for 24 hours. It helps a great deal in keeping out some repeat offenders and preventing them from filling up your log files with 444 errors. See the Fail2Ban folder for instructions on configuring this great add on for the Nginx Bad Bot Blocker.


STOPPING GOOGLE ANALYTICS "GHOST" SPAM:

Simply using the Nginx blocker does not stop Google Analytics ghost referral spam because they are hitting Analytics directly and not always necessarily touching your website.

You should use regex filters in Analytics to prevent ghost referral spam.

For this there are several google-exclude-0*.txt files which have been created for you and they are updated at the same time when the Nginx Blocker is updated. As the list grows there will be more exclude files created.


TO STOP "GHOST" SPAM ON GOOGLE ANALYTICS FOLLOW THE SIMPLE VISUAL GUIDE BELOW

Follow the step by step visual instructions below to add these google-exclude files as segments to your web site.

Google Analytics - Adding Segments to Stop Ghost Spam
Google Analytics - Adding Segments to Stop Ghost Spam
Google Analytics - Adding Segments to Stop Ghost Spam
Google Analytics - Adding Segments to Stop Ghost Spam
Google Analytics - Adding Segments to Stop Ghost Spam
Google Analytics - Adding Segments to Stop Ghost Spam
Google Analytics - Adding Segments to Stop Ghost Spam

BLOCKING SPAM DOMAINS USING GOOGLE SEARCH CONSOLE / WEBMASTER TOOLS

(How to use the google-disavow.txt file)

I have added the creation of a Google Disavow text file called google-disavow.txt. This file can be used in Google's Webmaster Tools to block all these domains out as spammy or bad links. Use with caution.


ROBOTS.txt VERSION for those who cannot use this full blocker?

Lots of people are at the peril of their hosting company and do not have root access to the server running behind their web site. If this is your situation check out the automatically generated robots.txt file which will help you to some degree to keep a lot of Bad Bots and User-Agents out of your sites.


.htaccess VERSIONS for those who cannot use this full blocker?

Lots of people are at the peril of their hosting company and do not have root access to the server running behind their web site.

If this is your situation check out the automatically generated .htaccess versions of the Spam Referrer Blocker which can be found in this repository this .htaccess method (FOR APACHE SITES ONLY) will help you to keep all the Spam Referrers in this blocker out of your site.

This is merely mentioned here as a lot of people using CPanel systems think they are sitting behind an Nginx server but in reality are actually running on an Apache Server sitting behind an Nginx Proxy Server. .htaccess does not work on Nginx sites.

Not Using Nginx Web Server? See the Get the APACHE ULTIMATE BAD BOT BLOCKER


IT FORKING WORKS !!!


Just Enjoy now what the Nginx Bad Bot Blocker Can Do For You and Your Web Sites.

And Help Support This Project

Thousand of hours of programming and testing have gone into this project, show some love

Buy me Coffee


HAS YOUR WEB SITE BEEN HACKED?

Contact me for help to help you clean up and secure your web site.


SOME OTHER AWESOME FREE PROJECTS


ALSO CHECKOUT THE NEW BIG LIST OF HACKED MALWARE & WORDPRESS WEB SITES

This repository contains a list of all web sites I come across that are hacked with malware. Most site owners are unaware their sites have been hacked and are being used to plant malware.

Check it out at: https://github.com/mitchellkrogza/The-Big-List-of-Hacked-Malware-Web-Sites


INTO PHOTOGRAPHY?

Come drop by and visit me at mitchellkrog.com or Facebook or Follow Me on Twitter Follow @MitchellKrog


ACKNOWLEDGMENTS & CONTRIBUTORS:

Many Thanks to those contributing to this project.

Many parts of the generator scripts and code running behind this project have been adapted from snippets from hundreds of sources. In fact it is so hard to mention everyone but here are a few key people whose little snippets of code have helped me introduce new features all the time. Show them some love and check out some of their projects too.

If you believe your name should be here, drop me a line.


Writing Code like this takes lots of time !!

Thousand of hours of programming and testing have gone into this project, show some love

Buy me Coffee


MIT License

Copyright (c) 2017 Mitchell Krog - [email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

nginx-ultimate-bad-bot-blocker's People

Contributors

angel333 avatar atastycookie avatar danci1973 avatar doomedraven avatar fl02 avatar francesco-filicetti avatar franciscopaniskaseker avatar hong823 avatar ics avatar itoffshore avatar jaybizzle avatar jchimene avatar jsibbiso avatar lukeb avatar markwbrown avatar mitchellkrogza avatar netchild avatar petecooper avatar peterdavehello avatar reatlat avatar roniemartinez avatar starkravingza avatar stefanobaldo avatar udanieli avatar uded avatar xcorat avatar xopez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nginx-ultimate-bad-bot-blocker's Issues

Update script generating error after trying to reload Nginx configuration

I don't know why, but after I updated the update script to the latest version it fails to reload Nginx configuration. It doesn't happen all the time, so if I run script twice it normally works after the second one.

Now I set my crontab to run updates three times per day to circumvent this issue. Here goes the output:

1 - Latest Blacklist Already Installed: 3.2017.06.569

2 - Update Available => 3.2017.06.570

2017-06-19 11:20:03 URL:https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/conf.d/globalblacklist.conf [154957/154957] -> "/etc/nginx/conf.d/globalblacklist.conf" [1]

Reloading NGINX configuration...[FAILED]

3 - Latest Blacklist Already Installed: 3.2017.06.570

Oh well, it looks like the second round did work this time despite the bad output. But it normally doesn't.

And yes, everything is OK with my nginx.conf and nginx -t won't output any error. My environment is an Ubuntu 16.04 box.

Any clue?

Thanks in advance.

Auto Update Fails with NGINX EMERG

Please read updated configuration instructions.
A New include file is required for blacklisting your own bad user agents, nginx will fail a reload without the presence of the file /etc/nginx/bots.d/blacklist-user-agents.conf

A quick fix is to simply run

sudo touch /etc/nginx/bots.d/blacklist-user-agents.conf

Which file i can edit to whitelist ip from rate limit rule are for DDOS filter?

Hi,
botblocker-nginx-settings.conf there have rule to rate limit as recommend for wordpress site is 90 request per second will be fine, but i'm using magento so i'm confuse which value is suiable for this or can i add ip to whitelist to ignore this rule somehow.

limit_req_zone $binary_remote_addr zone=flood:50m rate=90r/s;
limit_conn_zone $binary_remote_addr zone=addr:50m;

I can see there have a whitelist-ips.conf but it only work for limit bad bot.
Thanks.

NUBBB Integration with Engintron Nginx Plugin

Nginx Ultimate Bad Bot Blocker full integration introduction with Engintron Nginx on your cPanel/WHM server.

STEP 1:

cd /etc/nginx/conf.d
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/conf.d/globalblacklist.conf -O globalblacklist.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/conf.d/botblocker-nginx-settings.conf -O botblocker-nginx-settings.conf
sudo mkdir /etc/nginx/bots.d
cd /etc/nginx/bots.d
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/blockbots.conf -O blockbots.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/ddos.conf -O ddos.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/whitelist-ips.conf -O whitelist-ips.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/whitelist-domains.conf -O whitelist-domains.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/blacklist-user-agents.conf -O blacklist-user-agents.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/blacklist-ips.conf -O blacklist-ips.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/bad-referrer-words.conf -O bad-referrer-words.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/custom-bad-referrers.conf -O custom-bad-referrers.conf

STEP 2:

Remove this line from nginx.conf (/etc/nginx/nginx.conf)

server_names_hash_bucket_size 512;

STEP 3:

Add include line to /etc/nginx/conf.d/default.conf (Lines: 61-62) and /etc/nginx/conf.d/default_https.conf (Lines: 20-21, 57-58, 77-78, 97-98, 117-118) and /etc/nginx/utilities/https_vhosts.php (Lines: 53-54)

include /etc/nginx/bots.d/blockbots.conf; 
include /etc/nginx/bots.d/ddos.conf;

STEP 4:

TESTING YOUR NGINX CONFIGURATION

sudo nginx -t

If you get no errors then you followed my instructions so now you can make the blocker go live with a simple.

sudo service nginx reload

The blocker is now active and working so now you can run some simple tests from another linux machine to make sure it's working.

You need re-do STEP 2, 3 and below STEP after every update of Engintron !!!

cd /etc/nginx/conf.d
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/conf.d/globalblacklist.conf -O globalblacklist.conf
sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/conf.d/botblocker-nginx-settings.conf -O botblocker-nginx-settings.conf

Incorrect Detections using "CPython" and "python-requests" user agent names

Through constant log monitoring and a recent new fail2ban filter I created, I noticed that a valid Google cloud IP address was being blacklisted by Fail2ban for generating too many "444" errors.

I tracked this down to the user agent strings "CPython" and "python-requests" which were previously in my list of bad bots. They have both been removed as they are used by "UniversalFeedParser/5.2.1 +https://code.google.com/p/feedparser/" which also uses a User-Agent String of "python-requests/2.5.3 CPython/2.7.9 Linux/3.16.0-4-amd64"

Net Range Using this is also valid:

NetRange: 104.196.0.0 - 104.199.255.255
CIDR: 104.196.0.0/14
NetName: GOOGLE-CLOUD
NetHandle: NET-104-196-0-0-1
Parent: NET104 (NET-104-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS15169
Organization: Google Inc. (GOOGL-2)
RegDate: 2014-08-27
Updated: 2015-09-21

Content scrapers log

sisfeed.com using cloudflare but I have found their own IP (sisfeed.com/IP: 176.99.4.10/11), Does this log help?

176.99.4.10 - - [17/Apr/2017:15:49:56 +0600] "GET / HTTP/1.1" 200 229413 "" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0"
176.99.4.11 - - [17/Apr/2017:15:49:58 +0600] "GET /cometchat/cometchatcss.php HTTP/1.1" 200 55650 "http://www.mysite.com/" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0"
176.99.4.11 - - [17/Apr/2017:15:49:58 +0600] "GET /cometchat/cometchatjs.php HTTP/1.1" 200 728214 "http://www.mysite.com/" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0"

Omgilibot

Received an email regarding Omgilibot as below. Further research reveals they are compliant as a good bot and obey robots.txt.

I hope all is well. My name is Ran Geva and I'm the founder & CEO of Webhose.io. We provide web data feeds for thousands of users from students to enterprise around the world. Basically we create structured web data repositories anyone can tap into instead of crawling the web themselves.

Anyway, it seems like you have listed our polite crawler (omgilibot) into a blacklist and I was wondering what can we do to take it out, so our crawler could do a better job?

Looking forward for your reply,

--
Ran Geva
CEO, Webhose.io

Fail2ban jails not working

Hello Mitchell,

My nginxrepeatoffender and nginx-limit-req jails are just starting up normally, but they won't ban anyone. So, I tried running fail2ban (0.9.3) in verbose mode, but I see no errors. I also edited paths-common.conf as you suggest somewhere, and I get the same result.

Take a look on how they are starting up:

Creating new jail 'nginxrepeatoffender'
Jail 'nginxrepeatoffender' uses pyinotify
Set jail log file encoding to UTF-8
Initiated 'pyinotify' backend
Set maxRetry = 20
Added logfile = /var/log/nginx/22222.access.log
Added logfile = /var/log/nginx/mysite.com.access.log
Added logfile = /var/log/nginx/access.log
Set banTime = 86400
Set jail log file encoding to UTF-8
Set findtime = 604800
Jail nginxrepeatoffender is not a JournalFilter instance

Creating new jail 'nginx-limit-req'
Jail 'nginx-limit-req' uses pyinotify
Set jail log file encoding to UTF-8
Initiated 'pyinotify' backend
Set maxRetry = 1
Added logfile = /var/log/nginx/mysite.com.error.log
Added logfile = /var/log/nginx/error.log
Added logfile = /var/log/nginx/22222.error.log
Set banTime = 600
Set jail log file encoding to UTF-8
Set findtime = 600
Jail 'nginxrepeatoffender' started
Jail 'nginx-limit-req' started

Any clue?

Thanks in advance,

César

Google Analytics - Segments Not Saving

Hi Mitchell,

After adding the first "google-exclude-01.txt" to the filter under the segment, Google does not allow me to save the Segment. These are intended to be all under the same Segment, correct? Maybe Google is further reducing the limit for adding keywords/filters to a segment?

Thank you everyone for your support !!!

I would just like to thank everyone for their support, usage and feedback on this project. It's growing in leaps and bounds lately. Now have 175 starred and done 919 commits since I started this repo. I see new users starring almost daily and starting to use this blocker.

I have lots of ideas for the future of this project which will slowly come on line over time but thanks again to everyone for your support. 👍 💯

screen shot 2017-07-16 at 1 55 59 pm

Update script returning error

Hello Mitchell,

Update script (b919876) is returning an error on line 82:

./update-ngxblocker: 82: local: --: bad variable name

Can you please look into it?

Thanks in advance,

Cesar

Fail2ban + Debian

I am running the latest version of Debian and when i add everything for Fail2Ban it will not start. First it could not understand the nginx logs location and i replaced that with the path to my access logs and now i get bad substitution

● fail2ban.service - LSB: Start/stop fail2ban
Loaded: loaded (/etc/init.d/fail2ban)
Active: active (exited) since Wed 2017-01-11 19:39:44 UTC; 5min ago
Process: 25824 ExecStop=/etc/init.d/fail2ban stop (code=exited, status=0/SUCCESS)
Process: 25831 ExecStart=/etc/init.d/fail2ban start (code=exited, status=0/SUCCESS)

Jan 11 19:39:43 ACROSS-LANTIS systemd[1]: Starting LSB: Start/stop fail2ban...
Jan 11 19:39:44 ACROSS-LANTIS fail2ban[25831]: Starting authentication failure monitor: fail2banERROR Failed during configuration: Bad value substitution:
Jan 11 19:39:44 ACROSS-LANTIS fail2ban[25831]: section: [nginxrepeatoffender]
Jan 11 19:39:44 ACROSS-LANTIS fail2ban[25831]: option : action
Jan 11 19:39:44 ACROSS-LANTIS fail2ban[25831]: key : port
Jan 11 19:39:44 ACROSS-LANTIS fail2ban[25831]: rawval : ", protocol="%(protocol)s", chain="%(chain)s"]
Jan 11 19:39:44 ACROSS-LANTIS fail2ban[25831]: failed!

blacklist-user-agents.conf not working, or bad config?

Firstly, this is an awesome repository, thank you very much

I am trying to completely block Baidu/Baiduspider from my site, however this doesn't seem to work. I am sure I am doing something wrong, but it is not completely obvious to me.

/etc/nginx/bots.d/blacklist-user-agents.conf

"~*Baidu"               3;
"~*Baiduspider"               3;

After restarting nginx

# curl -A "AhrefsBot" https://www.mydomain.com
curl: (52) Empty reply from server
# curl -A "Baiduspider" https://www.mydomain.com
<!doctype html>
<html>
//snipped

looking at my access logs, I can see

180.76.15.154 - - [10/May/2017:11:39:56 +0200] "GET /product/9781441101471 HTTP/1.1" 200 4060 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

"geo" supports IPv4 only

When testing, I get this error message:
nginx: [emerg] "geo" supports IPv4 only in /etc/nginx/conf.d/globalblacklist.conf:4865
nginx: configuration file /etc/nginx/nginx.conf test failed

Travis CI

Busy introducing Travis Ci for checking each commit and build. I have thorough cross checks already on the server where the scripts are generated but feel a Travis double check is a good thing. Will have it sorted in the morning.

Dots should be escaped

I understand that all the regular expressions should have its dots escaped, as follows:

This:
"~*ico.re" 1;

Should become this:
"~*ico\.re" 1;

The problem is that the "." from the expression is being interpreted as "any single character", and we are getting some false positives, like this:

curl -I http://xxx -e http://locatellicorretor.com.br
curl: (52) Empty reply from server

Auto Update

Hi mitchell,
first a big thanks for your hard work on that useful stuff.

I have some issues with the auto update functions.
Well, it is working so far and downloading the globalblacklist.conf which seems to be the only thing it updates. Dont get me wrong, i am just wondering if i have to worry about updates to the other files.

One problem is that it doesnt restart nginx automatically after update.
I am on Centos 7.3 and cant see any any failure messages logged.
Do i have to change something in the update script to make it work?
I checked the paths etc. and all is correct.
I only changed my emailadress (to get notified) and nothing else and added the cronjob as described.
The email output i receive is
2017-06-06 08:00:01 URL:https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/conf.d/globalblacklist.conf [149482/149482] -> "/etc/nginx/conf.d/globalblacklist.conf" [1]

Is it meant to be like that or should there be more info, maybe the version downloaded or that nginx has been restarted successfully?

Sorry for so much questions but got a bit stuck on that. ;)

TravisCI generator scripts bought online.

TravisCI now does the generation of all the bad bot blocker files during the build process, including creating the robots.txt file, google-disavow and google-exclude files. The build process is now based entirely off pull requests on the generator list files in the _generator_lists folder.

User Request - Allow Including of Own Bad Bots List

V2.2017.05 - Major Version Update

Introduced in: f19f327

  • PLEASE READ UPDATED CONFIGURATION INSTRUCTIONS
  • PLEASE READ CHANGELOG
  • New Custom Include File (/etc/nginx/bots.d/blacklist-user-agents.conf ) for Blacklisting your own User-Agents which will not get wiped out with updates
  • New include file (/etc/nginx/conf.d/botblocker-nginx-settings.conf) for having the important nging settings automatically included into your nginx.conf for you.
  • New Bash Installer Script for Easy Installation and Copying Files Directly from the Repo
  • Important to note changes in this Version Update as you will get EMERG errors if you are missing any of the new include files

Travis Build Errors (In Progress)

Travis CI decided to stop playing nice today and started failing on generating the globalblacklist.conf file. Relates to cat and sed errors which did not exist before, perhaps something has changed in Travis CI without users knowing about it?

At the moment a globalblacklist.conf file being generated with no bad bot or referrer information printed into it.

Working on the issue right now, will have working blocker back on track shortly.

False Positives / Incorrectly Blacklisted Domains?

If you ever find any false positives or blacklisted domains that you believe should not be blacklisted on this blocker please list them separately as issues here explaining why they should not be on the blocker and they will be investigated and promptly removed if found to be valid removals.

Unfortunately this list started over a year ago which was originally based on a few other lists which did contain a few false positives. Most have been ironed out over the past few months thanks to issues noted on this and other repo's I run or contribute to.

The original list from where this began had a mere 200+ listings which now has grown into over 3800+ strong which has been built day by day through monitoring of my servers logs of 18 very busy web sites. So yes there may be a few remnants from the original list which do not belong here. I have spent into the thousands of hours building this blocker and maintaining this list.

It is NOT the intention of this blocker to block out valid domains and services and I maintain a very clean list which is also regularly tested for dead domains and expired domains which are also removed every now and again to keep the list clean and functional as it should be.

So if you find anything being blocked that you believe is unwarranted, kindly list it here under issues and the matter will be looked into and promptly attended to.

Agents and Referrer Cronjob with info of bad bots

Hi mitchellkrogza,
first of all thanks for the updated version of the update script.
The info via email is now just perfect to see if everything is fine.

The following is not really an issue, its more an idea i had which could possibly be useful.
(if there is a better place to post ideas or suggestions please let me know for later times)
I have running the cronjobs you suggest in the install notes for agents and referrers which are running fine.
The Idea is to have a note or a sign in the list which shows that this referrer or agent is one you should have an eye on because it is not known as a good or bad one.
I have no idea how complex this would be to do but i thought i would share the idea i had.

cheers, sascha

Bad to create breaking changes when you have people using an auto-update script

when you have some people setting up and using this project, including an auto-update script that you are supplying, it seems like a bad idea to send along breaking changes through that auto-update mechanism.

i think it would be better to do one of two things:

  1. never send breaking changes at all

or

  1. architect an auto-update system / script that can handle things like creating new necessary .conf files so that you don't create changes that cause nginx to [emerg] behind people's backs.

Domain reffer exclusion not working good?

Hi,

To start, i'd like to thank you for creating this list, its awesome! It works good, except that i have one little problem. The whole site is working, except one page that seems to trigger the badbot list. The page has the word 'hardcore' in it and that seems to trigger the badbot list, yet i have excluded my server IP and domain names from the reffer check:

"~*mydomain.com" 0; "~*metalfans.be" 0; "~*home-theater.be" 0;

The url is as follows:
https://metalfans.be/metalfestivals/ieper-hardcore-fest-2017

Also it seems that it allows the page to be loaded, but not the assets?

Thank you for your time :)

Do I need access.log to be turned on ?

Hi,

newbie question. Do I need the nginx access.log to be turned on for this to work?

I have a lot of bots trying to access my wordpress site and the access.log increases in size too much and I decided to disable it instead..

Avoiding Nginx EMERG errors when Major Updates Take Place / Notifications Mailing List

I apologize for some people's auto update scripts having broken in the past week due to a major update that was released that required several new include files to exist in your /etc/nginx/bots.d folder.

Users that have not updated to the latest release will notice they are getting an Nginx EMERG error on an nginx reload due to the missing new include files.

This was a very necessary and very overdue update and should have been done much earlier in the development stages of the blocker but as they say better late than never. It was not intentional to break anyone's blocker from updating and causing Nginx to EMERG in the process but I knew the new new changes would do exactly that.

I do suggest you please subscribe to the mailing list to be notified of all new updates, builds and any major changes.

Simply send a blank email to [email protected]
and you will be subscribed to the mailing list, google will send you an email asking you to confirm your addition so please check for that or you won't actually be added.

Thanks everyone for your patience and understanding of the new changes and how much more power they bring to you as a user of the blocker, allowing you much more self customization than ever before and even more important your customized include files never get overwritten when pulling the (almost) daily updates of globalblacklist.conf

Thank you all for your support of this project and also to the great work @itoffshore Stuart Cardall is doing to the install, setup and update scripts. Assuring you this blocker will only go from strength to strength.

robots.txt fixed - non escaped user-agents names

Picked up a bug that robots.txt was using the list of user-agents which includes escaped spaces.

So fixed it now.

Where it was doing the following:

User-agent: Battleztar\ Bazinga
Disallow:/

it is now correctly printed as

User-agent: Battleztar Bazinga
Disallow:/

Firefox/7.0.1 (Brute Force Against Wordpress Sites)

Added "Firefox/7.0" as a bad bot, this old version of Firefox should no longer be in uses at all as it supports no modern day protocols. It is used exclusively by bots trying to register fake accounts or to brute force attack wordpress sites. Grep your logs for Firefox/7 and you will see

grep 'Firefox/7' /var/log/nginx/*.log

Unblock IP

Hey guys,

is there a way I can unblock an IP when its blocked?

Update Notification System

Busy building an update notification system. Users of the list will be able to subscribe and get immediate email notifications when the blocker is updated. Should be up and running tomorrow. This will ensure that you are always notified of updates and any major changes to the blocker which will affect operation.

Whitelist a user agent?

I've run across a situation where a partial user agent string that is in the global blacklist is blocking a user agent that I need to whitelist.

The line in the global blacklist looks like this:

"~*Disco" 3;

In my case, the user agent string that I need to whitelist (currently blocked by the rule above) looks like this:

Discourse Forum Onebox v1.9.0.beta3

I tried putting
"~*Disco" 0;

into whitelist-domains.conf but that didn’t' work.

Is there any correct way to whitelist a particular user agent string that is on the global blocklist, or override a blocked user agent string other than directly editing the global blocklist file (which would then make automatic updates un-usable)?

globalblacklist.conf directory errors on self compiled nginx

Thanks for the great solution. There is problem with the globalblacklist.conf file. It seems after an update it gets back the file directories like:

include /etc/nginx/bots.d/whitelist-domains.conf;

My installation of nginx is at:

include /usr/local/nginx/bots/whitelist-domains.conf;

The directory should be taken from a static file instead to overwrite it everytime after an update.

Duplicate mappings and directives preventing from using the bad bot blocker configs

#I wanted to add a layer of security to my NGINX by blocking malicious bots, so I followed the step-by-step instructions provided to set-up the configuration but I'm facing errors every time I try to start my server. Trying to fix one error leads to a new error. My nginx version: nginx/1.10.3.

Some of the console outputs:
nginx: [emerg] "server_names_hash_max_size" directive is duplicate in /etc/nginx/conf.d/botblocker-nginx-settings.conf:2 nginx: [emerg] limit_req_zone "flood" is already bound to key "$binary_remote_addr" in /etc/nginx/conf.d/botblocker-nginx-settings.conf:3 nginx: [emerg] limit_conn_zone "bot2_connlimit" is already bound to key "$bot_iplimit" in /etc/nginx/conf.d/globalblacklist.conf:5895

EDIT: My bad, I had a duplicate directive from a previous iteration of my nginx.conf which included '/etc/nginx/conf.d/*.conf'

install on Amazon AWS

HI

Has anyone tried to install nginx-ultimate-bad-bot-blocker on AWS Elastic Beanstalk using .ebextensions?

Qwant blocked

Why did you decide to block the Qwant-bot? It's a legitimate search engine which reads and obeys the robots.txt.

PLEASE UPDATE YOUR bad-referrer-words.conf include file !!!

Dear Users

PLEASE UPDATE YOUR bad-referrer-words.conf include file !!!

  • Major changes made to the default bad-referrer-words.conf file.
  • Stripped down to only a few words I choose to search for.
  • Commenting in the new file explains how dangerous this include file can be if used incorrectly.
  • All users urged to update their include file to the new pushed out today to the repo.

Simply pull a new copy by means of

sudo wget https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/bots.d/bad-referrer-words.conf -O /etc/nginx/bots.d/bad-referrer-words.conf

and then reload nginx

To explain why:

An issue was logged where the users own domain name was specialisteparquet.com and because this list contained the word "cialis" it was detected within his domain name causing his entire site to go down and not server any assets.

That one entry would even cause any site containing a word like "specialist" anywhere in any of their sites pages to cause them to be blocked and whitelisting your own domain name in the whitelist-domains.conf file will not even bypass this, SO BE VERY CAREFUL PLEASE with the use of the bad-referrer-words.conf include file.

Investigation Ongoing

Testing why the whitelist-domains.conf was ignored, will fix accordingly when I find a solution.

error installing + still seeing AhrefsBot and SemrushBot requests coming in

I am still seeing AhrefsBot requests coming in. I haven't added AhrefsBot to my robots file. I thought these conf additions would take care of it?

[26/Apr/2017:15:56:26 +0000] "GET /vendor/abx-engineering-inc/reviews/ HTTP/1.1" 444 0 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" "164.132.161.58"

[26/Apr/2017:15:58:33 +0000] "GET /locations/indio-ca/pallets/ HTTP/1.1" 444 0 "-" "Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.com/bot.html)" "46.229.168.70"

I tried using the scripts to do the install, but got errors:

$ sudo ~/install-ngxblocker 
Checking url: https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/include_filelist.txt

** Dry Run ** | -x or --exec to download files

awk: {$1="",$2=""; print}
awk:       ^ syntax error
Nothing to update for directory: /etc/nginx/conf.d
awk: {$1="",$2=""; print}
awk:       ^ syntax error
Nothing to update for directory: /etc/nginx/bots.d

I went through the manual installation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.