Tuesday, February 15, 2011

How To Block Bots, Ban IP Addresses With .htaccess

SkyHi @ Tuesday, February 15, 2011
Got a spambot or scraper constantly showing up in your server logs? Or maybe there's another site that's leeching all your bandwidth? Perhaps you just want to ban a user from a certain IP address? In this article, I'll show you how to use .htaccess to do all of that and more!

Identifying bad bots

So you've noticed a certain user-agent keeps showing up in your logs, but you're not sure what it is, or if you want to ban it? There's a few ways to find out:
Once you've determined that the bot is something you want to block, the next step is to add it to your .htaccess file.

Blocking bots with .htaccess

This example, and all of the following examples, can be placed at the bottom of your .htaccess file. If you don't already have a file called .htaccess in your site's root directory, you can create a new one.
#get rid of the bad bot
   RewriteEngine on
   RewriteCond %{HTTP_USER_AGENT} ^BadBot
   RewriteRule ^(.*)$ http://go.away/
So, what does this code do? It's simple: the above lines tell your webserver to check for any bot whose user-agent string starts with "BadBot". When it sees a bot that matches, it redirects them to a non-existent site called "go.away".
Now, that's great to start with, but what if you want to block more than one bot?
#get rid of bad bots
   RewriteEngine on
   RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
   RewriteCond %{HTTP_USER_AGENT} ^EvilScraper [OR]
   RewriteCond %{HTTP_USER_AGENT} ^FakeUser
   RewriteRule ^(.*)$ http://go.away/
The code above shows the same thing as before, but this time I'm blocking 3 different bots. Note the "[OR]" option after the first two bot names: this lets the server know there's more in the list.

Blocking Bandwidth Leeches

Say there's a certain forum that's always hotlinking your images, and it's eating up all your bandwidth. You could replace the image with something really gross, but in some countries that might get you sued! The best way to deal with this problem is simply to block the site, like so:
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC]
RewriteRule .* - [F] 
This code will return a 403 Forbidden error to anyone trying to hotlink your images on somebadforum.com. The end result: users on that site will see a broken image, and your bandwidth is no longer being stolen.
Here's the code for blocking more than one site:
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*lastexample\.com [NC]
RewriteRule .* - [F] 
If you want to block hotlinking completely, so that no one can hotlink your files, take a look at my article on using .htaccess to block hotlinkers.

Banning An IP Address

Sometimes you just don't want a certain person (or bot) accessing your website at all. One simple way to block them is to ban their IP address:
order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12
deny from 172.16.7.92
allow from all
The example above shows how to block 3 different IP addresses. Sometimes you might want to block a whole range of IP addresses:
order allow,deny
deny from 192.168.
deny from 10.0.0.
allow from all
The above code will block any IP address starting with "192.168." or "10.0.0." from accessing your site.
Finally, here's the code to block any specific ISP from getting access:
order allow,deny
deny from some-evil-isp.com
deny from subdomain.another-evil-isp.com
allow from all

Final notes on using .htaccess

As you can see, .htaccess is a very powerful tool for controlling who can do what on your website. Because it's so powerful, it's also fairly easy for things to go wrong. If you have any mistakes or typos in your .htaccess file, the server will spit out an Error 500 page instead of showing your site, so be sure to back up your .htaccess file before making any changes.
If you'd like to learn more about writing .htaccess files, I recommend checking out the Definitive Guide to Mod_Rewrite. This book covers everything you need to know about Apache's .htaccess rewrite system.

REFERENCES
http://blamcast.net/articles/block-bots-hotlinking-ban-ip-htaccess