Identifying bad bots
So you've noticed a certain user-agent keeps showing up in your logs, but you're not sure what it is, or if you want to ban it? There's a few ways to find out:- Google it: Try a search like this.
- Check the User Agent Database.
- Head over to Webmaster World and search again or start a new thread.
Blocking bots with .htaccess
This example, and all of the following examples, can be placed at the bottom of your .htaccess file. If you don't already have a file called .htaccess in your site's root directory, you can create a new one.#get rid of the bad bot RewriteEngine on RewriteCond %{HTTP_USER_AGENT} ^BadBot RewriteRule ^(.*)$ http://go.away/So, what does this code do? It's simple: the above lines tell your webserver to check for any bot whose user-agent string starts with "BadBot". When it sees a bot that matches, it redirects them to a non-existent site called "go.away".
Now, that's great to start with, but what if you want to block more than one bot?
#get rid of bad bots RewriteEngine on RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR] RewriteCond %{HTTP_USER_AGENT} ^EvilScraper [OR] RewriteCond %{HTTP_USER_AGENT} ^FakeUser RewriteRule ^(.*)$ http://go.away/The code above shows the same thing as before, but this time I'm blocking 3 different bots. Note the "[OR]" option after the first two bot names: this lets the server know there's more in the list.
Blocking Bandwidth Leeches
Say there's a certain forum that's always hotlinking your images, and it's eating up all your bandwidth. You could replace the image with something really gross, but in some countries that might get you sued! The best way to deal with this problem is simply to block the site, like so:RewriteEngine on RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC] RewriteRule .* - [F]This code will return a 403 Forbidden error to anyone trying to hotlink your images on somebadforum.com. The end result: users on that site will see a broken image, and your bandwidth is no longer being stolen.
Here's the code for blocking more than one site:
RewriteEngine on RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC,OR] RewriteCond %{HTTP_REFERER} ^http://.*example\.com [NC,OR] RewriteCond %{HTTP_REFERER} ^http://.*lastexample\.com [NC] RewriteRule .* - [F]If you want to block hotlinking completely, so that no one can hotlink your files, take a look at my article on using .htaccess to block hotlinkers.
Banning An IP Address
Sometimes you just don't want a certain person (or bot) accessing your website at all. One simple way to block them is to ban their IP address:order allow,deny deny from 192.168.44.201 deny from 224.39.163.12 deny from 172.16.7.92 allow from allThe example above shows how to block 3 different IP addresses. Sometimes you might want to block a whole range of IP addresses:
order allow,deny deny from 192.168. deny from 10.0.0. allow from allThe above code will block any IP address starting with "192.168." or "10.0.0." from accessing your site.
Finally, here's the code to block any specific ISP from getting access:
order allow,deny deny from some-evil-isp.com deny from subdomain.another-evil-isp.com allow from all
Final notes on using .htaccess
As you can see, .htaccess is a very powerful tool for controlling who can do what on your website. Because it's so powerful, it's also fairly easy for things to go wrong. If you have any mistakes or typos in your .htaccess file, the server will spit out an Error 500 page instead of showing your site, so be sure to back up your .htaccess file before making any changes.If you'd like to learn more about writing .htaccess files, I recommend checking out the Definitive Guide to Mod_Rewrite. This book covers everything you need to know about Apache's .htaccess rewrite system.
REFERENCES
http://blamcast.net/articles/block-bots-hotlinking-ban-ip-htaccess