Wednesday, December 30, 2009

.htaccess, 301 Redirects & SEO: Guest Post by NotSleepy

SkyHi @ Wednesday, December 30, 2009
Tony Spencer here doing a guest spot on SEOBook. Aaron was asking me some 301 redirect questions a while back and recently asked me if I would drop in for some
tips on common scenarios so here goes. Feel free to drop me any questions in the comments box.

301 non-www to www

From what I can tell Google has yet to clean up the canonicalization problem that arises when the www version of your site gets indexed along with the non-www version (i.e. http://www.seobook.com & http://seobook.com).

<code>
RewriteEngine On

RewriteCond %{HTTP_HOST} ^seobook.com [NC]
RewriteRule ^(.*)$ http://www.seobook.com/$1 [L,R=301]
</code>

The '(*.)$' says that we'll take anything that comes after http://seobook.com and append it to the end of 'http://www.seobook.com' (thats the '$1' part) and redirect to that URL. For more grit on how this works checkout a good regular expressions resource or two.

Note: You only have to enter 'RewriteEngine On' once at the top of your .htaccess file.

Alternately you may chose to do this 301 redirect from
in the Apache config file httpd.conf.

<code>
<VirtualHost 67.xx.xx.xx>
ServerName www.seobook.com
ServerAdmin webmaster@seobook.com
DocumentRoot /home/seobook/public_html
</VirtualHost>

<VirtualHost 67.xx.xx.xx>
ServerName seobook.com
RedirectMatch permanent ^/(.*) http://www.seobook.com/$1
</VirtualHost>
</code>

Note that often webhost managers like CPanel would have placed a 'ServerAlias' seobook.com in the first VirtualHost entry which would negate the following VirtualHost so be sure to remove the non-www ServerAlias.

301 www to non-www

Finally the www 301 redirect to non-www version would look like:

<code>
RewriteCond %{HTTP_HOST} ^www.seobook.com [NC]
RewriteRule ^(.*)$ http://seobook.com/$1 [L,R=301]
</code>

Redirect All Files in a Folder to One File

Lets say you no longer carry 'Super Hot Product' and hence want to redirect all requests to the folder /superhotproduct to a single page called /new-hot-stuff.php. This redirect can be accomplished easily by adding the following your .htaccess page:

<code>
RewriteRule ^superhotproduct(.*)$ /new-hot-stuff.php [L,R=301]
</code>

But what if you want to do the same as the above example EXCEPT for one file? In the next example all files from /superhotproduct/ folder will redirect to the /new-hot-stuff.php file EXCEPT /superhotproduct/tony.html which will redirect to /imakemoney.html

<code>
RewriteRule ^superhotproduct/tony.html /imakemoney.html [L,R=301]
RewriteRule ^superhotproduct(.*)$ /new-hot-stuff.php [L,R=301]
</code>

Redirect a Dynamic URL to a New Single File

It's common that one will need to redirect dynamic URL's with parameters to single
static file:

<code>
RewriteRule ^article.jsp?id=(.*)$ /latestnews.htm [L,R=301]
</code>

In the above example, a request to a dynamic URL such as http://www.seobook.com/article.jsp?id=8932
will be redirected to http://www.seobook.com/latestnews.htm

SSL https to http

This one is more difficult but I have experienced serious canonicalization problems
when the secure https version of my site was fully indexed along side my http version. I have yet
to find a way to redirect https for the bots only so the only solution I have for now is
to attempt to tell the bots not to index the https version. There are only two ways I know to do this and neither are pretty.

1. Create the following PHP file and include it at the top of each page:

if (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on') {
echo '<meta name="robots" content="noindex,nofollow">'. "\n";
}

2. Cloak your robots.txt file.
If a visitor comes from https and happens to be one of the known bots such as googlebot, you will display:

User-agent: *
Disallow: /

Otherwise display your normal robots.txt. To do this you'll need to alter your .htaccess
file treat .txt files as PHP or some other dynamic language and then proceed to write
the cloaking code.

I really wish the search engines would get together and add a new attribute to robots.txt
that would allow us to stop them from indexing https URLs.

Getting Spammy With it!!!

Ok, maybe you aren't getting spammy with it but you just need to redirect a shit ton of pages. First of all it'll take you a long time to type them into .htaccess, secondly too many entries in .htaccess tend to slow Apache down, and third its too prone to human error. So hire a programmer and do some dynamic redirecting from code.

The following example is in PHP but is easy to do with any language. Lets say you switched to a new system and all files that ended in the old id need to be redirected. First create a database table that will hold the old id and the new URL to redirect to:

old_id INT
new_url VARCHAR (255)

Next, write code to populate it with your old id's and your new URLs.

Next, add the following line to .htaccess:

<code>
RewriteRule ^/product-(.*)_([0-9]+).php /redirectold.php?productid=$2
</code>

Then create the PHP file redirectold.php which will handle the 301:

<code>
<?php
function getRedirectUrl($productid) {
// Connect to the database
$dServer = "localhost";
$dDb = "mydbname";
$dUser = "mydb_user";
$dPass = "password";

$s = @mysql_connect($dServer, $dUser, $dPass)
or die("Couldn't connect to database server");

@mysql_select_db($dDb, $s)
or die("Couldn't connect to database");

$query = "SELECT new_url FROM redirects WHERE old_id = ". $productid;
mysql_query($query);
$result = mysql_query($query);
$hasRecords = mysql_num_rows($result) == 0 ? false : true;
if (!$hasRecords) {
$ret = 'http://www.yoursite.com/';
} else {
while($row = mysql_fetch_array($result))
{
$ret = 'http://www.yoursite.com/'. $row["new_url"];
}
}
mysql_close($s);
return $ret;
}

$productid = $_GET["productid"];
$url = getRedirectUrl($productid);

header("HTTP/1.1 301 Moved Permanently");
header("Location: $url");
exit();
?>
</code>

Now, all requests to your old URLs will call redirectold.php which will lookup the new URL and return a HTTP response 301 redirect to your new URL.


Reference: http://www.seobook.com/archives/001714.shtml