Generating MT Entry Redirects
John invoked the LazyWeb *, and asks for a script which would go through renamed Moveable Type archive files, rename them, and generate the appropriate mod_rewrite rule. Furthermore, I'm going to assume that we want the renamed file to have the title of the MT entry.
If you want to see the answer, jump to the script.
While mod_rewrite will get you what you want. I'm going to suggest a hybrid mod_rewrite/PHP solution.
What We Know
The standard file system layout for Moveable Type has archives in the folder "archives". Looking at that folder on my evaluation MT weblog, I see:
/btvs/game/archives: 00001.html 12.xml 00002.html 2002_11.html ... ... 00027.html cat_mechanics.html
From that, I know what my mod_rewrite rules will look like:
RewriteEngine on RewriteRule ^00001.html$ title_of_first_entry.html [R,L] ... RewriteRule ^99999.html$ title_of_last_entry.html [R,L]
The directive [R,L] at the end of each rule means, redirect to the new URL, and, ignore the other rules.
So I'd like to pipe the names of the files in the archives directory to a script that renames the files, and generates the appropriate mod_rewrite rule.
First Pass
Okay, so that means I need to find all the html files in the directory whose filenames match the regular expression [0-9]+.html.
I'd like to use UNIX find to get the list of files, but it only understands globbing and not regular expressions.
So I'll just pipe ls /path/to/archives *.html into my script.
The script will have the usual Perlish input loop:
while (<>)
{
if (# this is an individual entry)
{
# get name of entry
# generate mod_rewrite rule
# write mod_rewrite rule
# make sure we don't have a duplicate filename
# copy current entry file to new file name
# move old file
}
}
We could use HTML::Parser, but all we want is the title of the entry. In the stock MT setup, that's:
<h3 class="title">My Spiffy Blog Entry</h3>
We can read the file in one gulp, and look for the appropriate regular expression.
/<h3 class="title">(.+)</h3>/
That leaves the title in $1. Then we can fold it to lower case, strip non-alphanumeric characters, and replace spaces with '_'.
Putting it all together
The first try yields:
#!/usr/bin/perl -w # # Renames mt archive entries with file names of the form -9+.html # to a name based on the entry's title. Moves the old file to ../oldarchive, # and creates an .htaccess file with mod_rewrite rules to redirect the old # URLs to the new ones. # # Usage: ls *.html | mtRename.pl # # (c) 2003, Bill Humphries, http://www.whump.com/ # Released under a Creative Commons License: # http://creativecommons.org/licenses/by/1.0 use strict; use File::Copy 'cp'; use FileHandle; use File::Path; my ($title, $rule, $ENTRY, $entrytext, $fh, $fhout, %rename, $previous); #open .htaccess $fhout = new FileHandle; $fhout->open ("> .htaccess") or die "Can't open .htaccess for writing. $!\n"; print $fhout "RewriteEngine On\n"; #create directory to move old files into mkpath ("../oldarchive"); while (<>) { chop; if (/^[0-9]+.html$/) { $fh = new FileHandle; if ($fh->open ("< $_")) { undef $/; $entrytext = <$fh>; $/ = "\n"; $fh->close (); } else { die "Can't open $_: $!\n"; } if ($entrytext =~ /<h3 class="title">(.+)</h3>/) { $title = $1; print "In $_ found $title, "; # Now get the new title: $title =~ tr/A-Z/a-z/; $title =~ s/s/_/g; $title =~ s/[^a-z0-9_]//g; print "trying to rename to $title.html\n"; # Make sure we're not overwriting a file if (exists ($rename{'$title'})) { $previous = $rename{'$title'}; die "We have a problem.\n$previous has already been renamed as $title.\nGo give $_ a new title.\n"; } # Copy to new name $fh = new FileHandle; if ($fh->open ("> $title.html")) { print $fh $entrytext; $fh->close (); } else { die "Can't open $title.html for writing. $!\n" } # Copy to backup $fh = new FileHandle; if ($fh->open ("> ../oldarchive/$_")) { print $fh $entrytext; $fh->close (); } else { die "Can't open ../oldarchive$_ for writing. $!\n" } # print the rewrite rule print $fhout "RewriteRule ^$\{_}\$\t$title.html [R,L]\n"; # delete the original unlink; $rename{'$title'} = $_; } } } $fhout->close ();
An improvement?
What if you have several hundred, or several thousand entries?
You don't want to use mod_rewrite. This is because in the case of .htaccess, Apache has to read that file for every request. And even if you put the rules in httpd.conf, that becomes a large ruleset that uses memory and processor. The set of mod_rewrite rules should be small.
An intermediate step would be to write a PHP script that handles the redirection for us.
PHP receives the URI of the request in $_SERVER['REQUEST_URI'], so we can create an array mapping old and new filenames, do a lookup on the URI, get the new file name and redirect to it.
The PHP for this looks like:
<?php // redirect.php $map = array('000001.html' => 'first_entry_title.html', ... '99999.html' => 'last_entry_title.html'); $base = "/path/to/archive/"; $request = str_replace ($_SERVER['REQUEST_URI'],$base,""); $newURI = $base.$map($request); header ("Location: http://".$_SERVER['HOSTNAME']."/".$newURI"); ?>
Which reduces the mod_rewrite rule to:
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^.+.html$
redirect.php?%{QUERY_STRING}
This rule tests to see if a file exists, and if it doesn't, sends the request to redirect.php. That script looks up the URL and redirects the request. Alternatively, you can specify redirect.php as the 404 error handler.
However, PHP has to load the remapping array, so that's a hit if it's large.
Ultimately, you'll want to tap into Moveable Type's database to handle the redirection.
Think of it this way, if the map is larger than the PHP or Perl script required to hit MT's database, then use the database.
* How does one invoke the LazyWeb? Does a searchlight shine the initials "LW" on the low clouds over Gotham City? Does a Red Phone light up in the Kremlin? Do you try to IM Willow Rosenthal? Do you rub the magic lamp you keep with the spare inkjet cartridges in the supplies cabinet?