I’ve been hearing a lot of chatter in the blogosphere about caching blog posts so your site doesn’t drown when you get linked-to from a high traffic site.
Most suggestions reference the WordPress “Super Cache” plugin, since a lot of bloggers use WordPress.
I don’t using WordPress, however. I actually hand-code my blog, so I started thinking of how I could quickly add caching to my own site without too much extra work to my custom CMS.
The main problem, for those unaware (or who also have a hand-coded site and want to implement a caching system), is that a heavy traffic surge to a site/page can cripple the database. So if your CMS queries the database every time you display a blog post, this is bad because the database can only handle so many concurrent queries. For casual Google referrals, this is not a huge problem. But if a major site happened to link to a blog post of yours, you could have hundreds or thousands of people trying to access the same page, which runs a query on each load.
This is especially important if you are using shared hosting! I can’t overstate how important it is to have some sort of caching system if you are using shared hosting, where database resources are shared amongst other users on the system.
If possible, you should always limit how many queries your page executes.
The examples I provide below will be using PHP.
Getting started
To get started, let’s review what needs to be done from a high level:
- Cache every blog post to a file, using
file_get_contents(). - When loading a blog post page, check for cached file first, and if there – use that instead of running the query.
- Only re-cache at appropriate times, such as when comments are posted to a blog post.
Caching each blog post initially
To cache every blog post to a file, we could run some sort of script that processes every post at once, but if you have more than a few hundred posts, this could really slam your server.
It’s probably better to cache each post as they are loaded – in other words, when someone lands on your site from a search engine, the caching will occur then. This way it sort of takes care of itself.
What to cache
You should cache the entire HTML page – everything – including header, navigation, sidebar(s), and footer. The idea is to display a representation of how the page appeared at a certain point in time.
So the simplest approach is to just use file_get_contents() and pass the blog post URL. For example:
file_get_contents("http://matthom.com/archive/2011/04/10/test-post")
This PHP code will attempt to obtain the page source for that URL – that is, the fully rendered page just as it would appear when loading it in a browser.
Writing and displaying cached content
Once you obtain that, save it to a file on your server in a web-accessible directory. Something like cache/blog/2011/2011-04-10_test-post.html should work. You just need to make sure it’s absolutely unique, and that multiple posts can’t possibly have the same file name/location.
Use fwrite() to write to the file. Try something like this:
$file_path = "/home/cache/blog/2011/2011-04-10_test-post.html";
$file_content = file_get_contents("http://matthom.com/archive/2011/04/10/test-post");
$fp = fopen($file_path, "w");
fwrite($fp, $file_content);
fclose($fp);
Then, to check if the cache file exists (when loading a blog post page), try this:
$cache_filename = "/home/cache/blog/2011/2011-04-10_test-post.html";
$cache_file_exists = file_exists($cache_filename);
if ($cache_file_exists)
{
$cached_content = file_get_contents($cache_filename);
echo $cached_content;
exit();
}
else
{
// run database query to retrieve blog post
}
Re-caching
You should only re-cache a particular blog post if the content on the page has changed, such as when someone leaves a comment on the post.
Otherwise, always serve visitors the cached version, and you’ll save a lot of database back-and-forth, making your site much more responsive and welcoming to more visitors!