Hello, I'm Chris Murphy — I specialize in creating engaging, user-centric interactive experiences.

URL Token Replacement Techniques for WordPress 3.0

Automating the content migration process with WordPress as a CMS can often be a painful process, but with some simple techniques using WordPress' content filters and plug-in hooks, the process can be less cumbersome.

WordPress comes with many built-in features for working with Page and Post content, and in some cases WordPress’ feature sets are well suited to being used as a Content Management System (CMS) to power websites. But any good development workflow suffers from some common issues—in this case: content migration.

Generally speaking, if you’re using WordPress as a CMS (or any CMS really), you’re likely to have some sort of tiered development setup. Such a setup typically consists of a Development, Staging, and Production environment (servers—virtualized or otherwise), and with each of those environments will likely have a corresponding Database server like MySQL.

The Issue

A tiered development workflow typically requires that you promote changes in one database to another (upstream), e.g. Development to Staging to Production. In some cases the replication can be bi-directional (upstream and downstream), e.g. Production to Staging and Back to Production. This last case can be pretty rare, but it exists, and where a CMS is involved, replicating the database can be a chore unless you’ve done some serious work to automate the process.

Where WordPress is concerned, we run into issues when we migrate content from our Staging Server and onto our Production server.

Here’s a Simple Illustration:

URL: http://localhost/mywordpress/

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ut risus purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. <a href="http://localhost/mywordpress/2010/09/01/the-quick-brown-fox">The quick brown fox jumps over the lazy dog</a>. Integer iaculis augue non lorem fermentum ac vulputate odio convallis. Curabitur congue sem vitae ipsum tempor sagittis.</p>

URL: http://www.stagingserver.com/mywordpress/

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ut risus purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. <a href="http://localhost/mywordpress/2010/09/01/the-quick-brown-fox">The quick brown fox jumps over the lazy dog</a>. Integer iaculis augue non lorem fermentum ac vulputate odio convallis. Curabitur congue sem vitae ipsum tempor sagittis.</p>

URL: http://www.productionserver.com/

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ut risus purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. <a href="http://localhost/mywordpress/2010/09/01/the-quick-brown-fox">The quick brown fox jumps over the lazy dog</a>. Integer iaculis augue non lorem fermentum ac vulputate odio convallis. Curabitur congue sem vitae ipsum tempor sagittis.</p>

Each instance of WordPress on the Development, Staging, and Production environments uses a replicated database. If you examine the links in each, you’ll see that each link is identical:

<a href="http://localhost/mywordpress/2010/09/01/the-quick-brown-fox">…</a>

We’ve moved the content from one environment/database to the another, but the URIs in the content haven’t changed. That leaves us with our fundamental migration issue, and if you’ve been in such a situation, you’ll probably have taken the following approach to deal with it:

Manually Updating The Database:

In most cases, you’d likely run a Search & Replace SQL statement that will modify all of the posts/pages/etc for you. Your Search & Replace SQL might look something like this:

UPDATE wp_options SET option_value = REPLACE(option_value, "localhost", "www.stagingserver.com");
UPDATE wp_posts SET post_content = REPLACE(post_content, "localhost", "www.stagingserver.com");
UPDATE wp_posts SET post_content_filtered = REPLACE(post_content_filtered, "localhost", "www.stagingserver.com");
UPDATE wp_postmeta SET meta_value = REPLACE(meta_value, "localhost", "www.stagingserver.com");

With this approach you’d need to run this search and replace each time you migrate the database from your localhost to your staging platform.

What a pain.

WordPress Content Filters to The Rescue

To avoid all of the unnecessary hassle we need to dig a little deeper into some of the inner workings of WordPress. In this case, content filters.

WordPress comes with a number of built in filters that Plug-in developers can hook to enhance parts of WordPress’ core feature set. In fact, in a default setup, WordPress applies a number of filters to the content when you first save a draft; when you publish an article or page; and when you use features within the_loop().

“Token” Replacement

A token in the context of this article is simple a common placeholder string that can be easily referenced programmatically, and transformed/replaced with something else. As the term suggests, the “token” should be simple and easy to find in your html markup of your content.

The objective of this approach is to replace all of the instances of your site URL with that of a token, and do this in such a way that it is invisible to you or whomever is tasked with managing the content, e.g. set-it-and-forget-it.

Let’s begin.

Step 1: Choosing a Token

The first thing we need to do is to determine the format of our token. Here are some considerations:

  1. The token should be something that is unique in comparison to the surrounding content—meaning that if it was garbled in with html markup and other text, it would be easily recognizable.
  2. The token should use only characters that are HTML safe—this excludes any special characters that might be transformed into HTML entities by WordPress’ other default content filters. There are ways to get around this, but that will be for you to discover.
  3. The token that we determine should be stored as a reference in a globally accessible file, e.g. functions.php

For the purposes of this article we will use the following as our token: [%URL%]

Step 2: The Token Replacement Functions

<?php
$URL_TOKEN = '[%URL%]';
/**
* Find instances of the current domain and replace them with a token
* @param String $content - The orignal content
* @return String - The updated content with the token replacement in place of the site URL
*/
function f3_createDomainToken( $content )
{
	$domain = get_bloginfo('url');
	$token = $URL_TOKEN;

	$content = str_replace( $domain, $token, $content );

	return $content;
}

/**
* Find instances of the token and replace them with the current domain
* @param String $content - The orignal content
* @return String - The updated content with the site URL in place of the token
*/
function f3_replaceDomainToken( $content )
{
	$domain = get_bloginfo('url');
	$token = $URL_TOKEN;

	// Find instances of the token and replace them with the current domain
	$content = str_replace( $token, $domain, $content );

	return $content;
}
?>

What we have with these two functions is a means to parse the content that is supplied to each one, and transform any initial references to your domains, e.g. “Site URL” into a token. Line 1 in the above code illustrates a constant that is accessible by each function and should be placed at the top of your functions.php file. The functions themselves will be to be placed in functions.php where you see fit.

Step 3: The Content Filters, e.g. “Hooks”

This last step is essential. In order for these functions to be of any benefit to you, they must be added to WordPress’ filter queue using the “add_filter()” method. You can use these functions independently rather than as filters, but you’ll only gain some marginal benefit, and it will require that you call the filters in your templates where you use the functions: the_content(), and get_the_content().

What we want to do is ensure that when the_content() function is called in your template, the resulting output is already correctly formatted, e.g. your token has been replaced. Here are the hooks that you will need to use these functions effectively:

<?php
// DOMAIN REPLACEMENT FILTERS
add_filter( 'the_content',      'f3_replaceDomainToken', 11);
//add_filter( 'get_the_content',  'f3_replaceDomainToken', 11);
add_filter( 'content_edit_pre', 'f3_replaceDomainToken', 11);
add_filter( 'content_save_pre', 'f3_createDomainToken', 11);
add_filter( 'the_editor_content', 'f3_replaceDomainToken', 11);
?>

The function “add_filter()” expects three parameters, two of which are required, and the last is optional. If you look at the example above, you can see that we’re using some of WordPress’ hooks:

<?php
the_content()
get_the_content()
content_edit_pre()
content_save_pre()
the_editor_content()
?>

The first two you might already familiar with if you’ve done some template development, but the last three are ones that are primarily used by WordPress Plug-in Developers. What you probably don’t know is that all of what I’ve illustrated here can be easily wrapped into a simple WordPress plug-in. If you really consider how WordPress is built, you’ll also realize that functions.php is a very simple example of how a plug-in functions.

Notes:

  • If you’re paying attention, you’ll notice that line 3 in the above code is commented out, this is because some plug-ins might already be applying filters to the output of get_the_content(), and you as a WordPress developer might want to do some other transformations on your content.
  • Line 6 is a rarely used (please correct me if I’m wrong) plug-in hook that’s used for plug-ins that introduce additional content fields into the normal WordPress editor.
  • You will still need to update the site url in the options table (wp_options) when you migrate from one database/environment to another, but you can use the first SQL statement I’ve provided in this article to handle that.
  • Full automation of this process can be accomplished with some edits to the wp_config.php by defining a constant that will be automatically updated as you migrate from server to server; this constant will replace the reference to bloginfo(‘url’) in the functions I’ve provided (I’m going to let you figure that one out).

Conclusion:

Content migration is still going to be painful at times, and it doesn’t really matter if you’re using WordPress or some other setup (even your own); however, it doesn’t always have to be that way. What I’ve illustrated here is a means to mitigate some of that headache. Take some time to explore how WordPress’ content filters work–you might be amazed at how much more control you actually have over your content and it’s formatting.

References:

  1. WordPress Codex: the_loop() explained
  2. WordPress Codex: apply_filters() explained

farfromfearless

Comments on This Post:

  1. WP_HOME and …
    Date: October 19, 2012
    Time: 10:17 am

    [...] all of the database tables except wp_site and wp_blogs to a local database.I highly recommend the URL Token Replacement Techniques for WordPress 3.0 article by Chris Murphy to help handle URLs in your content.This example assumes a subdomain [...]

  2. obmerk99
    Date: September 9, 2013
    Time: 9:49 pm

    That is a great little technique . But I have a small doubt .
    Isn’t that going to overload the server ? after all, this would require the str_replace() twice on every page load ( even more on archives, home etc.. ). maybe some enhancements where those changes actually reflect in the DB is more efficient in terms of server load .

Add a Comment: