Using WordPress Transients to Fail Gracefully

June 11, 2015

A long-standing mantra for web developers has been to ‘fail gracefully’ – if you have some feature on your website that doesn’t work in an old browser, make sure the visitor can still read your text; if network congestion prevents images from loading, etc. WordPress websites often depend outside services like Disqus for comments, Twitter for feeds, AddThis/Sharethis for sharing widgets, and so on. As ‘graceful’ as we try and make our own failures, if one of those outside services has a catastrophic failure it can cause a domino effect and give your visitors a terrible impression. Fortunately, there are some techniques built around the WordPress Transient API we can use to keep this under control.

The problem:

On the PBS Newshour website, we built sidebar widgets to show the ‘Most Popular’ and ‘Most Discussed’ posts on the website. The PBS Newshour site is highly trafficked, with a lot of discussion and content shares happening, and we found the built-in Jetpack widgets didn’t do what we needed. Instead, we took advantage of the Chartbeat API to get the most up-to-date feed of ‘popular’ posts for the site (based on traffic and social shares) and the Disqus API to get the feed of posts with the most active recent discussions. Both services return a JSON feed, which we request via PHP and then use to render a list of posts. The lists don’t change that often, so requesting these JSON feeds every time a page loaded seemed really inefficient, so we looked to a way to cache them: WP Transients.

WP Transients are awesome – sort of.

WordPress has a Transient API that is quite easy to use and handles retrieving and caching data very nicely. The idea is that you request some data that will change occasionally and store it in the WP Options table with a built-in expiration time. If the expiration hasn’t passed, WordPress retrieves the stored data from the Options table instead of regenerating it; if the expiration has passed, WordPress regenerates the data and updates the Options table with the new data and a new expiration time.

WordPress provides several functions to make this all work, and you should review their documentation, but using transients is straightforward. Here’s a simple example from the WordPress documentation:

// Get any existing copy of our transient data
if ( false === ( $special_query_results = get_transient( 'special_query_results' ) ) ) {
  // It wasn't there, so regenerate the data and save the transient
  $special_query_results = new WP_Query( 'cat=5&order=random&tag=tech&post_meta_key=thumbnail' );
  set_transient( 'special_query_results', $special_query_results, 12 * HOUR_IN_SECONDS );
// Use the data like you would have normally...

In our case, we wanted to fetch JSON-formatted data from Chartbeat (or Disqus) and store the data in a transient, getting a new copy of the data every 10 minutes (600 seconds), like so (the api key has been obscured):

if ( false === ( $sidebar_popular_chartbeat = get_transient( 'sidebar_popular_chartbeat' ) ) ) {
  $sidebar_popular_chartbeat = wp_remote_get("");
  set_transient( 'sidebar_popular_chartbeat', $sidebar_popular_chartbeat, 600 );

And now we had those JSON results stored in the $sidebar_popular_chartbeat variable. We used the same code to retrieve and set a $sidebar_popular_disqus variable, and everything ran fast and efficiently… but what happens if Chartbeat or Disqus go down?

Outside data source down + WP Transients = faceplant

System outages happen to everyone – Google, Facebook, everyone! When one of these services go down, you see lots of broken websites and panicked messages from the people maintaining them. Disqus is generally quite reliable, but during testing we saw Disqus go down for a few hours. Aside from seeing panicked messages from people swearing to never use Disqus again, it exposed some serious issues with using transients to store remote data. Not only did we have a big blank space in our sidebar where visitors expected to see a list of posts, we had lots of error messages in our logs and increased server load. Maybe not exactly the ‘faceplant’ that Disqus was having, but still not a very graceful failure on our part.

What went wrong?

The example code above doesn’t do any error checking on what we got back from wp_remote_get(). However, website requests can fail in a lot of different ways, and detecting those failures from what wp_remote_get() returns is complex:

  • client connectivity errors like a DNS lookup or networking problem will be detected by is_wp_error();
  • 40x/50x response codes can be found by inspecting the returned array under $ary[‘response’][‘code’];
  • some websites will return a success (200) response code but pass back an empty body in the return;
  • even worse, some websites might return a 200 response code but pass back some error language in the body.

To cover the possible errors you really need to check them all, and there’s no built-in method from WordPress to do so.

Transients store whatever data you set them to until they expire, at which point they disappear. If you get an error when you try and get the value to store in the transient, you have two problems:

  1. If you store that error as your value, you’re stuck with it until the transient expires. There are manual means of deleting a transient, but you’re better off making sure you don’t store a bad value in the first place.
  2. Even if you do error checking on the value you’ve just retrieved before storing it, the old ‘good’ value has disappeared from the database without you doing anything! So your choice is between storing and displaying an error message, or storing and displaying nothing.

None of these are graceful failures, but these last two features of transients make failures particularly painful – your errors are stored for the life of the transient, and there’s no old copy of the data to fallback on. It almost seems like you’d be better off not using transients at all. Unless…

Solution: Stale cache storage and lots of error checking

As we saw before, ‘transients’ are simply entries in the WP_Options table that have an ‘expiration date’ value. Our solution was to store a backup of every transient without an expiration date.

We created a wrapper function for transients that get their data from remote sources. It retrieves the remote data, extensively error checks it, and stores a backup in the options table. If there’s an error it uses that backup to reset the transient with a ‘stale’ version of the cache in the hope that next time the transient is refreshed the remote data source will be up and running.

We replaced the old transient code:

if ( false === ( $sidebar_popular_chartbeat = get_transient( 'sidebar_popular_chartbeat' ) ) ) {
  $sidebar_popular_chartbeat = wp_remote_get("");
  set_transient( 'sidebar_popular_chartbeat', $sidebar_popular_chartbeat, 600 );

with this:

$sidebar_popular_chartbeat  = wnet_get_transient_remote_json('sidebar_popular_chartbeat', '', 600);

wnet_get_transient_remote_json() is a wrapper function for getting and setting transients safely. Lets walk through the function code below:

Our end result – with careful error checking on the retrieved data, and if you save a backup copy of the data, you can safely and effectively use transients to store remote-accessed data. You’ll be able to use the speed and efficiency of transients instead of retrieving remote data every time, and if the remote data source has failed spectacularly your local copy will still be available until the remote data source has repaired their problem. So we ‘fail’ so gracefully that visitors don’t even know there’s a problem, even if the data source has (metaphorically) fallen down face-first and knocked out a couple of teeth.

Digital at The WNET Group is not responsible for your or any third party’s use of code from this tutorial. All the information on this website is published in good faith “as is” and for general information purposes only; WNET and IEG make no representation or warranty regarding reliability, accuracy, or completeness of the content on this website, and any use you make of this code is strictly at your own risk.