While doing some updates to the blog, such as adding author information and other useful information
to hopefully help with legitimatizing things, I spent about two hours trying to figure out why a
shortcode in the theme that I use (Suffusion) would not parse arguments. The tag
suffusion-the-author would display author name, but if I added an argument to make it display the
description or a link instead of just the name, it would always fail and revert to the author name
only. Some quick googling told me that this didn't seem to be a problem anyone else had, and after
four major versions for the theme, I'm pretty sure someone would've noticed if a basic feature were
entirely nonfunctional. After lots of debugging, I discovered that the problem was happening in
Wordpress's code, inside the definition of shortcode_parse_atts()
, which parses the attributes
matched in the shortcode into an array for use inside the callback. For me, instead, it killed all
of the tag arguments and returned an empty string. In the end I found that the issue was a regular
expression that appears to be designed to replace certain types of unicode spaces with normal spaces
to simplify the argument parsing regular expression that comes after
$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);
On my system, this would simply set $text to an empty string, every time. Initially, I wanted to
just comment it out, because I'm not expecting to see any of these characters inside tags, but that
bothered me, because this has been a part of the Wordpress code for a very long time, and nobody
else appears to have problems with it. Finally, I concluded that this must mean that unicode support
was not enabled, so I began searching through yum. Bad news, mbstring and pcre were all already
installed, and php -i
told me that php was built with multibyte string support enabled,
as well as pcre regex support. I tested my pcre libraries and they claimed to support unicode as
well.
In the end, I solved the problem by updating php and pcre to the latest version, which was an update from php-5.3.13 to php-5.3.24 in my case, and pcre was updated from 7.x to 8.21. It appears that there was some kind of incompatibility between php 5.3 and the 7.x branch of pcre (if you build php 5.3 from source, it will target pcre 8.x), which prevented me from using unicode despite support for it being present everywhere. So, if you have trouble getting your php installation to handle unicode inside regular expressions, and you're running on Amazon Web Services, make sure you update to the latest versions of php and pcre, as there appear to be some issues with older packages.