In our service-oriented system, I wanted to send around a custom Message
object. We currently use Amazon SQS for messaging, which requires that all message characters fall within the valid XML character range (according to W3C XML 1.0 spec). This range is:
#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]
Here’s some code to serialize the object and enqueue it to SQS.
class Message
{
private $_msg;
public function setMessage($msg)
{
$this->_msg = $msg;
}
}
$obj = new Message;
$obj->setMessage('Hello world!');
$msg = serialize($obj);
enqueueToSQS($msg);
Unfortunately, this code produced this exception:
Amazon_SQS_Exception: An invalid binary character was found in the message body, the set of allowed characters is #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
What was going on?
I echoed out the serialized $msg
var and looked at it carefully. There was a lot of funny punctuation, but nothing outside of the XML character range.
O:7:"Message":1:{s:13:" Message _msg";s:12:"Hello world!";}
When in doubt, look at the binary. I wrote $msg
out to a file, and looked at the file’s hex dump.
0000000: 4f3a 373a 224d 6573 7361 6765 223a 313a O:7:"Message":1:
0000010: 7b73 3a31 333a 2200 4d65 7373 6167 6500 {s:13:".Message.
0000020: 5f6d 7367 223b 733a 3132 3a22 4865 6c6c _msg";s:12:"Hell
0000030: 6f20 776f 726c 6421 223b 7d o world!";}
The culprit is at the end of line 2. The NUL (0x00) character is most definitely not in the valid character range. Some googling confirmed my suspicions in the PHP manual itself.
If you are serializing an object with private variables, beware. The serialize() function returns a string with null (x00) characters embedded within it, which you have to escape.
Other comments on that manual page also gave me my solution: escaping with addslashes(). So I replaced line 13 in the code above like so, and I was then able to send objects over SQS with no problem.
$msg = addslashes(serialize($obj));