<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Straylight Run &#187; xml</title>
	<atom:link href="http://blog.straylightrun.net/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.straylightrun.net</link>
	<description>Software, Technology, PHP</description>
	<lastBuildDate>Tue, 11 May 2010 03:53:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Preventing XML-Restricted Characters</title>
		<link>http://blog.straylightrun.net/2009/01/07/preventing-xml-restricted-characters/</link>
		<comments>http://blog.straylightrun.net/2009/01/07/preventing-xml-restricted-characters/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 19:03:26 +0000</pubDate>
		<dc:creator>gerard</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[sqs]]></category>
		<category><![CDATA[utf-8]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://blog.straylightrun.net/?p=87</guid>
		<description><![CDATA[In the last post, I told you how I wanted to send around a custom Message object among services in our system using Amazon SQS messaging.  I soon ran into another problem.
As a reminder, here is the set of allowed XML characters again.
#x9 &#124; #xA &#124; #xD &#124; [#x20 to #xD7FF] &#124; [#xE000 to #xFFFD] [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://blog.straylightrun.net/2009/01/05/sending-a-serialized-object-to-amazon-sqs/">last post</a>, I told you how I wanted to send around a custom <code>Message </code>object among services in our system using Amazon SQS messaging.  I soon ran into another problem.</p>
<p>As a reminder, here is the set of allowed XML characters again.</p>
<blockquote><p>#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]</p></blockquote>
<p>I soon got another Amazon SQS exception.  I inspected the hex dump and found 0&#215;19 byte in the text.</p>
<p>The <a href="http://www.fileformat.info/info/unicode/char/0019/index.htm">0&#215;19 character is END-OF-MEDIUM character</a>, a control character I imagine you might copy-and-paste from a Windows machine into a web form.  In Firefox, 0&#215;19 looks like this (right after the phrase &#8220;Basket Makers&#8221;).</p>
<p style="text-align: center;"><a href="http://blog.straylightrun.net/wp-content/uploads/2009/01/ff-xml-char.jpg"><img style="text-align: center; border-width: 0px;" src="http://blog.straylightrun.net/wp-content/uploads/2009/01/ff-xml-char-thumb.jpg" border="0" alt="EOM (0x19) in Firefox" width="354" height="127" /></a></p>
<p>In IE7, it looks like this.</p>
<p style="text-align: center;"><a href="http://blog.straylightrun.net/wp-content/uploads/2009/01/ie-xml-char.jpg"><img style="text-align: center; border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://blog.straylightrun.net/wp-content/uploads/2009/01/ie-xml-char-thumb.jpg" border="0" alt="EOM (0x19) in IE7" width="340" height="130" /></a></p>
<p>One thing you might try is to strip out invalid UTF-8 characters using <a href="http://us.php.net/manual/en/function.iconv.php">iconv</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$string</span> <span style="color: #339933;">=</span> <span style="color: #990000;">iconv</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;UTF-8&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;UTF-8//IGNORE&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$string</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>But this won&#8217;t work.  0&#215;19 is a <em>valid</em> UTF-8 character, but an <em>invalid</em> XML character.  No choice but to explicitly filter out invalid XML values.  Using this filter (assuming text is in UTF-8 already) on instance values before serializing the object seems to work.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$str</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/[^(\x9|\xA|\xD|\x20-\xD7FF|\xE000-\xFFFD|\x10000\-\x10FFFF)]*/'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">''</span><span style="color: #339933;">,</span> <span style="color: #000088;">$str</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.straylightrun.net/2009/01/07/preventing-xml-restricted-characters/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Sending A Serialized Object To Amazon SQS</title>
		<link>http://blog.straylightrun.net/2009/01/05/sending-a-serialized-object-to-amazon-sqs/</link>
		<comments>http://blog.straylightrun.net/2009/01/05/sending-a-serialized-object-to-amazon-sqs/#comments</comments>
		<pubDate>Mon, 05 Jan 2009 16:08:30 +0000</pubDate>
		<dc:creator>gerard</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[serialization]]></category>
		<category><![CDATA[sqs]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://blog.straylightrun.net/?p=80</guid>
		<description><![CDATA[In our service-oriented system, I wanted to send around a custom Message object.  We currently use Amazon SQS for messaging, which requires that all message characters fall within the valid XML character range (according to W3C XML 1.0 spec).  This range is:
#x9 &#124; #xA &#124; #xD &#124; [#x20 to #xD7FF] &#124; [#xE000 to #xFFFD] &#124; [...]]]></description>
			<content:encoded><![CDATA[<p>In our service-oriented system, I wanted to send around a custom <code>Message </code>object.  We currently use <a href="http://aws.amazon.com/sqs">Amazon SQS</a> for messaging, which requires that <a href="http://docs.amazonwebservices.com/AWSSimpleQueueService/2008-01-01/SQSDeveloperGuide/index.html?Query_QuerySendMessage.html">all message characters fall within the valid XML character range</a> (according to <a href="http://www.w3.org/TR/REC-xml/#charsets">W3C XML 1.0 spec</a>).  This range is:</p>
<blockquote><p>#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]</p></blockquote>
<p>Here&#8217;s some code to serialize the object and enqueue it to SQS.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">class</span> Message
<span style="color: #009900;">&#123;</span>
   <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000088;">$_msg</span><span style="color: #339933;">;</span>
&nbsp;
   <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> setMessage<span style="color: #009900;">&#40;</span><span style="color: #000088;">$msg</span><span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
       <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span>_msg <span style="color: #339933;">=</span> <span style="color: #000088;">$msg</span><span style="color: #339933;">;</span>
   <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000088;">$obj</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Message<span style="color: #339933;">;</span>
<span style="color: #000088;">$obj</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setMessage</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Hello world!'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$msg</span> <span style="color: #339933;">=</span> <span style="color: #990000;">serialize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$obj</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
enqueueToSQS<span style="color: #009900;">&#40;</span><span style="color: #000088;">$msg</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Unfortunately, this code produced this exception:</p>
<blockquote><p>Amazon_SQS_Exception: An invalid binary character was found in the message body, the set of allowed characters is #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]</p></blockquote>
<p>What was going on?</p>
<p>I echoed out the serialized <code>$msg</code> var and looked at it carefully.  There was a lot of funny punctuation, but nothing outside of the XML character range.</p>

<div class="wp_syntax"><div class="code"><pre class="sh" style="font-family:monospace;">O:7:&quot;Message&quot;:1:{s:13:&quot; Message _msg&quot;;s:12:&quot;Hello world!&quot;;}</pre></div></div>

<p>When in doubt, look at the binary.  I wrote <code>$msg</code> out to a file, and looked at the file&#8217;s hex dump.</p>

<div class="wp_syntax"><div class="code"><pre class="sh" style="font-family:monospace;">0000000: 4f3a 373a 224d 6573 7361 6765 223a 313a  O:7:&quot;Message&quot;:1:
0000010: 7b73 3a31 333a 2200 4d65 7373 6167 6500  {s:13:&quot;.Message.
0000020: 5f6d 7367 223b 733a 3132 3a22 4865 6c6c  _msg&quot;;s:12:&quot;Hell
0000030: 6f20 776f 726c 6421 223b 7d              o world!&quot;;}</pre></div></div>

<p>The culprit is at the end of line 2.  The NUL (0&#215;00) character is most definitely not in the valid character range.  Some googling confirmed my suspicions in the <a href="http://us3.php.net/manual/en/function.serialize.php#60834">PHP manual</a> itself.</p>
<blockquote><p>If you are serializing an object with private variables, beware. The serialize() function returns a string with null (x00) characters embedded within it, which you have to escape.</p></blockquote>
<p>Other comments on that manual page also gave me my solution: escaping with <a href="http://php.net/addslashes">addslashes()</a>.  So I replaced line 13 in the code above like so, and I was then able to send objects over SQS with no problem.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>13
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$msg</span> <span style="color: #339933;">=</span> <span style="color: #990000;">addslashes</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">serialize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$obj</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.straylightrun.net/2009/01/05/sending-a-serialized-object-to-amazon-sqs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->