Here’s a good reason to keep all your development and production environments the same.  The task was simple enough.  I wanted to strip a UTF-8 encoded string of all punctuation.  Here’s some example code that does it, using PHP’s PCRE library.

1
2
3
4
5
6
7
8
<?php
 
// remove everything but letters, numbers, and spaces
// the 'u' modifier enables UTF-8
 
$string = "TAGholy! “moley”.    & bát's were _killed ^%by ; dogs, for £50 ümlauts";
$string = preg_replace('/[^\p{L}\p{N}\p{Zs}]+/u', '', $string);
echo "$string\n";

On PHP 5.2.4:

1
2
3
4
5
% php -i | grep PCRE
PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 6.6 06-Feb-2006
% php pcre_test.php
ss

On PHP 5.2.6:

1
2
3
4
5
% php -i | grep PCRE
PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 7.6 2008-01-28
% php pcre_test.php
TAGholy moley     báts were killed by  dogs for 50 ümlauts

Only took me a day to figure out.