Archive for 'Coding'

Since I knew that the MySQL Query Cache used the literal queries as keys, it made sense that MySQL did not cache queries with certain SQL functions in them, such as this one:

$sql = "select event_id from events where event_dt >= curdate()";

Because MySQL knows that this query run today is not the same query when it is run tomorrow. There are other SQL functions such as rand() and unix_timestamp() that will bypass the query cache. These are listed here.

So I avoid these functions when possible by calculating the value in PHP. For example, I’d rewrite the above query as:

$date = date('Y-m-d');
$sql = "select event_id from events where event_dt >= '$date'";

Database Unit Testing

When I first looked into unit testing, the concept seemed simple enough.  The canonical example seemed to be the sum function.

public function sum($x, $y)
{
    return $x + $y;
}

And the test.

public function testSum()
{
    $summer = new Summer();

    $this->assertEquals(7, $summer->sum(3, 4));

    // and after awhile, trying to break the 
    // function turned out to be part of the fun.
    $this->assertEquals(4, $summer->sum(7, -3));
    $this->assertEquals(4, $summer->sum(4, null));
}

In any case, this was nice and all for arithmetic, but like in any typical web app, I soon ran into the big problem: how do I test the database?

Now, there is a chapter on database testing in the PHPUnit manual with DBUnit for PHP.  But as I read about those, I saw a lot of documentation about syntax and handling of null values and empty tables. I wondered why I couldn’t just create a real database and test it with real queries?

Enter SQLite

SQLite is a self-contained, serverless, zero-configuration, ACID-compliant RDBMS.  Since SQLite requires no setup, and runs completely in memory, it makes the perfect database to run self-contained, serverless, automated unit tests against.

Database Access Abstraction To The Rescue

Being able to test seamlessly with a SQLite database in my unit tests and run against a MySQL database in live code required some careful thought about how I went about setting up my classes.  First of all, I created a database access interface with some standard functions like query(), getRows(), startTransaction(), commit(), etc.  (You are using a database access class, aren’t you?)  Then, I implemented the interface with a MySQL implementation, and an SQLite implementation.

Since there are places in the code that takes advantage of MySQL specific features, I had to emulate these features in my SQLite driver.  For example, there was no lpad() function in SQLite.  Fortunately, thanks to the magic of createFunction(), I could create a SQLite User-Defined Function.

public function lpad($value, $numchars, $char)
{
    return str_pad($value, $numchars, $char, STR_PAD_LEFT);
}

$sqlite->createFunction('lpad', 'lpad', 3);

Some of the MySQL bells and whistles are of course missing, such as ON DUPLICATE KEY UPDATE. With some fancy query re-writing, I got those MySQL queries working with SQLite.  Here is my current SQLite database access class in its entirety.

Dependency Injection FTW!

Now, that I had a MySQL database driver, and an equivalent SQLite database driver, I needed a way to get my business objects to use the SQLite version in a test, and the MySQL version during real execution.  If only there was a pattern that let me “inject” the proper database dependency into the object.  Well, this is exactly what Dependency Injection is!

Instead of instantiating a database driver within the class, I passed the database object in the constructor.  There are other ways to pass in the object, such as with a setter method, or with a global registry, but I like constructor the best.  So the class might look like this:

class Customer
{
    private $_db;

    public function __construct($db)
    {
        $this->_db = $db;
    }
}

Writing The Test

With the database access classes in place set to be injected into my classes, I could test functions that persisted data to the database, and then actually inspect the database.  There was no XML data files.  There was no database testing DSL.  There was only plain ol’ SQL.

Here is a simplified function that saves fictional customer data, and a test for it.

public function save($firstName, $lastName)
{
    $this->_db->query("INSERT INTO customers (first_name, last_name) VALUES (?, ?)", 
        array($firstName, $lastName));

    return $this->_db->lastInsertId();
}

And the test.

class CustomerTest extends PHPUnit_Framework_TestCase
{
    private $_db;

    public function setUp()
    {
        $this->_db = new Test_Database;

        $this->_db->query("CREATE TABLE customers (first_name, last_name_);");
    }

    public function tearDown()
    {
        $this->_db->query("DROP TABLE customers");
    }

    public function testSave()
    {
        $customer = new Customer($this->_db);

        $customer->save("Mickey", "Mouse");

        // test that customer is saved to DB
        $results = $this->_db->getRows("SELECT * FROM customers");

        $this->assertEquals("Mickey", $results[0]['first_name']);
        $this->assertEquals("Mouse", $results[0]['last_name']);
        $this->assertEquals(1, count($results));
    }
}

One Last Thing

This method could get to be overkill in certain situations as you write tons of SQL to set up your tests.  But even then I find it preferable to setting up XML or CSV data.  As far as mocking out a database with some sort of Mock Object, I still maintain, why use a mock and a DSL when you can use the real thing?  This has worked well for me so far.

This past Thursday I gave a short talk at the monthly OINK-PUG meeting about how I discovered the Test-Driven Development Kool-Aid®.  I knew about TDD years before I actually started doing it.  I tried to start several times, and never kept it going.  About a year ago, while developing a web service with no screens, the light bulb went on and gave me the idea for this talk.

In which I give a short introduction to unit testing, and then outline the story that finally turned on the light bulb about real test-driven development and test first for me. The third part of the talk gives an introduction to Dependency Injection (DI), an integral part to unit testing.

Autoload Magic

So to use a class in PHP, usually, we first have to include the file that contains the definition for that class.

require('myclass.php');

$class = new MyClass;

But after you start instantiating a few classes here and there, problems arise.

  • If you call a lot of classes, the number calls to require() becomes big.
  • When you start including files in other files, you are not sure whether or not you’ve already included a certain file, so you use require_once(), which is inefficient.
  • Worst of all, included files become disorganized, and you may accidentally remove a require() that is needed, not in the current file, but in a file downstream, leading to a Fatal Error (if you use require() and not include()).

The solution: __autoload() The PHP autoload feature is one of the coolest features in the whole language. Basically, if PHP tries to load a class and cannot find the class definition, it will call the __autoload() function that you provide giving it the name of the class it can’t find. At that point, you’re on your own. But how to find the file located on disk? There are four strategies for finding the right class file.

  1. Keep all class files in one directory. Not a very attractive method.
  2. Maintain a global array of class names to class definition files. In this case, the name of the class is the key, and the location of the file on disk is the value. This global hash could be created in memory upon server start time, or created as the application executes. Obviously, seems kind of a heavy solution to me.
  3. Use a special container class that all other classes inherit from. This method is suitable mostly when attempting to unserialize objects (when unserializing objects, PHP must have the class definition to recover the object). This special container will automatically know the class file for the contained object. Then, the only class file that needs to be located is the container’s class file. (Autoloading in the context of object un/serialization is a special case of class loading.)
  4. Use a naming convention. The idea here is to name your classes in a way that corresponds to your file system. For example, Zend names their classes where underscores represent directories in the project. So the Zend_Auth_Storage_Session class is defined in the file Zend/Auth/Storage/Session.php. Autoload simply needs to replace the underscores for the system directory separator char and give it to require().

At work, we use #4, naming conventions. Why? Well, why not? The other methods are heavier or more complicated, and I don’t see any gain. By using a simple naming convention, never again will we need to call require() or include() in our app. As a bonus side-effect, the code is organized in a consistent matter. Provided we follow the naming convention that Zend (and PEAR) use, our __autoload() function looks like:

// by using PEAR-type naming conventions, autoload will always know where to
// find class definitions
function __autoload($class)
{
   require(str_replace('_', DIRECTORY_SEPARATOR, $class) . '.php');
}

The typical argument for database abstraction is database portability. By abstracting the database, you are free to switch from one RDBMS to another effortlessly. Popular database abstraction layers are PEAR MDB2, ADOdb, and built-in PHP PDO library (not quite a database abstraction layer, but we’ll throw it in there anyway).

But there are three problems with that logic.

  1. How often are you going to switch databases in the life of your application anyway? I don’t know if I’ve ever switched a database out from under an application. Ever.
  2. To achieve true database independence, you’ll need to avoid using all the little syntax nuances that vary from DB to DB (e.g. the MySQL LIMIT clause) and avoid any feature in that database that makes it interesting (e.g. say goodbye to ON DUPLICATE KEY UPDATE). Chances are, you haven’t done these things. So to switch databases, your biggest problem will not be having to change all the mysql_* calls to ora_* or whatever.
  3. As with any layer, when you add another layer, there will always be some performance impact.

In light of these reasons, database dependence can be a good thing. You can take advantage of features in the RDBMS. And if you can make native calls directly to the extension, you’re saving a lot of code from being executed.

But still, something feels wrong from having all those native function calls throughout the codebase. The solution is database access abstraction, which does not attempt to abstract away the entire database, but attempts to abstract away access to the database.

Practically, this means building a wrapper class around your database code.  Then, your application can use this wrapper class for its database needs. This is somewhat the best of both worlds. If you do need to switch DB’s, all your native functions calls will at least be in one file. You can also insert any system-wide logic pertaining to DB’s in this one class. For example, if you move to a replicated MySQL environment, you’ll need to direct READ queries to connect to one of multiple slave servers, and direct WRITE queries to the master server.  This seems like an obvious thing to do, but a lot of people assume using a DBAL is enough abstraction already.

At work, my biggest motivation was performance. Running tests on our current DBAL, ADOdb, against using the mysqli_* functions in PHP revealed significant performance gains in going without the DBAL, which makes sense.

This blog repeats much of the thinking here, but is a more comprehensive looks at the topic (though the language is confusing at times.)