Tag: mysql

At work, we had set up some wildcard virtual hosts in Apache config, and that got us by for quite some time.  But the time came when we needed finer-grained control of where to send incoming requests for different domains.  I needed to store my virtual hosts in a Mysql database, mapping domains to project directories.

I’ll spare you the problems I ran into and overcame, and just list the steps to get this done.  These instructions are based on a 64-bit, RHEL 5 server running the pre-packaged Apache server.  So if you follow these instructions on a different setup, of course, filenames, directories, versions, etc. may differ.

Install mod_vhost_dbd

Download dbd-modules from Google Code.  This is a great piece of code in the form of an Apache module that uses mod_dbd and a DBD Mysql (or other database) driver to fetch the DocumentRoot for a given domain from a database.

% wget http://dbd-modules.googlecode.com/files/dbd-modules-1.0.5.zip

Unzip the archive in a directory. As indicated on the website, build and install the module.

% apxs -c mod_vhost_dbd.c
% apxs -i mod_vhost_dbd.la

This places mod_vhost_dbd.so in /usr/lib64/httpd/modules.  Enable both this module and mod_dbd by adding two lines to httpd.conf, or equivalently creating a new include file in /etc/httpd/conf.d containing these lines.

LoadModule dbd_module modules/mod_dbd.so
LoadModule vhost_dbd_module modules/mod_vhost_dbd.so

In true unit fashion, now might be a good time to restart Apache, just so you can be sure everything is working up to this point.

% service httpd restart

Install Mysql DBD Driver to APR

Unfortunately, on my system, the Mysql DBD driver was nowhere to be found.  I had to rebuild Apache Portable Runtime (APR) utils with the Mysql driver enabled.

Download apr and apr-util from Apache.  Note these are not the latest versions, but the versions that matched the packages in worked for RHEL 5.

% wget http://archive.apache.org/dist/apr-1.2.8.tar.bz2
% wget http://archive.apache.org/dist/apr-util-1.2.8.tar.bz2

Unpack and untar these archives in the same parent directory.

Build and install APR.  Now, I do not think this is absolutely necessary, but it seems like a good idea to keep the versions in sync.

% ./configure --prefix=/usr
% make
% make install

Build and install apr-util.  Due to licensing issues, apr-util does not actually contain the Mysql DBD driver until apr-util-1.2.12.  Prior to that version, it must be downloaded separately, and the configure script rebuilt.

% wget http://apache.webthing.com/svn/apache/apr/apr_dbd_mysql.c
% ./buildconf --with-apr=../apr-1.2.7

Now for the three commands every Linux admin loves.

% ./configure --prefix=/usr --with-apr=/usr --libdir=/usr/lib64 --with-expat=builtin --with-ldap-include=/usr/include --with-ldap-lib=/usr/lib64 --with-ldap=ldap --with-mysql
% make
% make install

The first time I tried this, Apache could not find any LDAP-related modules.  Adding those configure switches seemed to do the trick.  Restart Apache.

% service httpd restart

Apache should now be able to query a Mysql database to get the DocumentRoot for a domain.  My VirtualHost block looked something like this.


    ServerName *.example.com
    DocumentRoot "/path/to/default/document/root"

    DBDriver mysql
    DBDParams host=localhost,user=root,pass=secret,dbname=vhosts

    DBDocRoot "SELECT path FROM vhosts WHERE host = %s"  HOSTNAME

For more details and instructions on mod_vhost_dbd configuration directives, read the project wiki.

The typical argument for database abstraction is database portability. By abstracting the database, you are free to switch from one RDBMS to another effortlessly. Popular database abstraction layers are PEAR MDB2, ADOdb, and built-in PHP PDO library (not quite a database abstraction layer, but we’ll throw it in there anyway).

But there are three problems with that logic.

  1. How often are you going to switch databases in the life of your application anyway? I don’t know if I’ve ever switched a database out from under an application. Ever.
  2. To achieve true database independence, you’ll need to avoid using all the little syntax nuances that vary from DB to DB (e.g. the MySQL LIMIT clause) and avoid any feature in that database that makes it interesting (e.g. say goodbye to ON DUPLICATE KEY UPDATE). Chances are, you haven’t done these things. So to switch databases, your biggest problem will not be having to change all the mysql_* calls to ora_* or whatever.
  3. As with any layer, when you add another layer, there will always be some performance impact.

In light of these reasons, database dependence can be a good thing. You can take advantage of features in the RDBMS. And if you can make native calls directly to the extension, you’re saving a lot of code from being executed.

But still, something feels wrong from having all those native function calls throughout the codebase. The solution is database access abstraction, which does not attempt to abstract away the entire database, but attempts to abstract away access to the database.

Practically, this means building a wrapper class around your database code.  Then, your application can use this wrapper class for its database needs. This is somewhat the best of both worlds. If you do need to switch DB’s, all your native functions calls will at least be in one file. You can also insert any system-wide logic pertaining to DB’s in this one class. For example, if you move to a replicated MySQL environment, you’ll need to direct READ queries to connect to one of multiple slave servers, and direct WRITE queries to the master server.  This seems like an obvious thing to do, but a lot of people assume using a DBAL is enough abstraction already.

At work, my biggest motivation was performance. Running tests on our current DBAL, ADOdb, against using the mysqli_* functions in PHP revealed significant performance gains in going without the DBAL, which makes sense.

This blog repeats much of the thinking here, but is a more comprehensive looks at the topic (though the language is confusing at times.)

The MySQL Query Cache is not very hard to understand. It is at its most basic a giant hash where the literal queries are the keys and the array of result records are the values. So this query:

SELECT event_name FROM events WHERE event_id = 8;

is different from this query:

SELECT  event_name FROM events WHERE event_id = 10;

Important note!  This means that even though your parameterized queries may look the same without the parameters, to the query cache, they are not!

As with all caches, the query cache is concerned about freshness of data. It takes perhaps the simplest approach possible to this problem by keeping track of any tables involved in your cached query. If any of these tables changes, it invalidates the query and removes it from the cache. This means that if your query returns frequently-changing data in its results, the query cache will invalidate the query frequently, leading to thrashing. For example, if you had a query that returned a view count of an event:

SELECT event_name, views FROM events WHERE event_id = 8;

Every time that event is viewed, the cached query will be invalidated. What’s the solution?

In general, write queries so that their result sets do not change often. In specific, mixing static attributes with frequently updated fields in a single table leads to thrashing, so separate out things like view counts and analytics into their own tables. The frequently updated data can be read with a separate query, or perhaps cached in your application in a data structure that periodically flushes to the DB.

This vertical partitioning of a single table’s columns into multiple tables helps immensely with the query cache. What’s more is that the table with the unchanging data can be further optimized for READS, and the frequently updated table can be optimized for UPDATES.

In MySQL (and MySQL only AFAIK), INSERT has a clause called ON DUPLICATE KEY UDPATE. When ON DUPLICATE KEY UPDATE is used with INSERT, the insert will update the record if a value for a unique or primary key already exists, or else create a record if the value does not exist. So now when a form can either create a new something or edit an existing something, you can use one query to do it, and not have to query to see if the something exists already.

So instead of this:

$num = $db->getone("select count(*) from events where event_id = 15");

if ($num > 0)
{
   $sql = "update events set name = 'New name' where event_id = 15";
}
else
{
   $sql = "insert into events (event_id, name) values (null, 'New name')";
}

you can do something like this:

$id = (empty($_POST['id'])) ? null : $_POST['id'];
$sql = "insert into events (event_id, name) values ($id, 'New name') on duplicate key update name = 'New name'";

Neat, eh?

In case you’re wondering what the difference is between ON DUPLICATE KEY UPDATE and a REPLACE query, a REPLACE fires a DELETE followed by an INSERT query, as opposed to a real UPDATE.