Archive for March, 2009

Ward Cunningham On Technical Debt

Ward Cunningham reflects on the history, motivation and common misunderstanding of the “debt metaphor” as motivation for refactoring.

The MySQL Query Cache is not very hard to understand. It is at its most basic a giant hash where the literal queries are the keys and the array of result records are the values. So this query:

SELECT event_name FROM events WHERE event_id = 8;

is different from this query:

SELECT  event_name FROM events WHERE event_id = 10;

Important note!  This means that even though your parameterized queries may look the same without the parameters, to the query cache, they are not!

As with all caches, the query cache is concerned about freshness of data. It takes perhaps the simplest approach possible to this problem by keeping track of any tables involved in your cached query. If any of these tables changes, it invalidates the query and removes it from the cache. This means that if your query returns frequently-changing data in its results, the query cache will invalidate the query frequently, leading to thrashing. For example, if you had a query that returned a view count of an event:

SELECT event_name, views FROM events WHERE event_id = 8;

Every time that event is viewed, the cached query will be invalidated. What’s the solution?

In general, write queries so that their result sets do not change often. In specific, mixing static attributes with frequently updated fields in a single table leads to thrashing, so separate out things like view counts and analytics into their own tables. The frequently updated data can be read with a separate query, or perhaps cached in your application in a data structure that periodically flushes to the DB.

This vertical partitioning of a single table’s columns into multiple tables helps immensely with the query cache. What’s more is that the table with the unchanging data can be further optimized for READS, and the frequently updated table can be optimized for UPDATES.