View counts, click counts, hit counts, traffic statistics… The need for analytics and reporting on web products is a must-have.  Well, the easiest way to do that is to simply increment a database value each time.  The problem is when those counts are coming in hundreds of times per second.  Writes are the most expensive queries:

  1. Writes usually trigger updating an index.
  2. If you’re using a MyISAM storage engine, the table-level locking can get out of hand. 
  3. Writes are not query-cacheable.

After observing subpar write behavior, I wanted to know just how many of our total writes were for updating statistics?

First, I ran mysqltuner.

% mysqltuner
...
[**] Reads / Writes: 93% / 7%
...
%

So 7% of all queries were writes.  That wasn’t bad.  Then, I took the binary log of all DML statements for yesterday, starting at midnight.  I figured 24 hours was a good sample.

% mysqlbinlog  --start-date='2010-06-06 0' binary-log.000152 > cow

I grepped out DML lines, to get rid of the binary log stuff.

% grep -i '^insert' cow > cow2
% grep -i '^update' cow >> cow2

I counted up lines that wrote to our stat tables.

% wc -l cow2
24898 cow
% grep -i -c 'stat_' cow2
20880

Doing the math: 20880 / 24898 = 0.86. About 86% of all writes to our database were for statistics.  Which wasn’t too surprising.  Most web sites must store and log a lot of data to know where to improve and how users are using the site.

So what do we do?

That’s the subject of another post, but the short answer is that these writes can be batched somehow.  Whether the queries are batched with some sort of write-through cache, or job queues, the database won’t suffer from constant write queries.