View counts, click counts, hit counts, traffic statistics… The need for analytics and reporting on web products is a must-have. Well, the easiest way to do that is to simply increment a database value each time. The problem is when those counts are coming in hundreds of times per second. Writes are the most expensive queries:
After observing subpar write behavior, I wanted to know just how many of our total writes were for updating statistics?
First, I ran mysqltuner
.
% mysqltuner
...
[**] Reads / Writes: 93% / 7%
...
%
So 7% of all queries were writes. That wasn’t bad. Then, I took the binary log of all DML statements for yesterday, starting at midnight. I figured 24 hours was a good sample.
% mysqlbinlog --start-date='2010-06-06 0' binary-log.000152 > cow
I grepped out DML lines, to get rid of the binary log stuff.
% grep -i '^insert' cow > cow2
% grep -i '^update' cow >> cow2
I counted up lines that wrote to our stat tables.
% wc -l cow2
24898 cow
% grep -i -c 'stat_' cow2
20880
Doing the math: 20880 / 24898 = 0.86
. About 86% of all writes to our database were for statistics. Which wasn’t too surprising. Most web sites must store and log a lot of data to know where to improve and how users are using the site.
So what do we do?
That’s the subject of another post, but the short answer is that these writes can be batched somehow. Whether the queries are batched with some sort of write-through cache, or job queues, the database won’t suffer from constant write queries.