Archive for 'Performance'

ETags

This one is filed under “that’s pretty picky, but I guess it couldn’t hurt.”

The Entity Tags (ETags) HTTP header is a string that uniquely identifies a specific version of resource. When the browser first downloads a resource, it stores the ETag. When it requests it again, it sends along the ETag to the server. If the server sees the same ETag, it will respond with a 304 Not Modified response, saving the download.

The problem is that the default format for the ETag (in Apache) is inode-size-timestamp. And the inode will be different from server to server, meaning the server may see a different ETag from the browser, even thought it is in fact an identical file.

According to Yahoo:

The end result is ETags generated by Apache and IIS for the exact same component won’t match from one server to another. If the ETags don’t match, the user doesn’t receive the small, fast 304 response that ETags were designed for; instead, they’ll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn’t a problem. But if you have multiple servers hosting your web site, and you’re using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you’re consuming greater bandwidth, and proxies aren’t caching your content efficiently.

There is another scenario where it isn’t a problem: if you are using sticky sessions in your load balancer.

In any case, as stated above, it couldn’t hurt to rectify this. So I configured the ETag format in Apache to exclude the inode, and use only size and timestamp.

FileETag MTime Size

So files across servers have the same ETag.

Serving Javascript and CSS

Editor’s note: This post formed the basis of the Front-End Optimization talk I’ve given in the past.

You’ve programmed websites for years, know the ins & outs of PHP, MySQL, why are Javascript and CSS files such a big deal? You put them in a directory, and link to them from your pages. Done. Right?

Not if you want maximum performance.

According to the Yahoo Exception Performance team:

…Only 10% of the time is spent here for the browser to request the HTML page, and for apache to stitch together the HTML and return the response back to the browser. The other 90% of the time is spent fetching other components in the page including images, scripts and stylesheets.

So static content is very important. The same Yahoo people provide us with a comprehensive list of Best (Front-end) Practices for Speeding Up Your Website.  IMO, some of the rules are more important than others, and some are more easily achieved.  Leaving aside hardware solutions (static server, CDN, etc.) for now, let’s look at six of the rules:

  1. Rule 1: Make Fewer HTTP Requests, or combine files. The less downloads the better. Simple file concatenation would do. Our goal is at most one Javascript and one CSS file per page.
  2. Rule 3: Add an Expires Header, or every static file must accompany a time-stamp so we can take advantage of the HTTP Expires: header. A time-stamp in the GET parameters might work, but some say that some CDN’s and browser/version/platform combinations will not request a new file if the query string changes. A better solution would be to put the time-stamp in the filename somewhere.
  3. Rule 4: Gzip Components. This is easily achieved by enabling mod_deflate in Apache.
  4. Rule 9: Reduce DNS Lookups. Okay, the real value in this rule is introducing parallel downloads by using at least two but no more than four host names. This is better explained here.
  5. Rule 10: Minify JavaScript, or at the least strip out all whitespace and comments. There are more sophisticated compressors out there that replace your actual variables with shorter symbols, but the chances of introducing bugs is higher.
  6. Rule 12: Remove Duplicate Scripts, which as they say is more common than you think.

Rule 3 is a matter of configuring Apache. How to achieve the other five?

As I see it, there are three broad ways to achieve them.

  1. Handle every request in real-time.  This means using a PHP file to serve the files (e.g. <link rel="stylesheet" type="text/css" href="custom_handler.php?file1.css,file2.css" /> or something like that).  It can also mean using mod_rewrite to direct incoming requests for CSS and Javascript to go to a PHP script. Either way, there is processing on every page load. Caching the end-product helps. Still, there must be a better way.
  2. Use a template or view plugin.  If you are using a templating system to dynamically generate your HTML, you can use some sort of plugin or function to read in a list of static files, check their last-modified times, and if changed build a combined, minified, time-stamped output file to serve up.  This is better than method #1 because by the time the page is built, there is a static file that is simply served to the browser.  Still, there must be a better way.
  3. The best way is to do it offline.  This means a job that checks static files to see if they’ve been modified.  If so, it processes the files and builds the output file that is directly served to the browser.  This job could be run in cron, or run manually by developers, but the best way is to make it a part of the build server.

Don’t have a build server?  That’s a whole other topic.

tmpfs Rules!

In any system, the biggest bottlenecks will usually be related to I/O. What this means practically is two things:

  1. Memory is faster than disk.
  2. Disk is faster than network.

But moving across the boundaries of memory, disk, and network is usually cumbersome.  For example, storing things on disks is programmatically easy, but slow.  Storing things in memory, in a persistent way, can be hard.  This is more true for a shared-nothing architecture like PHP rather than Java, so you may have to deal with some shared memory libraries and SysV IPC-style calls.

Enter tmpfs, the linux shared-memory file system.  You can mount it just like ext3, create files, and otherwise treat it like a normal disk, but it’s in memory!  Awesome!

On RHEL, Fedora, CentOS – not sure about others – there is a tmpfs drive mounted under /dev/shm by default.  One other note: since it is memory, its contents will be lost upon reboot.  I usually re-create any directories I need in the /etc/rc.d/rc.local script.  Note, however, that this is the last file to run on boot, so if you have a service or daemon that assumes a folder in /dev/shm, you will need to create it in the service’s startup script (usually in /etc/init.d).

When looking for something in an array of values, it is very tempting to use in_array(). After all, that’s what the name says. However, searching through an array, even with best-case search algorithms, will never be faster than a single index lookup, which is where isset() comes in. With isset(), you can use one operation to see if a value exists, provided those values exist as keys. I don’t know if it’s truly random access, but it’s pretty darn close.

So, instead of something like this:

1
2
3
4
5
6
7
8
9
$exclude = array(1, 4, 6, 8);
 
for ($i = 0, $size = count($data); $i < $size; $i++)
{
   if (in_array($data[$i]['id'], $exclude)
   {
      // do something
   }
}

do something like:

1
2
3
4
5
6
7
8
9
10
11
12
$exclude[1] = true;
$exclude[4] = true;
$exclude[6] = true;
$exclude[8] = true;
 
for ($i, $size = count($data); $i < $size; $i++)
{
   if (isset($exclude[$data[$i]['id']]))
   {
      // do something
   }
}

So does this make a difference? Let’s write a little benchmark script.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/php
<?php
$haystack = array();
for ($i = 0; $i < 1000; $i++)
{
    $haystack[] = rand(0, 1000);
}
 
$needles = array();
for ($i = 0; $i < 1000; $i++)
{
    $needles[] = rand(0, 1000);
}
 
for ($i = 0; $i < 1000; $i++)
{
    foreach ($needles as $needle)
    {
        if (in_array($needle, $haystack));
    }
}

We fill two arrays with 1000 random integers. One is the haystack – what we will search through. The other is the list of needles – we want to search for each one. For each needle, we look for it in the haystack. Then, we repeat this 1000 times.

Executing this, the script takes around 37 seconds:

% time ./bench.php
 
real    0m37.400s
user    0m37.282s
sys     0m0.068s

Now, let’s change the last for() loop to this:

15
16
17
18
19
20
21
22
for ($i = 0; $i < 1000; $i++)
{
    $tmp = array_flip($haystack);
    foreach ($needles as $needle)
    {
        if (isset($tmp[$needle]));
    }
}

The new output:

% time ./bench.php
 
real    0m0.778s
user    0m0.764s
sys     0m0.008s

Execution time drops from around 37 seconds to 0.7 seconds.

It’s All About Scope

A while ago, we were struggling with the question of whether or not to use a framework with some new code.  Specifically, did we want to use Zend Framework or not?  (The reasons for settling on ZF vs. others is the topic of a different post.)  We had been using our own little framework for almost a year, and it had served us relatively well.

Reasons to use ZF:

  • Many have worked on it and will continue to work on it, and it enjoys community support. After all, it was created by the PHP people.
  • Why re-invent the wheel?
  • Accelerates development and time to release.

Reasons not to use ZF:

  • It will impact performance. After all, it is plainly more code. The question is how much?
  • There will be a learning curve to learn the “framework” way, when we already know our own code.

My #1 concern was with performance.  So I ran some tests using Apache Bench and Zend Framework 1.0.  I won’t concern you with the details of the test because 1.0 is now a bit outdated, and the performance of Zend Framework is not really the point of this post.  But I will say that ZF was much slower than what we were currently using.

Then I got to thinking about all the websites out there.  Zend is a very nice framework – perhaps my #1 choice for frameworks at the time (though that changes frequently). And getting “Hello World” to work was easy and enjoyable.

But different websites have different audiences. If I am making a local storefront website, or even a chain website, the traffic patterns are going to be very different from a website aiming to be a national destination. If I were cranking out storefront websites every month, I think ZF is the way to go. It’s quick, and I’m sure I would be doing similar things over and over again. ZF is very good for that. Same goes for a shrink-wrap application (by shrinkwrap, I mean something downloadable and installable). If I wanted to make the next great blogging platform, and thousands of people would come to my website just to download it and put on their own hosts, again, ZF is very good for that. (In fact, I would make the application in ZF, and the website to download it from in ZF!)

But for my company, it’s different. If we are to become that national destination that we want to be, it will need every ounce of performance squeezed out. It will have a very custom environment, and require very custom features. That is why Zend Framework, and other frameworks, fail.

Performance !== Scalability

When we talk about performance, it is the response time of a single request, web page, SQL query, etc. It is the actual execution time for something in the absence of load. To illustrate, suppose you wanted to test the performance of a web page using Apache Bench. You should run something like:

% ab -n 1000 -c 1 http://www.whatever.com


The -n is the number of requests and the -c is the number of concurrent requests. Since we’re interested in end-to-end response time, we only need one concurrent request. Scalability is usually about throughput, or the number of concurrent requests within a certain period of time. Using the example above, the Apache Bench command should be something like:

% ab -c 100 -t 60 http://www.whatever.com


The -t is the amount of time to run the test. We can vary -c until individual response times begin to grow, at which point something in the system has reached its maximum capacity.