Tag: apache

HTTP Keep-Alive

Like most people, I did not know much about HTTP Keep-Alive headers other than that they could be very bad if used incorrectly. So I’ve kept them off, which is the default. But I ran across this blog post which explains the HTTP Keep-Alive, including its benefits and potential pitfalls pretty clearly.

It’s all pretty simple really. There is an overhead to opening and closing TCP connections. To alleviate this, Apache can agree to provide persistent connections by sending HTTP Keep-Alive headers. Then the browser can open a single connection to download multiple resources. But Apache won’t know when the browser is done downloading, so it simply keeps the connection open according to a Keep-Alive timeout, which is set to 15 seconds by default. The problem is the machine can only keep so many simultaneous requests open due to physical limitations (e.g. RAM, CPU, etc.) And 15 seconds is a long time.

To allow browsers to gain some parallelism on downloading files, without keeping persistent connections open too long, the Keep-Alive timeout value should be set to something very low, e.g. 2 seconds.

I’ve done this for static content only. Why only static content? It doesn’t really make much sense for the main page source itself since that’s the page the user wants to view.

I’ve mentioned before that by serving all static content on dedicated subdomains, we indirectly get the benefit of being able to optimize just those subdomains. So far, this meant:

  1. disabling .htaccess files
  2. setting a far-future Expires: header
  3. avoiding setting cookies on the subdomain

Now we can add to the list: enabling HTTP Keep-Alive headers. The VirtualHost block might look like this now:


    ServerName      static0.yourdomain.com
    ServerAlias     static1.yourdomain.com
    ServerAlias     static2.yourdomain.com
    ServerAlias     static3.yourdomain.com
    DocumentRoot    /var/www/vhosts/yourdomain.com
    KeepAlive On
    KeepAliveTimeout 2
    
        AllowOverride None
        ExpiresActive On
        ExpiresByType text/css "access plus 1 year"
        ExpiresByType application/x-javascript "access plus 1 year"
        ExpiresByType image/jpeg "access plus 1 year"
        ExpiresByType image/gif "access plus 1 year"
        ExpiresByType image/png "access plus 1 year"
    

Note the following applies to Windows Vista, but is probably easier on MacOS/Linux.

Is your hosts file becoming monstrous?  Do you have an alias or shortcut to your hosts file because you edit it so often?  Tired of manually adding every subdomain and domain you work on?

I was too when I thought there must be a better way.  And there was.

The general idea is this: by installing a local DNS nameserver in BIND, we can set up local development domains that look like regular domains on the internet. For real domains, we’ll just forward the requests on to a real nameserver.  This gives us a couple more benefits: 1) we can use the local nameserver as a caching nameserver to speed up DNS queries (in theory, I have not actually done this), and 2) we can choose to use any DNS service we wish, i.e. OpenDNS, or Google DNS.

Here are the steps.

  1. Follow these instructions on installing and configuring BIND and configuring a zone for your local domain.
    1. I installed BIND to C:\Windows\system32\dns.
    2. Here is my named.conf in its entirety.
      options {
          directory ";c:\windows\system32\dns\zones";
          allow-transfer { none; };
          forward only;
          forwarders {
              //208.67.222.222; // OpenDNS
              //208.67.220.220;
              8.8.8.8; // Google DNS
              8.8.4.4;
          };
          query-source address * port 53;
      };
      
      /*
      logging {
          channel queries_log {
              file "c:\windows\system32\dns\var\queries.log";
              print-severity yes;
              print-time yes;
          };
          category queries { queries_log ; };
      };
      */
      
      zone "work.local" IN {
          type master;
          file "work.local.txt";
      };
      
      key "rndc-key" {
          algorithm hmac-md5;
          secret "xxxxxxxxxxxxxxxxxxxxxxxx";
      };
      
      controls {
          inet 127.0.0.1 port 953
              allow { 127.0.0.1; } keys { "rndc-key"; };
      };
    3. I created a zone file for my development domain work.local following this zone file example. Here is the zone file in its entirety.  Note the CNAME wildcard record.
      $TTL 86400
      @	IN SOA	ns1.work.local.	admin.work.local. (
      			2008102403
      			10800
      			3600
      			604800
      			86400 )
      
      @		NS	ns1.work.local.
      
      	IN A	127.0.0.1
      ns1	IN A	127.0.0.1
      www	IN A	127.0.0.1
      *	IN CNAME	www
  2. Start or restart the BIND service.
  3. Configure you network connection to use 127.0.0.1 as your primary nameserver, instead of DHCP.  My IPv4 properties look like this:

    Set DNS nameserver to 127.0.0.1

  4. Flush the Windows DNS cache by running:
    C:\> ipconfig /flushdns
  5. Test BIND by pinging www.work.local.  If you have errors, you can uncomment the logging block in named.conf.
  6. Once that is working, create a VirtualHost in Apache for your development domain.  Thanks to VirtualDocumentRoot, we can map any number of subdomains to project roots.  Here is my VirtualHost block.
        
    
        ServerName www.work.local
        ServerAlias *.work.local
        VirtualDocumentRoot "C:/_work/%1"
        
            Options Indexes FollowSymLinks Includes ExecCGI
            AllowOverride All
            Order allow,deny
            Allow from all
    
    
    
  7. Start or restart Apache.
  8. Create a directory in C:\_work, for example, C:\_work\awesomeapp.  Create a test index.html file in that directory.
  9. You should now be able to go to http://awesomeapp.work.local in your browser and see your index.html file!

Now, you should be able to repeat step 8 for any new website you create!  No editing of hosts files, no bouncing the webserver!  Just create the project directory and it’s immediately available.

One other important note: Firefox has its own DNS cache independent of the OS.  For sanity, restarting Firefox resets its DNS cache. You can also permanently disable DNS caching in Firefox.

According to these Munin memory graphs, the large orange area is the OS buffer cache – a buffer the OS uses to cache plain ol’ file data on disk.  The graph below shows one of our web servers after we upgraded its memory. 

Web server memory usage

It makes sense that most of the memory not used by apps would be used by the OS to improve disk access.  So seeing the memory graphs filled with orange is generally a good thing.  After a few days, I watched the orange area grow and thought, “Great!  LInux is putting all that extra memory to use.”  I thought in my head that maybe it was caching images and CSS files to serve to Apache.  But was that true?

Looking At A Different Server

Here is a memory graph from one of our database servers after the RAM upgrade.

Database server memory usage

Again, I first thought that the OS was caching all that juicy database data from disk.  The problem is that we don’t have 12GB of data, and that step pattern growth was suspiciously consistent.

Looking again at the web server graph, I saw giant downward spikes of blue color, where the buffer cache was emptied.  (The blue is unused memory.)  These occurred every day at 4 am, and on Sundays there’s a huge one.  What happens every day at 4 am?  The logs are rotated.  And on Sundays, the granddaddy log of them all – the Apache log – is rotated.

The Problem

It was starting to make sense.  Log files seem to take up most of the OS buffer cache on the web servers.  Not optimal, I’m sure.  And when they’re rotated, the data in the cache is invalidated and thus freed.

Here is a memory graph for one of our other database servers.

Database server memory usage

That step pattern growth is missing!  In fact, most of RAM is unused.  What is the difference between the first database server and this one?  The first has the `mysqldump` backup.  It occurs every night at 2:30 am, right when those step changes occur on its memory usage graph.

It was clear to me that most of the OS buffer cache was wasted on logs and backups and such.  There had to be a way to tell the OS not to cache a file. 

The Solution

Google gave me this page: Improving Linux performance by preserving Buffer Cache State.  I copied the little C program into a file and ran it on all the `mysqldump` backups.  Here is the what happened to the memory usage.

Database server memory usage

Quite a bit of buffer cache was freed.  On that night’s backup, I logged the buffer cache size before the backup and after.

% cat 2008.08.21.02.30.log
Starting at Thu Aug 21 02:30:03 EDT 2008
=========================================
Cached:        4490232 kB
Cached:        5350908 kB
=========================================
Ending at Thu Aug 21 02:30:55 EDT 2008

Just under a gigabyte increase in buffer cache size.  What was the size of the new backup file?

% ll 2008.08.21.02.30.sql
-rw-r--r-- 1 root root 879727872 Aug 21 02:30 2008.08.21.02.30.sql

About 900MB.

Did It Work?

I used the C program on that page to ensure no database backups were cached by the OS.  I did the same on the web servers in the logrotate config files.  A couple days later, I checked the memory graph on the database server that performed the backup.  Notice how the buffer cache did not fill up.  It looked like the program worked, and the OS was free to cache more important things.

Database server memory usage

At work, we had set up some wildcard virtual hosts in Apache config, and that got us by for quite some time.  But the time came when we needed finer-grained control of where to send incoming requests for different domains.  I needed to store my virtual hosts in a Mysql database, mapping domains to project directories.

I’ll spare you the problems I ran into and overcame, and just list the steps to get this done.  These instructions are based on a 64-bit, RHEL 5 server running the pre-packaged Apache server.  So if you follow these instructions on a different setup, of course, filenames, directories, versions, etc. may differ.

Install mod_vhost_dbd

Download dbd-modules from Google Code.  This is a great piece of code in the form of an Apache module that uses mod_dbd and a DBD Mysql (or other database) driver to fetch the DocumentRoot for a given domain from a database.

% wget http://dbd-modules.googlecode.com/files/dbd-modules-1.0.5.zip

Unzip the archive in a directory. As indicated on the website, build and install the module.

% apxs -c mod_vhost_dbd.c
% apxs -i mod_vhost_dbd.la

This places mod_vhost_dbd.so in /usr/lib64/httpd/modules.  Enable both this module and mod_dbd by adding two lines to httpd.conf, or equivalently creating a new include file in /etc/httpd/conf.d containing these lines.

LoadModule dbd_module modules/mod_dbd.so
LoadModule vhost_dbd_module modules/mod_vhost_dbd.so

In true unit fashion, now might be a good time to restart Apache, just so you can be sure everything is working up to this point.

% service httpd restart

Install Mysql DBD Driver to APR

Unfortunately, on my system, the Mysql DBD driver was nowhere to be found.  I had to rebuild Apache Portable Runtime (APR) utils with the Mysql driver enabled.

Download apr and apr-util from Apache.  Note these are not the latest versions, but the versions that matched the packages in worked for RHEL 5.

% wget http://archive.apache.org/dist/apr-1.2.8.tar.bz2
% wget http://archive.apache.org/dist/apr-util-1.2.8.tar.bz2

Unpack and untar these archives in the same parent directory.

Build and install APR.  Now, I do not think this is absolutely necessary, but it seems like a good idea to keep the versions in sync.

% ./configure --prefix=/usr
% make
% make install

Build and install apr-util.  Due to licensing issues, apr-util does not actually contain the Mysql DBD driver until apr-util-1.2.12.  Prior to that version, it must be downloaded separately, and the configure script rebuilt.

% wget http://apache.webthing.com/svn/apache/apr/apr_dbd_mysql.c
% ./buildconf --with-apr=../apr-1.2.7

Now for the three commands every Linux admin loves.

% ./configure --prefix=/usr --with-apr=/usr --libdir=/usr/lib64 --with-expat=builtin --with-ldap-include=/usr/include --with-ldap-lib=/usr/lib64 --with-ldap=ldap --with-mysql
% make
% make install

The first time I tried this, Apache could not find any LDAP-related modules.  Adding those configure switches seemed to do the trick.  Restart Apache.

% service httpd restart

Apache should now be able to query a Mysql database to get the DocumentRoot for a domain.  My VirtualHost block looked something like this.


    ServerName *.example.com
    DocumentRoot "/path/to/default/document/root"

    DBDriver mysql
    DBDParams host=localhost,user=root,pass=secret,dbname=vhosts

    DBDocRoot "SELECT path FROM vhosts WHERE host = %s"  HOSTNAME

For more details and instructions on mod_vhost_dbd configuration directives, read the project wiki.

At work, every project has an .htaccess file containing at the least some mod_rewrite rules.  This way, all I need to do to run a project is check it out of version control.  I don’t need to modify my local Apache configuration.

But turning this option on and allowing .htaccess files may be a performance hit. More specifically, enabling the AllowOverride option in Apache is a performance hit. The Apache docs sums up the problem best:

“Wherever in your URL-space you allow overrides (typically .htaccess files) Apache will attempt to open .htaccess for each filename component. For example,

DocumentRoot /www/htdocs

   AllowOverride all

and a request is made for the URI /index.html. Then Apache will attempt to open /.htaccess, /www/.htaccess, and /www/htdocs/.htaccess.”

So I disabled all .htaccess files in production, and inserted each file’s individual mod_rewrite rules into the main Apache config file. After a quick Apache Bench run, one project looked around 3% faster. Note that there are a few other useful optimizations on that page.