Well, it only took me 18 months, but I finally got around to cleaning up and publishing the Phing filters we use to automatically transform a static site into one that implements many of Yahoo’s Exceptional Performance Rules.  These filters, together with the Apache configurations in the README, implement the process outlined in this talk from php|works 2008:

To see it in action, first create an VirtualHost pointing to the mysite directory in the project as the web root.  Then run:

% phing optimize

which creates a parallel site in a build directory.  Point your VirtualHost to the new build directory to see the same site with the performance transformations.

You could run this Phing task in a continuous integration process as part of deployment.  You could run it at production deployment time, but it’s probably a good idea to run it at staging time in case it flubs on some CSS or HTML syntax that it is not expecting.

Note that there are other miscellaneous Phing tasks in that github project.  I threw these in there in case they could be of use to other Phing users.

Securely Running A Command As Root

As much as I wish we deployed builds from our continuous integration server, all but one of our products is deployed with good ol’ `svn up`.  Developers generally have access to only one web server, so I needed an rsync command to propagate new code to the rest of the web servers.  I wanted normal user accounts to be able to run it at any time in any directory with one command.  Then developers would be instructed to run this command after updating any files.

So I whipped up an shell script that called rsync with some predefined options and targets.  Unfortunately, in order to preserve ownership and permissions in the destination, rsync needed to be run as root.

At first, I looked at the setuid bit. By changing the ownership of the rsync shell script and running `chmod u+s` on the script, setting the setuid, any user could execute it and it would run as root. Well, it turns out that the kernel will not honor setuid on shell scripts for security reasons. But what if I wrote a C program instead of a shell script? That actually worked, and ran with root privileges, but it still did not rsync as root for some reason. So that was out.

The second solution was to insert sudo before the rsync command in the script. I modified /etc/sudoers to allow the users group to run rsync under sudo. That worked perfectly. So if I put this script in /usr/local/bin, I would be done. But I had already written this magnificent (two-line) C program.  Why not make it even more secure (sudo does not work on shell scripts either)?  Instead of allowing all users to run rsync under sudo, I could limit them to running only my C program under sudo, instead of rsync in general. Then, in my script, I could replace rsync with my C program. So that’s what I did. I again modified /etc/sudoers and my shell script, threw both the script and C executable in /usr/local/bin and I was done.

I named the final command `zipsync`. Here is the shell script for that, anonymized a bit.

#!/bin/sh
 
cd /var/www/vhosts

# repeat for each web server
sudo zipsync.bin \
   -av --delete \
   --exclude=".svn" \
   --exclude="logs" \
   --exclude="tmp" \
   --exclude="cache" \
   --exclude="*.swp" \
   * 192.168.1.101:/var/www/vhosts

cd -

And the C program, zipsync.bin.

#include 
 
int main(int argc, char** argv)
{
   *argv = "rsync";
   return execvp(*argv, argv);
}

According to these Munin memory graphs, the large orange area is the OS buffer cache – a buffer the OS uses to cache plain ol’ file data on disk.  The graph below shows one of our web servers after we upgraded its memory. 

Web server memory usage

It makes sense that most of the memory not used by apps would be used by the OS to improve disk access.  So seeing the memory graphs filled with orange is generally a good thing.  After a few days, I watched the orange area grow and thought, “Great!  LInux is putting all that extra memory to use.”  I thought in my head that maybe it was caching images and CSS files to serve to Apache.  But was that true?

Looking At A Different Server

Here is a memory graph from one of our database servers after the RAM upgrade.

Database server memory usage

Again, I first thought that the OS was caching all that juicy database data from disk.  The problem is that we don’t have 12GB of data, and that step pattern growth was suspiciously consistent.

Looking again at the web server graph, I saw giant downward spikes of blue color, where the buffer cache was emptied.  (The blue is unused memory.)  These occurred every day at 4 am, and on Sundays there’s a huge one.  What happens every day at 4 am?  The logs are rotated.  And on Sundays, the granddaddy log of them all – the Apache log – is rotated.

The Problem

It was starting to make sense.  Log files seem to take up most of the OS buffer cache on the web servers.  Not optimal, I’m sure.  And when they’re rotated, the data in the cache is invalidated and thus freed.

Here is a memory graph for one of our other database servers.

Database server memory usage

That step pattern growth is missing!  In fact, most of RAM is unused.  What is the difference between the first database server and this one?  The first has the `mysqldump` backup.  It occurs every night at 2:30 am, right when those step changes occur on its memory usage graph.

It was clear to me that most of the OS buffer cache was wasted on logs and backups and such.  There had to be a way to tell the OS not to cache a file. 

The Solution

Google gave me this page: Improving Linux performance by preserving Buffer Cache State.  I copied the little C program into a file and ran it on all the `mysqldump` backups.  Here is the what happened to the memory usage.

Database server memory usage

Quite a bit of buffer cache was freed.  On that night’s backup, I logged the buffer cache size before the backup and after.

% cat 2008.08.21.02.30.log
Starting at Thu Aug 21 02:30:03 EDT 2008
=========================================
Cached:        4490232 kB
Cached:        5350908 kB
=========================================
Ending at Thu Aug 21 02:30:55 EDT 2008

Just under a gigabyte increase in buffer cache size.  What was the size of the new backup file?

% ll 2008.08.21.02.30.sql
-rw-r--r-- 1 root root 879727872 Aug 21 02:30 2008.08.21.02.30.sql

About 900MB.

Did It Work?

I used the C program on that page to ensure no database backups were cached by the OS.  I did the same on the web servers in the logrotate config files.  A couple days later, I checked the memory graph on the database server that performed the backup.  Notice how the buffer cache did not fill up.  It looked like the program worked, and the OS was free to cache more important things.

Database server memory usage

Coding In A Startup

The Center of Innovation program at the College of Applied Science at UC aims to show seniors that their choices for employment upon graduation is not limited to Fifth Third, Kroger’s, and Great American Insurance (not that there’s anything wrong with those fine companies).  The program also aims to show them that, in a region full of marketing, design, and business talent, there is serious need for young technology talent with an entrepreneurial bent.  For those who might want to take the plunge, the program also outlines business skills and resources they’ll need to complement their technology skills.

This is a great thing, and kudos and support goes to Andy Erickson and Dr. Hazem Said for their work so far.

This past Tuesday I gave a short talk to students in the Innovation Seminar series in CAS at UC about what’ it’s like to work in a startup from a coder’s point of view.  I talked about transitioning from a cubicle farm job to a startup environment, the nature and pace of working in a startup, and the tons and tons of learning that is inevitable.

You won’t get a lot from these slides without the narrative, but I post all my talks here so I thought I’d post this one.

Cross-posted on my Cincinnati blog.

 

We’ve never had long-lived sessions.  It was never a requirement.  I think we had a “Remember me” checkbox that didn’t work at one point, but we soon removed it.  But suddenly, customer requests started coming in.  They asked, “why do I have to log in every time I use the site?  Why can’t I stay logged in forever, like Facebook or Twitter?”  That was a good question.

Basic User Login

Like most sites, we used the PHP session to maintain a logged in user for our site.   We started a session, kept track of some data indicating if the user is logged in or not, and that was about it.

I never looked at sessions and cookies in-depth before.  I knew generally how sessions worked.  PHP sets a cookie in the client’s browser.  The cookie contains a session ID.  When a request comes in, PHP reads the session ID, looks for a file corresponding to the ID on disk (or in a database, memcached, etc.), reads in the file containing the session data, and loads the session into the request.  When the request finishes, the session data is saved to the file again.

Implementing The “Remember Me” Checkbox

First, naively, I thought all I had to do was find the right php.ini directive to make sessions last forever.  Browsing the PHP manual and googling, I came across the session.cookie_lifetime directive, configured in either php.ini or by session_set_cookie_params().

session.cookie_lifetime specifies the lifetime of the cookie in seconds which is sent to the browser. The value 0 means “until the browser is closed.” Defaults to 0.

I set this to 24 hours.  Well, that was easy, I thought.

Except it didn’t work.  Users reported logging in, going out to lunch, coming back, and getting logged out on the first link clicked.  I dug deeper and found another directive.

session.gc_maxlifetime specifies the number of seconds after which data will be seen as ‘garbage’ and cleaned up. Garbage collection occurs during session start.

It defaults to 1440 seconds, or 24 mins.

It’s important to know that session.cookie_lifetime starts when the cookie is set, regardless of last user activity, so it is an absolute expiration time.  session.gc_maxlifetime starts from when the user was last active (clicked), so it’s more like a maximum idle time.

Starting To Understand

Now I could see that both of these directives must cooperate to get the desired effect. Specifically, the shorter of these two values determines my session duration.

For example, let’s say I have session.cookie_lifetime set to its default of 0, and session.gc_maxlifetime is set to its default of 24 mins.  A user who logs in can stay logged in forever, provided he never closes his browser, and he never stops clicking for more than 24 mins.

Now, let’s say the same user takes a 30 min. lunch break, and leaves his browser open.  When, he gets back, he’ll most likely have been logged out because his session data was garbage collected on the server, even though his browser cookie was still there.

Now, let’s change session.cookie_lifetime to 1 hour.  A user who logs in can stay logged in for up to an hour if he clicks away for the whole time.  This is regardless of whether or not he closes/reopens his browser.  If he takes his 30 min. lunch break after working for 15 mins. he will most likely be logged out when he returns, even though his browser cookie had 15 more mins. of life.

Now, keeping session.cookie_lifetime at 1 hour, let’s set session.gc_maxlifetime to 2 hours.  A user who logs in can stay logged in for up to an hour, period.  He does not have to click at all in that time, but he’ll be logged out after an hour.

The Real “Remember Me” Solution

Back to my problem.  At this point, I could’ve just set both directives to something  like 1 year.  But since session.gc_maxlifetime controls garbage collection of session data, I’d have session data up to a year old left on the server!  I did a quick check on the PHP session directory.  There were already several thousand sessions, and that was only for a 24-minute lifetime!

Clearly, this was not how Twitter did it.  A little more digging, and I realized that sites like those do not keep your specific session around for long periods of time.  What they do is set a long-lasting cookie that contains some sort of security token.  From that token, they can authenticate you, and re-create your session, even if your session data has already been removed from the server.  (The cookie name for Twitter is auth_token and looks to have a lifetime of 20 years.)

With the session recreation method, I could control when and how to log out users, if at all.  So this enabled us to give users indefinite sessions, while keeping all session directives at their default values.

Beyond Session Cookies

This only scratches the surface of authentication topics of course.  We didn’t talk about security implications of the session re-creation method, though I will say that the best security practice against session-based attacks seems to prompt for a password if the user attempts to change or view sensitive account information.  LinkedIn is the first example that comes to mind.

Shortly after implementing this, a request came down from high above to centralize the authentication for our multiple products.  I began to investigate single sign-on (like Google accounts) and federated identity (like OpenID), but those are topics of another post.

Here are a couple blogs that got me on my way to the final solution. Be sure to read the comments: