This one is filed under “that’s pretty picky, but I guess it couldn’t hurt.”
The Entity Tags (ETags) HTTP header is a string that uniquely identifies a specific version of resource. When the browser first downloads a resource, it stores the ETag. When it requests it again, it sends along the ETag to the server. If the server sees the same ETag, it will respond with a 304 Not Modified response, saving the download.
The problem is that the default format for the ETag (in Apache) is inode-size-timestamp. And the inode will be different from server to server, meaning the server may see a different ETag from the browser, even thought it is in fact an identical file.
The end result is ETags generated by Apache and IIS for the exact same component won’t match from one server to another. If the ETags don’t match, the user doesn’t receive the small, fast 304 response that ETags were designed for; instead, they’ll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn’t a problem. But if you have multiple servers hosting your web site, and you’re using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you’re consuming greater bandwidth, and proxies aren’t caching your content efficiently.
There is another scenario where it isn’t a problem: if you are using sticky sessions in your load balancer.
In any case, as stated above, it couldn’t hurt to rectify this. So I configured the ETag format in Apache to exclude the inode, and use only size and timestamp.
FileETag MTime Size
So files across servers have the same ETag.
Editor’s note: This post formed the basis of the Front-End Optimization talk I’ve given in the past.
You’ve programmed websites for years, know the ins & outs of PHP, MySQL, why are Javascript and CSS files such a big deal? You put them in a directory, and link to them from your pages. Done. Right?
Not if you want maximum performance.
According to the Yahoo Exception Performance team:
…Only 10% of the time is spent here for the browser to request the HTML page, and for apache to stitch together the HTML and return the response back to the browser. The other 90% of the time is spent fetching other components in the page including images, scripts and stylesheets.
So static content is very important. The same Yahoo people provide us with a comprehensive list of Best (Front-end) Practices for Speeding Up Your Website. IMO, some of the rules are more important than others, and some are more easily achieved. Leaving aside hardware solutions (static server, CDN, etc.) for now, let’s look at six of the rules:
mod_deflate
in Apache. Rule 3 is a matter of configuring Apache. How to achieve the other five?
As I see it, there are three broad ways to achieve them.
<link rel="stylesheet" type="text/css" href="custom_handler.php?file1.css,file2.css" />
or something like that). It can also mean using mod_rewrite
to direct incoming requests for CSS and Javascript to go to a PHP script. Either way, there is processing on every page load. Caching the end-product helps. Still, there must be a better way. Don’t have a build server? That’s a whole other topic.
I’ve been around offshoring for quite a while now, and I’ve heard people tout the benefit of the 24-hour development cycle many times. The idea is that when your developers in the Western Hemisphere are going to sleep, your developers in the Easter Hemisphere are waking up. When those developers go to sleep, the cycle begins again. Voilá! 24×5 coding effort.
But it never made sense to me. It was like there were two people building a house, and claiming that if one person worked on the house during the day, and one person worked at night, it would be faster than if both people worked during the day. At best, it would be equal effort. It may even be a little worse because if two people work on it together, they can re-use tools and such, and they each have someone to keep them company.
Looking for some literature on using agile methods with offshore teams a while ago, I finally found someone had put the same thought in writing.
Another benefit of offshore that’s coming up is the use of 24 hour development to reduce time to market. The benefit that touted is that by putting hands on the code base at all hours of the day, functionality gets written faster. Frankly I think this is a totally bogus argument, since I don’t see what adding people does in India that it wouldn’t do by adding them to the onshore team. If I need to add people, it’s more efficient to do it while minimizing the communication difficulties.
Martin Fowler, "Using an Agile Software Process with Offshore Development"
I have been thinking lately that the longer I am in this business, the more I am amazed that any software anywhere runs successfully at all. Every scenario has to be accounted for. Every detail has to be precise. There are so many opportunities for error, from translating requirements, to miscommunications, to un-anticipated inputs, to simply flaws in logic and typos. Not to mention the equally complex task of maintaining the necessary system, storage, and networking environment for the program to run within.
In Code Complete, Steve McConnell talks about this foolhardy profession.
Nobody is really smart enough to program computers. Fully understanding an average program requires an almost limitless capacity to absorb details and an equal capacity to comprehend them all at the same time. The way you focus your intelligence is more important than how much intelligence you have.
At the 1972 Turing Award lecture, Edsger Dijkstra delivered a paper titled "The Humble Programmer." He argued that most of programming is an attempt to compensate for the strictly limited size of our skulls. The people who are best at programming are the people who realize how small their brains are. They are humble. The people who are the worst at programming are the people who refuse to accept the fact that their brains aren’t equal to the task.
The purpose of many good programming practices is to reduce the load on your gray cells. You might think that the high road would be to develop better mental abilities so you wouldn’t need these programming crutches. You might think that a programmer who uses mental crutches is taking the low road. Empirically, however, it’s been shown that humble programmers who compensate for their fallibilities write code that’s easier for themselves and others to understand and that has fewer errors.
Speaking of Dijkstra, he knew this already in 1968. In "The Structure of the T-H-E Multiprogramming System," wherein he describes the design of one of the first multitasking systems, he gives props to his students:
"The other remark is that the members of the group have previously enjoyed as good students a university training of five to eight years and are of Master’s or Ph.D. level. I mention this explicitly because at least in my country the intellectual level needed for system design is in general grossly underestimated. I am convinced more than ever that this type of work is very difficult, and that every effort to do it with other than the best people is doomed to either failure or moderate success at enormous expense."
Most of our live production code was written (by me) without any attention paid to character encodings. Fortunately, nearly every link in the LAMP chain seems to default to ISO-8859-1 nicely, so things have worked out for the most part as that. Every now and then a UTF-8 character will pop up, and we’ll either change the character in the database, or someone will use random combinations of htmlentities() and mb_convert_encoding() in some random file until it looks right in that particular case. It’s one of those cases of building up a smidgen of technical debt. Doing it the right way and switching all of our code, databases, and data from ISO-8859-1 to UTF-8 at this point makes me shudder.
For our newer systems coming online, I really wanted to get this character encoding problem right. Since we started from scratch, all the necessary endpoints were written to support UTF-8 encoded text. And we made sure that all incoming data is UTF-8 encoded. If it was not, we converted it basically using this single line.
$string = mb_convert_encoding($string, 'UTF-8');
But something was wrong. When I tried to convert a single smart quote (’) generated on my Windows machine and view it in my browser, it simply disappeared. Trawling the PHP manual for a solution (as usual), I came upon it on the manual page for utf8_encode().
Note that you should only use utf8_encode() on ISO-8859-1 data, and not on data using the Windows-1252 codepage. Microsoft’s Windows-1252 codepage contains ISO-8859-1, but it includes several characters in the range 0x80-0x9F whose codepoints in Unicode do not match the byte’s value (in Unicode, codepoints U+80 – U+9F are unassigned).
utf8_encode() simply assumes the bytes integer value is the codepoint number in Unicode.
What this means is that, for example, a single smart quote (’), sent to PHP as ISO-8859-1, and converted to UTF-8 using utf8_encode(), will not convert to the proper multi-byte character, and thus will either appear as garbage in the browser or not at all (in fact it’s not at all since the values are unassigned).
Since no third argument is given, mb_convert_encoding() will use the default internal encoding for that platform. Unfortunately, PHP uses ISO-8859-1 on Windows instead of the so-similar-yet-different-it’s-annoying-that-it-must-be-a-Microsoft-product Windows-1252 encoding, which mostly overlaps with ISO-8859-1 but has different values for certain non-control, non-ASCII punctuation characters.
The solution fortunately was also in the same manual page, which was simply a function with a hard-coded mapping to replace all the incorrectly converted Windows-1252 characters to their correct UTF-8 values.
So I modified the above line of code to look like the following, and I could see my smart quotes once again.
$string = strtr(mb_convert_encoding($string, 'UTF-8'), self::$_cp1252_map);