Critical Section

Site Optimization

Sunday,  04/13/03  05:29 PM

Recently I systematically optimized this little site.  By way of documentation and in case it is of public interest, here's what I did...

  1. Conform to standards.  Make more people and robots able to "view" the site.
  2. Reduce file sizes.  Increase speed loading pages.
  3. Serve a special home page to "robots".  Help them find everything easily.

Conform to Standards

HTML is a "loose" language.  Just about anything goes.  The popular browsers like Internet Explorer and Mozilla will "do the right thing" with all kinds of weird errors.  But for maximum compatibility it is best to have pages which are "correct".

The easiest way to make sure your pages are correct is to use an HTML validator.  I like Doctor HTML, but there are a bunch out there.  You point Doctor HTML at a page, and it tells you what (if anything) is wrong with it.  This is a great way to pick up unclosed tags, invalid syntax, etc. - it also verifies links and even checks spelling.

Most browsers and programs don't care about content-encoding, but some do.  (The ones that don't pretty much assume U.S. ASCII is in use.)  The easiest way to take care of this is simply specify the encoding in a META tag:

    <META HTTP-EQUIV="Content-type" CONTENT="text/HTML; charset=US-ASCII">

If you have templates for your pages, put this in the template and you're done.

Finally, if you're a heavy user of CSS, be sure to test the CSS you're using on all the browsers with which you want to be compatible.  I test with Internet Explorer, Mozilla, and Opera (Windows), Internet Explorer, Mozilla, and Safari (Mac), and Mozilla (Linux).  Even though your CSS may be "valid", it may not be interpreted the way you want by all browsers.  This is one reason I've stuck to frames and tables, they've been around so long pretty much all browsers treat them the same way.

Reduce File Sizes

Everyone's browsing experience will improve if you can reduce file sizes, especially people with slower connections to the Internet.  It will also enable your site to serve more people concurrently with the same amount of bandwidth.  There is nothing you can do which is better for your visitors (except give them interesting content!)

Reducing file sizes bifurcates into two kinds of activities: reducing image sizes, and reducing page sizes.

Reducing Image Sizes

Image sizes are a function of three things - the pixel dimensions of the image, the type of image, and the compression ratio.  You should never make images any bigger than they have to be.  If you have a really big image which just must be big, then put a thumbnail for it in the page's content which links to a new window with the big image.  Any image bigger than 200 x 200 pixels is a candidate for shrinkage or thumbnailing.

There are two kinds of images in wide use on the web: GIFs and JPEGs.  GIFs are best for images with a small number of colors and well-defined borders - cartoons, diagrams, flow charts, etc.  JPEGs are best for images with gradients of colors and smooth transitions - mainly photographs.  The coolest tool for shrinking images is Adobe Photoshop's "Save for the Web" feature.  This allows you to take any image and try "what if" scenarios with file format and compression ratio.  In addition, when Photoshop saves for the web it optimizes image headers, storing only the minimum information required, and enables progressive rendering, allowing larger images to be displayed incrementally as the browser receives data.  There are other tools which have similar capabilities, but Photoshop is the leader.

Reducing File Sizes

HTML pages are plain text; making them smaller is pretty tough.  Of course it is always better to use less words if you can, "brevity is the soul of wit" and all that.  But that won't really make your pages smaller.

The best thing to do for reducing HTML page sizes is to implement GZIP compression.  This means each page will be compressed before sending it out over the network, and decompressed by the browser.  Typically this reduces file sizes by about 50%.  All modern browsers say they support compression and do, but many robots do not.  If the client does not support compression the server will automatically send an uncompressed page.  There is really no downside to implementing this - do it!

If you're using Apache, the way to implement compression is via mod_gzip.  There are many parameters for mod_gzip; I found this page to be very helpful.  I use the following directives in my HTTPD.CONF file:

LoadModule gzip_module modules/   in LoadModule section, should be last
AddModule mod_gzip.c   in AddModule section, should be last
<IfModule mod_gzip.c>    
mod_gzip_on Yes   enable mod_gzip
mod_gzip_command_version '/mod_gzip_status'   status URL
mod_gzip_minimum_file_size 500   minimum file size to compress
mod_gzip_maximum_file_size 500000   maximum file size to compress
mod_gzip_maximum_inmem_size 100000   maximum file size to compress in memory
mod_gzip_min_http 1000   require HTTP/1.0 for compression
mod_gzip_handle_methods GET POST   use compression for GET or POST
mod_gzip_item_include file \.html$   compress HTML files
mod_gzip_item_include file \.cgi$   compress CGI output
mod_gzip_item_exclude file nph-.*\.cgi$   don't compress nph CGI output
mod_gzip_item_exclude file \.css$   don't compress CSS files
mod_gzip_item_include mime ^text/   compress any text types
mod_gzip_item_exclude mime ^image/   don't compress any image types
mod_gzip_add_header_count Yes   include header size in statistics
mod_gzip_dechunk Yes   correctly handle chunked output
mod_gzip_send_vary Yes   correctly handle incremental output

If you're using IIS, the way to implement compression is via the Web Service property sheet.  Microsoft has a good description of how to do this on their website.  They are cautious about recommending page compression for CPU utilization reasons, but in my experience it is always beneficial; most of the time your webserver runs out of bandwidth long before it runs out of CPU cycles.  This page also has good information about configuring IIS for compression.

After you get compression configured, you can test it using this site.  Very handy.

Serve a Special Home Page to "Robots"

I don't know about you, but I've found that "robots" make up a good deal of the traffic to my site.  These robots can be search engine spiders, various indexing tools like technorati, or analysis tools.  There are also tons of RSS aggregators out there, and although they load your site's RSS feed first, many of them come back and get page data, too.

So - I have my website setup to look for the HTTP_USER_AGENT, and if the client is a robot I serve a different home page.  This serves several purposes:

  • Robots are not interested in visual presentation, so you can eliminate images, tables, styles, etc.  (And if you're using them, you can eliminate frames, too!)  This makes the page smaller and also avoids confusing the robot.
  • Robots are interested in your links.  My "normal" home page has links as part of the articles posted there, but all the navigation links are on a separate page served as the navigation bar.  And this doesn't have all the links, either, because I have an "extended blogroll" of sites I like.  So for robots I serve a page which has the home page content, all the navigation bar links, and the extended blogroll.  This gives them all the links in one place.

How do you tell if you're dealing with a robot?  Well, if the agent string doesn't start with "Mozilla" or "Opera", it's a robot.  (For historical reasons all versions of Netscape and Internet Explorer have always used "Mozilla" in their user agent strings.)  If it starts with "Mozilla" it might still be a robot pretending to be a browser; I check for two common cases, "Slurp" (Inktomi's spider) and "Teoma" (Ask Jeeves / Teoma's spider).  There are others, but this will get you 99% of the robots.

Some handheld browsers report a non-Mozilla user agent, like Handspring's Blazer and AvantGo.  This is a good thing; the robot version of the home page is perfect for a handheld (no graphics, straightforward layout, all links present, etc.).  For this reason it is better to put links at the bottom than the top; nobody wants to see your blogroll before your content.

Here's the robot version of my home page, in case you're interested...  It was a little more work, but it's nice to keep the robots happy :)


About Me

Greatest Hits
Correlation vs. Causality
The Tyranny of Email
Unnatural Selection
Aperio's Mission = Automating Pathology
On Blame
Try, or Try Not
Books and Wine
Emergent Properties
God and Beauty
Moving Mount Fuji The Nest Rock 'n Roll
IQ and Populations
Are You a Bright?
Adding Value
The Joy of Craftsmanship
The Emperor's New Code
Toy Story
The Return of the King
Religion vs IQ
In the Wet
the big day
solving bongard problems
visiting Titan
unintelligent design
the nuclear option
estimating in meatspace
second gear
On the Persistence of Bad Design...
Texas chili cookoff
almost famous design and stochastic debugging
may I take your order?
universal healthcare
triple double
New Yorker covers
Death Rider! (da da dum)
how did I get here (Mt.Whitney)?
the Law of Significance
Holiday Inn
Daniel Jacoby's photographs
the first bird
Gödel Escher Bach: Birthday Cantatatata
Father's Day (in pictures)
your cat for my car
Jobsnotes of note
world population map
no joy in Baker
vote smart
exact nonsense
introducing eyesFinder
to space
where are the desktop apps?