Are you sick of hearing about the cloud yet? If not, then you’ll probably be eager to read this article. If you are, then you should read this article for an easy way to take advantage of cloud infrastructures like Amazon’s S3 in order to speed up your web site or application.
Why would you want to use S3?
In one word, performance. Web browsers allow a fixed number of simultaneous connections to each host as a way of balancing per-connection speed and overall load time of a page. Many web browsers are limited to just two simultaneous connections. Think about your home page – does it have 20 or 30 images? Even if they are only 500 bytes each, you still have a DNS lookup, a TCP connection, handshake, transfer, and load for each asset. Much of this process can be cached but the per-host limit remains and YSlow can visually show you how parallelized the assets for your website are downloaded. There are two ways to get around this:
- Combine multiple assets into one file to reduce download count
- Combine multiple images into a sprite file
- Concatenate CSS into a single file, even print and screen styles in a single file!
- Spread the assets of your site out across more hosts
There’s two reasons why I like S3 – one is that it has an incredible number of tools and users. The second is you have two options from a “poor mans” accelerator by putting your assets on S3 to a full-blown CDN (with literally the click of a mouse) on Amazon’s CloudFront service which uses S3 as its source. CloudFront, like CDN pioneer Akamai, put servers all around the world close to users and serve your files from the machine closest to the user. This results in fewer network hops, less latency and faster speeds improving the user experience. Other players include CacheFly, Limelight, SimpleCDN and others.
Another great use of S3 is for hosting user-uploaded files. Especially when you have a cluster of servers, you may need to make the files available across all of the nodes which involves some form of synchronization. S3 can fill in that role all while providing dirt-cheap storage ($0.15/GB per month plus $0.10/GB for transfer). Plus, no concerns about backing up those files – S3 automatically stores several copies of your file across their network for redundancy.
Lastly, S3 supports HTTP headers like Expires and Cache-Control which can tell web browsers to keep your static assets in the local cache. Subsequent page views will have a “Primed Cache” experience that radically reduces the data required to load the page, often to as little as just the raw HTML.
Did I mention it’s cheap?
What are the Gotchas?
S3 is not a web server. If you’re used to something like Apache and mod_gzip, well, you’re in for a few changes:
- Amazon can serve gzipped content (which shrinks the size of the file sent over the wire, something most browsers support) but it doesn’t automatically negotiate it. E.g., you need to compress it yourself and conditionally serve it as appropriate.
- Most people recommend setting up a FQDN like cdn.motorsportreg.com so you can point a DNS CNAME record at s3.amazonaws.com. When configured, you can access files at any of the following URLs:
- http://vanity.yourdomain.com/file.jpg – where “vanity” is the name of your bucket and vanity.yourdomain.com is a CNAME pointing to s3.amazonaws.com (something you do in your DNS setup)
- Amazon will let you set HTTP headers for caching and mime type but you must do it on a per-file basis; there is no mime.types file that automagically determines the right value for you. This can be automated but must be accounted for.
- SSL is supported but Amazon has a wildcard SSL certificate for *.s3.amazonaws.com. That means https://bucketname.s3.amazonaws.com works without browser certificate mismatch issues but using a vanity CNAME like https://vanity.yourdomain.com will throw a fit when accessed with SSL because the certificate is for *.s3.amazonaws.com and not vanity.yourdomain.com. If you serve CSS from S3 over SSL using the vanity approach, your site would appear completely unstyled in Webkit browsers like Chrome and Safari. Bummer! As of today, Amazon will not let you use your own SSL certificate. If SSL is a requirement for you, you’ll need to skip the vanity approach. If you do go this route, don’t be tempted to use the vanity URL on http and non-vanity on HTTPS. You want to always have the same reference or else the user will be downloading those files a second time!
Despite these limitations, we’re using S3 with great success by automating our deployments to account for the gotchas and we’re now serving only CFML requests from our two-node cluster. Last month, S3 storage and transfer for our medium-sized web application cost us about $1.22.
- Open an Amazon Web Services account and add S3
- Get a free S3 client like S3Fox or Cloudberry Explorer. The latter will let you set HTTP headers in a GUI application before you get to the point where you’re doing it programatically and you’ll need that capability in order to set expiry headers, mime types and so forth. S3fox is so easy to pop into Firefox that it doesn’t hurt to have it around as well.
- Create two buckets, one for compressed (gzipped) content and one for uncompressed. We named ours cdn-sitename and cdnz-sitename, where cdnz represents the gzipped bucket. IMO, it’s preferable to have two buckets with identical files in each rather than have different names for the compressed file. This makes switching between the two much simpler and you can always rely on a single path/filename regardless of how it will eventually be served. KISS!
- If you need to serve assets over SSL, then you should NOT use the CNAME vanity approach to avoid the SSL mismatch. Instead, just use the bucket names and access the files as https://bucketname.s3.amazonaws.com/…
- Big sites like YouTube and Yahoo take browser download parallelism to the max by having more than one hostname for assets. Rather than just “static.domain.com”, they have static1 and static2 and maybe more. Figure out a strategy for which assets are put on which host (you don’t want it to be random or the user will wind up downloading an equivalent asset more than once). This will let each page split up the downloads as much as possible.
- Upload your files using Cloudberry (be aware that by default, Cloudberry uploads any file greater than 10mb in “chunks” and then masks those chunks in the UI. In reality, on S3, you wind up with multiple 10-mb chunks instead of a single file. Be sure to disable this under preferences. When you’re uploading, set the Expires header to be some date in the far future. The maximum allowed by the RFC is 1 year in the future. You should also set the mime types (image/gif for GIFs, text/css for CSS, etc) or if you access the files directly, the browser won’t know what to do with it (but it will work when included in a web page in my experience)
- Modify your site to use a prefix on your static assets like:
<img src="#request.prefix#/images/foo.gif" />
Using the prefix in this way is helpful during development on your local machine where you can set prefix to an empty string and work completely offline or use local files independent of S3. In production, you populate the prefix and can turn on/off your references to S3 with a single configuration. In fact, if S3 were to have an outage, we could switch to our local web servers with a simple config file change.
- Use YUI-compressor or another tool to minify your JS and CSS on the fly and even combine files. This allows retaining your heavily commented and nicely formatted JS and CSS during development without worrying about bloating the files for deployment. No more compromise!
- You can speed up your site even further by locally including third party files you would otherwise reference remotely. For example, if you use Google Analytics, you can download ga.js and append it to your local builds. Just remember you need to update them periodically. I’ll share my Ant script in the near future which does this automatically during deployment.
- In your application, you need code like the following to determine HTTP vs. HTTPS and uncompressed vs. compressed support:
<cfif cgi.server_port_secure> <cfif findNoCase("gzip", cgi.HTTP_ACCEPT_ENCODING)> <cfset request.prefix = "https://cdnz-sitename.s3.amazonaws.com" /> <cfelse> <cfset request.prefix = "https://cdn-sitename.s3.amazonaws.com" /> </cfif> <cfelse> <cfif findNoCase("gzip", cgi.HTTP_ACCEPT_ENCODING)> <cfset request.prefix = "http://cdnz-sitename.s3.amazonaws.com" /> <cfelse> <cfset request.prefix = "http://cdn-sitename.s3.amazonaws.com" /> </cfif> </cfif>
Basically we’re building a source prefix that switches between HTTP and HTTPS and the two buckets you created for holding compressed and uncompressed content. So long as every static asset on your site has #request.prefix# at the beginning of the src, embed or link, you’ll be good to go.
- Monitor your access logs for any pages that might still be referencing static assets like HTML email templates or scheduled tasks. Within a week or two, your web access logs should be free of static asset requests.
It took me several days to get this right the first time. And now I need to do it again every time we update our static assets? How about automating this whole thing? Lucky for you, I’m going to give you the fruits of a full week of labor on my part – my Ant script! It does everything from check out my static assets from subversion to automatically pulling in remote assets like Google Analytics ga.js to compressing and uploading the two versions of each file to my buckets on S3. Look for this in the next week or so!