Storing files on Amazon's S3
Let's say you want to provide lots of media files online as part of your site. (These could be videos, or mp3's of say sermons, podcasts, or your music).
Most web hosting packages today offer a generous amount of storage and bandwidth. For example, GoDaddy offers 5 GB of storage and 250 GB of bandwidth for just $3.99 pm, or 100 GB of storage and 1,000 GB of bandwidth for $6.99 pm (less if you pay for a year or more).
(Note that bandwidth is the amount of data moving in and out of your account each month. So if you upload thirty 10 MB files, that's 300 MB of upload bandwidth used that month, and if your users then download just one of those files five hundred times, that will be a further 5 GB of download bandwidth, for a total of 5.3 GB of bandwidth used that month).
What however if a video on your site gets mentioned in a popular blog read by hundreds of thousands of people, many of them who then try and download it? This is what is known as the Slashdot effect. The popular online technical news site Slashdot, with 5.5 million visitors a month, often links to previously obscure items of news. Then over the next few hours up to a million people suddenly try to access that item! Often the server hosting the item crashes. But let's say it doesn't - the person hosting that account may get a whopping bill that month for extra bandwidth usage! Say the item was a 20 MB video, and it was downloaded a million times - this would be 20,000 GB of bandwidth!
There is an alternative. Companies like Google and Amazon see a future in which we will have everything stored in 'the cloud' online. Having already spent billions on developing enormous data centers, they are starting to make their facilities available for other people to use as well.
Amazon's S3 (Simple Storage Service) is one such offering. It allows you to store your files on Amazon's servers, either in the US or in Europe, for a miniscule sum of money. Here are S3's basic rates for the US (Europe is slightly different).
Storage
$0.15 per GB-Month of storage usedData Transfer
$0.10 per GB - all data transfer in$0.18 per GB - first 10 TB / month data transfer out
$0.16 per GB - next 40 TB / month data transfer out
$0.13 per GB - data transfer out / month over 50 TB
So if you have 100 GB of storage online, this would cost you $15 per month. If you used the 2,000 GB (or 2 TB) of bandwidth we mentioned above, this would cost you a further $360 - a very reasonable bill for a million downloads of a 20 MB video!!!
I recently decided to use Amazon's S3 service to store all the sermons for our church. These already take up over 3.5 GB, and while we're not expecting a million people to simultaneously download any of them soon, the price of storage was very attractive!
S3 is surprisingly easy to use. You will need to start by creating an account on Amazon for using S3. Go to the Amazon S3 page and click on this link:
![]()
You will need to enter a credit card number. Once you have set up your account, Amazon will send you an email with a link to a page that has your Access Key ID and Secret Access Key. You will need these to access S3.
![]()

S3 stores files in a different way to regular web hosts, so you will need a special S3 client program to interface to S3. On Windows there is a excellent free Firefox extension called S3Fox that opens and runs in a tab in your browser. Or you can purchase the standalone Bucket Explorer program for $29.99. Both have strengths and weaknesses, and I'll be using them both in the rest of this tutorial. On the Mac, Panic Inc's Transmit FTP program now also supports S3, as does BinaryNight's Forklift (which also supports FXP).
S3 stores files in buckets. These are containers that can hold an unlimited number of objects, or files. Each account can have up to 100 buckets, and you can specify for each bucket whether it is to be hosted in the US or in Europe. So your first step, once you have installed one of these programs and logged into S3 with your Access Key ID and your Secret Access Key, will be to create a bucket to hold your files. Please note that bucket names have to be unique across all users in the entire S3 space.
Once you have created a bucket, you can then upload files to it from any directory on your computer, much as one does with an FTP program. You can also create folders within buckets. Here is what it looks like in Bucket Explorer:

Here is the same process in S3Fox. Note that you will not see the word 'bucket' anywhere when you first log into S3Fox - it refers to folders throughout, but the top-level folders for each account are the buckets.

Once you have copied the files into your bucket, you will need to set permissions (called ACL - Access Control List - Preferences) so that these files can be seen by other people. (If you forget to do this you will get an Access Denied error when you try to access the file in your browser). You will need to set Read permissions for both the bucket and each of the files within it (all objects in S3 by default do not have any permissions for other users). In Bucket Explorer this looks like:

And in S3Fox this looks like:

Finally, you need to get the path name for each file that you want to place online. Amazon S3 files have the following pathname structure:
http://<bucket_name>.s3.amazonaws.com/<file_name>
In Bucket Explorer you can easily get the pathname for any file by right-clicking it and choosing Generate Web Url. Note that by default the url shown is for a secure (https) connection:

Click on HTTP to get a standard url (or edit the "s" out). If you forget to do this, then when you access the file through your browser you will get a certificate mismatch if you are using your own domain (see below), like this:

In S3Fox, simply right-click the file, and choose Copy URL to Clipboard.
You can also map your own domain to an Amazon S3 bucket. To do this, create a subdomain (CNAME) record at your domain registrar for your bucket. For our sermons, this was s3.knysnavineyard.org. Point this subdomain at s3.amazonaws.com. Then create a bucket with exactly the same name as your subdomain, eg here "s3.knysnavineyard.org". This is how it then works: S3 takes any incoming request for a domain, and redirects it to the bucket with the same name.
In our case, I had created a folder called "sermons" within the "s3.knysnavineyard.org" bucket, so a given sermon stored in a file called S-080106a-JVH.mp3 could then be accessed either through our domain, or through the Amazon domain:
http://s3.knysnavineyard.org/sermons/S-080106a-JVH.mp3
or
http://s3.knysnavineyard.org.s3.amazonaws.com/sermons/S-080106a-JVH.mp3
All right, so now you want to know, how many times did your video of your dog riding a surfboard get downloaded? Unfortunately, statistics are one of the weak points on S3, requiring one to slog through analysing lots of log files.

Read the article Roll your own Web Stats for Amazon S3 for more details, or visit the author's site S3STAT for one solution.
Finally, while S3 gets great reviews for price and reliability, and can certainly withstand getting Slashdotted, I have been less impressed so far with download speeds from S3. Here are the speeds achieved downloading the same 10 MB file from different servers to my computer in South Africa.
From S3 - US 2 minutes 26 seconds From S3 - Europe 2 minutes 1 seconds From GoDaddy 54 seconds From Network Solutions 56 seconds
As you can see, the speeds from S3 are significantly slower. Perhaps if I was sitting in the US or Europe this would be somewhat better - though note that both the Network Solutions and GoDaddy servers are also presumably in the US. You may need to assess where your users will be downloading to, and whether they will notice this difference (the difference may not be noticeable on slower links; I'm using 3G broadband at 1.8mbps). As they say in the funnies, "your mileage may vary"!
There were also concerns about S3's reliability after an outage of two hours in February 2008, although this was probably overrated as an issue, given that it was the first outage ever for the service and affected ony one of their three geographic locations. Amazon S3 offers an SLA (Service Level Agreement) that commits to over 99.9% uptime.
Note that with S3 your content is served straight out of their servers, and must then find its own path across the 'net. This can often be a long route, leading to delays like those noted above. Large media companies like Yahoo, MTV, Disney and many others use Content Delivery Networks (CDN's) to ensure that their content is delivered at high speed all around the world. CDN's accomplish this by automatically caching copies of their content in servers around the globe. The major players in this space are Akamai and Limelight. Their services are really only affordable by large corporates. Recently however smaller CDN's have started entering the marketplace, with pricing that is more accessible. Some players in this space are BitGravity, LocalMirror, Value CDN and CacheFly.
There is also a new S3 competitor called Nirvanix, which has been funded in part by Intel. Nirvanix offers a similar service and pricing to S3 (except for domain mapping, which costs $50 for setup and $15 pm for each domain). It also promises some basic CDN functionality of "intelligent routing of files to the closest global location for accelerated performance. As Nirvanix deploys additional nodes around the globe, the architecture balances localized demand from users in a geo-location by moving content to the closest storage node available".
Note of caution: Nirvanix is also affiliated with MediaMax, which has had some significant problems in the past.
Useful articles
 

