Posts Tagged ‘aws’

Honesty Box: EBS Performance Revisited

Tuesday, March 2nd, 2010

As part of my work on Honesty Box, I’ve been reviewing EBS disk performance once again. This was a great opportunity to expand on the research from last year. After re-reading what I posted then, along with the wealth of data that has been compiled since, I realized I still didn’t have sufficient information to answer two key questions.

  1. How does the number of EBS volumes impact a performance of RAID 0?
  2. Does the instance size, make a significant difference in the RAID performance?

As before I used Bonnie++ to measure the results. You can read about the full method I used below.

Results

  • RAID 0 performed better with an even number of EBS volumes.
  • RAID 0 performed best with 8 volumes for writes and random seek.
  • RAID 0 performed poorly for reads!
  • Larger instances perform significantly better than smaller instances.
  • The ephemeral store has very good overall performance.

Data

The titles of the Bonnie tests can be confusing for folks removed from the programming process. Be sure to read the full explanation of what each test is doing.

Sequential Output is a measure of the write performance to the drive. Higher bars are better. With RAID 0 it appears that an even number of drives performs significantly better than an odd number.

Sequential Create is a measure of the files created by Bonnie. Higher values are better. Test that complete too quickly return no values. That is the cause of the missing bars for Read/sec above. You can safely consider that value too fast to measure.

Sequential Input is a measure of the read performance from the drive. Higher values are better. This is concerning because of the steady decline in block read performance associated with the number of available volumes. This may have to do with the time of day that these tests were run and really warrants more investigation. It should also be noted that this is a measure of sequential performance so unless your reading contiguous files off the disk, this number may be irrelevant to you.

Random Create measures how the files are created and deleted. Higher values are better. Again, tests that happen too quickly are discarded explaining the Read/sec result having no values.

Random Seeks should scale consistently with the number of EBS volumes added. Higher values are better. However, that did not appear to be the case and a limit appeared to be reached at 8.

Effect of CPU

To test the impact of the CPU units, I selected the 4 volume array and then compared it with the tests run last year. Both were using 4 volume EBS RAID 0 with XFS file systems. They both used the noop IO scheduler. The underlying OS did change from Fedora to Ubuntu and a year has passed.

Sequential Output Taller is better. Clearly the additional IO capacity in the larger instance does make a big difference in the performance of the volumes. I would expect smaller increments in CPU capacity would result in smaller differences.

Sequential Input Taller is better. Clearly the m1.large instance out performs the smaller m1.small instance.

Thoughts and Next Steps

After reviewing the performance of the native ephemeral storage, I wonder if partitioning the ephemeral store and assembling a RAID array from there might not be the best route for high speed storage? Of course backup would be a potential issue, but snapshotting of XFS may be able to mitigate that. For future tests I would like to study the impact of using the -b flag which causes Bonnie++ to flush to disk. I also think larger volume sets as shown by these tests and different I/O schedulers may yield different results.

Method

As before I used Bonnie++ to measure disk performance but it’s limitations are fairly well understood and it gives us a metric that can be compared with other metrics. You can read the full explanation of what each value actually means here. Armed with 16 EBS stores mapped to an unused m1.large instance, I began running tests. The process was as follows:

  1. Create a new RAID set using a chunk size of 256
  2. Use XFS to format the drives
  3. Mount the filesystem w/ Ubuntu defaults
  4. Capture Bonnie results
  5. Dissassemble the RAID set
  6. Rinse and repeat

I did this for 2-10 volumes and then one additional test with 16 volumes. For comparison, I also ran the test with the ephemeral store and a single EBS volume. Those are the results represented in each of the graphs above. I reran the 6 volume test 3 times over the course of a day and took an average value for the graphs.

Cloud Pricing Models

Monday, December 14th, 2009

By ArcticNomad Yesterday Amazon announced their Spot pricing model. Effectively providing market driven pricing for instances on EC2. Depending on your product, this probably won’t impact you much, but it got me to thinking about pricing of the cloud. Amazon’s Web Services was a game changer when it launched. Buy the computing resources you need for only the time you needed them. However, your stuck with a very limited set of instances and therefore you need to architect your systems around their pre-defined instance sizes. While they expanded their instance offering to include high cpu and more recently high memory instances, you’re still stuck with a fairly rigid set of boxes from which to run your systems.

A specific weak spot I’m having with the pre-defined box sizes is Memcached. It turns out that Memcached is fairly light on the processor and requires essentially no disk I/O. Really the processor is just a go between for the memory and the network card. If you are looking at putting a 32Gb server online to manage the caching tier for your app, you’d need to buy the “High-Memory Double Extra Large Instance” for $1.20/hr (or $10,512/year) wait… what?! Okay, obviously we should pre-pay this, typical business model is to run the hardware over a 3 year cycle, so lets pay the $4,900 up front and then we enjoy a more comfortable $0.42/hr (or $3,679.20/year + $1,633.34/year for the pre-pay = $5,312.54 each year for 3 years). Obviously the $15,937.60 we pay over 3 years is easier to swallow than the $31,536 if we don’t pre-pay it.

Now, if your running your infrastructure in the cloud and considering using Memcached, you really can’t put a box in a rack somewhere else because the increased latency and unreliability means you may not be able to get data from your cache in a cost effective way so I’m not going to look at what buying a box with that kind of memory would cost, not to mention there is such variation in buying rack/ping/power that it would be too messy to calculate here.

This has me intruiged to see how other providers are doing their billing. I love the idea of a-la-carte servers paid by the hour. But really what would be great is allowing me to choose the CPU, memory, and I/O I need. This brings me to two smaller cloud providers who seem to have interesting offerings.

First up is 3Tera. 3Tera offers a completely different take on the cloud infrastructure model. The idea behind their offering is that you purchase hardware (or lease it) and then slice the box however you want. Basically, running your own virtual cloud! You can consider different hardware options, including stuffing a ton of RAM into weaker boxes and so on. Ultimately the product is a resource allocation tool. The dark side is that you have to pay for all that hardware, even if your not using it. Really this isn’t a cost savings over EC2. Although it’s an interesting idea if your system resource needs shift significantly over time, but are consistent enough to warrant buying or leasing hardware. I’m really interested in their technology and they have an impressive list of partners running the software that you can then lease the virtual images from.

The second provider is OpSource Cloud. OpSource charges a base fee for the VLAN service and then you build your infrastructure on top of that. The beauty is that it’s a-la-carte down to the cpu cycles and memory! Currently the memory footprint is limited to 8Gb and each machine needs between 1 and 4 CPU’s. However, this pricing model is interesting as you can provision a single CPU with 8Gb of RAM which comes out to roughly $0.24/hr (or $2,102.40/year). Starting 4 of these instances to hold the 32Gb of cache is only slightly cheaper than Amazon’s model coming in at a whopping $8,409.60/year. There are some cost savings available if you buy a silver, gold or platinum pricing tier for a monthly pre-pay. The pricing for those starts at $500/month and goes up; so you really need to have some significant hardware running to justify those costs. Another gotcha with this plan is that you need to provision a network which is $0.20/hr. I’m going to be keeping an eye on this provider. I think in the future they may have a winning solution.

Unfortunately, I don’t yet see a solution that fits my specific need. Perhaps I need to adjust my thinking and look at alternatives. It may be time to consider Amazon’s Simple DB, which provides simple key/value storage like Memcached, although as a service. Is it the answer for putting large amounts of data into a non-RDBMS? I’ll consider that in another post.


Creative Commons Photo by ArcticNomad

Amazon Opening US-WEST-1

Wednesday, December 2nd, 2009

While I don’t have access to it yet, apparently Amazon has opened US-WEST-1 for EC2. Customers using enStratus have access already.

This is important because anyone leveraging platforms like Facebook or MySpace have just put themselves nearly 3000 miles closer to the key data centers where these platforms are running out of.

MySQL Relational Database Service on AWS

Tuesday, October 27th, 2009

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Elastic Load Balancing in Multiple Zones

Thursday, July 30th, 2009

Ran into a problem this morning with Amazon’s Elastic Load Balancer. If you want to have multiple availability zones, say us-east-1a and us-east-1b, behind your elastic load balancer, be sure to have at least one healthy instance running in each. What happens otherwise is inbound requests will “dead end” and serve up 503 errors. This is because the DNS actually resolves to each zone regardless of the health checks and before passing requests to the actual load balancer for that zone. In otherwords, the zones are unaware of the status of the machines in different zones. :(

From Paul@AWS on the Amazon Discussion Forum:

The output from your describe call shows that you have two zones enabled:

       <AvailabilityZones> 
          <member>us-east-1b</member> 
          <member>us-east-1a</member> 
        </AvailabilityZones>

but you only have instances behind one of them. Whenever your client happens to get directed to the empty zone (which happens at DNS resolution time), it will have a dead-end.

The solution is to either add instances in the additional zone or disable that extra zone.

You can read the full thread that tipped me off this morning to the issue I was experiencing.

3 Amazon Elastic Load Balancer Tips

Wednesday, July 15th, 2009

Amazon Web Services Logo

Getting running on Amazon’s Elastic Load Balancer is easy. Once your up, you’ll also need to monitor it and do some basic maintenance of your nodes. These tips should make the most of the Elastic Load Balancer and show you some simple ways to get the monitoring data you’ll need.

1. Configure Health Checks

If an instance is off or not responding, you will want the load balancer to stop sending requests to those instances ASAP. The following code will setup a check that polls your server every 5 seconds. Be warned, this functionality is not enabled by default!

elb-configure-healthcheck  ApplicationServers  --target "TCP:80" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

Once you have this setup, the load balancer will check that port 80 responds to http requests and will stop sending requests to any instances if it sees a problem. You can then check in on the status of your instances with the following command:

elb-describe-instance-health ApplicationServers
INSTANCE-ID  i-12345678  InService
INSTANCE-ID  i-23456789  InService

2. Clear Out Old Instances

If you are using auto scaling to automatically add the instances to the load balancer, you can probably skip this one. But if you are like me and add instances to the load balancer only after completing the startup scripts, you’ll need to periodically clean out any invalid instances. After running with elastic load balancer for a few weeks, I found I had extra instances registered with the load balancer that were no longer running. When an instance that registers itself is shut down by auto scaling, the load balancer isn’t updated. This is VERY important, because after a week, the instance id will likely have cycled through to someone else!

elb-deregister-instances-from-lb ApplicationServers --instances i-23456789,i-34567890

3. Check Monitoring Values

You already know about describing the instance health from when you setup the health check before. Now checkout the cloud watch monitoring tools. If you’re not using auto scaling, these values in combination with your own internal metrics will help determine when to add capacity. Spend some time with your log files and these metrics.

mon-list-metrics | grep "ApplicationServers"
 
HealthyHostCount    AWS/ELB  {LoadBalancerName=ApplicationServers}
Latency             AWS/ELB  {LoadBalancerName=ApplicationServers}
RequestCount        AWS/ELB  {LoadBalancerName=ApplicationServers}
UnHealthyHostCount  AWS/ELB  {LoadBalancerName=ApplicationServers}
 
mon-get-stats HealthyHostCount --statistics Average,Minimum,Maximum --dimensions "LoadBalancerName=ApplicationServers" --namespace "AWS/ELB" --period 600 --headers
 
Time                 Samples  Average  Minimum  Maximum  Unit
2009-07-16 03:47:00  98.0     2        2.0      2.0      Count
2009-07-16 03:57:00  103.0    2        2.0      2.0      Count
2009-07-16 04:07:00  98.0     2        2.0      2.0      Count
2009-07-16 04:17:00  98.0     2        2.0      2.0      Count
2009-07-16 04:27:00  99.0     2        2.0      2.0      Count
2009-07-16 04:37:00  98.0     2        2.0      2.0      Count

Scaling Out with EC2, CloudWatch, Auto Scaling and Elastic Load Balancing

Thursday, July 9th, 2009

Amazon Web Services Logo Earlier this year, Amazon launched a suite of new services that replaced the need to work with a product like Scalr and RightScale for building scaleable applications on the EC2 platform. Those tools help you allocate more resources according to current application load. The key benefit of using a cloud based service is that you only pay for what you use. However, without one of the afore mentioned providers, and their additional costs, you were in a lurch designing a system that could detect the current load of your infrastructure and respond accordingly. Amazon has now made it very simple to create infrastructure that can expand AND contract very simply, of course only paying for what you use.

The Tools

Elastic Load Balancer

Solutions for load balancing were as varied as round robin selection DNS to running a load balancer on an instance (I’d been using Nginx on an m1.small instance $0.10/hr). Nginx worked well, with an assigned an elastic ip (static ip) that could move from machine to machine as needed and special scripts to manage the pool of servers (or do it manually). It worked, but was by no means efficient or even easy to maintain. Furthermore, there is a single point of failure with the Nginx host. Being proactive, it was possible to create a monitoring system to monitor Nginx, and then bring up and configure a new server before re-directing the elastic ip to the new host should it fail. It was a hack and certainly not elegant!

Enter elastic load balancing. You create an elastic load balancer and then add the instances to the load balancer. That’s it! Amazon handles the redundancy and the best part is that it’s only $0.025/hr that’s a savings of $54/month over running a load balancer instance. There is of course a drawback. With Nginx and other load balancers, you have the option to do intelligent load balancing. Advanced functionality like sticky sessions and response rewriting isn’t available for the Amazon solution. However, with a well designed application, this should be irrelevant.

CloudWatch

Monitoring the cloud is VERY important. Amazon has issues with all sorts of things from EBS stores going offline to instances being completely unavailable. Before CloudWatch, I used a mix of systems including SNMP monitoring and 3rd party service Pingdom to keep tabs on my instances. The CloudWatch product doesn’t replace these, but rather supplements the data I gather from them. CloudWatch is an additional $0.015 per server above the default instance cost, it takes about 2 minutes to come online and the statistics are available through the API almost immediately after that. CloudWatch provides access to monitored instance’s CPU utilization, disk read bytes, disk read operations, disk write bytes, disk write operations, network in, and network out. I find for my needs, CPU utilization is an excellent indicator of server performance and I use that to determine when to add a new server or take one away.

You can gather these statistics grouped by AMI, Instance Id, instance type and even AutoScaling group. If you can reliably detect your need to add an additional server based on these statistics, you’ll be able to take advantage of Auto Scaling; more on that in a minute. If not, it’s very simple to write a script that determines if it’s time to start a new server up to help with processing and register it with the load balancer. Oh, and did I mention for the load balancer you also get access to healthy host count, latency, request count, and unhealthy host count? These could be helpful metrics for rolling your own scaling scripts or may be sufficient for knowing when you need an additional server.

Auto Scaling

This is the glue that brings it all together. Auto Scaling monitors your statistics from CloudWatch and starts new instances when needed then turns them back off when no longer needed. Currently this is all offered for FREE if you are using CloudWatch! The setup is simple once you go through it the first time, but took me a couple of tries to get it right. So in my case, I monitor my application server pool and when I see that it’s stressed, I add another server. Because of the way it’s configured, I have some safe guards in place that keep me from starting thousands of instances too.

How To Do It

Background

This assumes you’ve installed all the Amazon CLI tools for Elastic Load Balancing, Auto Scaling and CloudWatch, your fairly comfortable at the command line and know how to make your own AMI. Now, you’ll need to determine what the best way for you to publish your code to a new server is. Some possible solutions to this are rsync, subversion, nfs mount, s3 or a mix of technologies. Some folks just bundle up the code in their AMI (works well if your codebase is static). Regardless, that’s a bit beyond the scope of this post. After you create your solution, you’ll create a server image (AMI) that can boot up and correctly get a copy of the code you’re running. If you already have that, you can of course just use that one. Once you create an instance that can be turned on and handle traffic…

The Process

  1. Create the Load Balancer
  2. Create the Auto Scale Launch Config
  3. Create the Auto Scale Group
  4. Create the Auto Scale Trigger(s)

Create the Load Balancer

The DNS-NAME that is returned is the point you’ll direct all traffic to. Add this as a CNAME in your DNS for your domain.

elb-create-lb ApplicationServer --availability-zones us-east-1a --listener "protocol=http,lb-port=80,instance-port=80"
DNS-NAME ApplicationServer-12345678.us-east-1.elb.amazonaws.com

Create the Auto Scale Launch Config

The AMI will of course be your AMI that knows how to come online and get a fresh copy of your code and you may be using different instance types. Definitely take a look over the documentation to ensure you are doing it all right

as-create-launch-config AppServerConfig --image-id ami-12345678 --instance-type m1.small --group default

Create the Auto Scale Group

I use a nice long cooldown period here (10 minutes) so that the servers don’t come online or go offline too quickly. If you expect an occasional slashdotting, you might want this to be shorter. This also provides some a boundry. There will always be at least 1 server and no more than 3. This also tells auto scaling that you want the new instance to join the load balancer.

as-create-auto-scaling-group AppServerGroup --launch-configuration AppServerConfig --availability-zones us-east-1a --min-size 1 --max-size 3 --cooldown 600 --load-balancers ApplicationServer

Create the Auto Scale Trigger(s)

You will likely spend a good bit of time working on this portion. What this basically does is if the average CPU utilization for my servers is above 70% for 10 minutes, bring a new server online. Then likewise, if it falls below 30% for 10 minutes, turn one off. The Auto Scaling Group we created ensures there is always at least 1 server online.

as-create-or-update-trigger AppServerTrigger --auto-scaling-group AppServerGroup --namespace "AWS/EC2" --measure CPUUtilization --statistic Average --dimensions "AutoScalingGroupName=AppServerGroup" --period 60 --lower-threshold 30 --upper-threshold 70 --lower-breach-increment=-1 --upper-breach-increment 1 --breach-duration 600

That is all there is to it! You now have an system that can grow your application servers up on demand! I hope this helps you build out an infrastructure that lets you scale up your next web application. You might want to look over the command line tool documentation before getting started.

Loading Data Into Bash Variables

Tuesday, July 7th, 2009

Go away or I will replace you with a very small shell script. For those unfamiliar with Unix and Linux environments, bash is the command line shell that is standard on many distributions. These examples grew out of challenges attempting to automate EC2 processes. These basic principles of course can be applied more generally as needed. My goal is to simply provide the options I need most often in a single place. As I continue to automate routine tasks, I might just be able to replace myself, or others, with a series of scripts!

The Gaps

The Solutions

Loading data from a URL

With Amazon EC2 many startup values are available via a HTTP request to the internal address instance-data.ec2.internal. If you want to know more about these values, the Developer Guide is a good resource.

#!/bin/bash
MY_INSTANCE_ID=`exec wget -q -O - http://instance-data.ec2.internal/latest/meta-data/instance-id`
echo $MY_INSTANCE_ID

This script grabs the instance id and puts it into a variable and prints it back out by using the result of a remote execution to wget with the quiet (-q) and the output file set as standard output (-O -), the second dash is what sends the data to standard output so don’t forget it! Now anywhere in our script we want the instance id for string comparison, logging or whatever, we have it!

Loading data from a file

What if the data we want to load is in a file on the disk? This method is not good for processing giant apache access logs, but with smaller text files, it will work just fine.

#!/bin/bash
FILE_DATA=( $( /bin/cat file_data.txt ) )
for I in $(/usr/bin/seq 0 $((${#FILE_DATA[@]} - 1)))
	do
		echo $I $FILE_DATA[$i]
	done

What’s going on? The code is being loaded into an array, in bash, called FILE_DATA. It then loops over each element in the array using a for loop. Finally within the loop, we simply print the current index and then output the line we loaded. This would be roughly equivalent to running cat -n file_data.txt from the shell directly, but obviously gives us the flexibility to do further processing with the string contained in the variable.

Loading data from the user

Obviously this is not ideal for creating a process that runs on a cron job. However, if a script is being run by a user, they often need to tweak something about the way it runs that often can’t be detected automatically. In this case, you’ll want the user to key the data directly into your script.

#!/bin/bash
read -p "Enter Something: " VARIABLE
echo $VARIABLE

This example uses read with the optional prompt (-p) flag. This causes the text in the quotes to be displayed on the users standard output or terminal window.

Loading data from the command line

A step further is to let the user pass in data on the command line at run time. This of course can also be automated if needed. The following example leverages getopts to parse the parameters that were called in.

#!/bin/bash
OPT_A=0
OPT_B='Undefined'
while getopts ":ab:" OPTION
do
	case $OPTION in
	a ) OPT_A=1 ;;
	b ) OPT_B=$OPTARG ;;
	esac
done
shift $(($OPTIND - 1))
echo $OPT_A $OPT_B

The example script takes 2 different parameters a flag “-a” and a flag “-b” which expect data. In the example, default values are provided for each value, this gives the effect of making all flags optional. Using the flag -a would likely toggle a specific behavior within your script, perhaps loading a specific configuration file instead of the default one. If you wanted to collect data in each field, you simply add a colon “:” after each flag, ‘a’ in this example, following the getopts command. You would then update the case statement to reflect your expectation of data being present in $OPTARG. See the modified script below for clarification.

#!/bin/bash
OPT_A=0
OPT_B='Undefined'
while getopts ":a:b:" OPTION
do
	case $OPTION in
	a ) OPT_A=$OPTARG ;;
	b ) OPT_B=$OPTARG ;;
	esac
done
shift $(($OPTIND - 1))
echo $OPT_A $OPT_B

But wait… there’s more!

There’s also a simple way to pass data in that just stores the input from the command line into the $1, $2, $3, $4 and so on input variables.

#!/bin/bash
echo $2 $1

The script above when run as “./test_script hello world” will output “world hello”. This method can be handy for scripting quick tasks that you often use a series of parameters for. For example, adding the flags “-la” to “ls” as demonstrated below.

#!/bin/bash
ls -la $1

Script Configuration

So now that we can get different bits of data from all these different sources, what if all my scripts leverage the same data? Can’t I just have it as a single configuration file that I edit once? YES! This next example does just that. While it doesn’t technically load data into a variable, it does allow you to encapsulate your code, including a file full of variable assignments, into logical chunks. In my case, I was looking to avoid editing multiple scripts when configuration changes were needed.

First I created my configuration script, my_script.cfg, in the same directory I am running my example script below.

# Comments are allowed
OPT_1='Ubuntu'
OPT_2='Linux'
OPT_3='64bit'

Now the script that uses the configuration file above.

#!/bin/bash
OPT_1='RedHat'
OPT_2='Linux'
OPT_3='i386'
if [ -f my_script.cfg ];then 
	. my_script.cfg
fi
echo $OPT_1 $OPT_2 $OPT_3

Dissecting the script you’ll see that I first set some default values. Next the code checks for the existence of the configuration file. If found, it is included. It’s important to note that this is included because it actually allows you to run code within the configuration file. An EC2 instance might, for example, place all of the calls to instance-data.ec2.internal for metadata into a configuration file that’s simply included on scripts that use that information.

That’s it! Hope you find this resource helpful!

And for anyone looking to put those around you on alert, buy the t-shirt from Think Geek.

Exploring Amazon CloudFront

Sunday, July 5th, 2009

A few weeks ago I switched Sexii to use Amazon’s CloudFront content delivery network (CDN) instead of serving images through my own Apache server. The decision came after analysis of watching my EC2 instance get slammed through a couple of high volume peaks. I crunched my logs and found that over 80% of the traffic I had was serving images! Adding a new instance for handling images exclusively would certainly solve the problem, but at ~$75/month for a new instance, I figured there was probably a better way. My ultimate goal here to increase the potential work my infrastructure can do per instance hour, not necessarily save costs, but I don’t want to spend more than I need to.

Setting Up

Setting up CloudFront is like any AWS service, agree to the terms and costs and get going. There’s a full API for interacting with it, but rather than learn another new API, I opted for the point and click features of S3 Organizer. If you’re unfamiliar with the product, it gives you Finder or Explorer like functionality for managing the data you store on the Amazon S3 platform.

I started by creating a unique bucket in S3 and uploading my images. Now with my content happily hosted on S3, I actually had a CDN of sorts. I could have set the access control list (ACL) to allow anonymous reads for that bucket then updated my code accordingly and stopped there. S3 would handle the serving of the content and my server is free to do other tasks. Some popular websites are already doing this; for example Twitter serves your profile picture from S3. This actually accomplishes my primary goal, reducing the stress on my server. But, being the curious geek that I am, I wanted to try out the full blown CDN and with the pay per use pricing model, it was low risk.

With the bucket setup, S3 Fox was able to create the distribution with a couple of clicks (literally). What surprised me was how long it actually took (10 minutes) for the distribution to come online. Since my image URL’s are assembled on the fly in code, I updated my configuration file to reflect the new source and I was live. I did run into one problem, the URL’s for images had to be syntactically correct and I was bit by this. By default, Apache is more forgiving and just ignores the double forward slash.

Works:
<img src="http://a5e3px8iw78h4.cloudfront.net/images/logo.png" />
Doesn't:
<img src="http://a5e3px8iw78h4.cloudfront.net/images//logo.png" />

A nice feature, that I didn’t use, is the ability to map your own domain name to the delivery network; effectively masking a name like http://a5e3px8iw78h4.cloudfront.net with http://mycdn.example.com if you want. This wasn’t critical for me and so I skipped it, besides Amazon’s DNS servers are likely much more robust than mine.

Costs

Before dissecting the costs, its important to understand the content I moved. Sexii doesn’t actually have many images. It relies heavily on CSS and the layout as the design. It was done this way to eliminate hosting costs early on. MySpace applications that have home and profile surfaces incur a very high traffic volume with a low return on the investment. The assets hosted by the application, are a set of small icons used in a feature that facilitates flirting. The feature shows a grid of around 30 icons that the user can attach to a flirt. Each icon is a separate image file and all of these icons are shown at the same time resulting in about 30 requests for <1k images. See below for the consequences of this design mistake.

Costs are cumulative across the AWS platform and so are a bit difficult to break down on a per application basis, but I'm going to try to give you an overview of what the change to CloudFront cost me and where I can potentially save more money. I've skipped over the costs of uploading and hosting on S3 as they're insignificant for my data and usage pattern, less than $0.08/month.

CloudFront Requests By Country

During the roughly 4 1/2 day window analyised, the cost for CloudFront was $6.44, roughly $1.46/day. I served over 5.4 million images or 14.3 images per second to the 4 regions. If I had built a dedicated server and configured my site to use that for all images, it would have cost me around $11.44 including bandwidth for the same time period. The largest portion of the cost for me was the overall number of requests. By rewriting the icon code to use CSS positioning, I expect to reduce the request count more than 90% (-$4.95 or -$1.12/day). The bandwidth number should stay about the same because the larger icon file transfers the same amount of information as the 30 independent requests.

CloudFront Cost Breakdown

For those of you who are curious, the Japanese traffic was the smallest portion at 0.11% and Hong Kong was 0.16%.

Where Next?

Move more to CloudFront!

First step, after reducing the number of requests, is to move externally loaded CSS and JavaScript files. These are no-brainers (and are even suggested on the CloudFront site). Taking it a bit further, I’m considering moving static advertising iframe files, common in Social Network based applications. I can continue to maintaing my generic demographic targeting by creating a series of files that include the appropriate demographic distinctions. skyscraper_m_18.html, skyscraper_f_18.html, skyscraper_m_25.html, skyscraper_f_25.html and so on. I’ll be looking to see if this can be better handled through S3 alone to keep the costs of these low value CPM/CPC ads lower. MySpace hosts the actual application code on their servers, but sites like Hi5 and Orkut do not. I would strongly suggest moving your application.xml files to a CDN and only run the ajax response files on your application servers.

Where would it all end?
It’s conceivable you could host an entire brochureware site on CloudFront but that may be taking it too far.

Conclusion

I’m pleased with CloudFront. I’ve definitely increased the capacity of the application server which is especially useful during peak times. It also exposed a design weakness, a sad but important lesson to learn. This is a GREAT way for companies to get the speed benefits of a CDN and increase the processing capacity of an application server.

Amazon AWS Command Line Tool Help

Monday, June 29th, 2009

Amazon Web Services Logo Amazon’s Web Services are very handy, although sometimes the command line tool syntax is a little awkward to remember and the documentation, while extensive, is not quite as simple to navigate as I’d like. I’m providing these help files as a reference for anyone who might need them. As you are no doubt aware, you can also get this content directly by issuing <aws_command_name> --help in the shell. For me it’s much easier to have these up in a browser window so I can quickly toggle between it and the command line without losing my place. I’ve added references for Auto Scaling, CloudWatch, Elastic Compute Cloud and Elastic Load Balancing.

© 1998-2008 AF-Design, All rights reserved.