Archive for the ‘hosting’ Category

Share Testing Your AWS Elastic Load Balancer

Tuesday, July 27th, 2010

Vijay Ramachandran asked me, via twitter, how to test if an Amazon Elastic Load Balancer is really doing it’s job. Because 140 characters really isn’t sufficient space to handle this answer, I’ve created this post. Feel free to use any of this in any of your environment.

First, I’ll assume you’ve covered some of the basics with ELB.

The default configuration you’ll end up with following my guides above is a stateless system that distributes the requests more or less evenly across all configured servers. However, when you do it the first time, it’s nice to see that it’s actually doing what you think it should be. The steps are simple

  1. Verify each instance is working as expected
  2. Verify the load balancer is distributing the requests across multiple instances
  3. Verify the instances are working behind the load balancer

1. Verify each instance is working

This is far and away the easiest step. You can simply access each machine by the amazon assigned IP address for that specific instance and ensure that it’s doing what you expect. The only potential issue here is you might jump from one machine to a different machine if you are not watching your URL. For example, if you are on ec2-123-123-123-123.compute-1.amazonaws.com, access your application at that address and ensure it works as expected, if it jumps to a domain name because you’ve hard coded a link somewhere, you may not be testing the new server at all.

2. Verify the load balancer is distributing the requests across multiple instances

To test that requests are being distributed across multiple machines, I use a test file. I generate my test file automatically by running the following script as part of the boot-up routine. This simply saves the instance-id from the metadata into a text file. If you are uncomfortable placing this information in the web root, you can optionally place it behind basic authentication, put it into a script that hashes it (md5 or sha1) or some other application based logic to access it.

/usr/local/bin/curl http://169.254.169.254/latest/meta-data/instance-id
 > /var/www/html/instance-id.txt

Check the path for curl and the web root for your local system and adjust accordingly. This should work from RedHat flavored distributions.

Once you’ve run this on each of your instances, you can tell that requests are being distributed to both machines by simply requesting your load balancer address and verifying that it changes. (Obviously replace the following request with the correct address for your machine.)

http://applicationservers-123456789.us-east-1.elb.amazonaws.com/instance-id.txt

3. Verify the instances are working behind the load balancer

Now for the last and final test. Confident that your requests are being distributed across both machines, test that your application works as expected. First under the Amazon assigned name, applicationservers-123456789.us-east-1.elb.amazonaws.com in this example, then under your CNAME’d alias.

If everything still works, you can assume all is good.

4. Bonus Check

If you really, really, really want to know… you can also verify using your access logs. Check in /var/log/httpd/access_log or wherever your web server logs are kept to see that requests are being distributed to each machine.

DNS Tips:

1. Never use the real IP returned from dig or nslookup as an A record in DNS unless you automate checking it (and even still I wouldn’t) because the actual IP changes from time to time. Only use CNAME entries.

2. If you are using GoDaddy’s DNS tool, you can’t CNAME the root of a domain (ie .example.com). For this case I use one instance as a permanent instance with an elastic IP and point the root A record for my domains to this. I then assign www. as a CNAME for the load balancer’s AWS assigned domain. Last but not least, I use .htaccess and mod_rewrite to ensure requests are sent to www.example.com. This ensures traffic is being sent to the load balancer address.

Share Amazon Sweetens the Cloud Pot

Friday, March 26th, 2010

I was excited to learn that AWS has sweetened the pot for people who want to try out infrastructure as a service by eliminating bandwidth charges less than 1Gb. Furthermore, the aggregation of pricing across services means many smaller sites will get their bandwidth for FREE!

I strongly believe that if all your putting up for your website is static pages and a few photos, you could effectively use S3 as your lone hosting solution. Now that the bandwidth for these small sites would be zero, your charge is only the cost of the files, which would likely be less than $0.10 per month.

The email I received is below.

Dear AWS Customers,

Starting April 1, 2010, your Data Transfer Out pricing tier for a given Region will be based on your total Data Transfer Out usage within that Region for Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon SimpleDB, Amazon Relational Database Service (Amazon RDS), Amazon Virtual Private Cloud (Amazon VPC), and Amazon Simple Queue Service (Amazon SQS). Until now, usage tiers have been calculated individually for each service, based on data transfer related to that service. Because AWS is now aggregating your total Data Transfer Out usage across multiple services, you can reach higher usage tiers and lower pricing more quickly. In addition, you’ll benefit from a complimentary tier which provides your first GB of outbound transfer in each Region each month at no charge.

The tiered pricing for Data Transfer Out is as follows for each Region:

First 1 GB of data transferred out per month is free
Remainder of first 10 TB per Month: $0.15 per GB
Next 40 TB per Month: $0.11 per GB
Next 100 TB per Month: $0.09 per GB
Over 150 TB per Month: $0.08 per GB
As you may know, all inbound data transfer is free of charge until June 30, 2010. All data transfer usage (both inbound and outbound) for participating Amazon Web Services now appears in aggregate in its own section of your AWS account activity page and monthly bill. As a bonus, you’ll notice that your first GB of outbound data transfer in each Region is now included free of charge.

As always, thank you for your support.

Sincerely,

The Amazon Web Services Team

Share Three Types of Clouds

Wednesday, March 3rd, 2010


Last night I attended CloudCamp in Minneapolis. While there was much healthy discussion about the “cloud”, one thing became crystal clear for me. The cloud means different things to different people. George Reese summed it up well, there are three distinct types of clouds: Infrastructure, Platform and Software. I took away from the discussions that this distinction wasn’t clear for many people (including myself).

Infrastructure as a Service (IaaS):
Amazon and Rackspace are the two largest players in this space, but there are other solid offerings (including ReliaCloud) that compete with them. This is very similar in concept to leasing a dedicated server from an ISP, but with flexible pricing. Keith Schacht pointed out on my post about cloud pricing models, that some providers are offering non-virtualized infrastructure on a per-hour basis. Is there any benefit to choosing a virtualized machine vs. a real machine? I think that goes beyond the scope of this discussion. Something companies need to be aware of here is that running infrastructure in the cloud doesn’t reduce the need for good system administrators and that in terms of architecture very little has changed. Developers in this tier still need to be concerned with system capacity etc. The upside is many of these problems are well understood and the solutions for dealing with them are common place.

Platform as a Service (PaaS):
Salesforce and Google App Engine run platforms which you can build services on. These providers abstract everything away so that product can become the focus. Designing products for platforms doesn’t require an in-depth understanding of the sub-systems. Developers don’t need to know if MySQL, Oracle, MS SQL Server or some other storage engine are handling the data storage layer, they can just trust that the data is being stored and retrieve it when they need it. Of course this model has limitations and anyone building a product would be well served to learn about the best way to leverage the platform efficiently. The drawbacks are obvious as well. Google is an extremely reliable provider, however, they do have downtime. It’s is also extremely difficult to migrate platforms. None of the vendors currently provide import/export style functionality for data.

Software as a Service (Saas):
SaaS isn’t consumer offerings, such as those from 37 signals. Those are products or applications. What I’m referring to is a lower level software like MySQL, SQL Server, Amazon’s SQS and so on. Leveraging these services provides a unique opportunity to use the best solutions for each task, instead of a complete vendor lock-in. Developers interact with the tools and sub-systems they’re already familiar with. What the SaaS vendor does is abstract the management and scaleability tasks. Unfortunately this has a dark side. Reliance on multiple providers requires building systems that degrade gracefully when any single sub-system is no longer available. Zynga, developers of the massively popular Facebook game Farmville, build their scaleable systems in the cloud using the notion of degradeable services. Architecting the solution such that it’s dependence on external systems can be dialed back on demand. Building this into products up-front requires a different way of thinking about application design. Someone raised the point last night during the breakout about architecting for the cloud, that these are the same problems that were being solved in the 70′s. Designing networks of loosely coupled systems is not a new problem. However, it is a problem that many developers I’ve met haven’t spent much time thinking about… yet.

Share MySQL Relational Database Service on AWS

Tuesday, October 27th, 2009

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Share Elastic Load Balancing in Multiple Zones

Thursday, July 30th, 2009

Ran into a problem this morning with Amazon’s Elastic Load Balancer. If you want to have multiple availability zones, say us-east-1a and us-east-1b, behind your elastic load balancer, be sure to have at least one healthy instance running in each. What happens otherwise is inbound requests will “dead end” and serve up 503 errors. This is because the DNS actually resolves to each zone regardless of the health checks and before passing requests to the actual load balancer for that zone. In otherwords, the zones are unaware of the status of the machines in different zones. :(

From Paul@AWS on the Amazon Discussion Forum:

The output from your describe call shows that you have two zones enabled:

       <AvailabilityZones> 
          <member>us-east-1b</member> 
          <member>us-east-1a</member> 
        </AvailabilityZones>

but you only have instances behind one of them. Whenever your client happens to get directed to the empty zone (which happens at DNS resolution time), it will have a dead-end.

The solution is to either add instances in the additional zone or disable that extra zone.

You can read the full thread that tipped me off this morning to the issue I was experiencing.

Share 3 Amazon Elastic Load Balancer Tips

Wednesday, July 15th, 2009

Amazon Web Services Logo

Getting running on Amazon’s Elastic Load Balancer is easy. Once your up, you’ll also need to monitor it and do some basic maintenance of your nodes. These tips should make the most of the Elastic Load Balancer and show you some simple ways to get the monitoring data you’ll need.

1. Configure Health Checks

If an instance is off or not responding, you will want the load balancer to stop sending requests to those instances ASAP. The following code will setup a check that polls your server every 5 seconds. Be warned, this functionality is not enabled by default!

elb-configure-healthcheck  ApplicationServers  --target "TCP:80" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

Once you have this setup, the load balancer will check that port 80 responds to http requests and will stop sending requests to any instances if it sees a problem. You can then check in on the status of your instances with the following command:

elb-describe-instance-health ApplicationServers
INSTANCE-ID  i-12345678  InService
INSTANCE-ID  i-23456789  InService

2. Clear Out Old Instances

If you are using auto scaling to automatically add the instances to the load balancer, you can probably skip this one. But if you are like me and add instances to the load balancer only after completing the startup scripts, you’ll need to periodically clean out any invalid instances. After running with elastic load balancer for a few weeks, I found I had extra instances registered with the load balancer that were no longer running. When an instance that registers itself is shut down by auto scaling, the load balancer isn’t updated. This is VERY important, because after a week, the instance id will likely have cycled through to someone else!

elb-deregister-instances-from-lb ApplicationServers --instances i-23456789,i-34567890

3. Check Monitoring Values

You already know about describing the instance health from when you setup the health check before. Now checkout the cloud watch monitoring tools. If you’re not using auto scaling, these values in combination with your own internal metrics will help determine when to add capacity. Spend some time with your log files and these metrics.

mon-list-metrics | grep "ApplicationServers"
 
HealthyHostCount    AWS/ELB  {LoadBalancerName=ApplicationServers}
Latency             AWS/ELB  {LoadBalancerName=ApplicationServers}
RequestCount        AWS/ELB  {LoadBalancerName=ApplicationServers}
UnHealthyHostCount  AWS/ELB  {LoadBalancerName=ApplicationServers}
 
mon-get-stats HealthyHostCount --statistics Average,Minimum,Maximum --dimensions "LoadBalancerName=ApplicationServers" --namespace "AWS/ELB" --period 600 --headers
 
Time                 Samples  Average  Minimum  Maximum  Unit
2009-07-16 03:47:00  98.0     2        2.0      2.0      Count
2009-07-16 03:57:00  103.0    2        2.0      2.0      Count
2009-07-16 04:07:00  98.0     2        2.0      2.0      Count
2009-07-16 04:17:00  98.0     2        2.0      2.0      Count
2009-07-16 04:27:00  99.0     2        2.0      2.0      Count
2009-07-16 04:37:00  98.0     2        2.0      2.0      Count

Share Scaling Out with EC2, CloudWatch, Auto Scaling and Elastic Load Balancing

Thursday, July 9th, 2009

Amazon Web Services Logo Earlier this year, Amazon launched a suite of new services that replaced the need to work with a product like Scalr and RightScale for building scaleable applications on the EC2 platform. Those tools help you allocate more resources according to current application load. The key benefit of using a cloud based service is that you only pay for what you use. However, without one of the afore mentioned providers, and their additional costs, you were in a lurch designing a system that could detect the current load of your infrastructure and respond accordingly. Amazon has now made it very simple to create infrastructure that can expand AND contract very simply, of course only paying for what you use.

The Tools

Elastic Load Balancer

Solutions for load balancing were as varied as round robin selection DNS to running a load balancer on an instance (I’d been using Nginx on an m1.small instance $0.10/hr). Nginx worked well, with an assigned an elastic ip (static ip) that could move from machine to machine as needed and special scripts to manage the pool of servers (or do it manually). It worked, but was by no means efficient or even easy to maintain. Furthermore, there is a single point of failure with the Nginx host. Being proactive, it was possible to create a monitoring system to monitor Nginx, and then bring up and configure a new server before re-directing the elastic ip to the new host should it fail. It was a hack and certainly not elegant!

Enter elastic load balancing. You create an elastic load balancer and then add the instances to the load balancer. That’s it! Amazon handles the redundancy and the best part is that it’s only $0.025/hr that’s a savings of $54/month over running a load balancer instance. There is of course a drawback. With Nginx and other load balancers, you have the option to do intelligent load balancing. Advanced functionality like sticky sessions and response rewriting isn’t available for the Amazon solution. However, with a well designed application, this should be irrelevant.

CloudWatch

Monitoring the cloud is VERY important. Amazon has issues with all sorts of things from EBS stores going offline to instances being completely unavailable. Before CloudWatch, I used a mix of systems including SNMP monitoring and 3rd party service Pingdom to keep tabs on my instances. The CloudWatch product doesn’t replace these, but rather supplements the data I gather from them. CloudWatch is an additional $0.015 per server above the default instance cost, it takes about 2 minutes to come online and the statistics are available through the API almost immediately after that. CloudWatch provides access to monitored instance’s CPU utilization, disk read bytes, disk read operations, disk write bytes, disk write operations, network in, and network out. I find for my needs, CPU utilization is an excellent indicator of server performance and I use that to determine when to add a new server or take one away.

You can gather these statistics grouped by AMI, Instance Id, instance type and even AutoScaling group. If you can reliably detect your need to add an additional server based on these statistics, you’ll be able to take advantage of Auto Scaling; more on that in a minute. If not, it’s very simple to write a script that determines if it’s time to start a new server up to help with processing and register it with the load balancer. Oh, and did I mention for the load balancer you also get access to healthy host count, latency, request count, and unhealthy host count? These could be helpful metrics for rolling your own scaling scripts or may be sufficient for knowing when you need an additional server.

Auto Scaling

This is the glue that brings it all together. Auto Scaling monitors your statistics from CloudWatch and starts new instances when needed then turns them back off when no longer needed. Currently this is all offered for FREE if you are using CloudWatch! The setup is simple once you go through it the first time, but took me a couple of tries to get it right. So in my case, I monitor my application server pool and when I see that it’s stressed, I add another server. Because of the way it’s configured, I have some safe guards in place that keep me from starting thousands of instances too.

How To Do It

Background

This assumes you’ve installed all the Amazon CLI tools for Elastic Load Balancing, Auto Scaling and CloudWatch, your fairly comfortable at the command line and know how to make your own AMI. Now, you’ll need to determine what the best way for you to publish your code to a new server is. Some possible solutions to this are rsync, subversion, nfs mount, s3 or a mix of technologies. Some folks just bundle up the code in their AMI (works well if your codebase is static). Regardless, that’s a bit beyond the scope of this post. After you create your solution, you’ll create a server image (AMI) that can boot up and correctly get a copy of the code you’re running. If you already have that, you can of course just use that one. Once you create an instance that can be turned on and handle traffic…

The Process

  1. Create the Load Balancer
  2. Create the Auto Scale Launch Config
  3. Create the Auto Scale Group
  4. Create the Auto Scale Trigger(s)

Create the Load Balancer

The DNS-NAME that is returned is the point you’ll direct all traffic to. Add this as a CNAME in your DNS for your domain.

elb-create-lb ApplicationServer --availability-zones us-east-1a --listener "protocol=http,lb-port=80,instance-port=80"
DNS-NAME ApplicationServer-12345678.us-east-1.elb.amazonaws.com

Create the Auto Scale Launch Config

The AMI will of course be your AMI that knows how to come online and get a fresh copy of your code and you may be using different instance types. Definitely take a look over the documentation to ensure you are doing it all right

as-create-launch-config AppServerConfig --image-id ami-12345678 --instance-type m1.small --group default

Create the Auto Scale Group

I use a nice long cooldown period here (10 minutes) so that the servers don’t come online or go offline too quickly. If you expect an occasional slashdotting, you might want this to be shorter. This also provides some a boundry. There will always be at least 1 server and no more than 3. This also tells auto scaling that you want the new instance to join the load balancer.

as-create-auto-scaling-group AppServerGroup --launch-configuration AppServerConfig --availability-zones us-east-1a --min-size 1 --max-size 3 --cooldown 600 --load-balancers ApplicationServer

Create the Auto Scale Trigger(s)

You will likely spend a good bit of time working on this portion. What this basically does is if the average CPU utilization for my servers is above 70% for 10 minutes, bring a new server online. Then likewise, if it falls below 30% for 10 minutes, turn one off. The Auto Scaling Group we created ensures there is always at least 1 server online.

as-create-or-update-trigger AppServerTrigger --auto-scaling-group AppServerGroup --namespace "AWS/EC2" --measure CPUUtilization --statistic Average --dimensions "AutoScalingGroupName=AppServerGroup" --period 60 --lower-threshold 30 --upper-threshold 70 --lower-breach-increment=-1 --upper-breach-increment 1 --breach-duration 600

That is all there is to it! You now have an system that can grow your application servers up on demand! I hope this helps you build out an infrastructure that lets you scale up your next web application. You might want to look over the command line tool documentation before getting started.

Share Exploring Amazon CloudFront

Sunday, July 5th, 2009

A few weeks ago I switched Sexii to use Amazon’s CloudFront content delivery network (CDN) instead of serving images through my own Apache server. The decision came after analysis of watching my EC2 instance get slammed through a couple of high volume peaks. I crunched my logs and found that over 80% of the traffic I had was serving images! Adding a new instance for handling images exclusively would certainly solve the problem, but at ~$75/month for a new instance, I figured there was probably a better way. My ultimate goal here to increase the potential work my infrastructure can do per instance hour, not necessarily save costs, but I don’t want to spend more than I need to.

Setting Up

Setting up CloudFront is like any AWS service, agree to the terms and costs and get going. There’s a full API for interacting with it, but rather than learn another new API, I opted for the point and click features of S3 Organizer. If you’re unfamiliar with the product, it gives you Finder or Explorer like functionality for managing the data you store on the Amazon S3 platform.

I started by creating a unique bucket in S3 and uploading my images. Now with my content happily hosted on S3, I actually had a CDN of sorts. I could have set the access control list (ACL) to allow anonymous reads for that bucket then updated my code accordingly and stopped there. S3 would handle the serving of the content and my server is free to do other tasks. Some popular websites are already doing this; for example Twitter serves your profile picture from S3. This actually accomplishes my primary goal, reducing the stress on my server. But, being the curious geek that I am, I wanted to try out the full blown CDN and with the pay per use pricing model, it was low risk.

With the bucket setup, S3 Fox was able to create the distribution with a couple of clicks (literally). What surprised me was how long it actually took (10 minutes) for the distribution to come online. Since my image URL’s are assembled on the fly in code, I updated my configuration file to reflect the new source and I was live. I did run into one problem, the URL’s for images had to be syntactically correct and I was bit by this. By default, Apache is more forgiving and just ignores the double forward slash.

Works:
<img src="http://a5e3px8iw78h4.cloudfront.net/images/logo.png" />
Doesn't:
<img src="http://a5e3px8iw78h4.cloudfront.net/images//logo.png" />

A nice feature, that I didn’t use, is the ability to map your own domain name to the delivery network; effectively masking a name like http://a5e3px8iw78h4.cloudfront.net with http://mycdn.example.com if you want. This wasn’t critical for me and so I skipped it, besides Amazon’s DNS servers are likely much more robust than mine.

Costs

Before dissecting the costs, its important to understand the content I moved. Sexii doesn’t actually have many images. It relies heavily on CSS and the layout as the design. It was done this way to eliminate hosting costs early on. MySpace applications that have home and profile surfaces incur a very high traffic volume with a low return on the investment. The assets hosted by the application, are a set of small icons used in a feature that facilitates flirting. The feature shows a grid of around 30 icons that the user can attach to a flirt. Each icon is a separate image file and all of these icons are shown at the same time resulting in about 30 requests for <1k images. See below for the consequences of this design mistake.

Costs are cumulative across the AWS platform and so are a bit difficult to break down on a per application basis, but I'm going to try to give you an overview of what the change to CloudFront cost me and where I can potentially save more money. I've skipped over the costs of uploading and hosting on S3 as they're insignificant for my data and usage pattern, less than $0.08/month.

CloudFront Requests By Country

During the roughly 4 1/2 day window analyised, the cost for CloudFront was $6.44, roughly $1.46/day. I served over 5.4 million images or 14.3 images per second to the 4 regions. If I had built a dedicated server and configured my site to use that for all images, it would have cost me around $11.44 including bandwidth for the same time period. The largest portion of the cost for me was the overall number of requests. By rewriting the icon code to use CSS positioning, I expect to reduce the request count more than 90% (-$4.95 or -$1.12/day). The bandwidth number should stay about the same because the larger icon file transfers the same amount of information as the 30 independent requests.

CloudFront Cost Breakdown

For those of you who are curious, the Japanese traffic was the smallest portion at 0.11% and Hong Kong was 0.16%.

Where Next?

Move more to CloudFront!

First step, after reducing the number of requests, is to move externally loaded CSS and JavaScript files. These are no-brainers (and are even suggested on the CloudFront site). Taking it a bit further, I’m considering moving static advertising iframe files, common in Social Network based applications. I can continue to maintaing my generic demographic targeting by creating a series of files that include the appropriate demographic distinctions. skyscraper_m_18.html, skyscraper_f_18.html, skyscraper_m_25.html, skyscraper_f_25.html and so on. I’ll be looking to see if this can be better handled through S3 alone to keep the costs of these low value CPM/CPC ads lower. MySpace hosts the actual application code on their servers, but sites like Hi5 and Orkut do not. I would strongly suggest moving your application.xml files to a CDN and only run the ajax response files on your application servers.

Where would it all end?
It’s conceivable you could host an entire brochureware site on CloudFront but that may be taking it too far.

Conclusion

I’m pleased with CloudFront. I’ve definitely increased the capacity of the application server which is especially useful during peak times. It also exposed a design weakness, a sad but important lesson to learn. This is a GREAT way for companies to get the speed benefits of a CDN and increase the processing capacity of an application server.

© 1998-2008 AF-Design, All rights reserved.