Archive for the ‘hosting’ Category

Three Types of Clouds

Wednesday, March 3rd, 2010


Last night I attended CloudCamp in Minneapolis. While there was much healthy discussion about the “cloud”, one thing became crystal clear for me. The cloud means different things to different people. George Reese summed it up well, there are three distinct types of clouds: Infrastructure, Platform and Software. I took away from the discussions that this distinction wasn’t clear for many people (including myself).

Infrastructure as a Service (IaaS):
Amazon and Rackspace are the two largest players in this space, but there are other solid offerings (including ReliaCloud) that compete with them. This is very similar in concept to leasing a dedicated server from an ISP, but with flexible pricing. Keith Schacht pointed out on my post about cloud pricing models, that some providers are offering non-virtualized infrastructure on a per-hour basis. Is there any benefit to choosing a virtualized machine vs. a real machine? I think that goes beyond the scope of this discussion. Something companies need to be aware of here is that running infrastructure in the cloud doesn’t reduce the need for good system administrators and that in terms of architecture very little has changed. Developers in this tier still need to be concerned with system capacity etc. The upside is many of these problems are well understood and the solutions for dealing with them are common place.

Platform as a Service (PaaS):
Salesforce and Google App Engine run platforms which you can build services on. These providers abstract everything away so that product can become the focus. Designing products for platforms doesn’t require an in-depth understanding of the sub-systems. Developers don’t need to know if MySQL, Oracle, MS SQL Server or some other storage engine are handling the data storage layer, they can just trust that the data is being stored and retrieve it when they need it. Of course this model has limitations and anyone building a product would be well served to learn about the best way to leverage the platform efficiently. The drawbacks are obvious as well. Google is an extremely reliable provider, however, they do have downtime. It’s is also extremely difficult to migrate platforms. None of the vendors currently provide import/export style functionality for data.

Software as a Service (Saas):
SaaS isn’t consumer offerings, such as those from 37 signals. Those are products or applications. What I’m referring to is a lower level software like MySQL, SQL Server, Amazon’s SQS and so on. Leveraging these services provides a unique opportunity to use the best solutions for each task, instead of a complete vendor lock-in. Developers interact with the tools and sub-systems they’re already familiar with. What the SaaS vendor does is abstract the management and scaleability tasks. Unfortunately this has a dark side. Reliance on multiple providers requires building systems that degrade gracefully when any single sub-system is no longer available. Zynga, developers of the massively popular Facebook game Farmville, build their scaleable systems in the cloud using the notion of degradeable services. Architecting the solution such that it’s dependence on external systems can be dialed back on demand. Building this into products up-front requires a different way of thinking about application design. Someone raised the point last night during the breakout about architecting for the cloud, that these are the same problems that were being solved in the 70’s. Designing networks of loosely coupled systems is not a new problem. However, it is a problem that many developers I’ve met haven’t spent much time thinking about… yet.

MySQL Relational Database Service on AWS

Tuesday, October 27th, 2009

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Elastic Load Balancing in Multiple Zones

Thursday, July 30th, 2009

Ran into a problem this morning with Amazon’s Elastic Load Balancer. If you want to have multiple availability zones, say us-east-1a and us-east-1b, behind your elastic load balancer, be sure to have at least one healthy instance running in each. What happens otherwise is inbound requests will “dead end” and serve up 503 errors. This is because the DNS actually resolves to each zone regardless of the health checks and before passing requests to the actual load balancer for that zone. In otherwords, the zones are unaware of the status of the machines in different zones. :(

From Paul@AWS on the Amazon Discussion Forum:

The output from your describe call shows that you have two zones enabled:

       <AvailabilityZones> 
          <member>us-east-1b</member> 
          <member>us-east-1a</member> 
        </AvailabilityZones>

but you only have instances behind one of them. Whenever your client happens to get directed to the empty zone (which happens at DNS resolution time), it will have a dead-end.

The solution is to either add instances in the additional zone or disable that extra zone.

You can read the full thread that tipped me off this morning to the issue I was experiencing.

3 Amazon Elastic Load Balancer Tips

Wednesday, July 15th, 2009

Amazon Web Services Logo

Getting running on Amazon’s Elastic Load Balancer is easy. Once your up, you’ll also need to monitor it and do some basic maintenance of your nodes. These tips should make the most of the Elastic Load Balancer and show you some simple ways to get the monitoring data you’ll need.

1. Configure Health Checks

If an instance is off or not responding, you will want the load balancer to stop sending requests to those instances ASAP. The following code will setup a check that polls your server every 5 seconds. Be warned, this functionality is not enabled by default!

elb-configure-healthcheck  ApplicationServers  --target "TCP:80" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

Once you have this setup, the load balancer will check that port 80 responds to http requests and will stop sending requests to any instances if it sees a problem. You can then check in on the status of your instances with the following command:

elb-describe-instance-health ApplicationServers
INSTANCE-ID  i-12345678  InService
INSTANCE-ID  i-23456789  InService

2. Clear Out Old Instances

If you are using auto scaling to automatically add the instances to the load balancer, you can probably skip this one. But if you are like me and add instances to the load balancer only after completing the startup scripts, you’ll need to periodically clean out any invalid instances. After running with elastic load balancer for a few weeks, I found I had extra instances registered with the load balancer that were no longer running. When an instance that registers itself is shut down by auto scaling, the load balancer isn’t updated. This is VERY important, because after a week, the instance id will likely have cycled through to someone else!

elb-deregister-instances-from-lb ApplicationServers --instances i-23456789,i-34567890

3. Check Monitoring Values

You already know about describing the instance health from when you setup the health check before. Now checkout the cloud watch monitoring tools. If you’re not using auto scaling, these values in combination with your own internal metrics will help determine when to add capacity. Spend some time with your log files and these metrics.

mon-list-metrics | grep "ApplicationServers"
 
HealthyHostCount    AWS/ELB  {LoadBalancerName=ApplicationServers}
Latency             AWS/ELB  {LoadBalancerName=ApplicationServers}
RequestCount        AWS/ELB  {LoadBalancerName=ApplicationServers}
UnHealthyHostCount  AWS/ELB  {LoadBalancerName=ApplicationServers}
 
mon-get-stats HealthyHostCount --statistics Average,Minimum,Maximum --dimensions "LoadBalancerName=ApplicationServers" --namespace "AWS/ELB" --period 600 --headers
 
Time                 Samples  Average  Minimum  Maximum  Unit
2009-07-16 03:47:00  98.0     2        2.0      2.0      Count
2009-07-16 03:57:00  103.0    2        2.0      2.0      Count
2009-07-16 04:07:00  98.0     2        2.0      2.0      Count
2009-07-16 04:17:00  98.0     2        2.0      2.0      Count
2009-07-16 04:27:00  99.0     2        2.0      2.0      Count
2009-07-16 04:37:00  98.0     2        2.0      2.0      Count

Scaling Out with EC2, CloudWatch, Auto Scaling and Elastic Load Balancing

Thursday, July 9th, 2009

Amazon Web Services Logo Earlier this year, Amazon launched a suite of new services that replaced the need to work with a product like Scalr and RightScale for building scaleable applications on the EC2 platform. Those tools help you allocate more resources according to current application load. The key benefit of using a cloud based service is that you only pay for what you use. However, without one of the afore mentioned providers, and their additional costs, you were in a lurch designing a system that could detect the current load of your infrastructure and respond accordingly. Amazon has now made it very simple to create infrastructure that can expand AND contract very simply, of course only paying for what you use.

The Tools

Elastic Load Balancer

Solutions for load balancing were as varied as round robin selection DNS to running a load balancer on an instance (I’d been using Nginx on an m1.small instance $0.10/hr). Nginx worked well, with an assigned an elastic ip (static ip) that could move from machine to machine as needed and special scripts to manage the pool of servers (or do it manually). It worked, but was by no means efficient or even easy to maintain. Furthermore, there is a single point of failure with the Nginx host. Being proactive, it was possible to create a monitoring system to monitor Nginx, and then bring up and configure a new server before re-directing the elastic ip to the new host should it fail. It was a hack and certainly not elegant!

Enter elastic load balancing. You create an elastic load balancer and then add the instances to the load balancer. That’s it! Amazon handles the redundancy and the best part is that it’s only $0.025/hr that’s a savings of $54/month over running a load balancer instance. There is of course a drawback. With Nginx and other load balancers, you have the option to do intelligent load balancing. Advanced functionality like sticky sessions and response rewriting isn’t available for the Amazon solution. However, with a well designed application, this should be irrelevant.

CloudWatch

Monitoring the cloud is VERY important. Amazon has issues with all sorts of things from EBS stores going offline to instances being completely unavailable. Before CloudWatch, I used a mix of systems including SNMP monitoring and 3rd party service Pingdom to keep tabs on my instances. The CloudWatch product doesn’t replace these, but rather supplements the data I gather from them. CloudWatch is an additional $0.015 per server above the default instance cost, it takes about 2 minutes to come online and the statistics are available through the API almost immediately after that. CloudWatch provides access to monitored instance’s CPU utilization, disk read bytes, disk read operations, disk write bytes, disk write operations, network in, and network out. I find for my needs, CPU utilization is an excellent indicator of server performance and I use that to determine when to add a new server or take one away.

You can gather these statistics grouped by AMI, Instance Id, instance type and even AutoScaling group. If you can reliably detect your need to add an additional server based on these statistics, you’ll be able to take advantage of Auto Scaling; more on that in a minute. If not, it’s very simple to write a script that determines if it’s time to start a new server up to help with processing and register it with the load balancer. Oh, and did I mention for the load balancer you also get access to healthy host count, latency, request count, and unhealthy host count? These could be helpful metrics for rolling your own scaling scripts or may be sufficient for knowing when you need an additional server.

Auto Scaling

This is the glue that brings it all together. Auto Scaling monitors your statistics from CloudWatch and starts new instances when needed then turns them back off when no longer needed. Currently this is all offered for FREE if you are using CloudWatch! The setup is simple once you go through it the first time, but took me a couple of tries to get it right. So in my case, I monitor my application server pool and when I see that it’s stressed, I add another server. Because of the way it’s configured, I have some safe guards in place that keep me from starting thousands of instances too.

How To Do It

Background

This assumes you’ve installed all the Amazon CLI tools for Elastic Load Balancing, Auto Scaling and CloudWatch, your fairly comfortable at the command line and know how to make your own AMI. Now, you’ll need to determine what the best way for you to publish your code to a new server is. Some possible solutions to this are rsync, subversion, nfs mount, s3 or a mix of technologies. Some folks just bundle up the code in their AMI (works well if your codebase is static). Regardless, that’s a bit beyond the scope of this post. After you create your solution, you’ll create a server image (AMI) that can boot up and correctly get a copy of the code you’re running. If you already have that, you can of course just use that one. Once you create an instance that can be turned on and handle traffic…

The Process

  1. Create the Load Balancer
  2. Create the Auto Scale Launch Config
  3. Create the Auto Scale Group
  4. Create the Auto Scale Trigger(s)

Create the Load Balancer

The DNS-NAME that is returned is the point you’ll direct all traffic to. Add this as a CNAME in your DNS for your domain.

elb-create-lb ApplicationServer --availability-zones us-east-1a --listener "protocol=http,lb-port=80,instance-port=80"
DNS-NAME ApplicationServer-12345678.us-east-1.elb.amazonaws.com

Create the Auto Scale Launch Config

The AMI will of course be your AMI that knows how to come online and get a fresh copy of your code and you may be using different instance types. Definitely take a look over the documentation to ensure you are doing it all right

as-create-launch-config AppServerConfig --image-id ami-12345678 --instance-type m1.small --group default

Create the Auto Scale Group

I use a nice long cooldown period here (10 minutes) so that the servers don’t come online or go offline too quickly. If you expect an occasional slashdotting, you might want this to be shorter. This also provides some a boundry. There will always be at least 1 server and no more than 3. This also tells auto scaling that you want the new instance to join the load balancer.

as-create-auto-scaling-group AppServerGroup --launch-configuration AppServerConfig --availability-zones us-east-1a --min-size 1 --max-size 3 --cooldown 600 --load-balancers ApplicationServer

Create the Auto Scale Trigger(s)

You will likely spend a good bit of time working on this portion. What this basically does is if the average CPU utilization for my servers is above 70% for 10 minutes, bring a new server online. Then likewise, if it falls below 30% for 10 minutes, turn one off. The Auto Scaling Group we created ensures there is always at least 1 server online.

as-create-or-update-trigger AppServerTrigger --auto-scaling-group AppServerGroup --namespace "AWS/EC2" --measure CPUUtilization --statistic Average --dimensions "AutoScalingGroupName=AppServerGroup" --period 60 --lower-threshold 30 --upper-threshold 70 --lower-breach-increment=-1 --upper-breach-increment 1 --breach-duration 600

That is all there is to it! You now have an system that can grow your application servers up on demand! I hope this helps you build out an infrastructure that lets you scale up your next web application. You might want to look over the command line tool documentation before getting started.

Exploring Amazon CloudFront

Sunday, July 5th, 2009

A few weeks ago I switched Sexii to use Amazon’s CloudFront content delivery network (CDN) instead of serving images through my own Apache server. The decision came after analysis of watching my EC2 instance get slammed through a couple of high volume peaks. I crunched my logs and found that over 80% of the traffic I had was serving images! Adding a new instance for handling images exclusively would certainly solve the problem, but at ~$75/month for a new instance, I figured there was probably a better way. My ultimate goal here to increase the potential work my infrastructure can do per instance hour, not necessarily save costs, but I don’t want to spend more than I need to.

Setting Up

Setting up CloudFront is like any AWS service, agree to the terms and costs and get going. There’s a full API for interacting with it, but rather than learn another new API, I opted for the point and click features of S3 Organizer. If you’re unfamiliar with the product, it gives you Finder or Explorer like functionality for managing the data you store on the Amazon S3 platform.

I started by creating a unique bucket in S3 and uploading my images. Now with my content happily hosted on S3, I actually had a CDN of sorts. I could have set the access control list (ACL) to allow anonymous reads for that bucket then updated my code accordingly and stopped there. S3 would handle the serving of the content and my server is free to do other tasks. Some popular websites are already doing this; for example Twitter serves your profile picture from S3. This actually accomplishes my primary goal, reducing the stress on my server. But, being the curious geek that I am, I wanted to try out the full blown CDN and with the pay per use pricing model, it was low risk.

With the bucket setup, S3 Fox was able to create the distribution with a couple of clicks (literally). What surprised me was how long it actually took (10 minutes) for the distribution to come online. Since my image URL’s are assembled on the fly in code, I updated my configuration file to reflect the new source and I was live. I did run into one problem, the URL’s for images had to be syntactically correct and I was bit by this. By default, Apache is more forgiving and just ignores the double forward slash.

Works:
<img src="http://a5e3px8iw78h4.cloudfront.net/images/logo.png" />
Doesn't:
<img src="http://a5e3px8iw78h4.cloudfront.net/images//logo.png" />

A nice feature, that I didn’t use, is the ability to map your own domain name to the delivery network; effectively masking a name like http://a5e3px8iw78h4.cloudfront.net with http://mycdn.example.com if you want. This wasn’t critical for me and so I skipped it, besides Amazon’s DNS servers are likely much more robust than mine.

Costs

Before dissecting the costs, its important to understand the content I moved. Sexii doesn’t actually have many images. It relies heavily on CSS and the layout as the design. It was done this way to eliminate hosting costs early on. MySpace applications that have home and profile surfaces incur a very high traffic volume with a low return on the investment. The assets hosted by the application, are a set of small icons used in a feature that facilitates flirting. The feature shows a grid of around 30 icons that the user can attach to a flirt. Each icon is a separate image file and all of these icons are shown at the same time resulting in about 30 requests for <1k images. See below for the consequences of this design mistake.

Costs are cumulative across the AWS platform and so are a bit difficult to break down on a per application basis, but I'm going to try to give you an overview of what the change to CloudFront cost me and where I can potentially save more money. I've skipped over the costs of uploading and hosting on S3 as they're insignificant for my data and usage pattern, less than $0.08/month.

CloudFront Requests By Country

During the roughly 4 1/2 day window analyised, the cost for CloudFront was $6.44, roughly $1.46/day. I served over 5.4 million images or 14.3 images per second to the 4 regions. If I had built a dedicated server and configured my site to use that for all images, it would have cost me around $11.44 including bandwidth for the same time period. The largest portion of the cost for me was the overall number of requests. By rewriting the icon code to use CSS positioning, I expect to reduce the request count more than 90% (-$4.95 or -$1.12/day). The bandwidth number should stay about the same because the larger icon file transfers the same amount of information as the 30 independent requests.

CloudFront Cost Breakdown

For those of you who are curious, the Japanese traffic was the smallest portion at 0.11% and Hong Kong was 0.16%.

Where Next?

Move more to CloudFront!

First step, after reducing the number of requests, is to move externally loaded CSS and JavaScript files. These are no-brainers (and are even suggested on the CloudFront site). Taking it a bit further, I’m considering moving static advertising iframe files, common in Social Network based applications. I can continue to maintaing my generic demographic targeting by creating a series of files that include the appropriate demographic distinctions. skyscraper_m_18.html, skyscraper_f_18.html, skyscraper_m_25.html, skyscraper_f_25.html and so on. I’ll be looking to see if this can be better handled through S3 alone to keep the costs of these low value CPM/CPC ads lower. MySpace hosts the actual application code on their servers, but sites like Hi5 and Orkut do not. I would strongly suggest moving your application.xml files to a CDN and only run the ajax response files on your application servers.

Where would it all end?
It’s conceivable you could host an entire brochureware site on CloudFront but that may be taking it too far.

Conclusion

I’m pleased with CloudFront. I’ve definitely increased the capacity of the application server which is especially useful during peak times. It also exposed a design weakness, a sad but important lesson to learn. This is a GREAT way for companies to get the speed benefits of a CDN and increase the processing capacity of an application server.

© 1998-2008 AF-Design, All rights reserved.