I Love the Cloud/I Hate the Cloud

March 5th, 2010 by Erik

Love and Hate Developers have been consuming “cloud” services long before it was a buzzword. For me the first real transition to a cloud mentality was with web services. WSDL’s provided a uniform way to consume a remote resource that was tuned to provide specific information. There were of course limitations with data typing etc, but most of those could be worked around. I didn’t concern myself with how the services I called generated or manipulated the information, only that it responded quickly and was correct. Jump forward a few years and now we can get more than data, we can get infrastructure, platforms and software via simple requests. The terminology has changed but the underlying ideas are the same.

I’ve spent a fair amount of time working and thinking about “cloud” technology in the last year. Some of this time has been joyful and some of it painful. I started this list a while ago and feel it’s finally reached a critical mass, so I’m unleashing it on the world. The remainder of this post is some of the things I Love and Hate about the cloud and the services it provides me today.

I Love the Cloud

  • Provision 500Gb of storage on a 8 volume RAID array in less than 10 minutes
  • Incremental backup in seconds
  • 500Gb of redundant storage costs pennies
  • Access to mountains of meaningful data about almost any topic (Twitter, Flickr, Google)
  • Geo-location encoding/decoding!
  • Work from anywhere (although Coworking would be my first choice)
  • A powerful server online in minutes, use it for a day and then turn it off
  • New ways of building applications using loosely coupled systems
  • I don’t have to manage failed/failing hardware
  • Server Software as a Service (MySQL, SQS, SMTP etc)
  • Rapid scalability without capital expense
  • Wide variety of service offerings (and growing every day)

I Hate the Cloud

  • Inconsistent performance from infrastructure providers
  • Inconsistent performance from API’s (ahem Facebook)
  • Automating EC2 is labor intensive
  • Inconsistent use of terminology confuses developers, executives, media, consumers… really everyone
  • Difficult to monitor resource usage to see if upgrades are necessary
  • I still have to patch and administer infrastructure (EC2)
  • Code isn’t portable
  • More vague technology acronyms and buzzwords
  • Many points of failure within applications that leverage multiple services
  • Merging / Evolving / Failing / Deprecating platforms and services
  • Quotas and request limits

What do you love and hate about “the cloud”?

People Don’t Use Clouds

March 4th, 2010 by Erik

Are Microsoft Outlook and Apple’s Mail, software? Are web based products like Gmail and Windows Live Mail cloud offerings? What about Flickr? I can edit my photos using Picnic (for now) giving me basic photo editing functionality. Does moving traditional desktop applications into a web browser make them into “cloud” software? If so, it should hold true that any web based product or service is in some way a “cloud” service from a customer perspective.

Online CRM solutions like Salesforce are often referred to as software as a service (SaaS). How then do we classify software that enables other software to function? For example, Amazon’s Simple DB and SQS services. These are SaaS solutions for developers to build products on. Do we need to further break down SaaS into more granular distinctions? CDW offers 13 different categories for software; Best Buy has 15. Clearly SaaS is too broad in scope to accurately describe what is being offered.

When I talk with non-technical people and mention the word “cloud”, eyes quickly glaze over. People don’t store their family photos in the cloud, they use Flickr. They don’t care how Flickr stores them, so long as they continue to have access to them. They can get their minds around products and services and generally don’t care how that product or service is delivered. Do Gmail, Flickr and Twitter make their life easier, more enjoyable, more profitable, more fun and so on. These are the areas they care about, not if it’s built on scaleable cloud infrastructure or redundant dedicated hardware in an enterprise grade datacenter. In the mind set of my non-technical friends, these are services, tools, websites and in some cases just verbs, adjectives and nouns. The non-technical people I talk with don’t use “search engines” anymore, they “Google it” or use Bing. People don’t use “social networking sites”, they use Facebook, Twitter or MySpace. They have little idea how their computer interacts with these services and generally don’t care. They describe the usable features a specific product has and what it does for them. Clouds for them, are things in the sky! People don’t use clouds; people use products and services. People interact with brands. Individuals outside of technology circles are far less likely to understand or even care about the distinction between SaaS, IaaS or PaaS. The current cloud terminology is lost to them.

As for technologists, what’s most important is to be clear what you mean even when speaking with folks who might “get it.” Yesterday, I described what the different types of clouds mean to me. I did this to clarify for myself and for others who might read my thoughts later. After my experience at CloudCamp, it became clear to me that irrespective of how savvy an audience may seem, it is worth while to take a minute or two up front to define what you mean when you use different cloud terminology. After all, not everyone is functioning with same operational knowledge.

So if people don’t use clouds, who does? Applications use clouds, or more specifically, cloud services. Applications and products are built to interact with services. Services are abstracted access to some unknown back end. If an application needs to write a file, it simply creates a file handle, stores the information and moves on. It doesn’t need to be aware of the underlying technology (SAN, NFS, SSD etc) that might be driving it. Using cloud services is really a discussion of how to architect applications, products and solutions to effectively and efficiently take advantage of the growing array of on-demand infrastructure, platforms and software. We do this to avoid provisioning physical resources. We do this to reduce time to market. We do this to be able to rapidly prototype products. We do this so we can throw something out there and see if it sticks. We do this to change technology cost from capital expenses to operational expenses.

Thinking about the cloud in this way, moves the discussion to tangible instead of philosophical. Engineers, system architects and developers can use these services to build products. Need a way to store data? The discussion becomes which vendor’s product meets the need of the application. Does the relational table structures in Microsoft’s SQL Azure meet the current need, or would it be better served using hash tables like Big Table or Simple DB? The focus for me is about the correct solution and not about clouds.

Three Types of Clouds

March 3rd, 2010 by Erik


Last night I attended CloudCamp in Minneapolis. While there was much healthy discussion about the “cloud”, one thing became crystal clear for me. The cloud means different things to different people. George Reese summed it up well, there are three distinct types of clouds: Infrastructure, Platform and Software. I took away from the discussions that this distinction wasn’t clear for many people (including myself).

Infrastructure as a Service (IaaS):
Amazon and Rackspace are the two largest players in this space, but there are other solid offerings (including ReliaCloud) that compete with them. This is very similar in concept to leasing a dedicated server from an ISP, but with flexible pricing. Keith Schacht pointed out on my post about cloud pricing models, that some providers are offering non-virtualized infrastructure on a per-hour basis. Is there any benefit to choosing a virtualized machine vs. a real machine? I think that goes beyond the scope of this discussion. Something companies need to be aware of here is that running infrastructure in the cloud doesn’t reduce the need for good system administrators and that in terms of architecture very little has changed. Developers in this tier still need to be concerned with system capacity etc. The upside is many of these problems are well understood and the solutions for dealing with them are common place.

Platform as a Service (PaaS):
Salesforce and Google App Engine run platforms which you can build services on. These providers abstract everything away so that product can become the focus. Designing products for platforms doesn’t require an in-depth understanding of the sub-systems. Developers don’t need to know if MySQL, Oracle, MS SQL Server or some other storage engine are handling the data storage layer, they can just trust that the data is being stored and retrieve it when they need it. Of course this model has limitations and anyone building a product would be well served to learn about the best way to leverage the platform efficiently. The drawbacks are obvious as well. Google is an extremely reliable provider, however, they do have downtime. It’s is also extremely difficult to migrate platforms. None of the vendors currently provide import/export style functionality for data.

Software as a Service (Saas):
SaaS isn’t consumer offerings, such as those from 37 signals. Those are products or applications. What I’m referring to is a lower level software like MySQL, SQL Server, Amazon’s SQS and so on. Leveraging these services provides a unique opportunity to use the best solutions for each task, instead of a complete vendor lock-in. Developers interact with the tools and sub-systems they’re already familiar with. What the SaaS vendor does is abstract the management and scaleability tasks. Unfortunately this has a dark side. Reliance on multiple providers requires building systems that degrade gracefully when any single sub-system is no longer available. Zynga, developers of the massively popular Facebook game Farmville, build their scaleable systems in the cloud using the notion of degradeable services. Architecting the solution such that it’s dependence on external systems can be dialed back on demand. Building this into products up-front requires a different way of thinking about application design. Someone raised the point last night during the breakout about architecting for the cloud, that these are the same problems that were being solved in the 70’s. Designing networks of loosely coupled systems is not a new problem. However, it is a problem that many developers I’ve met haven’t spent much time thinking about… yet.

Honesty Box: EBS Performance Revisited

March 2nd, 2010 by Erik

As part of my work on Honesty Box, I’ve been reviewing EBS disk performance once again. This was a great opportunity to expand on the research from last year. After re-reading what I posted then, along with the wealth of data that has been compiled since, I realized I still didn’t have sufficient information to answer two key questions.

  1. How does the number of EBS volumes impact a performance of RAID 0?
  2. Does the instance size, make a significant difference in the RAID performance?

As before I used Bonnie++ to measure the results. You can read about the full method I used below.

Results

  • RAID 0 performed better with an even number of EBS volumes.
  • RAID 0 performed best with 8 volumes for writes and random seek.
  • RAID 0 performed poorly for reads!
  • Larger instances perform significantly better than smaller instances.
  • The ephemeral store has very good overall performance.

Data

The titles of the Bonnie tests can be confusing for folks removed from the programming process. Be sure to read the full explanation of what each test is doing.

Sequential Output is a measure of the write performance to the drive. Higher bars are better. With RAID 0 it appears that an even number of drives performs significantly better than an odd number.

Sequential Create is a measure of the files created by Bonnie. Higher values are better. Test that complete too quickly return no values. That is the cause of the missing bars for Read/sec above. You can safely consider that value too fast to measure.

Sequential Input is a measure of the read performance from the drive. Higher values are better. This is concerning because of the steady decline in block read performance associated with the number of available volumes. This may have to do with the time of day that these tests were run and really warrants more investigation. It should also be noted that this is a measure of sequential performance so unless your reading contiguous files off the disk, this number may be irrelevant to you.

Random Create measures how the files are created and deleted. Higher values are better. Again, tests that happen too quickly are discarded explaining the Read/sec result having no values.

Random Seeks should scale consistently with the number of EBS volumes added. Higher values are better. However, that did not appear to be the case and a limit appeared to be reached at 8.

Effect of CPU

To test the impact of the CPU units, I selected the 4 volume array and then compared it with the tests run last year. Both were using 4 volume EBS RAID 0 with XFS file systems. They both used the noop IO scheduler. The underlying OS did change from Fedora to Ubuntu and a year has passed.

Sequential Output Taller is better. Clearly the additional IO capacity in the larger instance does make a big difference in the performance of the volumes. I would expect smaller increments in CPU capacity would result in smaller differences.

Sequential Input Taller is better. Clearly the m1.large instance out performs the smaller m1.small instance.

Thoughts and Next Steps

After reviewing the performance of the native ephemeral storage, I wonder if partitioning the ephemeral store and assembling a RAID array from there might not be the best route for high speed storage? Of course backup would be a potential issue, but snapshotting of XFS may be able to mitigate that. For future tests I would like to study the impact of using the -b flag which causes Bonnie++ to flush to disk. I also think larger volume sets as shown by these tests and different I/O schedulers may yield different results.

Method

As before I used Bonnie++ to measure disk performance but it’s limitations are fairly well understood and it gives us a metric that can be compared with other metrics. You can read the full explanation of what each value actually means here. Armed with 16 EBS stores mapped to an unused m1.large instance, I began running tests. The process was as follows:

  1. Create a new RAID set using a chunk size of 256
  2. Use XFS to format the drives
  3. Mount the filesystem w/ Ubuntu defaults
  4. Capture Bonnie results
  5. Dissassemble the RAID set
  6. Rinse and repeat

I did this for 2-10 volumes and then one additional test with 16 volumes. For comparison, I also ran the test with the ephemeral store and a single EBS volume. Those are the results represented in each of the graphs above. I reran the 6 volume test 3 times over the course of a day and took an average value for the graphs.

Cloud Pricing Models

December 14th, 2009 by Erik

By ArcticNomad Yesterday Amazon announced their Spot pricing model. Effectively providing market driven pricing for instances on EC2. Depending on your product, this probably won’t impact you much, but it got me to thinking about pricing of the cloud. Amazon’s Web Services was a game changer when it launched. Buy the computing resources you need for only the time you needed them. However, your stuck with a very limited set of instances and therefore you need to architect your systems around their pre-defined instance sizes. While they expanded their instance offering to include high cpu and more recently high memory instances, you’re still stuck with a fairly rigid set of boxes from which to run your systems.

A specific weak spot I’m having with the pre-defined box sizes is Memcached. It turns out that Memcached is fairly light on the processor and requires essentially no disk I/O. Really the processor is just a go between for the memory and the network card. If you are looking at putting a 32Gb server online to manage the caching tier for your app, you’d need to buy the “High-Memory Double Extra Large Instance” for $1.20/hr (or $10,512/year) wait… what?! Okay, obviously we should pre-pay this, typical business model is to run the hardware over a 3 year cycle, so lets pay the $4,900 up front and then we enjoy a more comfortable $0.42/hr (or $3,679.20/year + $1,633.34/year for the pre-pay = $5,312.54 each year for 3 years). Obviously the $15,937.60 we pay over 3 years is easier to swallow than the $31,536 if we don’t pre-pay it.

Now, if your running your infrastructure in the cloud and considering using Memcached, you really can’t put a box in a rack somewhere else because the increased latency and unreliability means you may not be able to get data from your cache in a cost effective way so I’m not going to look at what buying a box with that kind of memory would cost, not to mention there is such variation in buying rack/ping/power that it would be too messy to calculate here.

This has me intruiged to see how other providers are doing their billing. I love the idea of a-la-carte servers paid by the hour. But really what would be great is allowing me to choose the CPU, memory, and I/O I need. This brings me to two smaller cloud providers who seem to have interesting offerings.

First up is 3Tera. 3Tera offers a completely different take on the cloud infrastructure model. The idea behind their offering is that you purchase hardware (or lease it) and then slice the box however you want. Basically, running your own virtual cloud! You can consider different hardware options, including stuffing a ton of RAM into weaker boxes and so on. Ultimately the product is a resource allocation tool. The dark side is that you have to pay for all that hardware, even if your not using it. Really this isn’t a cost savings over EC2. Although it’s an interesting idea if your system resource needs shift significantly over time, but are consistent enough to warrant buying or leasing hardware. I’m really interested in their technology and they have an impressive list of partners running the software that you can then lease the virtual images from.

The second provider is OpSource Cloud. OpSource charges a base fee for the VLAN service and then you build your infrastructure on top of that. The beauty is that it’s a-la-carte down to the cpu cycles and memory! Currently the memory footprint is limited to 8Gb and each machine needs between 1 and 4 CPU’s. However, this pricing model is interesting as you can provision a single CPU with 8Gb of RAM which comes out to roughly $0.24/hr (or $2,102.40/year). Starting 4 of these instances to hold the 32Gb of cache is only slightly cheaper than Amazon’s model coming in at a whopping $8,409.60/year. There are some cost savings available if you buy a silver, gold or platinum pricing tier for a monthly pre-pay. The pricing for those starts at $500/month and goes up; so you really need to have some significant hardware running to justify those costs. Another gotcha with this plan is that you need to provision a network which is $0.20/hr. I’m going to be keeping an eye on this provider. I think in the future they may have a winning solution.

Unfortunately, I don’t yet see a solution that fits my specific need. Perhaps I need to adjust my thinking and look at alternatives. It may be time to consider Amazon’s Simple DB, which provides simple key/value storage like Memcached, although as a service. Is it the answer for putting large amounts of data into a non-RDBMS? I’ll consider that in another post.


Creative Commons Photo by ArcticNomad

Amazon Opening US-WEST-1

December 2nd, 2009 by Erik

While I don’t have access to it yet, apparently Amazon has opened US-WEST-1 for EC2. Customers using enStratus have access already.

This is important because anyone leveraging platforms like Facebook or MySpace have just put themselves nearly 3000 miles closer to the key data centers where these platforms are running out of.

Is Desktop Software Dead?

December 1st, 2009 by Erik

I’ve always been a huge fan of desktop software. It allows developers to create a unique experience specifically tailored to a specific task. It promotes consistency within the OS, always knowing that the close window button is in the same location is a huge boon to usability. It’s generally faster and can work where your internet connection doesn’t. Last but not least, you have that copy on your hard drive that you can backup, put on a thumb drive or even print out as a hex dump if your so inclined.

Lately it seems though that, more and more of the software I use on a daily basis is heavily reliant on it’s client client connecting to the real back end over the internet. Email, instant messaging, Skype and web based documents are quickly being the primary conduit for my communication with clients. Add in the dizzying array of Twitter, Facebook, FriendFeed, YouTube, LinkedIn, Flickr and suddenly there are a lot of different ways to get a hold of me, none of which are dependent on my desktop software. The dark side of this cloud based interaction is that I don’t have a nice easy way to interact – a single touchpoint, like email, that I can be part of the conversation.

Enter Raindrop, it is an amazing new project from Mozilla Labs that promises to make email relevant again. In addition to email, it will bring in all of my conversation channels into one place, becoming a communication hub for me. This software is hugely exciting, taming the communication channels in a way that Tweetdeck and Seesmic Desktop can only dream of, while keeping all the information I gather in once central location. If this project ever launches and does only 1/2 of what it says it will do, it’s going to be awesome!

Today, TechCrunch posted an article about a new project called Inbox2. This project is web based and even has a Facebook application. It also promises to tame your communication environment! While the Raindrop application will definitely rock, it can’t touch the flexibility that the web version can reach. Raindrop is doomed because all of the data it’s taming is online, it only makes sense to build the application online as well.

This brings me to my original question, is desktop software dead? Google’s new Chrome OS is betting it is. In my experience, many software applications and iPhone apps are little more than a thin wrapper around a series of web services and API calls. Even when they’re not, like Pages or Word, I’m usually going to share it with a client via email, so why not write it in Google Docs to begin with? Obviously, the line is already blurring. As a developer, it seems easier to me to prototype an application in just about any web language and just simply run it in a browser window than it is to build a client for each device, iPhone, Mac, PC, Andriod, Linux you get the idea…

What do you think, is the end here for the Desktop Software market?

Twitter Beta Testing Re-Tweet Interface

November 18th, 2009 by Erik

It seems Facebook is now showing re-tweets in a new way for some users, myself included. Instead of showing the person who re-tweeted, it’s now showing the original poster. Interesting way to give credit where credit is due.


This is the explanation.


And an example from my feed.

And a nice tabbed interface showing popular tweets and who’s retweeting them.

PHP & 64-bit Integer Modulus (Almost)

October 28th, 2009 by Erik

While at times PHP seems to be capable of 64 bit math, it’s important to understand what’s really going on. Beyond 32 bit integers, PHP is silently converting your integers to floats. While this usually isn’t a problem, many of the operations you might perform on an int, such as modulus choke when attempting to convert back to a 32 bit integer internally.

This is actually the cause of the sprintf / printf issue I encountered before. The code below provides the maximum signed value for integers between 1 and 64 bit in PHP along with the result of the built in modulus operand “%” and a function I wrote mod() which doesn’t go all the way to 64 bits, but gets us a lot closer leveraging the built in data types. If you can install external modules, you might review and test the performance of BCMath or GMP which can both handle much larger values.

<?php
 
// Find out what our internal values are capable of
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
 
// Generate an array of maximum signed 32 bit values
$ints = array();
for($pwr = 0; $pwr < 64; $pwr++){ $ints[] = pow(2,$pwr) - 1; }
 
// Generate a table of values
print "bits\t%100\tmod()\t%s\n";
$bits = 0;
foreach($ints as $int){
	$bits++;
	printf("%d\t%s\t%s\t%s\n", $bits, $int%100, mod($int,100), $int);
}
 
// (60 bit) - 1 aware modulus function
function mod($val, $mod){ return $val - floor($val/$mod) * $mod; }
?>

This generates the following table. Notice the internal value for integers is capped at 2,147,483,647 and the modulus operation goes kaput beyond 32 bits. The function provided seems to holds up through 59 bits before failing to function properly at 60.

PHP_INT_MAX: 2147483647
PHP_INT_SIZE: 4 bytes (32 bits)
bits	%100	mod()	%s
1	0	0	0
2	1	1	1
3	3	3	3
4	7	7	7
5	15	15	15
6	31	31	31
7	63	63	63
8	27	27	127
9	55	55	255
10	11	11	511
11	23	23	1023
12	47	47	2047
13	95	95	4095
14	91	91	8191
15	83	83	16383
16	67	67	32767
17	35	35	65535
18	71	71	131071
19	43	43	262143
20	87	87	524287
21	75	75	1048575
22	51	51	2097151
23	3	3	4194303
24	7	7	8388607
25	15	15	16777215
26	31	31	33554431
27	63	63	67108863
28	27	27	134217727
29	55	55	268435455
30	11	11	536870911
31	23	23	1073741823
32	47	47	2147483647
33	-1	95	4294967295
34	-1	91	8589934591
35	-1	83	17179869183
36	-1	67	34359738367
37	-1	35	68719476735
38	-1	71	137438953471
39	-1	43	274877906943
40	-1	87	549755813887
41	-1	75	1099511627775
42	-1	51	2199023255551
43	-1	3	4398046511103
44	-1	7	8796093022207
45	-1	15	17592186044415
46	-1	31	35184372088831
47	-1	63	70368744177663
48	-1	27	1.4073748835533E+14
49	-1	55	2.8147497671066E+14
50	-1	11	5.6294995342131E+14
51	-1	23	1.1258999068426E+15
52	-1	47	2.2517998136852E+15
53	-1	95	4.5035996273705E+15
54	-1	91	9.007199254741E+15
55	0	84	1.8014398509482E+16
56	0	68	3.6028797018964E+16
57	0	32	7.2057594037928E+16
58	0	64	1.4411518807586E+17
59	0	32	2.8823037615171E+17
60	0	0	5.7646075230342E+17
61	0	0	1.1529215046068E+18
62	0	0	2.3058430092137E+18
63	0	0	4.6116860184274E+18
64	0	0	9.2233720368548E+18

MySQL Relational Database Service on AWS

October 27th, 2009 by Erik

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

© 1998-2008 AF-Design, All rights reserved.