Cloud Pricing Models

December 14th, 2009 by Erik

By ArcticNomad Yesterday Amazon announced their Spot pricing model. Effectively providing market driven pricing for instances on EC2. Depending on your product, this probably won’t impact you much, but it got me to thinking about pricing of the cloud. Amazon’s Web Services was a game changer when it launched. Buy the computing resources you need for only the time you needed them. However, your stuck with a very limited set of instances and therefore you need to architect your systems around their pre-defined instance sizes. While they expanded their instance offering to include high cpu and more recently high memory instances, you’re still stuck with a fairly rigid set of boxes from which to run your systems.

A specific weak spot I’m having with the pre-defined box sizes is Memcached. It turns out that Memcached is fairly light on the processor and requires essentially no disk I/O. Really the processor is just a go between for the memory and the network card. If you are looking at putting a 32Gb server online to manage the caching tier for your app, you’d need to buy the “High-Memory Double Extra Large Instance” for $1.20/hr (or $10,512/year) wait… what?! Okay, obviously we should pre-pay this, typical business model is to run the hardware over a 3 year cycle, so lets pay the $4,900 up front and then we enjoy a more comfortable $0.42/hr (or $3,679.20/year + $1,633.34/year for the pre-pay = $5,312.54 each year for 3 years). Obviously the $15,937.60 we pay over 3 years is easier to swallow than the $31,536 if we don’t pre-pay it.

Now, if your running your infrastructure in the cloud and considering using Memcached, you really can’t put a box in a rack somewhere else because the increased latency and unreliability means you may not be able to get data from your cache in a cost effective way so I’m not going to look at what buying a box with that kind of memory would cost, not to mention there is such variation in buying rack/ping/power that it would be too messy to calculate here.

This has me intruiged to see how other providers are doing their billing. I love the idea of a-la-carte servers paid by the hour. But really what would be great is allowing me to choose the CPU, memory, and I/O I need. This brings me to two smaller cloud providers who seem to have interesting offerings.

First up is 3Tera. 3Tera offers a completely different take on the cloud infrastructure model. The idea behind their offering is that you purchase hardware (or lease it) and then slice the box however you want. Basically, running your own virtual cloud! You can consider different hardware options, including stuffing a ton of RAM into weaker boxes and so on. Ultimately the product is a resource allocation tool. The dark side is that you have to pay for all that hardware, even if your not using it. Really this isn’t a cost savings over EC2. Although it’s an interesting idea if your system resource needs shift significantly over time, but are consistent enough to warrant buying or leasing hardware. I’m really interested in their technology and they have an impressive list of partners running the software that you can then lease the virtual images from.

The second provider is OpSource Cloud. OpSource charges a base fee for the VLAN service and then you build your infrastructure on top of that. The beauty is that it’s a-la-carte down to the cpu cycles and memory! Currently the memory footprint is limited to 8Gb and each machine needs between 1 and 4 CPU’s. However, this pricing model is interesting as you can provision a single CPU with 8Gb of RAM which comes out to roughly $0.24/hr (or $2,102.40/year). Starting 4 of these instances to hold the 32Gb of cache is only slightly cheaper than Amazon’s model coming in at a whopping $8,409.60/year. There are some cost savings available if you buy a silver, gold or platinum pricing tier for a monthly pre-pay. The pricing for those starts at $500/month and goes up; so you really need to have some significant hardware running to justify those costs. Another gotcha with this plan is that you need to provision a network which is $0.20/hr. I’m going to be keeping an eye on this provider. I think in the future they may have a winning solution.

Unfortunately, I don’t yet see a solution that fits my specific need. Perhaps I need to adjust my thinking and look at alternatives. It may be time to consider Amazon’s Simple DB, which provides simple key/value storage like Memcached, although as a service. Is it the answer for putting large amounts of data into a non-RDBMS? I’ll consider that in another post.


Creative Commons Photo by ArcticNomad

Amazon Opening US-WEST-1

December 2nd, 2009 by Erik

While I don’t have access to it yet, apparently Amazon has opened US-WEST-1 for EC2. Customers using enStratus have access already.

This is important because anyone leveraging platforms like Facebook or MySpace have just put themselves nearly 3000 miles closer to the key data centers where these platforms are running out of.

Is Desktop Software Dead?

December 1st, 2009 by Erik

I’ve always been a huge fan of desktop software. It allows developers to create a unique experience specifically tailored to a specific task. It promotes consistency within the OS, always knowing that the close window button is in the same location is a huge boon to usability. It’s generally faster and can work where your internet connection doesn’t. Last but not least, you have that copy on your hard drive that you can backup, put on a thumb drive or even print out as a hex dump if your so inclined.

Lately it seems though that, more and more of the software I use on a daily basis is heavily reliant on it’s client client connecting to the real back end over the internet. Email, instant messaging, Skype and web based documents are quickly being the primary conduit for my communication with clients. Add in the dizzying array of Twitter, Facebook, FriendFeed, YouTube, LinkedIn, Flickr and suddenly there are a lot of different ways to get a hold of me, none of which are dependent on my desktop software. The dark side of this cloud based interaction is that I don’t have a nice easy way to interact – a single touchpoint, like email, that I can be part of the conversation.

Enter Raindrop, it is an amazing new project from Mozilla Labs that promises to make email relevant again. In addition to email, it will bring in all of my conversation channels into one place, becoming a communication hub for me. This software is hugely exciting, taming the communication channels in a way that Tweetdeck and Seesmic Desktop can only dream of, while keeping all the information I gather in once central location. If this project ever launches and does only 1/2 of what it says it will do, it’s going to be awesome!

Today, TechCrunch posted an article about a new project called Inbox2. This project is web based and even has a Facebook application. It also promises to tame your communication environment! While the Raindrop application will definitely rock, it can’t touch the flexibility that the web version can reach. Raindrop is doomed because all of the data it’s taming is online, it only makes sense to build the application online as well.

This brings me to my original question, is desktop software dead? Google’s new Chrome OS is betting it is. In my experience, many software applications and iPhone apps are little more than a thin wrapper around a series of web services and API calls. Even when they’re not, like Pages or Word, I’m usually going to share it with a client via email, so why not write it in Google Docs to begin with? Obviously, the line is already blurring. As a developer, it seems easier to me to prototype an application in just about any web language and just simply run it in a browser window than it is to build a client for each device, iPhone, Mac, PC, Andriod, Linux you get the idea…

What do you think, is the end here for the Desktop Software market?

Twitter Beta Testing Re-Tweet Interface

November 18th, 2009 by Erik

It seems Facebook is now showing re-tweets in a new way for some users, myself included. Instead of showing the person who re-tweeted, it’s now showing the original poster. Interesting way to give credit where credit is due.


This is the explanation.


And an example from my feed.

And a nice tabbed interface showing popular tweets and who’s retweeting them.

PHP & 64-bit Integer Modulus (Almost)

October 28th, 2009 by Erik

While at times PHP seems to be capable of 64 bit math, it’s important to understand what’s really going on. Beyond 32 bit integers, PHP is silently converting your integers to floats. While this usually isn’t a problem, many of the operations you might perform on an int, such as modulus choke when attempting to convert back to a 32 bit integer internally.

This is actually the cause of the sprintf / printf issue I encountered before. The code below provides the maximum signed value for integers between 1 and 64 bit in PHP along with the result of the built in modulus operand “%” and a function I wrote mod() which doesn’t go all the way to 64 bits, but gets us a lot closer leveraging the built in data types. If you can install external modules, you might review and test the performance of BCMath or GMP which can both handle much larger values.

<?php
 
// Find out what our internal values are capable of
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
 
// Generate an array of maximum signed 32 bit values
$ints = array();
for($pwr = 0; $pwr < 64; $pwr++){ $ints[] = pow(2,$pwr) - 1; }
 
// Generate a table of values
print "bits\t%100\tmod()\t%s\n";
$bits = 0;
foreach($ints as $int){
	$bits++;
	printf("%d\t%s\t%s\t%s\n", $bits, $int%100, mod($int,100), $int);
}
 
// (60 bit) - 1 aware modulus function
function mod($val, $mod){ return $val - floor($val/$mod) * $mod; }
?>

This generates the following table. Notice the internal value for integers is capped at 2,147,483,647 and the modulus operation goes kaput beyond 32 bits. The function provided seems to holds up through 59 bits before failing to function properly at 60.

PHP_INT_MAX: 2147483647
PHP_INT_SIZE: 4 bytes (32 bits)
bits	%100	mod()	%s
1	0	0	0
2	1	1	1
3	3	3	3
4	7	7	7
5	15	15	15
6	31	31	31
7	63	63	63
8	27	27	127
9	55	55	255
10	11	11	511
11	23	23	1023
12	47	47	2047
13	95	95	4095
14	91	91	8191
15	83	83	16383
16	67	67	32767
17	35	35	65535
18	71	71	131071
19	43	43	262143
20	87	87	524287
21	75	75	1048575
22	51	51	2097151
23	3	3	4194303
24	7	7	8388607
25	15	15	16777215
26	31	31	33554431
27	63	63	67108863
28	27	27	134217727
29	55	55	268435455
30	11	11	536870911
31	23	23	1073741823
32	47	47	2147483647
33	-1	95	4294967295
34	-1	91	8589934591
35	-1	83	17179869183
36	-1	67	34359738367
37	-1	35	68719476735
38	-1	71	137438953471
39	-1	43	274877906943
40	-1	87	549755813887
41	-1	75	1099511627775
42	-1	51	2199023255551
43	-1	3	4398046511103
44	-1	7	8796093022207
45	-1	15	17592186044415
46	-1	31	35184372088831
47	-1	63	70368744177663
48	-1	27	1.4073748835533E+14
49	-1	55	2.8147497671066E+14
50	-1	11	5.6294995342131E+14
51	-1	23	1.1258999068426E+15
52	-1	47	2.2517998136852E+15
53	-1	95	4.5035996273705E+15
54	-1	91	9.007199254741E+15
55	0	84	1.8014398509482E+16
56	0	68	3.6028797018964E+16
57	0	32	7.2057594037928E+16
58	0	64	1.4411518807586E+17
59	0	32	2.8823037615171E+17
60	0	0	5.7646075230342E+17
61	0	0	1.1529215046068E+18
62	0	0	2.3058430092137E+18
63	0	0	4.6116860184274E+18
64	0	0	9.2233720368548E+18

MySQL Relational Database Service on AWS

October 27th, 2009 by Erik

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Deleting Data From InnoDB

October 24th, 2009 by Erik

MySQL Logo Problem: We are given a large MySQL database table that no longer fits in your system’s working memory. You need to prune the data since a significant portion of this data is no longer relevant to keep in this table. Our expectation is ~75% of the data will remain in the table because of a uniform and random distribution of values in col1 and col2. How then, do we go about pruning this table as efficiently as possible?

The table structure is as follows:

describe big_table;
+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| col1      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| col2      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| chardata  | varchar(35)         | YES  |     | NULL    |                | 
+-----------+---------------------+------+-----+---------+----------------+

Solution 1: Delete

We can simply run a query against the database that will delete all records that are irrelevant. The idea here is that we use a single query that’s easily readable to filter out the unwanted records. For this example we’ll assume we are sharding this data and want to keep all odd values for either col1 or col2 in this table. We can use MySQL’s MOD() function and check for even values and delete them. We expect this query to remove ~25% of our table.

DELETE FROM big_table WHERE MOD(col1,2) = 0 AND MOD(col2, 2) = 0;

Solution 2: Create -> Insert -> Rename -> Drop (CIRD)

Another solution is to create a new table identical to the original table then simply insert the records we want to keep into the new table. Once we have all the records we wish to keep, we simply rename the tables and then drop our original table. The difference in this query is to think about what we want to keep as opposed to what we want to get rid of. We expect this method to insert ~75% of the original table.

CREATE TABLE big_table_copy LIKE big_table;
INSERT INTO big_table_copy SELECT * FROM big_table WHERE MOD(col1, 2) = 1 OR MOD(col2, 2) = 1;
RENAME TABLE big_table TO big_table_old, big_table_copy TO big_table;
DROP TABLE big_table_old;

So which is faster?

It turns out it’s considerably faster to use solution 2. Solution 1 finished preparing our test table in just over 33 minutes. Solution 2 completed the inverse task in just over 5 1/2 minutes. The test data was 1.58 million rows of randomly generated data. The MySQL server was cripped to a 32Mb buffer pool to closely mimic our real world workload which would be heavily IO bound. The on disk file was about 260Mb. Using CIRD is about 6x faster than a straight delete. This probably isn’t surprising as it’s been reported for a long time that DELETE is ~10x slower than INSERT with InnoDB, but I wanted to validate these findings to determine if they still held true for newer versions (5.1.30) of MySQL.

Fotosync

October 23rd, 2009 by Erik

AF-Design is proud to announce the launch of Fotosync, a new Facebook application. Fotosync allows users to continue using their current photo sharing solutions while still getting the benefits of sharing on Facebook. Users simply install the application, provide their publicly accessibly username at the services they wish to sync and the application does the rest.


Fotosync

Fotosync currently supports Flickr, Photobucket and Picasa with other services coming soon. These services retain high-resolution photographs, great for cloud backup and a service not provided by Facebook – as well as offering a number of additional services. Fotosync offers a premium version and of course a free version.


premium_free_fotosync

API’s Should Be Free

October 21st, 2009 by Erik

Nothing can stifle development faster than putting a price tag on the development tools for your product. A case in point is email provider iContact who for whatever reason requires developers to register for a $9.95 per month account to create an application on their platform. Imagine for a minute how different the Facebook ecosystem would have been if they had charged developers to integrate with Platform. Many of iContact’s competitors including those featured on the iContact website, Constant Contact and Bronto, provide easy access to their development API’s. I don’t know if there’s a correlation, but these providers have been continuing to grow while iContact’s growth has remained relatively flat over the last 6 months.

iContact vs. Constant Contact vs. Bronto

There are business reasons why it’s probably important to control what applications make it onto a platform. Quality control, security of client data, inability to handle scale are all good reasons to keep people out. If this is what you need to do – don’t offer a public API.

The spectrum of current solutions for controlling access is varied. Apple’s often criticized application review process is one way to ensure that only “quality” applications become available to clients. On the other extreme, also not without criticism, Facebook’s more capitalistic approach allows users to ultimately decide what applications to use and thus drives the ecosystem from the other direction. Interestingly, regardless of the application approval process issues, anyone can create a FREE developer accounts, explore and interact with the SDK and Facebook API and start building applications. In fact, according to Alexa, only 2 of the Top 20 US Web Properties (ESPN and Disney’s GO) don’t offer free access to their API.

I strongly advise iContact and any other parties who sell their software as a service to provide an environment that encourages developers to interact with your service or don’t bother at all. Allowing developers to add value to your clients will always result in a happier customer and may even give you a competitive advantage in the marketplace, especially one as heavily saturated as email marketing.

Redirect Clients While Processing Continues

October 16th, 2009 by Erik

PHP LogoGenerally, if you need to have the browser see a page while you continue executing the request I would use a queue system like Gearman or Amazon’s SQS. However, in some rare cases running the code in the template requested by the user is just as fast as communicating with a remote queue. For those instances, redirecting the client while processing finishes makes sense.

<?php
// Redirect the client
ignore_user_abort(true);
set_time_limit(1);
ob_start();
header("Location: http://www.example.com/");
header("Connection: close");
header("Content-Length: " . ob_get_length());
while(@ob_end_flush());
flush();
 
// Continue processing
sleep(10);
?>

An example in action? Adding an entry to a database server on a redirect script. That DB server may be very busy and so take longer than we’d like to respond. Using this redirect code we can ensure the browser bounces to the next page as quickly as possible.

© 1998-2008 AF-Design, All rights reserved.