Archive for October, 2009

Share PHP & 64-bit Integer Modulus (Almost)

Wednesday, October 28th, 2009

While at times PHP seems to be capable of 64 bit math, it’s important to understand what’s really going on. Beyond 32 bit integers, PHP is silently converting your integers to floats. While this usually isn’t a problem, many of the operations you might perform on an int, such as modulus choke when attempting to convert back to a 32 bit integer internally.

This is actually the cause of the sprintf / printf issue I encountered before. The code below provides the maximum signed value for integers between 1 and 64 bit in PHP along with the result of the built in modulus operand “%” and a function I wrote mod() which doesn’t go all the way to 64 bits, but gets us a lot closer leveraging the built in data types. If you can install external modules, you might review and test the performance of BCMath or GMP which can both handle much larger values.

<?php
 
// Find out what our internal values are capable of
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
 
// Generate an array of maximum signed 32 bit values
$ints = array();
for($pwr = 0; $pwr < 64; $pwr++){ $ints[] = pow(2,$pwr) - 1; }
 
// Generate a table of values
print "bits\t%100\tmod()\t%s\n";
$bits = 0;
foreach($ints as $int){
	$bits++;
	printf("%d\t%s\t%s\t%s\n", $bits, $int%100, mod($int,100), $int);
}
 
// (60 bit) - 1 aware modulus function
function mod($val, $mod){ return $val - floor($val/$mod) * $mod; }
?>

This generates the following table. Notice the internal value for integers is capped at 2,147,483,647 and the modulus operation goes kaput beyond 32 bits. The function provided seems to holds up through 59 bits before failing to function properly at 60.

PHP_INT_MAX: 2147483647
PHP_INT_SIZE: 4 bytes (32 bits)
bits	%100	mod()	%s
1	0	0	0
2	1	1	1
3	3	3	3
4	7	7	7
5	15	15	15
6	31	31	31
7	63	63	63
8	27	27	127
9	55	55	255
10	11	11	511
11	23	23	1023
12	47	47	2047
13	95	95	4095
14	91	91	8191
15	83	83	16383
16	67	67	32767
17	35	35	65535
18	71	71	131071
19	43	43	262143
20	87	87	524287
21	75	75	1048575
22	51	51	2097151
23	3	3	4194303
24	7	7	8388607
25	15	15	16777215
26	31	31	33554431
27	63	63	67108863
28	27	27	134217727
29	55	55	268435455
30	11	11	536870911
31	23	23	1073741823
32	47	47	2147483647
33	-1	95	4294967295
34	-1	91	8589934591
35	-1	83	17179869183
36	-1	67	34359738367
37	-1	35	68719476735
38	-1	71	137438953471
39	-1	43	274877906943
40	-1	87	549755813887
41	-1	75	1099511627775
42	-1	51	2199023255551
43	-1	3	4398046511103
44	-1	7	8796093022207
45	-1	15	17592186044415
46	-1	31	35184372088831
47	-1	63	70368744177663
48	-1	27	1.4073748835533E+14
49	-1	55	2.8147497671066E+14
50	-1	11	5.6294995342131E+14
51	-1	23	1.1258999068426E+15
52	-1	47	2.2517998136852E+15
53	-1	95	4.5035996273705E+15
54	-1	91	9.007199254741E+15
55	0	84	1.8014398509482E+16
56	0	68	3.6028797018964E+16
57	0	32	7.2057594037928E+16
58	0	64	1.4411518807586E+17
59	0	32	2.8823037615171E+17
60	0	0	5.7646075230342E+17
61	0	0	1.1529215046068E+18
62	0	0	2.3058430092137E+18
63	0	0	4.6116860184274E+18
64	0	0	9.2233720368548E+18

Share MySQL Relational Database Service on AWS

Tuesday, October 27th, 2009

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Share Deleting Data From InnoDB

Saturday, October 24th, 2009

MySQL Logo Problem: We are given a large MySQL database table that no longer fits in your system’s working memory. You need to prune the data since a significant portion of this data is no longer relevant to keep in this table. Our expectation is ~75% of the data will remain in the table because of a uniform and random distribution of values in col1 and col2. How then, do we go about pruning this table as efficiently as possible?

The table structure is as follows:

describe big_table;
+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| col1      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| col2      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| chardata  | varchar(35)         | YES  |     | NULL    |                | 
+-----------+---------------------+------+-----+---------+----------------+

Solution 1: Delete

We can simply run a query against the database that will delete all records that are irrelevant. The idea here is that we use a single query that’s easily readable to filter out the unwanted records. For this example we’ll assume we are sharding this data and want to keep all odd values for either col1 or col2 in this table. We can use MySQL’s MOD() function and check for even values and delete them. We expect this query to remove ~25% of our table.

DELETE FROM big_table WHERE MOD(col1,2) = 0 AND MOD(col2, 2) = 0;

Solution 2: Create -> Insert -> Rename -> Drop (CIRD)

Another solution is to create a new table identical to the original table then simply insert the records we want to keep into the new table. Once we have all the records we wish to keep, we simply rename the tables and then drop our original table. The difference in this query is to think about what we want to keep as opposed to what we want to get rid of. We expect this method to insert ~75% of the original table.

CREATE TABLE big_table_copy LIKE big_table;
INSERT INTO big_table_copy SELECT * FROM big_table WHERE MOD(col1, 2) = 1 OR MOD(col2, 2) = 1;
RENAME TABLE big_table TO big_table_old, big_table_copy TO big_table;
DROP TABLE big_table_old;

So which is faster?

It turns out it’s considerably faster to use solution 2. Solution 1 finished preparing our test table in just over 33 minutes. Solution 2 completed the inverse task in just over 5 1/2 minutes. The test data was 1.58 million rows of randomly generated data. The MySQL server was cripped to a 32Mb buffer pool to closely mimic our real world workload which would be heavily IO bound. The on disk file was about 260Mb. Using CIRD is about 6x faster than a straight delete. This probably isn’t surprising as it’s been reported for a long time that DELETE is ~10x slower than INSERT with InnoDB, but I wanted to validate these findings to determine if they still held true for newer versions (5.1.30) of MySQL.

Share Fotosync

Friday, October 23rd, 2009

AF-Design is proud to announce the launch of Fotosync, a new Facebook application. Fotosync allows users to continue using their current photo sharing solutions while still getting the benefits of sharing on Facebook. Users simply install the application, provide their publicly accessibly username at the services they wish to sync and the application does the rest.


Fotosync

Fotosync currently supports Flickr, Photobucket and Picasa with other services coming soon. These services retain high-resolution photographs, great for cloud backup and a service not provided by Facebook – as well as offering a number of additional services. Fotosync offers a premium version and of course a free version.


premium_free_fotosync

Share API’s Should Be Free

Wednesday, October 21st, 2009

Nothing can stifle development faster than putting a price tag on the development tools for your product. A case in point is email provider iContact who for whatever reason requires developers to register for a $9.95 per month account to create an application on their platform. Imagine for a minute how different the Facebook ecosystem would have been if they had charged developers to integrate with Platform. Many of iContact’s competitors including those featured on the iContact website, Constant Contact and Bronto, provide easy access to their development API’s. I don’t know if there’s a correlation, but these providers have been continuing to grow while iContact’s growth has remained relatively flat over the last 6 months.

iContact vs. Constant Contact vs. Bronto

There are business reasons why it’s probably important to control what applications make it onto a platform. Quality control, security of client data, inability to handle scale are all good reasons to keep people out. If this is what you need to do – don’t offer a public API.

The spectrum of current solutions for controlling access is varied. Apple’s often criticized application review process is one way to ensure that only “quality” applications become available to clients. On the other extreme, also not without criticism, Facebook’s more capitalistic approach allows users to ultimately decide what applications to use and thus drives the ecosystem from the other direction. Interestingly, regardless of the application approval process issues, anyone can create a FREE developer accounts, explore and interact with the SDK and Facebook API and start building applications. In fact, according to Alexa, only 2 of the Top 20 US Web Properties (ESPN and Disney’s GO) don’t offer free access to their API.

I strongly advise iContact and any other parties who sell their software as a service to provide an environment that encourages developers to interact with your service or don’t bother at all. Allowing developers to add value to your clients will always result in a happier customer and may even give you a competitive advantage in the marketplace, especially one as heavily saturated as email marketing.

Share Redirect Clients While Processing Continues

Friday, October 16th, 2009

PHP LogoGenerally, if you need to have the browser see a page while you continue executing the request I would use a queue system like Gearman or Amazon’s SQS. However, in some rare cases running the code in the template requested by the user is just as fast as communicating with a remote queue. For those instances, redirecting the client while processing finishes makes sense.

<?php
// Redirect the client
ignore_user_abort(true);
set_time_limit(1);
ob_start();
header("Location: http://www.example.com/");
header("Connection: close");
header("Content-Length: " . ob_get_length());
while(@ob_end_flush());
flush();
 
// Continue processing
sleep(10);
?>

An example in action? Adding an entry to a database server on a redirect script. That DB server may be very busy and so take longer than we’d like to respond. Using this redirect code we can ensure the browser bounces to the next page as quickly as possible.

© 1998-2008 AF-Design, All rights reserved.