Archive for the ‘php’ Category

Memcached with PHP on Mac OS X

Saturday, October 18th, 2008

Nate Haug provides a great script for installing memcached along with some very detailed instructions on setting up a sandbox environment. I’m not using his MAMP sandbox, instead opting for the built in PHP / Apache install, so I needed to change a few things from his tutorial. My system is a fully updated Intel MacBook Pro running OS X 10.5.5 with the Xcode tools installed - YMMV. PHP is currently reporting version 5.2.6.

  1. I added my revised start script for memcached.
  2. The PHP version that ships with OS X doesn’t have PECL, so I downloaded the source and compiled manually.
    phpize; configure; make; sudo make install
  3. Edit to /etc/php.ini: Changed: extension_dir = /usr/lib/php/extensions/no-debug-non-zts-20060613/
  4. Edit to /etc/php.ini: Added: extension=memcache.so

You can skip his Apache scripts. Restart apache by restarting Web Sharing in the System Preferences.

The major changes I made from Nate’s memcached startup script were the singular instance and binding the service to localhost (127.0.0.1) only. This keeps memcached slightly more secure by only having it listen on the loopback adapter. If you need more space, just change the -m attribute to be higher; it’s measured in MB.

# /bin/sh
memcached -m 1 -l 127.0.0.1 -p 11211 -d

NOTE: As with any other service running on your system, opening a web server exposes your system to potential attack and worse. Be sure to keep production data away from your test environment. Someone at Starbucks, sharing your WiFi connection, may be surfing your development site too. Consider yourself warned!

Data Gotcha’s or Learn to Love CAST()

Wednesday, October 1st, 2008

PHP Loves Microsoft SQL Server In the last post, I explained my work around to get data out of a Microsoft SQL Server using stored procedures. Since then, I’ve continued learning more about the limitations of the FreeTDS driver in PHP and hit another snafu when working with the data. Fields that are exported as VARCHAR are truncated at 256 characters! Whoa! big gotcha when today’s database tables might contain values in VARCHAR(1024) or even bigger! So what’s a PHP script to do? CAST the values!!!

 
-- original statement
SELECT myBigVarChar 
     FROM myTable 
     WHERE myId = 1;
 
-- revised statement
SELECT CAST(myBigVarChar AS text) AS myBigVarChar 
     FROM myTable 
     WHERE myId = 1;

Notice the statement’s a little more unruly to look at, but is functionally the same. The case to text allows the FreeTDS driver to properly convert the data. I also noticed this issue with UNIQUEIDENTIFIER type columns coming through as a BLOB. My fix was to CAST the identifier to a VARCHAR(36) to safely hold the string representation of the UUID.

Hope this saves you some head scratching!

Running a Stored Procedure from MS-SQL Server with PHP on Linux

Wednesday, October 1st, 2008

PHP Loves Microsoft SQL Server After some effort getting the MSSQL drivers in PHP, I was ready to hook into the RDBMS and use my nicely developed stored procedures. FreeTDS required minimal setup and configuration and I was able to establish a connection and run simple queries against tables in no-time.

However, when I attempted to execute stored procedures, they would fail. To make troubleshooting harder, the MSSQL functions do not return error messages — at least none that I could access in PHP 5.1. My work around to the proper methodology init(), bind(), and execute() process was to use query() and pass in the procedure and parameters as needed. It’s slightly less elegant but opens the data up to you.

// Establish the connection and run the procedure
$connection = mssql_connect("123.123.123.123","db_login","db_password");
mssql_select_db("db_schemaname",$connection);
 
// Run the procedure
$username = "user"; // sanitized!!!
$password = "secret_password"; // sanitized!!!
$result = mssql_query("CheckLogin '$username', '$password'", $connection);
 
// handle the result as needed

The standard disclaimers apply about validating, escaping and bullet proofing any data passed into Microsoft SQL Server using this method. Seriously, heed these warnings, sanitize any data, or you will find your site vulnerable to SQL Injection attacks which are serious business.

PHP Development Tips

Tuesday, September 9th, 2008

Hammer and Screwdriver If you develop code in PHP or any other language take 5 minutes to read this. It raises excellent points regardless of the language you author in despite it’s focusing on PHP. I particularly liked #7 Use a PHP Framework & #8 Don’t use a PHP Framework. It harkens back to point #1 Use PHP Only When You Need It. Hammers are good at driving nails, but they make terrible screwdrivers. The same can be said of frameworks which are not the panacea that many tout them to be. There is a good tool for each job - the roll of technology folks is to accurately select and implement the correct tool for the task - because not everything is a nail.

Memcache Feature-Bug Gotchas

Thursday, July 17th, 2008

Recently I’ve been doing a lot of work with memcached using PHP and have been bitten a few times by different things with how things worked. I’m calling those items out here so anyone getting started with Memcache can learn from my mistakes. Memcached is an amazingly powerful caching layer with lots and lots of online documentation. It’s easy to get running on Linux and hooking PHP into it - I’ll save yet another post about how to do it since there are so many excellent resources already. The hard part is determining where you’ll implement it and in what way. For this post, I’ll leave the implementation strategy aside and walk you through a couple of examples of where I’ve been bit. I’ve created a layer to further abstract the memcache() object in PHP so I can ignore dealing with the add() vs. replace() vs. set() switches and allowing me to have one stop shopping for all of my configuration settings. The source for that basic class is included at the bottom of this post, feel free to use/distribute as you see fit.

One last item before we delve into the examples. Memcached and memcache are not quite the same thing. When I reference Memcached - I’m actually referring to the server instance of Memcache which you are running on the server, which is accessible using a variety of methods, on a variety of platforms including but not limited to PHP. When I use memcache in this post, I’m referring to the API hooks that have been created for PHP to interact with your Memcached server instance. Some (or possibly all) of these items ONLY APPLY TO PHP and shouldn’t be construed as feature-bugs with the Memcached server itself.

Compression and Object Sizes

It took me a while to track down this bug - but I finally read up on memcache a bit more and learned that memcache behaves oddly with small chunks of information if compression is turned on. I’m pretty sure this is memcache not memcached that’s causing the issue because the compression layer happens in PHP. The output for the following code is The value is not the same. However, if $compression is set to false, it will work as expected. Integers and character strings seem to be okay with compression on - as do complex objects. The setCompressThreshold method allows adjustment of this size but I’ve gotten in the habit of not caching simple values like true and false - instead opting to cache objects, classes, arrays and alternately JSON.

// create the cache
$cache = new Memcache();
$cache->addServer("localhost","11211");
$compression = true;
 
// create the value in the cache
$x = true;
$cache->add("x", $x, $compression, 10);
 
// access the value
$y = $cache->get("x", $compression);
 
// check what happened
if($x == $y){
   print "The value is the same";
} else {
   print "The value is not the same";
}

Caching of Class Objects

When caching complex objects like classes, memcache serializes the entire object and then caches it was as it was instantiated. So be warned if your class definition changes, you’ll need to flush your cache entirely of those objects or you might find it behaves a little differently than your expecting. Let’s say you have a class with a few properties that update multiple properties when certain methods are called and you wish to change one of those properties slightly. Any objects that are in the cache already will continue to work with the old values until they are flushed from the cache. It’s not sufficient to read it out and put it back in - the object really is your OLD class definition.

class Foo{
   protected $property = array();
   public function __construct($arr){
      if(count($arr) > 0){
         $this->properties = $arr;
      }
   }
   public function __get($key){
      return $this->property[$key];
   }
   public function __set($key, $value){
      $this->property[$key] = $value;
   }
}

So now you can create Foo objects all day and stuff all sorts of information into them and cache them. You can also get them back out willy nilly later (I’ll use my cache class to save time on the code below).

$foo1 = new Foo(array("apples"=>11,"orange"=>20));
$cache->set("foo1",$foo1, 60);
$foo2 = $cache->get("foo1");
echo $foo2->apples; // should be 11

So we see that all works but, what if we change the way the class works? For example adding a layer of math to calculate a tax or something along those lines.

class Foo{
   protected $property = array();
   public function __construct($arr){
      if(count($arr) > 0){
         $this->properties = $arr;
      }
   }
   public function __get($key){
      return round($this->property[$key] * .9); // calculate storage/depreciation loss
   }
   public function __set($key, $value){
      $this->property[$key] = $value;
      $this->property['num_items'] = count($this->properties) - 1;
   }
}

Our existing cached object doesn’t behave as expected. One of two things seems to happen, and I haven’t fully flushed it out what happens when. First, the object just comes back as it was initially instantiated or second, it silently dies without returning an error. This might be a good reason to create a version value for your cached objects so you can switch on the version to determine if the cached value is valid.

// let's access our existing cache object from before again...
$foo2 = $cache->get("foo1");
echo $foo2->apples; // we might hope for 10 but...

Database Results

Caching of resources doesn’t work. The data being cached needs to be able to be serialized by memcache so it can be inserted into memcached. Database handles are much like your memcached connection - they’re a socket you talk to and unfortunately, so are MySQL results. There are good reasons for this so don’t gripe about it. You’ll need to write a simple wrapper that does all of the result parsing for you prior to caching. Then you can easily create a cacheable MySQL object that can be inserted into memcached. It only takes a few minutes to do this and I may even post later describing the wrappers I’m now using to do just this. Until then - know that you can quickly create an array of your data using the following code and cache that result instead.

// create a cache object (using class from below)
$cache = new Cache();
 
// create an array to populate the data with
$data_array = array();
 
// run the query 
$mysqli_result = $mysqli->query("select * from table where condition=true");
 
// stuff all the data into the array
while($row = $mysqli_result->fetch_assoc()){
   $data_array[] = $row;
}
 
// cache the array
$cache->set("query_data",$data_array,90);

Cache Time to Live

Nothing too big here, but if you provide 0 (zero) or false for a time to live/expiration value, the item never expires, it just gets pushed out if needed later. This all happens on a LRU basis and is well documented.

Protect your Namespaces

This may seem trivial, but I’ve been bit here too. Often it’s sufficient to use one server for multiple tasks. Since Memcached is easy to run in one large pool and share it among multiple resources (much like you would with MySQL) it’s easy to share across multiple applications. There are some nice economies of scale this will afford you. But consider the following bug you could create for yourself in your logic.

Application 1 accessing it’s DB table.

 
// Application 1 fetching content about a user
$memcache = new Cache();
$query = "select * from users where userid = 12";
$result = $memcache->get(hash(md5,$query));
 
// The data wasn't in cache, so we run the query below and store the data
if(!$result){
     $result = $mysql->query($query);
     $memcache->set(hash(md5,$query), $result, 600);
}

Application 2 accessing it’s DB table.

// Application 2 fetching content about a user
 
$memcache = new Cache();
$query = "select * from users where userid = 12";
$result = $memcache->get(hash(md5,$query));
 
// the value existed in cache - so it skips the query and uses the cached value
if(!$result){
     $result = $mysql->query($query);
     $memcache->set(hash(md5,$query), $result, 600);
}

Application 2 and Application 1 are using the EXACT same key to reference their data. Unless this is intentional (because they share a common database) it can be a real pain to debug. The easiest way to correct this is to create a namespace for the cache layer and append it to any keys you may use. The example class provided below does just that with minimal fuss. The code above would be changed to reflect the correct namespace for each application and they could co-exist using the memcached server together.

// in app 1 - use that namespace
$cache = new Cache("app1");
 
// in app 2 - use that namespace
$cache = new Cache("app2");

Memcache Abstraction Class

This is the abstraction class I use to handle all memcache interaction. It’s little more than a thin veneer over the existing PHP object. You can see where it’s easy to expand this basic cache layer within the constructor and you can tune for your data, servers and other bits relevant to your implementation as needed.

class Cache{
 
   protected $cache = false;
   protected $namespace = "";
 
   public function __construct($namespace = ""){
      $this->cache = new Memcache();
      $this->cache->addServer("localhost","11211");
      $this->cache->setCompressThreshold(127,0.2);
      $this->namespace = $namespace;
   }
 
   public function __destruct(){
      $this->cache->close();
   }
 
   public function set($key, $value, $ttl = 600){
      $this->cache->set($key . $this->namespace, $value, true, $ttl);
   }
 
   public function get($key){
      return $this->cache->get($key . $this->namespace, true);
   }
 
}

Getting Accurate Metrics in WordPress

Thursday, May 22nd, 2008

WordPress + Google Analytics Google Analytics does a great job keeping track of visitors, sessions, geographic locations and with the addition of the Benchmarking feature, even compares your traffic to other blogs of similar size and topic. However, there’s a problem. It counts you too! If your using a fancy plug-in to manage your Google Analytics account, you won’t need this. I, however, have been tweaking and tuning my template over time and have my it all stuffed right in header. Today I made a minor tweak to my WordPress template it to suppress my views when I’m logged in and updated to the new tracking code from the old Urchin based code. This snippet is from my ./wp-content/templates/header.php file.

<?php if($user_ID != 1){ ?>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-123456-1");
pageTracker._initData();
pageTracker._trackPageview();
</script>
<?php } ?>

Be sure to add the id (or id’s) you want to suppress and to put the correct Google Analytics account information (the UA-123456-1 bit) in your code.
Happy tracking ;)

Not So Fast! MySQL Stored Procedures Are Not Panacea

Wednesday, May 21st, 2008

MySQL Logo MySQL 5.0 introduced stored procedures, but are you taking advantage of them? MySQL claims a 50% lift in performance just having an execution plan on hand, but what real world performance boost can you expect to see? I set out to discover just that and found the results unsatisfying.

Column Type Indexed
id int(11) unsigned Primary Key
content varchar(4096) No

I then conducted a test which would closely mimic a real world example to determine just that. I started with a table with a few million rows of variable length data in it. I then created a stored procedure that would as closely as possible match my hand coded SQL query. Then I created a simple PHP script to test it all out.

CREATE PROCEDURE sp_getdata (i INT(11))
BEGIN
	SELECT * FROM test_table WHERE id = t;
END;
$iterations = 100000;
$conn = mysqli_database($server, $user, $pass, $schema);
 
// find the extents of our random range and make sure the connection actually opens
$max = $conn->query("select max(id) as maxid from test_table");
$max = $max->fetch_assoc();
$max = $max['maxid'];
 
$sql = "SELECT * FROM test_table WHERE id = %d";
$procedure = "CALL sp_getmessage(%d)";
 
// run the stored procedure query
for($i=0;$i<$iterations; $i++){
     $query = sprintf($procedure, rand(1,$max));
     $conn->multi_query($query);
     $result = $conn->store_result();
}
print number_format(microtime(true)-$t2,2) . "s\n";
 
// run the straight sql query
print "Test SQL Query: ";
$t1 = microtime(true);
for($i=0; $i<$iterations; $i++){
     $query = sprintf($sql, rand(1,$max));
     $result = $conn->query($sql);
}
print number_format(microtime(true)-$t1,2) . "s\n";

I then ran the test 5 times to eliminate the so called warm-up period which MySQL would inevitably use to more accurately answer future questions. I then ran the test a number of times swapping the position of the two “tests” within the code so that neither received an unfair advantage. The results are below:

Run SQL Stored Procedure Order
1. 0.28s 0.39s Proc/SQL
2. 0.28s 0.37s Proc/SQL
3. 0.28s 0.41s Proc/SQL
4. 0.28s 0.38s Proc/SQL
5. 0.28s 0.39s Proc/SQL
6. 0.28s 0.80s SQL/Proc
7. 0.28s 0.82s SQL/Proc
8. 0.28s 0.81s SQL/Proc
9. 0.28s 0.82s SQL/Proc
10. 0.27s 0.80s SQL/Proc

I was taken by surprise by two obvious trends. First, the SQL statement was amazingly consistent regardless of when it was run. Second, not only was the stored procedure slower, it was significantly slower if a batch of plain old SQL statement executions were run first.

Next I decided to track the response time of each individual query that I ran to get a better look at the execution time costs for each. I expected to see a bell curve showing some queries running very fast and others running very slow with the majority falling in the middle somewhere. I had to increase the sensitivity of my timer millionths of a second to get a good picture of just how fast things were or were not running. Again I was surprised by the results. The basic query was still much faster in total elapsed time, but suddenly the performance of the stored procedure began showing itself. I ran the tests the same way as before and swapped the order in which they ran to eliminate preferential execution order. The following table gives results from one run of 100,000 so you can see the results first hand. The method was to insert the time value into an array and increment the value of that index by one each time a query executed in that time. So the first value you see is also the first query that gets sent to the server. This is important because for the stored procedure, you’ll notice it’s MUCH longer than all subsequent runs. Oddly, the query to get the maxium value had already been run so I’m not sure what causes the increase delay in runtime.

Stored Procedure Basic SQL Statement
    [0.00144] => 1
    [0.00005] => 2
    [0.00002] => 52
    [0.00001] => 99938
    [0.00003] => 6
    [0.00004] => 1
    [0.00063] => 1
    [0.00040] => 1
    [0.00047] => 16
    [0.00049] => 42172
    [0.00048] => 26
    [0.00299] => 21
    [0.00050] => 1735
    [0.00124] => 88
    [0.00074] => 517
    [0.00249] => 32
    [0.00224] => 38
    [0.00174] => 171
    [0.00149] => 138
    [0.00274] => 22
    [0.00099] => 139
    [0.00324] => 19
    [0.00399] => 12
    [0.00374] => 12
    [0.00349] => 12
    [0.00199] => 102
    [0.00054] => 1
    [0.00043] => 1
    [0.00075] => 35
    [0.00046] => 5
    [0.00250] => 2
    [0.00024] => 49970
    [0.00100] => 4
    [0.00025] => 4657
    [0.00125] => 6
    [0.00023] => 11
    [0.00175] => 2
    [0.00022] => 6
    [0.00449] => 4
    [0.00028] => 2
    [0.00052] => 1
    [0.00044] => 1
    [0.00200] => 1
    [0.00030] => 1
    [0.00041] => 1
    [0.00026] => 3
    [0.00027] => 4
    [0.00070] => 1
    [0.00499] => 1
    [0.00150] => 2
    [0.29807] => 1
    [0.00275] => 1
    [0.00424] => 2

Clearly the fluctuation in times we see in the first table are largely related to the first query we execute. After that initial procedure call, stored procedures run consistently fast, faster in fact than the plain old SQL query. But back to the “real world test”. How often do we create a single page that executes 100,000 queries at a time? More likely its 2-20 queries which are all very different than the others in the group. Given that variable, which is really the best solution? Ultimately it’s going to depend on your setup. This was a mostly idle app server communicating over a 100Mb connection to a moderately loaded DB server with ample RAM and disk performance.

Future exploration areas will be to research the most efficient way to execute stored procedures for maximum performance and deciding if a stored procedure is faster for more complex queries involving joins etc. If anyone has some good directions I can head in here, I’m all ears.

It’s All In a Roll of the Dice

Friday, May 16th, 2008

White dice Sitting in an airplane recently I began thinking about roll playing games and how they’re slowly gaining popularity on Facebook and MySpace as MMOG’s (Massively Multiplayer Online Games for those who don’t know). Basically the game engines of Dungeons & Dragons etc are simply extensions of more conventional game dynamics from simple board games. Specifically I began thinking about RISK and how the odds change based on the number of armies you bring to the fight.

So sitting in that airplane, I created a PHP class to play around with the odds and how things come out using classic risk weighting. First I created a simple class to roll a six sided die and then expanded it so many instances could be created representing virtual dice. I then created a roll method which will take an optional parameter to handle multiple dice (say 2 dice with 6 sides each common in many games). Last but not least I created to classes for actually rolling an opponent vs. defense, vs() and risk(). The risk class returns a Losses object with the number of units defeated in battle.

Have Fun!

I should note, I’m not the first person to be curious about the odds of Risk or other games for that matter. Google returns over 3.4 Million results for “risk odds”, but I wanted to call out a few that do a nice job of explaining the whole idea here, here and here.

class Dice {
	private $s = 6;
 
	// General constructor
	function __construct($faces = 6){
		$this->s = $faces;
	}
 
	// Returns a roll based on the sides of the dice being thrown
	function roll($dice = 1){
		$total = 0;
		while($dice){
			$total += rand(1,$this->s);
			$dice--;
		} 
		return $total;
	}
 
	// Uses a classic risk model that rewards putting more troops in play
	// removing the 3/2 limit if desired.
	function risk($of = 3, $df = 2){
		$losses = new Losses();
		$dl = 0; $ol = 0;
		$dt = array(); $ot = array();
		for($i=$df; $i>0; $i--){ $dt[] = $this->roll(); }
		for($i=$of; $i>0; $i--){ $ot[] = $this->roll(); }
		rsort($dt); rsort($ot);
		while(count($ot) > 0 && count($dt) > 0){
			if(array_shift($ot) > array_shift($dt)){
				$losses->defense--;
			} else {
				$losses->offense--;
			}
		}
		return $losses;
	}
 
}
// Helper class to pass back the losses from a risk roll
class Losses{ var $defense = 0; var $offense = 0; }

Finding a Maximum Range

Sunday, April 27th, 2008

I recently wrote about the Google Chart API but neglected to include a function for finding a maximum range that made sense. The following code can be used to come up with a good range limiter. This is very handy for setting limits on graphs and also for calculating the next level of measurement.

There are two functions doing the work here. The first function, get_max_range() is the one you will call in your code passing in your value as a singular parameter. It will return an integer value that you can use as your delimiter for the maximum value on your graph. The second function is a helper function that is used recursively to determine the base unit of the most significant digit. That value then becomes the exponent of the divisor and multiplier in the ceil() function and the return value.

function get_max_range($num){
	// Erik Giberti, 2008
	$magnitude = pow(10,get_exponent(abs($num),0));
	$new_num = ceil(abs($num)/$magnitude);
	if($num < 0){ $new_num = $new_num * -1; }
	return $new_num * $magnitude;
}
 
function get_exponent($num, $mag = 0){
	// Erik Giberti, 2008
	if($num/10 > 1){
		return get_exponent($num/10, $mag+1);
	} else {
		return $mag;
	}
}

Examples:
As you’ll see it works equally well with positive and negative numbers.

get_max_range(0.1) = 1
get_max_range(1.2) = 2
get_max_range(23.45) = 30
get_max_range(364)= 400
get_max_range(4567) = 5000
get_max_range(-59136789) = -60000000

Limitations:

1. While this works well for both positive and negative numbers, it doesn’t work well for very small numbers. For example 0.012 will still return a maximum scale of 1. If someone would like to expand on this work, please feel free and I’ll post the changes here.

2. Very large numbers only consider the most significant digit. The of course could be changed by altering the base_exponent value to better suit your datasets. I find this general value works well for graphing which is what this was intended for anyway.

License:

This code is released to you for use under the GPL.

Presenting Data Visually

Thursday, April 24th, 2008

Anyone who’s immersed in the world of data and its presentation knows the value of a well designed and laid out graphic can lead to a project. Currently application developers have two primary tools at their disposal, Adobe Flex and the Google Chart API for making dynamic graphics. Silverlight may offer some similar features, but I’ve no experience building with it. Additionally, Java is also very capable, but when you see how simple the Google product is, you’d need a very strong reason to consider running anything like this in house again. As such I have left them out of the discussion.

I’ve used Flex for some simple reporting graphs. The real power in Flex lays in it’s ability to create entire interfaces, even entire websites entirely in a rich multimedia capable environment. I’ve been intoxicated with the raw power to manipulate the data in real time and interact in a way not easily possible on the web before. However, all of this power comes at a steep price, while the core SDK is free - for time pressed developers - the IDE environment saves a huge amount of effort. Because of it’s great power, sometimes using Flex is a little like bringing a machete to the dinner table - a little bit more power than you really need to cut the vegetables with. This is where Google Charts comes in.

Group Male Female
0 52 38
1 48 34
2 61 40
3 77 63
4 79 89
5 120 102
6 138 125
7 125 154
8 172 155
9 156 173
10 193 193
11 217 189
12 291 293
13 300 345
14 292 321
15 297 333
16 321 348
17 365 366
18 364 337
19 327 306
20 243 248
21 161 179
22 122 110
23 74 68

Way back when in December of 2007, Google anounced the Google Chart API. This revolutionized graphing for developers, although I suspect many may not realize it yet! Lets start our exploration with some fake data and see just what we can do with the API. We’ll use a small slice of some new user signup data I have at my disposal to get our hands dirty with.

Generating this table is no special feat, some simple database queries against your own internal data and presto, you’ve got a column of numbers too. But that’s where the fun begins. Consider this table for a minute, it’s pretty clear there’s a normal distribution curve there, but are they the same? Are there more men or women - or is it nearly equal? Are there other things this data could be telling us?

Let’s start with the marketing department’s questions. Are we attracting more men or women? If we sum the numbers in the table together, we get 4,595 men and 4,609 women. We can create a pie chart with these numbers so folks can look at a glance and see what that would look like.

Google Chart API demo

http://chart.apis.google.com/chart?
cht=p&
chs=250x100&
chd=t:4595,4609&
chl=Male|Female&
chco=0000ff,ff0000

The line break is for clarity only. To walk through the pieces, the cht is simply the chart type, p for pie (p3 is a 3d pie), chs is a width x height value, chd is the data, chl are the labels and chco are the colors. Interestingly, you can actually skip the labels on and still generate a graph, but obviously it becomes much clearer when you provide the labels.

Now lets look at the curves - obviously we have a peak during the day and lows at night, but just what does that look like and is it the same for men and women? Here we can use the bar chart.

http://chart.apis.google.com/chart?
cht=bvg&
chs=630x150&
chbh=10,0,5&
chco=0000ff,ff0000&
chg=101,33.3&
chxt=x,y&
chxl=0:|12a|1a|2a|3a|4a|5a|6a|7a|8a|9a|10a|11a|12p|1p|2p|3p|4p|5p|6p|7p|8p|9p|10p|11p|1:|0|125|250|375&
chd=t:13.9,12.8,16.3,20.5,21.1,32.0,36.8,33.3,45.9,41.6,51.5,57.9,77.6,80.0,77.9,79.2,85.6,97.3,97.1,87.2,64.8,42.9,32.5,19.7|10.1,9.1,10.7,16.8,23.7,27.2,33.3,41.1,41.3,46.1,51.5,50.4,78.1,92.0,85.6,88.8,92.8,97.6,89.9,81.6,66.1,47.7,29.3,18.1&

Now with this chart, there’s quite a bit more going on. Google recommends that for larger data sets you use their special encoding system, but in order to keep this simple I went ahead and used the decimal encoding so it would be easier to understand. For those who are ready for that next step, take a look at the docs for the API, there’s excellent pseudo code there for how to transition to the encoding method they use. cht still defines the chart type, in this case a horizontal bar chart with grouped data. chs is the size again; read the docs carefully if you decide to implement one of these. The bar chart can extend past your chosen size and you’ll simply crop your data. chbh comes in handy here by letting me shrink the size of the bars to 10 pixels and removing lots of padding. Last but not least chco sets the colors bringing us to the really new stuff.

For bar charts the data has to be limited into a range of 0-100. If your maximum data point is less than 100, you’re in luck, just feed the data in place of this method below. However, if your data doesn’t live in the realm of 0-100, you’ll need to consider doing some more work. The simplest method is to create a percentage (some PHP code is below outlining how to do that). The X and Y axis are blank by default with a bar graph. There are a number of methods for making them display but I find this method the most logical. If your data is broken out by percentages already, using the default Y axis value is fine, however, in our example, it’s not. So we’re going to pass two parameters. First chxt=x,y, what this does is tells the API that the first label set I pass in is for the X axis, and the second is for the Y. Now we can pass in the values we want to display on the X and Y axis. chxl defines our labels using position indexes to reference back to the values we had assigned in our chxt parameter. In our example, index 0 is X and index 1 is Y - notice they’re bolded: chxl=0:|label1|label2||labeln|1:|label1|label2| … and so on. The last parameter I skipped over from before is chg which sets up the grid based on a percentage. I passed in 101% spacing for the vertical bars along the X axis, and 33.3% for the horizontal bars because they lined up with my increments.

For those unsure how to generate the percentages in PHP, it’s very simple. The first thing we’ll do is loop over our dataset to determine what our largest value is. Then, instead of passing in the raw values as we did with the pie chart above, we’ll actually pass in their percentage of the maximum number. If your like me, you might want to step the scale up to the next logical number. I did that on my graph, but did not do it in the code below - I’ll save that for another blog post. Then you can at last loop over the data one more time to create the actual data strings. Once it’s all done, you’ll have a single string you can drop into your image call.

$result = $conn->query($select);

// get the maximum value in our data
$max = 0;
while($row = $result->fetch_assoc()){
   if($row['Male'] > $max){ $max = $row['Male']; }
   if($row['Female'] > $max){ $max = $row['Female']; }
}
$result->data_seek(0);

// generate the two datasets
$data_male = "";
$data_female = "";
while($row = $result->fetch_assoc()){
   if(strlen($data_male) == 0){ $data_male .= ","; $data_female .= ","; }
   $data_male .= round($row['Male']/$max,1);
   $data_female .= round($row['Female']/$max,1);
}

// Concatenate the strings to complete the data parameter
$data = $data_male . "|" . $data_female;

There are so many different graph types available, I obviously can’t share them all in this already rather long post. I highly encourage you to read over the API. It’s very simple to understand and extremely easy to use. Google only requests that if you’ll use more than 250,000 graphs a day, you let them know.

© 1998-2008 AF-Design, All rights reserved.