Presenting Data Visually
Thursday, April 24th, 2008Anyone who’s immersed in the world of data and its presentation knows the value of a well designed and laid out graphic can lead to a project. Currently application developers have two primary tools at their disposal, Adobe Flex and the Google Chart API for making dynamic graphics. Silverlight may offer some similar features, but I’ve no experience building with it. Additionally, Java is also very capable, but when you see how simple the Google product is, you’d need a very strong reason to consider running anything like this in house again. As such I have left them out of the discussion.
I’ve used Flex for some simple reporting graphs. The real power in Flex lays in it’s ability to create entire interfaces, even entire websites entirely in a rich multimedia capable environment. I’ve been intoxicated with the raw power to manipulate the data in real time and interact in a way not easily possible on the web before. However, all of this power comes at a steep price, while the core SDK is free - for time pressed developers - the IDE environment saves a huge amount of effort. Because of it’s great power, sometimes using Flex is a little like bringing a machete to the dinner table - a little bit more power than you really need to cut the vegetables with. This is where Google Charts comes in.
| Group | Male | Female |
|---|---|---|
| 0 | 52 | 38 |
| 1 | 48 | 34 |
| 2 | 61 | 40 |
| 3 | 77 | 63 |
| 4 | 79 | 89 |
| 5 | 120 | 102 |
| 6 | 138 | 125 |
| 7 | 125 | 154 |
| 8 | 172 | 155 |
| 9 | 156 | 173 |
| 10 | 193 | 193 |
| 11 | 217 | 189 |
| 12 | 291 | 293 |
| 13 | 300 | 345 |
| 14 | 292 | 321 |
| 15 | 297 | 333 |
| 16 | 321 | 348 |
| 17 | 365 | 366 |
| 18 | 364 | 337 |
| 19 | 327 | 306 |
| 20 | 243 | 248 |
| 21 | 161 | 179 |
| 22 | 122 | 110 |
| 23 | 74 | 68 |
Way back when in December of 2007, Google anounced the Google Chart API. This revolutionized graphing for developers, although I suspect many may not realize it yet! Lets start our exploration with some fake data and see just what we can do with the API. We’ll use a small slice of some new user signup data I have at my disposal to get our hands dirty with.
Generating this table is no special feat, some simple database queries against your own internal data and presto, you’ve got a column of numbers too. But that’s where the fun begins. Consider this table for a minute, it’s pretty clear there’s a normal distribution curve there, but are they the same? Are there more men or women - or is it nearly equal? Are there other things this data could be telling us?
Let’s start with the marketing department’s questions. Are we attracting more men or women? If we sum the numbers in the table together, we get 4,595 men and 4,609 women. We can create a pie chart with these numbers so folks can look at a glance and see what that would look like.
http://chart.apis.google.com/chart? cht=p& chs=250x100& chd=t:4595,4609& chl=Male|Female& chco=0000ff,ff0000
The line break is for clarity only. To walk through the pieces, the cht is simply the chart type, p for pie (p3 is a 3d pie), chs is a width x height value, chd is the data, chl are the labels and chco are the colors. Interestingly, you can actually skip the labels on and still generate a graph, but obviously it becomes much clearer when you provide the labels.
Now lets look at the curves - obviously we have a peak during the day and lows at night, but just what does that look like and is it the same for men and women? Here we can use the bar chart.
http://chart.apis.google.com/chart? cht=bvg& chs=630x150& chbh=10,0,5& chco=0000ff,ff0000& chg=101,33.3& chxt=x,y& chxl=0:|12a|1a|2a|3a|4a|5a|6a|7a|8a|9a|10a|11a|12p|1p|2p|3p|4p|5p|6p|7p|8p|9p|10p|11p|1:|0|125|250|375& chd=t:13.9,12.8,16.3,20.5,21.1,32.0,36.8,33.3,45.9,41.6,51.5,57.9,77.6,80.0,77.9,79.2,85.6,97.3,97.1,87.2,64.8,42.9,32.5,19.7|10.1,9.1,10.7,16.8,23.7,27.2,33.3,41.1,41.3,46.1,51.5,50.4,78.1,92.0,85.6,88.8,92.8,97.6,89.9,81.6,66.1,47.7,29.3,18.1&
Now with this chart, there’s quite a bit more going on. Google recommends that for larger data sets you use their special encoding system, but in order to keep this simple I went ahead and used the decimal encoding so it would be easier to understand. For those who are ready for that next step, take a look at the docs for the API, there’s excellent pseudo code there for how to transition to the encoding method they use. cht still defines the chart type, in this case a horizontal bar chart with grouped data. chs is the size again; read the docs carefully if you decide to implement one of these. The bar chart can extend past your chosen size and you’ll simply crop your data. chbh comes in handy here by letting me shrink the size of the bars to 10 pixels and removing lots of padding. Last but not least chco sets the colors bringing us to the really new stuff.
For bar charts the data has to be limited into a range of 0-100. If your maximum data point is less than 100, you’re in luck, just feed the data in place of this method below. However, if your data doesn’t live in the realm of 0-100, you’ll need to consider doing some more work. The simplest method is to create a percentage (some PHP code is below outlining how to do that). The X and Y axis are blank by default with a bar graph. There are a number of methods for making them display but I find this method the most logical. If your data is broken out by percentages already, using the default Y axis value is fine, however, in our example, it’s not. So we’re going to pass two parameters. First chxt=x,y, what this does is tells the API that the first label set I pass in is for the X axis, and the second is for the Y. Now we can pass in the values we want to display on the X and Y axis. chxl defines our labels using position indexes to reference back to the values we had assigned in our chxt parameter. In our example, index 0 is X and index 1 is Y - notice they’re bolded: chxl=0:|label1|label2| … |labeln|1:|label1|label2| … and so on. The last parameter I skipped over from before is chg which sets up the grid based on a percentage. I passed in 101% spacing for the vertical bars along the X axis, and 33.3% for the horizontal bars because they lined up with my increments.
For those unsure how to generate the percentages in PHP, it’s very simple. The first thing we’ll do is loop over our dataset to determine what our largest value is. Then, instead of passing in the raw values as we did with the pie chart above, we’ll actually pass in their percentage of the maximum number. If your like me, you might want to step the scale up to the next logical number. I did that on my graph, but did not do it in the code below - I’ll save that for another blog post. Then you can at last loop over the data one more time to create the actual data strings. Once it’s all done, you’ll have a single string you can drop into your image call.
$result = $conn->query($select);
// get the maximum value in our data
$max = 0;
while($row = $result->fetch_assoc()){
if($row['Male'] > $max){ $max = $row['Male']; }
if($row['Female'] > $max){ $max = $row['Female']; }
}
$result->data_seek(0);
// generate the two datasets
$data_male = "";
$data_female = "";
while($row = $result->fetch_assoc()){
if(strlen($data_male) == 0){ $data_male .= ","; $data_female .= ","; }
$data_male .= round($row['Male']/$max,1);
$data_female .= round($row['Female']/$max,1);
}
// Concatenate the strings to complete the data parameter
$data = $data_male . "|" . $data_female;
There are so many different graph types available, I obviously can’t share them all in this already rather long post. I highly encourage you to read over the API. It’s very simple to understand and extremely easy to use. Google only requests that if you’ll use more than 250,000 graphs a day, you let them know.