Without Context, Data is Meaningless

November 21st, 2008 by Erik

Hitchhiker's Guide to the Galaxy Without context, answers, facts and datum are meaningless. “The ultimate answer to life, the universe and everything” is undeniably 42, however, without the question, lost in an unfortunate galactic construction project, it’s useless.

Wikipedia is an awesome collection of facts, some even with context but, there are many areas, despite the best efforts of the contributors, that lack context. For example, Vincent Van Gogh has a wonderful entry detailing much of his life and explaining the inspirations of much of his works. However, a single link to the post-impressionism article and a few brief mentions of his contemporaries praise and admonitions provide context to his work. This is not a fault with the user driven model, but instead of the toolset. The unstructured nature of a Wiki doesn’t lend itself to comparative analysis - which is of course just one way to provide context to information.

Van Gogh self portrait Quantitative information alone is of course insufficient. What if we were to measure the sum of all works by Van Gogh. What measure would we use mass, dimension, net worth - a convoluted formula of all of the above? I am sure the answer would be 42. It’s absurd to compare everything using quantity alone. Qualitative analysis is a critical component in all evaluations. In this area Wikipedia’s model excels. Because the work is living, it can reflect the current thinking on a topic and through the edit history give us context of how perceptions have changed. Unfortunately, it still requires human beings to understand the changes and to provide the perspective - perhaps some graduate student is working on an automated system to solve this problem as their final thesis.

Dell vs. Hewlett Packard Stock Price Chart But if we leave the world of art and move back towards less abstract concepts that are easier to quantify, wikipedia fails. The inability to compare and contrast two articles of similar types, such as corporations leaves Wikipedia lacking. Consider Dell and HP. Both are publicly traded companies, both have Wikipedia entries and both have massive amounts of qualitative and quantitative data available about them. To look at the two companies side by side, we have to look beyond the Wiki walls and move to a more data oriented set of information. Google for example does a great job of comparing the corporate stock prices and a whole host of other providers give different toolsets for interacting with the publicly available financial data.

I think there should be a world where these two universes collide, empowering the public to contribute to the qualitative components of data sets in a wiki model, and also have comparative tools that act on the quantitative data.

301 Redirects in Apache

November 12th, 2008 by Erik

Taming the Beast has a nice primer on HTTP 301 redirects. I spent some time this morning cleaning up AF-Design’s internal issues after reading over the HubSpot Website Grader report. If you manage a website and haven’t already done so - consider running the free report. It takes only a few minutes to generate and read and most problems can be fixed in only a few additional minutes. You can get a handy dynamic badge too, which you can put on your internal monitoring tools to keep tabs on things.

Website Grader Score Badge Screenshot

Website Grader Score Badge Screenshot

The real takeaway on this article for me was swapping all references of www.af-design.com to just af-design.com. WordPress was already handling this for all blog entries, but the remaining sections of the site were still being referenced with the “www.” prefix. The additional lines for the .htaccess file are provided below for reference.

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.af-design.com [NC]            
RewriteRule ^(.*)$ http://af-design.com/$1 [L,R=301]

Make Sure Your Data’s Right!

November 3rd, 2008 by Erik

Two widgets, both provided by major companies have some discrepancies in their data and the election hasn’t even started yet! I’m not sure who’s wrong here, Google or Microsoft, but either way someone’s data isn’t accurate. When it comes to something like an election - accuracy is very important. As best I can tell, Hawaii is the last poll to close and from the even hour discrepancy, it seems to me someone didn’t account for daylight savings or a timezone somewhere.

Screen grabs taken at the same time
Google Widget MSN Widget

The live widgets for both companies are below. Read the rest of this entry »

Netflix Opens Watch Instantly to Macs

November 2nd, 2008 by Erik

Curly Bear Wagner - Blackfeet Indian Reservation NewsGang posted this tonight, it’s been picked up on CrunchGear as well - the long awaited Silverlight Watch Instantly, available to PC’s for a long time, is now in beta!

Thanks Netflix!

– Update: 9:10pm

Netflix on TiVo

Apparently Netflix has been harder at work than I had realized! Netflix will be offering TiVo subscribers access in December - W00t! Amazon may have gotten there first, but the UnBox - Video on Demand offering hasn’t impressed me.

Before You Buy that new DB

October 27th, 2008 by Erik

Database Server Icon - Visio Database servers are tricky. Before jumping to the conclusion that more hardware is needed, make sure you’re getting the most out of the hardware you already have. Hiring a DBA is definitely the right way to go. They’ll quickly be able to point out where the pain points are in your configuration. If you can’t afford a DBA, you can use this quick guide to get a starting point on what might need adjustment.

At the system level, start by checking over the following items.

  • Does the machine have sufficient RAM?
  • How much idle CPU time is available?
  • Is the hard disk okay?
  • Is the networking speed sufficient?
  • Are there services that should run somewhere else?

Next investigate the application itself.

  • Is the RDBMS allowed to use all of the RAM available?
  • Are the tables indexed; properly?
  • Are the indexes up to date?
  • Are the statistics up to date?
  • Are there queries that can be offloaded; refactored; eliminated?

This, at minimum, will save time when your thinking about sticking more power in-front of the database. If you still decide to make the move, make sure you check over these values as soon as you are done or you may find your performance decreases when it should have increased.

Database Migration Tips

October 25th, 2008 by Erik

Database Server Icon - Visio Moving database servers is a nontrivial task that requires planning and forethought. This guide provides some tips for planning a migration of a small to midsize databases, but not yet data-warehousing in the 5Gb - 500Gb size. Of course the process will work for smaller databases, and even larger as well. Ultimately your mileage may vary. Depending on the software you’re running your database on, there may be alternate paths, such as replication, available to you. Spend time researching your options before embarking on this type of backup. They can save time and frustration.

This document will not help with sizing your systems, however, completing the “Prepwork” will get you thinking in the right direction as to what you’ll ultimately need to purchase. I highly recommend reading Refactoring SQL Applications by Stéphane Faroult before buying hardware for capacity reasons alone.

Prepwork

Possibly the most important thing you’ll do during the migration is the preparation work. Spend some time looking at what your moving and where you’re moving to up front, it will cause you to answer some questions earlier than later which results in not making rash decisions in the heat of the moment.

1. Inventory and Audit

It’s time to find all the dirty laundry. User’s with inappropriate access (like ’sa’ or ‘root’) for running queries are obvious examples. While it’s tempting to do all of your house cleaning up front, don’t. Document it all so it can be shopped around your organization.

  • Look at your user accounts on the RDBMS. Are all of them still currently used? Can any be removed?
  • Take a look at the schemas (databases). Are all of them still necessary? Can any be removed?
  • Take a look at the tables within each schema. Make note of large objects, you’ll want to do further investigation on these to ensure they’re really serving their intended purpose.
  • Make sure you’re backups are running with the intended frequency (and that they’re actually being generated).
  • Understand the underlying sub-system design and architecture you currently have.
  • Document the pain points you’re experiencing (assuming this is a capacity based move).
  • Pick a downtime window for the cutover and ensure you build in some extra time. The length of your downtime can be estimated using your Soggy run.

Armed with this data, you can begin identifying garbage that doesn’t need to move. An example from my experience is a ColdFusion client storage database running on Microsoft SQL Server. This database holds 2 tables CDATA and CGLOBAL each with ~680K rows. With transactional logs etc, the footprint of the database swelled to 7.5Gb. Let alone for a minute that keeping transactional logs on this table was overkill, we were also making significant changes to the application servers. All of this was irrelevant because of other infrastructure changes that were being made concurrently and the database was able to be dropped. This is a singular example of how some research early on can save time later. Moving chunks of data, to disk or over the wire, still takes time - as does the restoration process.

2. Plan the actual steps required to move

1Tb External iBook Hard Drive This is a page out of David Allen’s GTD. Obvious right? Backup the data and move it to the new server, restore it… done! Well, sure that’s one way to do it, but make sure you’ve got details hashed out on the “move it” bit. If you’re doing this change in a major metro area at rush hour, you’re opening some huge risks into your timeline. But, forgetting traffic; is it best for your environment and your technology? Can you handle the downtime? Consider the following options when flushing out your plan.

  • Replication, if available to you, is an ideal way to minimize the downtime. However, you’ll still need to push all data over the wire (or via disk) at some point to get the new equipment to the correct point in time for the log files to begin replication of data effectively.
  • Sending up to 20Gb of binary data over the wire is pretty quick, but after that, it may be faster to consider other options for moving data, including moving data via USB/Firewire/eSATA/(insert your favorite external storage method here) or even a whole server.
  • Physical data moving takes time and requires coordination, possibly at a datacenter (or multiple datacenters) and may require access be setup ahead of time.

Write down your execution plan, be optimized… you’ll run against this later to keep the noise out of your head.

When moving a large MySQL database between two geographically separated systems, you’re at the mercy of the public internet. Obviously, finding a time that the throughput for your route is best is ideal, but not always an option. After completing a full backup of an entire MySQL database, I pushed the 75Gb over the wire to the new equipment. That process took nearly 2 hours. Armed with the database footprint from your audit, you can use a download calculator to calculate a best case scenario for the transfer time. Keep in mind network congestion, route traffic and latency could increase this time - and those will all increase with distance.

Spring Cleaning

Grab a baseline of your performance. This is a great way to validate that the new solution will do what you expect it to. It’s always nice to see an increase in performance, but it’s better to have a quantified value of just how much increase you get per dollar spent. If you’ll be doing any re-factoring as part of the spring cleaning, be sure to benchmark after you finish that process again.

Begin your house cleaning list by truncating or dropping unnecessary data, tables and schemas. Clean out all of those old user accounts and get your house in order. Ensure you backup proccess has been generating good files and your ready to use those for a Soggy run.

If all has gone well with your process so far, you may be considering just keeping the equipment you’ve got! But if not, and a move is still in your future… get ready to do a Soggy run - note quite the real thing, but not a “dry” run either, because we want to time our process.

Soggy Run

Make note of the time… and get moving - stick to the script. Should you deviate (in order or adding additional steps) be sure to note it on the script so you’ll be prepared for the final run. Also note time deltas at intervals so you’ll know if somethings taking too long and your window might need to be extended. It’s possible if you’ll be using replication to manage the transfer, that you won’t need to execute this again, but if you do, it’ll be nice to have the actual steps you’ve taken.

  • Make your backups
  • Begin your plan for transfering the data
  • Ensure you’ve got all the tools, scripts, and equipment you need in the right place in the right order.
  • Restore the data
  • Validate that EVERYTHING worked. Don’t just look at numbers of databases, but spot check a few (if there are too many to check them all) and ensure the data you’re expecting to be there is there!
    If you are using replication, can you start it, does it work. Trying validating a database or two to ensure everything is right - after the migration is no time to realize you’ve missed data.
  • Try your environment against the database and ensure it performs as expected, not just from an accuracy standpoint, but from a performance one as well.
  • If your not using replication to handle the next steps, drop all databases from your server. The system will need to drop them when you re-create them and this takes time especially for large databases.
  • Make note of the final time. Add some padding to it and set that as your downtime window. Doubling it is safest, but 20% may be sufficient.

D-Day

It’s been rehearsed, the pitfalls discovered, and the process documented - it’s now down to simple execution steps. Ensure you have access, either via phone, or in person to people who can assist with the migration should something go wrong. It’s nice to have resources available to call on. Gather necessary materials and get going.

  • Make sure you’re well rested. Nothing is worse than trying to make good decisions when extremely tired.
  • Begin your planned downtime window with a clear head. Disable the necessary systems and put up any necessary messaging. Now is the time to ensure your data is perfect and not being altered.
  • Run the playbook you’ve created…
  • Bring the systems back online!
  • Once you’re done with the plan - validate that it’s all working and point your apps, users and the world at your newly completed creation.
  • Benchmark! Quantify the improvements you’ve made with similar load to what you’d been running before if possible.

K.I.S.S.

October 24th, 2008 by Erik

Lips There is always a fine balance between K.I.S.S. and giving users the flexibility they really need. I’ve been working on configuring a number of Windows Server machines over the last few days getting ready for a major shift in architectural design and philosophy that will happen tonight beginning at 6pm. On numerous occasions when dialing in the configuration I’ve found myself overwhelmed by the un-necessary complexity of the windows wizard interfaces and underwhelmed by the advanced setting panes of many services.

More important to creating a step by step wizard handling users through a myriad of options would be to provide a context specific wizard that takes into account not only what the user has click just now, but what they’ve clicked and interacted with before. The wizard could be setup to solve problems instead of provide functionality. Then for the battle hardened users who no longer require the training wheels that the wizards offer, provide simple automateable tools which let them configure the server quickly and export the settings for duplication!

Let the software learn the user through prior interaction and adjust itself accordingly. Finally, if there’s an option that a single word can’t explain, place the contextual help right there so the user doesn’t have to search Google to find out what it does! Before anyone thinks this is a problem unique to Windows, it’s not. Almost every single digital device (BlackBerry, OS X, Linux can’t we agree on a pseudo consistent path structure guys?) I’ve ever used has had a UI that could be improved through careful thought and extensive testing.

jQuery Autocomplete Plugin

October 20th, 2008 by Erik

jQuery Logo jQuery users, take a look at the autocomplete plugin which is shipping in the UI framework. While there are some serious limitations to the currently plugin, it does provides a drop dead simple way to provide Google Suggest style functionality to your site users without significant overhead. It’s widely supported and weighs in at 78K (without data/css and markup) for the minified components.

My one complaint is the lack of JSON support for remotely fetched results. Currently you must return items for the suggest widget as plain text list. You can, however, currently parse individual lines (which could be objects). I expect the jQuery team will be adding this functionality soon enough. Jörn Zaefferer has built a modified version of the plugin that supports JSON. I’m holding out until it’s merged into the UI trunk to limit my sources of code updates in this instance.

Below is a quick example of the syntax so you can see how easy it is to create. The file, names.php, simply returns a series of names, 1 per line. This might be static or pulled from a database based on the passed “q” get parameter that holds the contents of the input area.

jQuery Autocomplete Plugin Screenshot

jQuery Autocomplete Plugin Screenshot

<link rel="stylesheet" href="css/jquery.autocomplete.css" type="text/css" />
<script type="text/javascript" src="jquery.min.js"></script>
<script type="text/javascript" src="ui.core.min.js"></script>
<script type="text/javascript" src="jquery.bgiframe.min.js"></script>
<script type="text/javascript" src="jquery.dimensions.min.js"></script>
<script type="text/javascript" src="ui.autocomplete.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
   $("#name").autocomplete("names.php");
});
</script>
<form>
   Name <input type="text" id="name" name="name" value="" /><input type="button" value="Go" />
</form>

Memcached with PHP on Mac OS X

October 18th, 2008 by Erik

Nate Haug provides a great script for installing memcached along with some very detailed instructions on setting up a sandbox environment. I’m not using his MAMP sandbox, instead opting for the built in PHP / Apache install, so I needed to change a few things from his tutorial. My system is a fully updated Intel MacBook Pro running OS X 10.5.5 with the Xcode tools installed - YMMV. PHP is currently reporting version 5.2.6.

  1. I added my revised start script for memcached.
  2. The PHP version that ships with OS X doesn’t have PECL, so I downloaded the source and compiled manually.
    phpize; configure; make; sudo make install
  3. Edit to /etc/php.ini: Changed: extension_dir = /usr/lib/php/extensions/no-debug-non-zts-20060613/
  4. Edit to /etc/php.ini: Added: extension=memcache.so

You can skip his Apache scripts. Restart apache by restarting Web Sharing in the System Preferences.

The major changes I made from Nate’s memcached startup script were the singular instance and binding the service to localhost (127.0.0.1) only. This keeps memcached slightly more secure by only having it listen on the loopback adapter. If you need more space, just change the -m attribute to be higher; it’s measured in MB.

# /bin/sh
memcached -m 1 -l 127.0.0.1 -p 11211 -d

NOTE: As with any other service running on your system, opening a web server exposes your system to potential attack and worse. Be sure to keep production data away from your test environment. Someone at Starbucks, sharing your WiFi connection, may be surfing your development site too. Consider yourself warned!

Finding Good Data

October 17th, 2008 by Erik

USA SearchThe amount of data available of publicly online is astounding. The US Government has done a pretty good job of providing detailed data to it’s citizens on pretty much anything they keep tabs on. The census data immediately comes to mind, but there is much much more if you take the time to look for it. I stumbled across the USASearch.gov search engine while looking for some population data for cities. The search engine is not as accurate as Google unless you know the government lingo for your data, but it’s pretty good. I suspect there’s a data set there somewhere that would enhance any businesses results and the only cost to you is figuring out how it applies to what you do.

© 1998-2008 AF-Design, All rights reserved.