Posts Tagged ‘MySQL’

Before You Buy that new DB

Monday, October 27th, 2008

Database Server Icon - Visio Database servers are tricky. Before jumping to the conclusion that more hardware is needed, make sure you’re getting the most out of the hardware you already have. Hiring a DBA is definitely the right way to go. They’ll quickly be able to point out where the pain points are in your configuration. If you can’t afford a DBA, you can use this quick guide to get a starting point on what might need adjustment.

At the system level, start by checking over the following items.

  • Does the machine have sufficient RAM?
  • How much idle CPU time is available?
  • Is the hard disk okay?
  • Is the networking speed sufficient?
  • Are there services that should run somewhere else?

Next investigate the application itself.

  • Is the RDBMS allowed to use all of the RAM available?
  • Are the tables indexed; properly?
  • Are the indexes up to date?
  • Are the statistics up to date?
  • Are there queries that can be offloaded; refactored; eliminated?

This, at minimum, will save time when your thinking about sticking more power in-front of the database. If you still decide to make the move, make sure you check over these values as soon as you are done or you may find your performance decreases when it should have increased.

Database Migration Tips

Saturday, October 25th, 2008

Database Server Icon - Visio Moving database servers is a nontrivial task that requires planning and forethought. This guide provides some tips for planning a migration of a small to midsize databases, but not yet data-warehousing in the 5Gb - 500Gb size. Of course the process will work for smaller databases, and even larger as well. Ultimately your mileage may vary. Depending on the software you’re running your database on, there may be alternate paths, such as replication, available to you. Spend time researching your options before embarking on this type of backup. They can save time and frustration.

This document will not help with sizing your systems, however, completing the “Prepwork” will get you thinking in the right direction as to what you’ll ultimately need to purchase. I highly recommend reading Refactoring SQL Applications by Stéphane Faroult before buying hardware for capacity reasons alone.

Prepwork

Possibly the most important thing you’ll do during the migration is the preparation work. Spend some time looking at what your moving and where you’re moving to up front, it will cause you to answer some questions earlier than later which results in not making rash decisions in the heat of the moment.

1. Inventory and Audit

It’s time to find all the dirty laundry. User’s with inappropriate access (like ’sa’ or ‘root’) for running queries are obvious examples. While it’s tempting to do all of your house cleaning up front, don’t. Document it all so it can be shopped around your organization.

  • Look at your user accounts on the RDBMS. Are all of them still currently used? Can any be removed?
  • Take a look at the schemas (databases). Are all of them still necessary? Can any be removed?
  • Take a look at the tables within each schema. Make note of large objects, you’ll want to do further investigation on these to ensure they’re really serving their intended purpose.
  • Make sure you’re backups are running with the intended frequency (and that they’re actually being generated).
  • Understand the underlying sub-system design and architecture you currently have.
  • Document the pain points you’re experiencing (assuming this is a capacity based move).
  • Pick a downtime window for the cutover and ensure you build in some extra time. The length of your downtime can be estimated using your Soggy run.

Armed with this data, you can begin identifying garbage that doesn’t need to move. An example from my experience is a ColdFusion client storage database running on Microsoft SQL Server. This database holds 2 tables CDATA and CGLOBAL each with ~680K rows. With transactional logs etc, the footprint of the database swelled to 7.5Gb. Let alone for a minute that keeping transactional logs on this table was overkill, we were also making significant changes to the application servers. All of this was irrelevant because of other infrastructure changes that were being made concurrently and the database was able to be dropped. This is a singular example of how some research early on can save time later. Moving chunks of data, to disk or over the wire, still takes time - as does the restoration process.

2. Plan the actual steps required to move

1Tb External iBook Hard Drive This is a page out of David Allen’s GTD. Obvious right? Backup the data and move it to the new server, restore it… done! Well, sure that’s one way to do it, but make sure you’ve got details hashed out on the “move it” bit. If you’re doing this change in a major metro area at rush hour, you’re opening some huge risks into your timeline. But, forgetting traffic; is it best for your environment and your technology? Can you handle the downtime? Consider the following options when flushing out your plan.

  • Replication, if available to you, is an ideal way to minimize the downtime. However, you’ll still need to push all data over the wire (or via disk) at some point to get the new equipment to the correct point in time for the log files to begin replication of data effectively.
  • Sending up to 20Gb of binary data over the wire is pretty quick, but after that, it may be faster to consider other options for moving data, including moving data via USB/Firewire/eSATA/(insert your favorite external storage method here) or even a whole server.
  • Physical data moving takes time and requires coordination, possibly at a datacenter (or multiple datacenters) and may require access be setup ahead of time.

Write down your execution plan, be optimized… you’ll run against this later to keep the noise out of your head.

When moving a large MySQL database between two geographically separated systems, you’re at the mercy of the public internet. Obviously, finding a time that the throughput for your route is best is ideal, but not always an option. After completing a full backup of an entire MySQL database, I pushed the 75Gb over the wire to the new equipment. That process took nearly 2 hours. Armed with the database footprint from your audit, you can use a download calculator to calculate a best case scenario for the transfer time. Keep in mind network congestion, route traffic and latency could increase this time - and those will all increase with distance.

Spring Cleaning

Grab a baseline of your performance. This is a great way to validate that the new solution will do what you expect it to. It’s always nice to see an increase in performance, but it’s better to have a quantified value of just how much increase you get per dollar spent. If you’ll be doing any re-factoring as part of the spring cleaning, be sure to benchmark after you finish that process again.

Begin your house cleaning list by truncating or dropping unnecessary data, tables and schemas. Clean out all of those old user accounts and get your house in order. Ensure you backup proccess has been generating good files and your ready to use those for a Soggy run.

If all has gone well with your process so far, you may be considering just keeping the equipment you’ve got! But if not, and a move is still in your future… get ready to do a Soggy run - note quite the real thing, but not a “dry” run either, because we want to time our process.

Soggy Run

Make note of the time… and get moving - stick to the script. Should you deviate (in order or adding additional steps) be sure to note it on the script so you’ll be prepared for the final run. Also note time deltas at intervals so you’ll know if somethings taking too long and your window might need to be extended. It’s possible if you’ll be using replication to manage the transfer, that you won’t need to execute this again, but if you do, it’ll be nice to have the actual steps you’ve taken.

  • Make your backups
  • Begin your plan for transfering the data
  • Ensure you’ve got all the tools, scripts, and equipment you need in the right place in the right order.
  • Restore the data
  • Validate that EVERYTHING worked. Don’t just look at numbers of databases, but spot check a few (if there are too many to check them all) and ensure the data you’re expecting to be there is there!
    If you are using replication, can you start it, does it work. Trying validating a database or two to ensure everything is right - after the migration is no time to realize you’ve missed data.
  • Try your environment against the database and ensure it performs as expected, not just from an accuracy standpoint, but from a performance one as well.
  • If your not using replication to handle the next steps, drop all databases from your server. The system will need to drop them when you re-create them and this takes time especially for large databases.
  • Make note of the final time. Add some padding to it and set that as your downtime window. Doubling it is safest, but 20% may be sufficient.

D-Day

It’s been rehearsed, the pitfalls discovered, and the process documented - it’s now down to simple execution steps. Ensure you have access, either via phone, or in person to people who can assist with the migration should something go wrong. It’s nice to have resources available to call on. Gather necessary materials and get going.

  • Make sure you’re well rested. Nothing is worse than trying to make good decisions when extremely tired.
  • Begin your planned downtime window with a clear head. Disable the necessary systems and put up any necessary messaging. Now is the time to ensure your data is perfect and not being altered.
  • Run the playbook you’ve created…
  • Bring the systems back online!
  • Once you’re done with the plan - validate that it’s all working and point your apps, users and the world at your newly completed creation.
  • Benchmark! Quantify the improvements you’ve made with similar load to what you’d been running before if possible.

MySQL GUI Tools

Monday, June 2nd, 2008

MySQL Logo I was struggling with MySQL Query Browser, trying to make some minor modifications to a schema recently when a friend, aptly named Erik, from WebCentrix.net turned me onto CocoaMySQL. I’ve only been playing with it for a short time but have found it to be far more stable, easier to use (after a short learning curve) and generally a more flexible product. MySQL should give up on their in house project and hire the developers or at least leverage the open source technology that’s up on SourceForge.

Not So Fast! MySQL Stored Procedures Are Not Panacea

Wednesday, May 21st, 2008

MySQL Logo MySQL 5.0 introduced stored procedures, but are you taking advantage of them? MySQL claims a 50% lift in performance just having an execution plan on hand, but what real world performance boost can you expect to see? I set out to discover just that and found the results unsatisfying.

Column Type Indexed
id int(11) unsigned Primary Key
content varchar(4096) No

I then conducted a test which would closely mimic a real world example to determine just that. I started with a table with a few million rows of variable length data in it. I then created a stored procedure that would as closely as possible match my hand coded SQL query. Then I created a simple PHP script to test it all out.

CREATE PROCEDURE sp_getdata (i INT(11))
BEGIN
	SELECT * FROM test_table WHERE id = t;
END;
$iterations = 100000;
$conn = mysqli_database($server, $user, $pass, $schema);
 
// find the extents of our random range and make sure the connection actually opens
$max = $conn->query("select max(id) as maxid from test_table");
$max = $max->fetch_assoc();
$max = $max['maxid'];
 
$sql = "SELECT * FROM test_table WHERE id = %d";
$procedure = "CALL sp_getmessage(%d)";
 
// run the stored procedure query
for($i=0;$i<$iterations; $i++){
     $query = sprintf($procedure, rand(1,$max));
     $conn->multi_query($query);
     $result = $conn->store_result();
}
print number_format(microtime(true)-$t2,2) . "s\n";
 
// run the straight sql query
print "Test SQL Query: ";
$t1 = microtime(true);
for($i=0; $i<$iterations; $i++){
     $query = sprintf($sql, rand(1,$max));
     $result = $conn->query($sql);
}
print number_format(microtime(true)-$t1,2) . "s\n";

I then ran the test 5 times to eliminate the so called warm-up period which MySQL would inevitably use to more accurately answer future questions. I then ran the test a number of times swapping the position of the two “tests” within the code so that neither received an unfair advantage. The results are below:

Run SQL Stored Procedure Order
1. 0.28s 0.39s Proc/SQL
2. 0.28s 0.37s Proc/SQL
3. 0.28s 0.41s Proc/SQL
4. 0.28s 0.38s Proc/SQL
5. 0.28s 0.39s Proc/SQL
6. 0.28s 0.80s SQL/Proc
7. 0.28s 0.82s SQL/Proc
8. 0.28s 0.81s SQL/Proc
9. 0.28s 0.82s SQL/Proc
10. 0.27s 0.80s SQL/Proc

I was taken by surprise by two obvious trends. First, the SQL statement was amazingly consistent regardless of when it was run. Second, not only was the stored procedure slower, it was significantly slower if a batch of plain old SQL statement executions were run first.

Next I decided to track the response time of each individual query that I ran to get a better look at the execution time costs for each. I expected to see a bell curve showing some queries running very fast and others running very slow with the majority falling in the middle somewhere. I had to increase the sensitivity of my timer millionths of a second to get a good picture of just how fast things were or were not running. Again I was surprised by the results. The basic query was still much faster in total elapsed time, but suddenly the performance of the stored procedure began showing itself. I ran the tests the same way as before and swapped the order in which they ran to eliminate preferential execution order. The following table gives results from one run of 100,000 so you can see the results first hand. The method was to insert the time value into an array and increment the value of that index by one each time a query executed in that time. So the first value you see is also the first query that gets sent to the server. This is important because for the stored procedure, you’ll notice it’s MUCH longer than all subsequent runs. Oddly, the query to get the maxium value had already been run so I’m not sure what causes the increase delay in runtime.

Stored Procedure Basic SQL Statement
    [0.00144] => 1
    [0.00005] => 2
    [0.00002] => 52
    [0.00001] => 99938
    [0.00003] => 6
    [0.00004] => 1
    [0.00063] => 1
    [0.00040] => 1
    [0.00047] => 16
    [0.00049] => 42172
    [0.00048] => 26
    [0.00299] => 21
    [0.00050] => 1735
    [0.00124] => 88
    [0.00074] => 517
    [0.00249] => 32
    [0.00224] => 38
    [0.00174] => 171
    [0.00149] => 138
    [0.00274] => 22
    [0.00099] => 139
    [0.00324] => 19
    [0.00399] => 12
    [0.00374] => 12
    [0.00349] => 12
    [0.00199] => 102
    [0.00054] => 1
    [0.00043] => 1
    [0.00075] => 35
    [0.00046] => 5
    [0.00250] => 2
    [0.00024] => 49970
    [0.00100] => 4
    [0.00025] => 4657
    [0.00125] => 6
    [0.00023] => 11
    [0.00175] => 2
    [0.00022] => 6
    [0.00449] => 4
    [0.00028] => 2
    [0.00052] => 1
    [0.00044] => 1
    [0.00200] => 1
    [0.00030] => 1
    [0.00041] => 1
    [0.00026] => 3
    [0.00027] => 4
    [0.00070] => 1
    [0.00499] => 1
    [0.00150] => 2
    [0.29807] => 1
    [0.00275] => 1
    [0.00424] => 2

Clearly the fluctuation in times we see in the first table are largely related to the first query we execute. After that initial procedure call, stored procedures run consistently fast, faster in fact than the plain old SQL query. But back to the “real world test”. How often do we create a single page that executes 100,000 queries at a time? More likely its 2-20 queries which are all very different than the others in the group. Given that variable, which is really the best solution? Ultimately it’s going to depend on your setup. This was a mostly idle app server communicating over a 100Mb connection to a moderately loaded DB server with ample RAM and disk performance.

Future exploration areas will be to research the most efficient way to execute stored procedures for maximum performance and deciding if a stored procedure is faster for more complex queries involving joins etc. If anyone has some good directions I can head in here, I’m all ears.

Cloud Databases Coming Soon

Monday, April 7th, 2008

Startups looking for a database back end will soon have more options (hopefully). MySQL is a fantastic lightweight database that scales reasonably well for most projects. Microsoft SQL Server, Oracle, DB2 and other commercial offerings scale well and have lots of great support but at a massive cost, creating a barrier to entry for most providers. Last year, Amazon announced their solution, a database in the cloud (my thoughts). Not to be outdone, Google is rumored to be announcing their own online equivillent tonight at one of their campfire format press events. They used Campfire One to announce OpenSocial and Tech Crunch is rumoring that they may be announcing their cloud database platform BigTable. With this, small businesses will be able to better outsource database (and database management) with minimal costs and potentially unlimited scale. This is ideal as the amount of user generated, tags, index, and categorized content online continues to grow. Many companies are having great success using Amazon’s Web Services for storage and computational intensive projects. This is a natural extension to that.

© 1998-2008 AF-Design, All rights reserved.