Archive for the ‘mysql’ Category

Share MySQL Relational Database Service on AWS

Tuesday, October 27th, 2009

Amazon Web Services Logo The stable of services available through AWS is continuing to expand! Last night Amazon announced RDS (Relational Database Service) which look a lot like EC2 instances running MySQL with EBS volumes – something I have a fair bit of experience with. However, these have the added benefit of being a service that can scale memory and processor both up and down with a single service call.

# ds-modify-db-instance mydbinstance --db-instance-class db.m1.xlarge -s 100

This flexibility comes with a downside, namely a 4 hour monthly service window where patches, updates and those requested capacity changes are applied. You can choose to apply them immediately, but your application should be prepared to handle the downtime. What happens is, your database instance goes offline and when it comes back, it has all the changes you requested applied. So at best, you should expect uptime in the 99.4% range. Most applications can handle a 4 hour downtime if it’s planned for. Under more conventional MySQL builds, developers or system administrators will mitigate these downtimes by first applying changes to slaves, promotion of one slave to master and then finally applying the changes to the original master. This sort of safety net provides gives applications smaller downtime windows (at most a few minutes each) allowing for theoretical 99.999% uptime.

Transitioning to RDS may not be without pain either. Importing your data is done through a mysqldump (or other flatflile export) and then playing that file back into your AWS instance. Depending on the size of your dataset a full mysqldump and re-importing may take days (no I’m not exaggerating). Also note, during the time mysqldump runs, your original database will acquire a read lock for consistency. With some DB’s I manage, I’ve stopped using MySQL dump entirely because the dumps took more than 4 hours to complete on a dedicated slave. With the myriad of snapshotting technologies available, it’s much easier to grab a binary copy of the DB every few hours. One last limitation is replication isn’t an option. I suspect AWS will be working on this soon as part of a HA (High Availability) release option.

Despite the limitations, I’m excited about this offering. This offloads much of the maintenance and management tasks which are usually the most tedious. I also hope that this means a higher IO disk subsystem may be coming to EBS soon.

Share Deleting Data From InnoDB

Saturday, October 24th, 2009

MySQL Logo Problem: We are given a large MySQL database table that no longer fits in your system’s working memory. You need to prune the data since a significant portion of this data is no longer relevant to keep in this table. Our expectation is ~75% of the data will remain in the table because of a uniform and random distribution of values in col1 and col2. How then, do we go about pruning this table as efficiently as possible?

The table structure is as follows:

describe big_table;
+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| col1      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| col2      | bigint(20) unsigned | NO   | MUL | NULL    |                | 
| chardata  | varchar(35)         | YES  |     | NULL    |                | 
+-----------+---------------------+------+-----+---------+----------------+

Solution 1: Delete

We can simply run a query against the database that will delete all records that are irrelevant. The idea here is that we use a single query that’s easily readable to filter out the unwanted records. For this example we’ll assume we are sharding this data and want to keep all odd values for either col1 or col2 in this table. We can use MySQL’s MOD() function and check for even values and delete them. We expect this query to remove ~25% of our table.

DELETE FROM big_table WHERE MOD(col1,2) = 0 AND MOD(col2, 2) = 0;

Solution 2: Create -> Insert -> Rename -> Drop (CIRD)

Another solution is to create a new table identical to the original table then simply insert the records we want to keep into the new table. Once we have all the records we wish to keep, we simply rename the tables and then drop our original table. The difference in this query is to think about what we want to keep as opposed to what we want to get rid of. We expect this method to insert ~75% of the original table.

CREATE TABLE big_table_copy LIKE big_table;
INSERT INTO big_table_copy SELECT * FROM big_table WHERE MOD(col1, 2) = 1 OR MOD(col2, 2) = 1;
RENAME TABLE big_table TO big_table_old, big_table_copy TO big_table;
DROP TABLE big_table_old;

So which is faster?

It turns out it’s considerably faster to use solution 2. Solution 1 finished preparing our test table in just over 33 minutes. Solution 2 completed the inverse task in just over 5 1/2 minutes. The test data was 1.58 million rows of randomly generated data. The MySQL server was cripped to a 32Mb buffer pool to closely mimic our real world workload which would be heavily IO bound. The on disk file was about 260Mb. Using CIRD is about 6x faster than a straight delete. This probably isn’t surprising as it’s been reported for a long time that DELETE is ~10x slower than INSERT with InnoDB, but I wanted to validate these findings to determine if they still held true for newer versions (5.1.30) of MySQL.

Share A Smaller Database is a Faster Database

Tuesday, February 24th, 2009

You can probably save a considerable amount of storage by right sizing your columns. It’s been documented that right sizing data types for columns can return data faster and of course making the data smaller results in less wasted storage space on disk. So I decided to find out just how much.

Inserts

I created two user tables and attempted to insert 1,200,000 rows in each, getting some performance information along the way. The method was to insert 100,000 records in a table, check out how disk space had been used and how the performance of the inserts was impacted. The code to run a similar test on your system is available at the end of this post.

Query Insert Performance

As you can see on the graph, when the on-disk file reached ~45Mb, the performance of the inserts fell very quickly. This is likely due to the low innodb_buffer_pool_size set on this machine (32M) and the forced swapping for the table. Analysis of the table after the inserts showed a average row length of 80 bytes for User1 and 67 bytes for User2. Roughly 15% smaller, which lends credibility to the roughly 15% increase in number of rows inserted before suffering similar performance bottlenecks.

Interestingly, I expected a higher throughput for the total queries per second for the user2 table because of the smaller data segments. While a small 1% increase was observed, this could eassily have been due to fragmented disk or any other number of potential elements impacting the test machine. However, since MySQL uses pages and not individual file writes for each record, it is unlikely to make a large impact unless the difference in record size (currently a difference of 13 bytes per record) is significant.

Selects

Curious how significant the impact would be on select speed, I turned my attention to seeing how fast I could get the data back. So I modified my script below to use a select statement instead and hammered the DB again. As you can see below, the performance of the select is greatly impacted by the number of rows. I ran the test with 300K, 600K and 1.2M rows and saw a performance decrease with each increment, however, the smaller record size was consistently faster than the larger format.

Query Select Performance

With user1 at 1.2Million rows, I was able to select 1M random uid values at an average rate of 3,458.3 qps. My smaller table, user2, was able to select 1M random uid values at an average rate of 3,590.6 qps, about 3.8% faster than the larger table. This translated to 132.3 queries per second more. Again, I suspect the performance increase has to do with the number of records stored in memory at any given time. Increasing the density of records by decreasing their size effectively makes the ram more efficient.

Running the tests again after purging roughly 1/2 of the records had similar results as you can see in the graph. The improvement was 4.3% which is clearly even faster but as the random pages were fetched from disk, the majority of the data now fit in RAM.

Investigating further, I opted to trim the file to just 300K records and run the tests one last time. The difference was only 1.5% faster, similar to the increased performance for the inserts. Perhaps a future test to measure the increased performance as a measure of the decrease in record length would show that a 15% decrease in record length will result in 1/10 increase in read/write performance? Alas, that’s for another blog post.

Conclusion

There are serious benefits to keeping your data footprint smaller. Although this is not the silver bullet of database tuning, it will certainly help your system run more efficiently. The best way to increase query throughput for reads is to have faster disk I/O. However, as this test shows, you can stretch your performance dollars just a little bit further by using smaller data values that make more sense for the values you are actually storing.

You can get more information on the space required by column type from MySQL’s website and do your own back of the napkin performance calculations on how much of a lift you can hope to see… good luck, and happy tuning.

The Scripts

<?php
	$records = 100000;
	$conn = mysqli_connect("127.0.0.1","root","","test");
	$start = microtime(true);
	for($i=0;$i<$records;$i++){
		$query = "INSERT IGNORE INTO user1 (uid, age, gender, lastvisit) VALUES (" . rand(1000000, 100000000) . ", " . rand(13, 50) . ", " . rand(0,2) . ", NOW())";
		// $query = "INSERT IGNORE INTO user2 (uid, age, gender, lastvisit) VALUES (" . rand(1000000, 100000000) . ", " . rand(13, 50) . ", " . rand(0,2) . ", NOW())";
		// $query = "SELECT * FROM user1 WHERE uid = " . rand(1000000, 100000000);
		// $query = "SELECT * FROM user2 WHERE uid = " . rand(1000000, 100000000);
		$conn->query($query);
	}
	$stop = microtime(true);
	mysqli_close($conn);
	print round(($stop - $start), 4) . " seconds elapsed filling the table (~" . number_format($records/round($stop-$start,4),1) . " qps).\n";
?>
mysql> CREATE TABLE `user1` (`uid` bigint(20) NOT NULL, `age` int(11) DEFAULT NULL, `gender` int(11) DEFAULT NULL, `lastvisit` datetime DEFAULT NULL, PRIMARY KEY (`uid`)) ENGINE=InnoDB;
mysql> CREATE TABLE `user2` (`uid` bigint(20) unsigned NOT NULL, `age` tinyint(3) unsigned NOT NULL, `gender` tinyint(3) unsigned NOT NULL, `lastvisit` timestamp NULL DEFAULT NULL, PRIMARY KEY (`uid`)) ENGINE=InnoDB
mysql> describe user1;
+-----------+------------+------+-----+---------+-------+
| Field     | Type       | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+-------+
| uid       | bigint(20) | NO   | PRI | NULL    |       | 
| age       | int(11)    | YES  |     | NULL    |       | 
| gender    | int(11)    | YES  |     | NULL    |       | 
| lastvisit | datetime   | YES  |     | NULL    |       | 
+-----------+------------+------+-----+---------+-------+
 
mysql> describe user2;
+-----------+---------------------+------+-----+---------+-------+
| Field     | Type                | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| uid       | bigint(20) unsigned | NO   | PRI | NULL    |       | 
| age       | tinyint(3) unsigned | NO   |     | NULL    |       | 
| gender    | tinyint(3) unsigned | NO   |     | NULL    |       | 
| lastvisit | timestamp           | YES  |     | NULL    |       | 
+-----------+---------------------+------+-----+---------+-------+

Share PHP MySQL vs MySQLi Database Access Metrics

Friday, January 30th, 2009

Last week I had a MySQL DBA/Developer tell me, “MySQLi sucks” when I asked why some code I was reviewing used the mysql_* extensions provided in PHP instead of mysqli_*. This really didn’t sit well with me. I turned to a trusted source for PHP information, and found they recommend mysqli_* for connecting to MySQL servers versions 4.1.3 and above. Although, I read many anecdotal stories about how mysqli_* was faster, I was unable to find any quantified proof of just how much faster. So in keeping with my performance measurement track I’ve been on this week, I decided to conduct my own test. I was surprised by the result. MySQL is faster in some cases.

The database I used is a pet project of mine that tracks some vehicle information. It uses a varchar primary key based on an automotive VIN number and the sample table I queried against has about 30K rows. The average row length is about 150bytes. I conducted 2 types of tests. The first was a completely unrealistic test measuring how fast a single row could be returned using each of the extension types. The other also returned a random sampling of rows. Since this isn’t a performance test of MySQL, I allowed each test to run one time completely to ensure the query cache was fully populated in MySQL. Then each test was run 3 subsequent times. Each test executed 1,000,000 queries to the server.

Single Hard Coded Query:

The first listing is the mysql_* calls. You’ll notice the

mysql_free_result()

function is commented out. I did test with this enabled and disabled and found that with this function enabled the queries returned at the same rate as with it enabled. So in the interest of fairness, I disabled it to more closely mimic the code executing with the mysqli_* calls.

<?php
	$tq = 1000000;
	$mt = microtime(true);
	$conn = mysql_connect("127.0.0.1","user","");
	mysql_select_db("test_schema", $conn);
	for($i=0; $i<$tq; $i++){
		$data = mysql_query("SELECT * FROM vehicles LIMIT 1", $conn);
		$data_object = mysql_fetch_assoc($data);
		// mysql_free_result($data);
	}
	mysql_close($conn);
	$et = microtime(true) - $mt;
	print "Elapsed time: " . number_format($et) . "s\n";
	print "QPS: " . ($tq / $et) . "\n";
?>

The next listing is the same code converted to leverage the mysqli_* calls. You’ll see that the result is not freed each time through the loop.

<?php
	$tq = 1000000;
	$mt = microtime(true);
	$conn = mysqli_connect("127.0.0.1","user","","test_schema");
	for($i=0; $i<$tq; $i++){
		$data = $conn->query("SELECT * FROM vehicles LIMIT 1");
		$data_obj = $data->fetch_assoc();
	}
	$conn->close();
	$et = microtime(true) - $mt;
	print "Elapsed time: " . number_format($et) . "s\n";
	print "QPS: " . ($tq / $et) . "\n";
?>

The results were surprising to me. The mysql_* extension actually ran considerably faster than the mysqli_* extenstion. In fact, the mysql_* extension ran 3.6% faster. Perhaps this DBA was on to something? I also tested this same methodology using prepared statements with MySQLi and found the results actually degraded. Discouraged, I devised a new test that would hopefully be more realistic. The values provided are queries per second.

Method Run #1 Run #2 Run #3 Avg
mysql_* 7,227.7 7,222.2 7,220.7 7,223.5
mysqli_* 6,963.0 6,974.0 6,974.4 6,970.5
mysqli prepared statements 4,849.6 5,005.3 4,978.7 4,944.5

Parameterized Queries:

More often than not, we do not run the same select query 1 million times, and we are actually more likely to run a query over a series of data sets. While this in itself is a terrible idea in production code because you should re-write the query to do it all as a single database call, it provides a better picture of how the database might be interacted with in PHP; especially if persistent connections are being utilized. However, if you find that you have some code that fits the patterns used here, it might be to your advantage to review the prepared statement code below.

The basic flow of the scripts is to first fetch 10 thousand (~1/3) of the table’s primary keys and populate an array. Then the code will select 1 million rows using a randomly selected primary key (vin). The data is still relatively small, and fits into the query cache after a warm up period. This shows how efficiently the query passing is instead of how efficient MySQL is at answering different questions, making it a better comparison of the two technologies. The three test scripts I ran are provided here for your reference.

Leveraging the mysql_* calls

 
<?php
	print "Fetching VIN array...\n";
	$vin_array = array();
	$conn = mysql_connect("127.0.0.1","user","");
	mysql_select_db("test_schema",$conn);
	$vins = mysql_query("SELECT vin FROM vehicles LIMIT 10000", $conn);
	while($vin = mysql_fetch_assoc($vins)){
		$vin_array[] = $vin['vin'];
	}
	mysql_close($conn);
	unset($conn);
 
	print "Starting test...\n";
	$tq = 1000000;
	$mt = microtime(true);
	$conn = mysql_connect("127.0.0.1","user","");
	mysql_select_db("test_schema", $conn);
	for($i=0; $i<$tq; $i++){
		$data = mysql_query("SELECT * FROM vehicles WHERE vin = '" . $vin_array[rand(0,count($vin_array)-1)] . "'", $conn);
		$data_object = mysql_fetch_assoc($data);
		// mysql_free_result($data);
	}
	mysql_close($conn);
	$et = microtime(true) - $mt;
	print "Elapsed time: " . number_format($et) . "s\n";
	print "QPS: " . ($tq / $et) . "\n";
?>

Leveraging the mysqli_* calls

<?php
	print "Fetching VIN Array...\n";
	$vin_array = array();
	$conn = mysqli_connect("127.0.0.1","user","","test_schema");
	$vins = $conn->query("SELECT vin FROM vehicles LIMIT 10000");
	while($vin = $vins->fetch_assoc()){
		$vin_array[] = $vin['vin'];
	}
	$conn->close();
	unset($conn);
 
	print "Starting test...\n";
	$tq = 1000000;
	$mt = microtime(true);
	$conn = mysqli_connect("127.0.0.1","user","","test_schema");
	for($i=0; $i<$tq; $i++){
		$data = $conn->query("SELECT * FROM vehicles WHERE vin = '" . $vin_array[rand(0,count($vin_array)-1)] . "'");
		$data_obj = $data->fetch_assoc();
	}
	$conn->close();
	$et = microtime(true) - $mt;
	print "Elapsed time: " . number_format($et) . "s\n";
	print "QPS: " . ($tq / $et) . "\n";
?>

Leveraging mysqli_ calls and prepared statements.

<?php
	print "Fetching VIN array...\n";
	$vin_array = array(); 
	$conn = mysqli_connect("127.0.0.1","user","","test_schema");
	$vins = $conn->query("SELECT vin FROM vehicles LIMIT 10000");
	while($vin = $vins->fetch_assoc()){
		$vin_array[] = $vin['vin'];
	}
	$conn->close();
	unset($conn);
 
	print "Running tests...\n";
	$tq = 1000000;
	$mt = microtime(true);
	$conn = mysqli_connect("127.0.0.1","user","","test_schema");
	$stmt = $conn->stmt_init();
	$stmt->prepare("SELECT * FROM vehicles WHERE VIN = ?");
	for($i=0; $i<$tq; $i++){
		$stmt->bind_param("s", $vin_array[rand(0,count($vin_array) - 1)]);
		$stmt->execute();
		$stmt->bind_result($a,$b,$c,$d,$e,$f,$g,$h,$z,$j,$k,$l,$m,$n);
		$stmt->fetch();
	}
	$conn->close();
	$et = microtime(true) - $mt;
	print "Elapsed time: " . number_format($et) . "s\n";
	print "QPS: " . ($tq / $et) . "\n";
?>

The performance of this “harder” task degraded for all technologies. Again, the mysql_* extension was about 3.6% faster, but the prepared statements were 9.7% faster than the old mysql_* calls and 13.7% faster than the mysqli_ calls. This improvement makes a good case for transitioning repetitive calls in your code to prepared statements leveraging the MySQLi extenstions. Again, the values are measured in queries per second.

Method Run #1 Run #2 Run #3 Avg
mysql_* 4,063.6 4,016.7 4,054.0 4,044.8
mysqli_* 3,902.3 3,905.8 3,896.0 3,901.4
mysqli prepared statements 4,443.8 4,421.8 4,445.7 4,437.1

Without testing stored procedures, it’s difficult to say if they would have continued to improve the performance of this test, however, it is clear that the speed advantages of mysqli_* extensions are only present if you are utilizing the advanced features of the more recent versions of the database.

Conclusion:

If your code is purely hard coded statements, the old school MySQL syntax may be your best bet for a performance lift. However, most likely your code would benefit more from prepared statements and some of the other performance enhancements provided by the MySQLi extension set. I wouldn’t say, “MySQLi Sucks” but it does appear there is a valid argument for continuing to use it if you are not leveraging any of the performance enhancements of the newer versions of MySQL.

System Configuration:

  • Mac OS 10.5.6
  • 2.16Ghz Intel Core Duo
  • 2 Gb RAM
  • PHP 5.2.6
  • MySQL Server 5.0.51b

Share Before You Buy that new DB

Monday, October 27th, 2008

Database Server Icon - Visio Database servers are tricky. Before jumping to the conclusion that more hardware is needed, make sure you’re getting the most out of the hardware you already have. Hiring a DBA is definitely the right way to go. They’ll quickly be able to point out where the pain points are in your configuration. If you can’t afford a DBA, you can use this quick guide to get a starting point on what might need adjustment.

At the system level, start by checking over the following items.

  • Does the machine have sufficient RAM?
  • How much idle CPU time is available?
  • Is the hard disk okay?
  • Is the networking speed sufficient?
  • Are there services that should run somewhere else?

Next investigate the application itself.

  • Is the RDBMS allowed to use all of the RAM available?
  • Are the tables indexed; properly?
  • Are the indexes up to date?
  • Are the statistics up to date?
  • Are there queries that can be offloaded; refactored; eliminated?

This, at minimum, will save time when your thinking about sticking more power in-front of the database. If you still decide to make the move, make sure you check over these values as soon as you are done or you may find your performance decreases when it should have increased.

Share MySQL GUI Tools

Monday, June 2nd, 2008

MySQL Logo I was struggling with MySQL Query Browser, trying to make some minor modifications to a schema recently when a friend, aptly named Erik, from WebCentrix.net turned me onto CocoaMySQL. I’ve only been playing with it for a short time but have found it to be far more stable, easier to use (after a short learning curve) and generally a more flexible product. MySQL should give up on their in house project and hire the developers or at least leverage the open source technology that’s up on SourceForge.

© 1998-2008 AF-Design, All rights reserved.