Share A Smaller Database is a Faster Database
February 24th, 2009 by ErikYou can probably save a considerable amount of storage by right sizing your columns. It’s been documented that right sizing data types for columns can return data faster and of course making the data smaller results in less wasted storage space on disk. So I decided to find out just how much.
Inserts
I created two user tables and attempted to insert 1,200,000 rows in each, getting some performance information along the way. The method was to insert 100,000 records in a table, check out how disk space had been used and how the performance of the inserts was impacted. The code to run a similar test on your system is available at the end of this post.

As you can see on the graph, when the on-disk file reached ~45Mb, the performance of the inserts fell very quickly. This is likely due to the low innodb_buffer_pool_size set on this machine (32M) and the forced swapping for the table. Analysis of the table after the inserts showed a average row length of 80 bytes for User1 and 67 bytes for User2. Roughly 15% smaller, which lends credibility to the roughly 15% increase in number of rows inserted before suffering similar performance bottlenecks.
Interestingly, I expected a higher throughput for the total queries per second for the user2 table because of the smaller data segments. While a small 1% increase was observed, this could eassily have been due to fragmented disk or any other number of potential elements impacting the test machine. However, since MySQL uses pages and not individual file writes for each record, it is unlikely to make a large impact unless the difference in record size (currently a difference of 13 bytes per record) is significant.
Selects
Curious how significant the impact would be on select speed, I turned my attention to seeing how fast I could get the data back. So I modified my script below to use a select statement instead and hammered the DB again. As you can see below, the performance of the select is greatly impacted by the number of rows. I ran the test with 300K, 600K and 1.2M rows and saw a performance decrease with each increment, however, the smaller record size was consistently faster than the larger format.

With user1 at 1.2Million rows, I was able to select 1M random uid values at an average rate of 3,458.3 qps. My smaller table, user2, was able to select 1M random uid values at an average rate of 3,590.6 qps, about 3.8% faster than the larger table. This translated to 132.3 queries per second more. Again, I suspect the performance increase has to do with the number of records stored in memory at any given time. Increasing the density of records by decreasing their size effectively makes the ram more efficient.
Running the tests again after purging roughly 1/2 of the records had similar results as you can see in the graph. The improvement was 4.3% which is clearly even faster but as the random pages were fetched from disk, the majority of the data now fit in RAM.
Investigating further, I opted to trim the file to just 300K records and run the tests one last time. The difference was only 1.5% faster, similar to the increased performance for the inserts. Perhaps a future test to measure the increased performance as a measure of the decrease in record length would show that a 15% decrease in record length will result in 1/10 increase in read/write performance? Alas, that’s for another blog post.
Conclusion
There are serious benefits to keeping your data footprint smaller. Although this is not the silver bullet of database tuning, it will certainly help your system run more efficiently. The best way to increase query throughput for reads is to have faster disk I/O. However, as this test shows, you can stretch your performance dollars just a little bit further by using smaller data values that make more sense for the values you are actually storing.
You can get more information on the space required by column type from MySQL’s website and do your own back of the napkin performance calculations on how much of a lift you can hope to see… good luck, and happy tuning.
The Scripts
<?php $records = 100000; $conn = mysqli_connect("127.0.0.1","root","","test"); $start = microtime(true); for($i=0;$i<$records;$i++){ $query = "INSERT IGNORE INTO user1 (uid, age, gender, lastvisit) VALUES (" . rand(1000000, 100000000) . ", " . rand(13, 50) . ", " . rand(0,2) . ", NOW())"; // $query = "INSERT IGNORE INTO user2 (uid, age, gender, lastvisit) VALUES (" . rand(1000000, 100000000) . ", " . rand(13, 50) . ", " . rand(0,2) . ", NOW())"; // $query = "SELECT * FROM user1 WHERE uid = " . rand(1000000, 100000000); // $query = "SELECT * FROM user2 WHERE uid = " . rand(1000000, 100000000); $conn->query($query); } $stop = microtime(true); mysqli_close($conn); print round(($stop - $start), 4) . " seconds elapsed filling the table (~" . number_format($records/round($stop-$start,4),1) . " qps).\n"; ?> |
mysql> CREATE TABLE `user1` (`uid` bigint(20) NOT NULL, `age` int(11) DEFAULT NULL, `gender` int(11) DEFAULT NULL, `lastvisit` datetime DEFAULT NULL, PRIMARY KEY (`uid`)) ENGINE=InnoDB; mysql> CREATE TABLE `user2` (`uid` bigint(20) unsigned NOT NULL, `age` tinyint(3) unsigned NOT NULL, `gender` tinyint(3) unsigned NOT NULL, `lastvisit` timestamp NULL DEFAULT NULL, PRIMARY KEY (`uid`)) ENGINE=InnoDB mysql> describe user1; +-----------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------+------+-----+---------+-------+ | uid | bigint(20) | NO | PRI | NULL | | | age | int(11) | YES | | NULL | | | gender | int(11) | YES | | NULL | | | lastvisit | datetime | YES | | NULL | | +-----------+------------+------+-----+---------+-------+ mysql> describe user2; +-----------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------------------+------+-----+---------+-------+ | uid | bigint(20) unsigned | NO | PRI | NULL | | | age | tinyint(3) unsigned | NO | | NULL | | | gender | tinyint(3) unsigned | NO | | NULL | | | lastvisit | timestamp | YES | | NULL | | +-----------+---------------------+------+-----+---------+-------+ |
You should follow me on Twitter.