Deleting Data From InnoDB
Saturday, October 24th, 2009
Problem: We are given a large MySQL database table that no longer fits in your system’s working memory. You need to prune the data since a significant portion of this data is no longer relevant to keep in this table. Our expectation is ~75% of the data will remain in the table because of a uniform and random distribution of values in col1 and col2. How then, do we go about pruning this table as efficiently as possible?
The table structure is as follows:
describe big_table; +-----------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------------------+------+-----+---------+----------------+ | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | col1 | bigint(20) unsigned | NO | MUL | NULL | | | col2 | bigint(20) unsigned | NO | MUL | NULL | | | chardata | varchar(35) | YES | | NULL | | +-----------+---------------------+------+-----+---------+----------------+ |
Solution 1: Delete
We can simply run a query against the database that will delete all records that are irrelevant. The idea here is that we use a single query that’s easily readable to filter out the unwanted records. For this example we’ll assume we are sharding this data and want to keep all odd values for either col1 or col2 in this table. We can use MySQL’s MOD() function and check for even values and delete them. We expect this query to remove ~25% of our table.
DELETE FROM big_table WHERE MOD(col1,2) = 0 AND MOD(col2, 2) = 0; |
Solution 2: Create -> Insert -> Rename -> Drop (CIRD)
Another solution is to create a new table identical to the original table then simply insert the records we want to keep into the new table. Once we have all the records we wish to keep, we simply rename the tables and then drop our original table. The difference in this query is to think about what we want to keep as opposed to what we want to get rid of. We expect this method to insert ~75% of the original table.
CREATE TABLE big_table_copy LIKE big_table; INSERT INTO big_table_copy SELECT * FROM big_table WHERE MOD(col1, 2) = 1 OR MOD(col2, 2) = 1; RENAME TABLE big_table TO big_table_old, big_table_copy TO big_table; DROP TABLE big_table_old; |
So which is faster?
It turns out it’s considerably faster to use solution 2. Solution 1 finished preparing our test table in just over 33 minutes. Solution 2 completed the inverse task in just over 5 1/2 minutes. The test data was 1.58 million rows of randomly generated data. The MySQL server was cripped to a 32Mb buffer pool to closely mimic our real world workload which would be heavily IO bound. The on disk file was about 260Mb. Using CIRD is about 6x faster than a straight delete. This probably isn’t surprising as it’s been reported for a long time that DELETE is ~10x slower than INSERT with InnoDB, but I wanted to validate these findings to determine if they still held true for newer versions (5.1.30) of MySQL.


Generally, if you need to have the browser see a page while you continue executing the request I would use a queue system like
Storing application data on the OpenSocial host is a great way to offload some unnecessary database and application server load. Why request a preference such as a skin for a user profile from your servers if we can just let the container handle it? MySpace allows for ~1K of data storage per user per application. However, there is a bug with the method
I have found the tool I’ve been looking for!
Another great tool that I discovered today is
Following up on the post about
I really like the date strings in the Facebook Newsfeed. They somehow make the data feel more personal when they relate to me with more natural language “Today”. This in place of callous standardized formatting so many websites present such as “2/10/2009″ or worse yet “02/10/2004″ or even worse still “2009-02-10″. The following code illustrates how you can create a quick and easy formatting of data that’s ordered by date and output it in a meaningful way to your users. You can see an example of this in action on the 