Quantcast
Channel: MySQL Performance Blog » Search Results » mysql microsecond slow log patch
Viewing all articles
Browse latest Browse all 36

How (not) to find unused indexes

$
0
0

I’ve seen a few people link to an INFORMATION_SCHEMA query to be able to find any indexes that have low cardinality, in an effort to find out what indexes should be removed.  This method is flawed – here’s the first reason why:

CREATE TABLE `sales` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_id` int(11) DEFAULT NULL,
`status` enum('archived','active') DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `status` (`status`)
) ENGINE=MyISAM AUTO_INCREMENT=65691 DEFAULT CHARSET=latin1;
mysql> SELECT count(*), status FROM sales GROUP by status;
+----------+---------+
| count(*) | status  |
+----------+---------+
|    65536 | archived |
|      154 | active  |
+----------+---------+
2 rows in set (0.17 sec)
mysql> EXPLAIN SELECT * FROM sales WHERE status='active'; # query 1
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
|  1 | SIMPLE      | sales | ref  | status        | status | 2       | const |  196 | Using where |
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
1 row in set (0.06 sec)
mysql> EXPLAIN SELECT * FROM sales WHERE status='archived'; # query 2
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | sales | ALL  | status        | NULL | NULL    | NULL | 65690 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)

The cardinality of status index is woeful, but provided that the application is always only sending query 1 to MySQL it’s actually a pretty good index!  It’s not always like this, but there are a lot of cases where applications have good selectivity with some queries despite what cardinality shows.

Not convinced?  Here’s reason number two:

CREATE TABLE `Country` (
`Code` char(3) NOT NULL DEFAULT '',
`Name` char(52) NOT NULL DEFAULT '',
`Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL DEFAULT 'Asia',
`Region` char(26) NOT NULL DEFAULT '',
`SurfaceArea` float(10,2) NOT NULL DEFAULT '0.00',
`IndepYear` smallint(6) DEFAULT NULL,
`Population` int(11) NOT NULL DEFAULT '0',
`LifeExpectancy` float(3,1) DEFAULT NULL,
`GNP` float(10,2) DEFAULT NULL,
`GNPOld` float(10,2) DEFAULT NULL,
`LocalName` char(45) NOT NULL DEFAULT '',
`GovernmentForm` char(45) NOT NULL DEFAULT '',
`HeadOfState` char(60) DEFAULT NULL,
`Capital` int(11) DEFAULT NULL,
`Code2` char(2) NOT NULL DEFAULT '',
PRIMARY KEY (`Code`),
KEY `Population` (`Population`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
mysql> SELECT count(*) FROM Country;
+----------+
| count(*) |
+----------+
|      239 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT count(distinct(population)) FROM Country;
+-----------------------------+
| count(distinct(population)) |
+-----------------------------+
|                         226 |
+-----------------------------+
1 row in set (0.05 sec)
mysql> EXPLAIN SELECT * FROM country WHERE population > 1000; # query 3
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | country | ALL  | Population    | NULL | NULL    | NULL |  239 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.04 sec)
mysql> EXPLAIN SELECT * FROM country WHERE population > 100000000; # query 4
+----+-------------+---------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table   | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+---------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | country | range | Population    | Population | 4       | NULL |   23 | Using where |
+----+-------------+---------+-------+---------------+------------+---------+------+------+-------------+
1 row in set (0.00 sec)

The index on query 3 had high cardinality but should not be used since too many countries have a population greater than 1000.  An automated search for low cardinality indexes wouldn’t have revealed it’s uselessness.  For range scans, it’s very easy to lead yourself into a trap where your index can not filter out enough rows to be effective.  I see this a lot in consulting issues where customers have queries that use a BETWEEN on a date, but the window of time it is searching in is too wide.

Side Note: In some texts you’ll see people quote the numbers “20-30%” as the minimum amount of rows you have to filter down to for an index to be useful (that is, eliminate 70-80% of rows).  It’s not quite correct to quote this as an exact percentage, since this value is not fixed in MySQL and can be a much wider window (15-60%) depending on the circumstances.  In this case, MySQL flipped from tablescan to index at about 34%.

How am I supposed to find unused indexes then?
You really have to run queries against your server – there is no other way.  From there, there’s a helpful patch in 5.0-percona called INDEX_STATISTICS that can then show you which indexes were touched and which were not.

If you are not running a patched server, then the alternative is to either use a proxy that checks EXPLAIN information (like QUAN) or set your slow query log to zero microseconds (5.1 feature) and then find someway to parse and EXPLAIN all results, then subtract the indexes that were mentioned from all indexes known.  There’s an old tool called mysqlidxchx which should be able to do this.

The post How (not) to find unused indexes appeared first on MySQL Performance Blog.


Viewing all articles
Browse latest Browse all 36

Trending Articles