Secrets of MySQL. Short cheat sheet
In this post I tried to gather interesting facts about MySQL that were once eye-opening for myself and maybe some of which will be new for you. This is not “in depth” post, but rather a collection of tips and tricks that might encourage you to read more about some aspect of MySQL in documentation/other blog posts.
Starting with MySQL 5.6 EXPLAIN command has json format. Which is not just another output type, but contains some extended very useful data. So if you need more information on what’s going on with your query just use:
EXPLAIN FORMAT=JSON SELECT ...
As we know there are 2 main engines in MySQL: MyISAM and InnoDB. MyISAM keeps address - a shift from the beginning of the data file in an index. InnoDB stores values of all the fields from primary key in every index instead. This has several interesting consequences:
To get a data from the table using regular index InnoDB engine would need to make 2 searches: first one in an index, and second one in the data file, having values for primary key. So InnoDB has a decent overhead on using non-primary indexes.
if some table on InnoDB engine has big primary key (many fields or a string field) and several indexes on the same table, you might want to replace primary key with autoincrement field, otherwise you will have a pretty big overhead on indexes.
Covering index: if you have InnoDB engine and a table with 3 fields (a, b, c), primary key is (a,b) and you have an index on the field "c" then SELECT a, b, c FROM tablename WHERE c=1 will get all the data from the index (no need to search and read data from table data file).
even if you do not create a primary key, MySQL will do it for you. Will keep it hidden and will use it in indexes.
one of the very important configuration parameters for InnoDB is innodb_pool_buffer_size. If it is big enough, then all the data can be fully placed in RAM, which speeds things up.
There is some sort of belief in MySQL world that SELECT count( * ) FROM tablename; is always blazingly fast. Actually that's only half of the truth. Such query runs really fast (in O(1)) only on MyISAM engine (not InnoDB). And the reason for that is that MyISAM keeps exact metadata for the table. So such query invokes only read of a single field from metadata. But if we add any condition to the WHERE clause it won't work and will require full scan/index search. As far as I know such optimizations are also absent in other MySQL engines.
Once it was an eye-opening for me that GROUP BY has a "WITH ROLLUP" modifier.
Another modifier SQL_CALC_FOUND_ROWS that is sometimes used to determine the full number of items from a SELECT statement in order to make a pager usually should be avoided, especially for big datasets. The reason for that is that it forces full table scan. A better strategy is to SELECT more items for the page then you need to show, and after that render several next pages and "last page" link.
One can’t use an index to sort in different orders. So that ORDER a ASC, b DESC would use an index. Though MySQL 5.7 has some new features to overcome this issue, like virtual columns: https://www.percona.com/blog/2015/04/29/generated-virtual-columns-in-mys...
MySQL can't use an index for a WHERE clause such as “WHERE age BETWEEN 18 AND 25” if the same query uses an index to sort users by another field. If MySQL uses an index for a range criterion in a query, it cannot also use another index (or a suffix of the same index) for ordering. MySQL will also stop using key parts of a multi-part index as soon as it meets the range (<, >, BETWEEN), it will however continue using index if range is expressed with the help of IN() statement.
To test it ( http://pastebin.com/f7vSimRy ):
Starting MySQL 5.6 it has a new feature that is called “index condition pushdown” - ICP. This means that some conditions that earlier couldn’t use index at all now can do it at least partially. Let’s see the examples:
SELECT a FROM tbl WHERE b=3 AND c LIKE “%hi%”
Before MySQL 5.6 such query could use:
KEY(b) - standard key, nothing special.
KEY(b,c,a) - covering index, so all the data for the result of the query could be read from it.
And now such index can also be used:
KEY(b,c) - it makes a range access by “b” and after that condition on “c” is “pushed down” so that we can filter by c column on index data and only after that read rows from table itself.
Another example for KEY(a,b,c) might be the following:
SELECT * FROM tbl WHERE a=5 AND c=6; now such query will use index to filter by “a” in O(log n), then filter by “c” in O(n) and only after that will scan the table for the data.
It is important to keep “related” content closer to each other.
And as table rows are stored based on primary key, it might be a good idea to create PK in a way that will group similar content together. A good example for that could be comments to the blog articles. The most common use-case is when you get them from DB to show under the article. But if you use an auto_increment field as a PK - those comments for a single article would be split across the whole database. As one of the possible solutions for that might become PRIMARY KEY (post_id, id). This way comments of a single post will fit into several pages on the disk and so can be easily cached and read at once.
The FULLTEXT indexes are sometimes heavily underestimated. One of the obvious applications for them is to search the text. It works fine for small to medium sites and for the big sites there are other solutions like Apache Solr, Sphinx or any other solution. But MySQL also has an interesting feature name “boolean full-text search”. This type of search can quickly check if some strings are present in the text or not. One of the applications might be a search of goods in a shop. Let’s imagine that you have a shop that sells a huge variety of different goods, and each good has unique characteristics. And for each set of characteristics you need to allow your users to quickly find out whether you have something that fits their need or not.
If you have hundreds or thousands of features you won’t be able to make a field in a table for every of them and it would be near to impossible to build an index over those fields.
The solution would be to create a separate text field for “technical use”. This field would store the text that is formed the following way: for every characteristic of a good and it’s value will be encoded. One example could look like this: “cpu_core_i5 cpu_version_5250U RAM_8GB touchscreen_yes ssd_yes” etc. After that you should create a full-text index on this field. And now you can start to quickly search for the objects that match your conditions.
SELECT * FROM goods WHERE MATCH (characteristics) AGAINST ('+cpu_core_i5 -cpu_version_5250U' IN BOOLEAN MODE);
This will find all the devices that have Core i5 processor but not 5250U revision.
If you want to find out more on the stuff described here one of the excellent resources is “High Performance MySQL” by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko. Most of the info from this post was initially read in this book and heard at some lectures of mr. Zaitsev.
BTW. Mr Zaitsev and co. currently are giving away a part of their new book for free.
Hope this was helpful,