I recently asked this question on StackOverflow about how to join to a derived table using an index. I found out that even if your derived table is generated with tables that use all of the proper indexes, and you try to join that derived table to a base table in your outer query, no indexes are used. Ever...
So, let's start with a contrived example (as we often do)...
OK... so let me get this straight... even if t1.f1 and t2.f1 are indexed, the join will still be a full table scan? You betcha!
Let me say that again... whenever you write a subquery and try to join the derived table to your outer query tables, no indexes are ever used. It's a full table scan.
So, I think to myself... "Wow... so that's really stupid, actually... because I have tables that only have a few thousand rows in them, and the query performance is shit. Like... over 2 seconds... terrible and unacceptable." What's the point of subqueries in MySQL if they throw foul-smelling excrement all over your query performance? Eww... gross.
After some research, I found out that the query optimizer in MySQL 5.6.3+, which is a preview release at the time of this writing, is actually smart enough to create an index on the derived table if it can improve query performance. YAY! Unfortunately, everyone is running 5.1 right now; 5.6 is many months away. *sigh* Bummer, dudes.
But, fortunately, I found a workaround, as discussed by Venu Anuganti (kudos, dude!) in one of his blog posts. His blog post talks about creating a temporary table containing the rows of the derived table, adding an index to it, and then running your main query, joining to the temporary table, instead of writing the derived query. This approach is a workaround, but it's almost an insult to the MySQL query optimizer... you have to hold MySQL's hand to tell it how to not be stupid.
For those who don't like this solution... consider this...
Things I tried that don't work:
So, let's start with a contrived example (as we often do)...
SELECT * FROM t1
JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;
OK... so let me get this straight... even if t1.f1 and t2.f1 are indexed, the join will still be a full table scan? You betcha!
Let me say that again... whenever you write a subquery and try to join the derived table to your outer query tables, no indexes are ever used. It's a full table scan.
So, I think to myself... "Wow... so that's really stupid, actually... because I have tables that only have a few thousand rows in them, and the query performance is shit. Like... over 2 seconds... terrible and unacceptable." What's the point of subqueries in MySQL if they throw foul-smelling excrement all over your query performance? Eww... gross.
After some research, I found out that the query optimizer in MySQL 5.6.3+, which is a preview release at the time of this writing, is actually smart enough to create an index on the derived table if it can improve query performance. YAY! Unfortunately, everyone is running 5.1 right now; 5.6 is many months away. *sigh* Bummer, dudes.
But, fortunately, I found a workaround, as discussed by Venu Anuganti (kudos, dude!) in one of his blog posts. His blog post talks about creating a temporary table containing the rows of the derived table, adding an index to it, and then running your main query, joining to the temporary table, instead of writing the derived query. This approach is a workaround, but it's almost an insult to the MySQL query optimizer... you have to hold MySQL's hand to tell it how to not be stupid.
For those who don't like this solution... consider this...
Things I tried that don't work:
- Moving the derived table into a view - no noticeable performance gains here... there are no indexes on a view either. Duh!
- Using "USE/FORCE INDEX" syntax on the derived table. MySQL doesn't like this - invalid syntax.
- Curse at MySQL (although this did help... it didn't solve the problem)
- Cry/weep - what are you, some kind of cry-baby? Geez!
Things that might work:
- Joining directly to t2 and using a GROUP BY in the query, eliminating the subquery altogether. I like this approach, too, but it involves re-writing your query and GROUPing BY every single column in t1. Lame. I wonder what that does to query performance?
- Strangle a MySQL RDBMS Software Engineer
And, finally, I'd like to say that neither the workaround above nor the query optimizer improvements in MySQL 5.6.3+ solve the entire problem. Sometimes you might not need to generate a key on the derived table because the field in the derived table is already indexed in the first place. In this case, MySQL should not regenerate the key... which takes O(n * log(n) ) time, by the way... it should just re-use the index that already exists to join the derived table to the outer query. In our example above, if t2.f1 is already indexed, there is no need for the optimizer to create an index on the derived table because the field used to join to the outer query is already indexed.
Comments