Skip to main content

Improving query performance - Subqueries (*aka Derived Queries)

I recently asked this question on StackOverflow about how to join to a derived table using an index.  I found out that even if your derived table is generated with tables that use all of the proper indexes, and you try to join that derived table to a base table in your outer query, no indexes are used.  Ever...

So, let's start with a contrived example (as we often do)...


SELECT * FROM t1
  JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;


OK... so let me get this straight... even if t1.f1 and t2.f1 are indexed, the join will still be a full table scan?  You betcha!

Let me say that again... whenever you write a subquery and try to join the derived table to your outer query tables, no indexes are ever used.  It's a full table scan.

So, I think to myself... "Wow... so that's really stupid, actually... because I have tables that only have a few thousand rows in them, and the query performance is shit.  Like... over 2 seconds... terrible and unacceptable."  What's the point of subqueries in MySQL if they throw foul-smelling excrement all over your query performance?  Eww... gross.

After some research, I found out that the query optimizer in MySQL 5.6.3+, which is a preview release at the time of this writing, is actually smart enough to create an index on the derived table if it can improve query performance.  YAY!  Unfortunately, everyone is running 5.1 right now; 5.6 is many months away.  *sigh* Bummer, dudes.

But, fortunately, I found a workaround, as discussed by Venu Anuganti (kudos, dude!) in one of his blog posts.  His blog post talks about creating a temporary table containing the rows of the derived table, adding an index to it, and then running your main query, joining to the temporary table, instead of writing the derived query.  This approach is a workaround, but it's almost an insult to the MySQL query optimizer... you have to hold MySQL's hand to tell it how to not be stupid.

For those who don't like this solution... consider this...

Things I tried that don't work:

  • Moving the derived table into a view - no noticeable performance gains here... there are no indexes on a view either. Duh!
  • Using "USE/FORCE INDEX" syntax on the derived table. MySQL doesn't like this - invalid syntax.
  • Curse at MySQL (although this did help... it didn't solve the problem)
  • Cry/weep - what are you, some kind of cry-baby?  Geez!
Things that might work:
  • Joining directly to t2 and using a GROUP BY in the query, eliminating the subquery altogether.  I like this approach, too, but it involves re-writing your query and GROUPing BY every single column in t1.  Lame.  I wonder what that does to query performance?
  • Strangle a MySQL RDBMS Software Engineer
And, finally, I'd like to say that neither the workaround above nor the query optimizer improvements in MySQL 5.6.3+ solve the entire problem.  Sometimes you might not need to generate a key on the derived table because the field in the derived table is already indexed in the first place.  In this case, MySQL should not regenerate the key... which takes O(n * log(n) ) time, by the way... it should just re-use the index that already exists to join the derived table to the outer query.  In our example above, if t2.f1 is already indexed, there is no need for the optimizer to create an index on the derived table because the field used to join to the outer query is already indexed.

Comments

Popular posts from this blog

Wedding Prediction - October, 2013

Carla and I are planning on getting married sometime in October next year.  We need to pick a date, and that decision may  involve some science and mathematics.  :) For example, we want the weather to be nice.  To be more precise, we'd like the high temperature for the wedding day to be between 60 and 80 degrees Fahrenheit.  Obviously, we have both lived in Ohio our entire lives, and we have a pretty good idea of what the weather will be like.  We both hypothesised that October was a "hit or miss" sort of month; it could be cold, or it could be nice. But, for me, a simple hypothesis was not enough; I really wanted to know the probabilities of decent weather based on historical weather data.  Many websites on the Internet (i.e. almanac.com) charge you to review historical weather data, but Carla and I discovered a cool page on cleveland.com that provided exactly what we wanted.  I loaded the historical temperature data from 1903 to 2011 f...

Web Browsers You Should Support

As a web developer, generally speaking, you should consider supporting the following browsers (at the time of this writing): Chrome (latest) - the browser that sets the bar for the others; you should be using it and supporting it Internet Explorer 9+ - the browser that finally caught up with the times a bit; basically, a Chrome wannabe.  I still say that IE sucks... even if it really doesn't anymore.  Yes... I'm sour about IE8 and below. Internet Explorer 8 - the old, sad browser that we sadly still have to support for a while.  CSS 3 is not well-supported here, so we use projects like CSS3 PIE or whatever.  By the way... IE8 sucks.  I can't wait until this comes off of the list. Firefox (latest) - the browser that was once awesome and has sadly suffered recently because it's slower than Chrome... but hey, lots of people still use it. Safari (latest) - Watch out for Safari as more iPhones, iPads, Macs, and more overly-priced Apple products flood the ...

BallWorld Screen Saver

Overview: My last Java programming assignment for my class at the University of Akron was called "BallWorld."  Its details can be found here .  I will not post any source code here, but I will post an executative JAR file that will run the screen saver.  Anyways, the final project of the BallWorld project was kind of cool, so I modified it a little bit to make a pretty neato screen saver.  You can download the project here:  BallWorld.zip .   The zip file contains a .JAR, an .EXE, and a .JOB.  The JAR file should execute the screen saver on any operating system.  The EXE file works only on Windows.  The JOB (Windows Task Scheduler) file can be used to automatically run the EXE file after a specified amount of computer idle time. Details: The EXE file was created using a program called Launch4j .  Launch4j simply takes a JAR file and converts it into a Win32 EXE.  Obviously, this destroys platform-inde...