Pages

Sunday 27 September 2009

Reading the SQL Server Execution Plan

We're having a pretty good weather in London: still sunny and about 18C. I took the bicycle to Hampstead Heath with the VAIO in the rucksack and started looking into SQL performance tuning, as you do.

Went through SQL Server 2008 Query Performance Tuning Distilled in Safari.

Here are a few reading notes...

What are the different types of joins?
  • nested loop join: the most intuitive one. This is the sort of join you would write if you were to code it in C++: iterate over the smallest table first and for each row, look for a match in the other table. Efficient only if the first input is small, and the second is large and indexed.
  • hash join: used if the largest input is not indexed. This is done in two steps:
step 1: builds a hash table with the smallest of both inputs. This hash table uses a hash function to associate a value in the joined column with an index to a bucket. Go through the whole input row by row and add each row to its appropriate bucket using the hash function.
step 2: go through the second input row by row. For each value in the joined column, work out the index to the bucket in the hash table using the hash function. If a row is present, then there is a match and the row is kept in the result set.
  • merge join: used if an index exists on the join columns of both tables. The join columns are sorted in both tables using the indexes. Then comparing columns is relatively fast because it takes advantage of the ordering.
What is a RID Lookup?
  • If a table does not have a clustered index, data pages are on the heap.
  • If a table has a clustered index, they are inside the clustered index.
Non clustered indexes contain pointers to table rows: this pointer is either
  • a RID (Row ID) if the table is on the heap
  • or a clustered index key if the table has a clustered index.

A RID lookup takes place on a heap table (table without clustered index). In order to locate data using a non clustered index SQL Server uses the RID to locate the data row in the heap. A RID lookup is costly because it involves an extra page read (on top of the page read needed for the non clustered index). You wouldn't get this extra page read with a clustered index.

How to see the execution plan directly from the SQL profiler?
This is very handy! No need to try and re-run a slow query in SSMS. Simply track the event ShowPlan XML in SQL Profiler under the Performance group. When you run the trace you can see the actual execution plan of any statement. The plan is displayed in a graphical way as in SSMS.
Warning, not to be used in production! It slows down the performance of the database quite a lot.

More about indexes:
Index Analysis
Index Design Recommendations 

Other resources:
Checklist for analysing slow-running queries