Optimizing SQL

Breck Carter
Last modified: May 15, 1996
mail to: bcarter@bcarter.com
[Home]

Optimizing SQL

Test Your Changes: Different databases handle SQL differently so it's not possible to lay down hard and fast rules that apply in every situation. What you can do, however, is look here for ideas to try and use an execution profiler to perform timing tests.

Check the Execution Plan: Most databases provide some facility to display the optimization strategy or plan that will be used. This will tell you if a fast index or primary key search will be used instead of a slow full table scan. For Sybase and Microsoft SQL Server use ISQL and the set showplan on command. For Watcom SQL use the plan() function as follows:

   select plan ( 'select * from customer where id = 101' )
     from sys.dummy;

   Estimate 2 I/O operations
   Scan customer using primary key for rows where id equals 101
   Estimate getting here 1 times

   select plan ( 'select * from customer where fname = ''Beth''' )
     from sys.dummy;

   Estimate 10 I/O operations
   Scan customer sequentially
   Estimate getting here 126 times

Use Indexes: Carefully chosen indexes speed up SQL queries without slowing updates too much. Almost all tables can benefit from an index, and experience has shown that the "ideal index" is almost never the primary key (even though a primary key index may be required to preserve referential integrity). Be aware, however, that many databases only store the first few bytes (e.g., 10 for Watcom SQL) of each entry in the actual index. And with Watcom SQL it is not necessary to define indexes for primary and foreign keys because they are automatically created; if you use ERwin or a similar tool be sure to disable the creation of indexes for Watcom primary and foreign keys.
Avoid Not: If you can rewrite your queries to use the positive exists, in and like operators instead of not exists, not in or not like the database may be able to stop looking as soon as it finds an entry that satisfies the condition rather than proving that no entries exist.
Use Exists instead of Count: The DBMS may be able to stop looking as soon as it finds a row that satisfies the exists condition whereas with count it must process all matching rows.
See also Exists Versus Count(*).
Try >= instead of >: If there is an index on column try select * from table where column >= 4 instead of where column > 3. Instead of looking in the index for the first row with column = 3 and then scanning forward for the first value that is > 3, the DBMS may jump directly to the first entry that is = 4.
Try Union instead of Or: The database may do a better job of optimizing two selects connected via union rather than one select with an or operator. For example,
```
   select * from a, b where a.p = b.q
    union
   select * from a, b where a.x = b.y
```
may run faster than
```
   select * from a, b where a.p = b.q or a.x = b.y
```
Avoid Where Column Like '%string': On the other hand where column like 'string%' may run quite quickly especially if there's an index on column.
Use Like Instead of Substr: If there is an index on column the database may process where column like 'x%' faster than where substr ( column, 1, 1 ) = 'x'.
Split Procedure to Create Then Select: If a stored procedure creates a temporary table and then selects from it, the database may not be able to optimize the select properly. Try moving the select to a second stored procedure which is called from the first.
Split Procedure to Turn a Variable Into a Parameter: The database may do a better job of optimizing where column = @var if @var is a stored procedure parameter rather than a local variable. Try moving the select to a second procedure which is called from the first procedure with @var passed as a parameter to the second procedure.
Consider Indexes for Max() and Min() The max ( column ) and min ( column ) aggregate functions may run faster if there is an index on column. Other restrictions may apply; try putting the function all by itself in a separate select rather than combining it with other expressions. The presence of a group by or where clause may also inhibit optimization.
Make Indexes Unique When They Are: The database may do a better job with unique indexes than with non-unique ones. So if an index is truly unique, declare it as such.
Avoid Correlated Subselects: A correlated subselect is a nested select that refers to a column from the outer select. Here is an example that uses product.id as a correlation column to find all products that have no sales orders:
```
   select product.id
     from product
    where not exists (
   select sales_order_items.id
     from sales_order_items
    where sales_order_items.prod_id = product.id )
```
Correlated subselects can be very slow if the inner result set is re-selected for each and every candidate row in the outer result set. Alternative SQL can sometimes look rather bizarre but it's usually worth the effort. In Watcom SQL the following select runs almost 4 times faster by using an outer join instead of a correlated subselect:
```
select product.id
  from product left outer join sales_order_items
    on product.id = sales_order_items.prod_id
 where IfNull ( sales_order_items.id, 1, 0 ) = 1
```

Breck Carter can be reached by phone at (416) 763-5200 or via email at bcarter@bcarter.com.