Breck Carter
Last modified: May 27, 1996
mail to: bcarter@bcarter.com
Here's an example using the Powersoft Demo Database. The original select returns about 1000 rows as expected:
select sales_order.id,
sales_order_items.prod_id,
product.unit_price
from sales_order,
sales_order_items,
product
where sales_order.id = sales_order_items.id
and sales_order_items.prod_id = product.id
order by sales_order.id;
Here's the modified version; why does it return over 10,000 rows?
select sales_order.id,
sales_order_items.prod_id
from sales_order,
sales_order_items,
product
where sales_order.id = sales_order_items.id
order by sales_order.id;
The answer lies in the from clause. All references to columns in the
product table have been removed but not the reference to the table
itself. Instead of the intended two-table join (sales_order and
sales_order_items) it is still a three-table join.
Because there is nothing in the where clause to say otherwise, each and every row in the product table is joined with each and every row in the expected result set (the join of sales_order and sales_order_items). There are 10 products in the Powersoft Demo Database so the final result set is ten times larger than expected.
The join of "all rows in one table with all rows in another" is called a cartesian product in RelationalSpeak. It's the number one reason why a select returns more rows than expected, and it's the first thing you should think of when the result set is suddenly overwhelming.