Breck Carter
Last modified: June 7, 1996
mail to: bcarter@bcarter.com
Yesterday's Pop Quiz Watcom 'Rithmetic contained a huge error, completely separate from the relatively minor problem with integer - versus - real arithmetic. Did you spot it?
If not, here it is again, repeated as today's Pop Quiz: What's really wrong with this select?
select product.name,
sum ( sales_order_items.quantity
/ 12 ) / 365
from product,
sales_order_items
where sales_order_items.ship_date >= '1994 01 01'
and sales_order_items.ship_date <= '1994 12 31'
group by product.name
order by product.name;
Here's a clue: Have a look at Too Many Rows?
Yes, indeed... if it wasn't for the group by clause the cartesian product would be obvious. Every single quantity is being summarized for every product; i.e., all the sales are being paired with all the products. The where clause is missing a line:
and product.id = sales_order_items.prod_id
Before the sum() function and group by clause summarized everything, the result set should have contained 411 product/quantity pairs. But without the join on product.id it actually contained 4110 rows.
If it wasn't for this explosion of data the arithmetic problem might have been more obvious. This select...
select product.name,
sum ( sales_order_items.quantity
/ 12 ) / 365
from product,
sales_order_items
where sales_order_items.ship_date >= '1994 01 01'
and sales_order_items.ship_date <= '1994 12 31'
and product.id = sales_order_items.prod_id
group by product.name
order by product.name;
...returns all zeros for the daily average sales figures:
Baseball Cap 0 Shorts 0 Sweatshirt 0 Tee Shirt 0 Visor 0
Here's another question: If there are 5 products, and 411 order quantities, why did the cartesian product contain 4110 rows? Shouldn't it contain only 2055 rows, or 411 times 5?
Have another look at the select: The group by is on product.name whereas the missing join is on product.id. The name column isn't unique. There are actually 10 different id values and only 5 different names: different styles of product have different ids but the same name.
For the record, here's the final solution:
select product.name,
sum ( cast ( sales_order_items.quantity as real )
/ 12 ) / 365
from product,
sales_order_items
where sales_order_items.ship_date >= '1994 01 01'
and sales_order_items.ship_date <= '1994 12 31'
and product.id = sales_order_items.prod_id
group by product.name
order by product.name;
Here's the final result set, with neither an arithmetic problem nor a cartesian product:
Baseball Cap 0.556393 Shorts 0.419178 Sweatshirt 0.394521 Tee Shirt 0.646575 Visor 0.430137
And the moral of this story? If you're going to use a group by clause to summarize data make sure the base result set contains the right rows. Test the select without the summarization first. A group by clause can hide a huge mistake, even a cartesian product!