Foxhound is the better* Database Monitor for SQL Anywhere.
*better: More thorough, more relevant, more effective.
...more Alerts, more All Clears, more details, more control in your hands.


Breck Carter
Last modified: June 7, 1996
mail to: bcarter@bcarter.com



Group By Peril

Yesterday's Pop Quiz Watcom 'Rithmetic contained a huge error, completely separate from the relatively minor problem with integer - versus - real arithmetic. Did you spot it?

If not, here it is again, repeated as today's Pop Quiz: What's really wrong with this select?

   select product.name,
          sum ( sales_order_items.quantity
                / 12 ) / 365
     from product,
          sales_order_items
    where sales_order_items.ship_date >= '1994 01 01'
      and sales_order_items.ship_date <= '1994 12 31'
    group by product.name
    order by product.name;

Here's a clue: Have a look at Too Many Rows?

Yes, indeed... if it wasn't for the group by clause the cartesian product would be obvious. Every single quantity is being summarized for every product; i.e., all the sales are being paired with all the products. The where clause is missing a line:

   and product.id = sales_order_items.prod_id

Before the sum() function and group by clause summarized everything, the result set should have contained 411 product/quantity pairs. But without the join on product.id it actually contained 4110 rows.

If it wasn't for this explosion of data the arithmetic problem might have been more obvious. This select...

   select product.name,
          sum ( sales_order_items.quantity
                / 12 ) / 365
     from product,
          sales_order_items
    where sales_order_items.ship_date >= '1994 01 01'
      and sales_order_items.ship_date <= '1994 12 31'
      and product.id = sales_order_items.prod_id
    group by product.name
    order by product.name;

...returns all zeros for the daily average sales figures:

   Baseball Cap  0
   Shorts        0
   Sweatshirt    0
   Tee Shirt     0
   Visor         0

Here's another question: If there are 5 products, and 411 order quantities, why did the cartesian product contain 4110 rows? Shouldn't it contain only 2055 rows, or 411 times 5?

Have another look at the select: The group by is on product.name whereas the missing join is on product.id. The name column isn't unique. There are actually 10 different id values and only 5 different names: different styles of product have different ids but the same name.

For the record, here's the final solution:

   select product.name,
          sum ( cast ( sales_order_items.quantity as real )
                / 12 ) / 365
     from product,
          sales_order_items
    where sales_order_items.ship_date >= '1994 01 01'
      and sales_order_items.ship_date <= '1994 12 31'
      and product.id = sales_order_items.prod_id
    group by product.name
    order by product.name;

Here's the final result set, with neither an arithmetic problem nor a cartesian product:

   Baseball Cap  0.556393
   Shorts        0.419178
   Sweatshirt    0.394521
   Tee Shirt     0.646575
   Visor         0.430137

And the moral of this story? If you're going to use a group by clause to summarize data make sure the base result set contains the right rows. Test the select without the summarization first. A group by clause can hide a huge mistake, even a cartesian product!


See also: Watcom 'Rithmetic


Breck Carter can be reached by phone at (416) 763-5200 or via email at bcarter@bcarter.com.