Foxhound is the better* Database Monitor for SQL Anywhere.
*better: More thorough, more relevant, more effective.
...more Alerts, more All Clears, more details, more control in your hands.


Breck Carter
Last modified: May 31, 1996
mail to: bcarter@bcarter.com



Dealing With Hierarchies

A "fishhook" is a foreign key relationship where the same table acts as both parent and child. It is often used to represent hierarchical relationships between different rows in the same table. For example, each employee row may contain a column pointing to that employee's supervisor who in turn is represented by another row in the same table.

Figure 1: Hierarchical "Fishhook" Relationship

Figure 1 shows a "location" table used to represent a company's various offices. At the top of the hierarchy is the corporate head office, and beneath that there are levels for regional, district and local branch offices. The location.parent_location_id for each branch points to its district office. For each district office the parent is the regional office, and so on up the chain. At the very top the parent_location_id for the corporate office is set to null.

While these fishook relationships are easy to implement and maintain they present problems when it comes time to produce reports. For example, the simple request to "display all locations in region 2" becomes a difficult "tree walking" affair. Figure 2 shows that branch office "c" should be included but it's parent_location_id points to district 5, not region 2.

Figure 2: Sample Location Hierarchy

                       child/parent  child/parent
           1              1   null      a    4
      -----------         2    1        b    4
      2         3         3    1        c    5
    ------    ------      4    2        d    5
    4    5    6    7      5    2        e    6
   ---  ---  ---   -      6    3        f    6
   a b  c d  e f   g      7    3        g    7

If the number of levels is fixed and known, it is possible to write SQL statements involving unions of selects to traverse the various levels but it's not a pleasant task. And it becomes even more unpleasant if the hierarchy can become arbitrarily deep.

Figure 3: Denormalized Hierarchy Table

A brute force solution is to flatten the hierarchy into a denormalized table that contains one row for each location and each of its "ancestors": each parent, each grandparent and so on up to the location at the top. Figure 3 shows what such a hiearchy table might look like.

Figure 4 shows what the hierarchy table would contain for the example above. A query can now easily find all locations in district 2 via select child_id from hierarchy where parent_id = 2 and the result set will indeed contain location "c".

Figure 4: Hiearchy Table Contents

                            parent/child pairs
           1            1 1   1 a   2 4   3 7   6 e
      -----------       1 2   1 b   2 5   3 e   6 f
      2         3       1 3   1 c   2 a   3 f   7 g
    ------    ------    1 4   1 d   2 b   3 g
    4    5    6    7    1 5   1 e   2 c   4 a
   ---  ---  ---   -    1 6   1 f   2 d   4 b
   a b  c d  e f   g    1 7   1 g   3 6   5 c

But how is this table kept up to date? One method is to treat it as as a temporary table and refresh it whenever a report is to be run. If an actual temporary table cannot be used, the current user's id could be included as part of the primary key so more than one user can load and use the table without too many annoying locks. That's why the user_id column is shown in Figure 3.

Also shown in Figure 3 is the hierarchy.type column which contains the string "location" for this example. This column makes it possible to use the same table for more than one hierarchy; e.g., type could be set to "person" for the employee reporting relationship.

The last problem is "How does the hierarchy table get loaded?" Figure 5 shows a SQL Anywhere 5 stored procedure which refreshes the location rows for the current user. This procedure makes one pass through a loop for each level it finds in the hierarchy and stops when no more rows have been inserted. This loop process makes it possible to compute the level number which is included in the hierarchy table to make some queries and sorts easier.

Figure 6 shows how this procedure can be called from a PowerBuilder script.

Figure 5: Loading The Location Hierarchy

drop procedure p_load_location_hierarchy`

create procedure p_load_location_hierarchy()

// Load a "location" hierarchy for this user
// based on location.location_id and
// location.parent_location_id values.

begin

declare previous_row_count integer;
declare this_row_count integer;
declare new_level_number integer;

// Make room for a new hierarchy.

delete from hierarchy
 where user_id = user
   and type = 'location';

// Insert "root" rows as id-id pairs.

set new_level_number = 1;

insert into hierarchy
select distinct
       user,
       'location',
       location_id,
       location_id,
       new_level_number
  from location
 where parent_location_id = location_id
    or IfNull ( parent_location_id, 1, 0 ) = 1;

// Insert the first layer of explicit
// parent-child pairs. The "not exists"
// where clauses in all the inserts
// guards against loops in the parent
// child relationships.

set new_level_number = new_level_number + 1;

insert into hierarchy
select distinct
       user,
       'location',
       a.parent_location_id,
       a.location_id,
       new_level_number
  from location a,
       hierarchy b
 where ( b.child_id = a.parent_location_id )
   and ( not exists (
       select c.parent_id
         from hierarchy c
        where c.parent_id = a.parent_location_id
          and c.child_id  = a.location_id ) )
   and ( IfNull ( a.parent_location_id, 1, 0 ) = 0 );

set previous_row_count = 0;

loop_insert:
   loop

   // Insert the next layer of explicit
   // parent-child pairs.

   set new_level_number = new_level_number + 1;

   insert into hierarchy
   select distinct
          user,
          'location',
          a.parent_location_id,
          a.location_id,
          new_level_number
     from location a,
          hierarchy b
    where ( b.child_id = a.parent_location_id )
      and ( not exists (
          select c.parent_id
            from hierarchy c
           where c.parent_id = a.parent_location_id
             and c.child_id  = a.location_id ) )
      and ( IfNull ( a.parent_location_id, 1, 0 ) = 0 );

   // Link the new children to their "grandparents"
   // which are actually their parent's parents as
   // already loaded by previous passes through this
   // loop. This works by looking for existing pairs
   // of rows a and b where a.child_id is the same
   // as b.parent_id, and then building a third row
   // using a.parent_id and b.child_id to create
   // the grandparent-grandchild link. The where
   // not exists condition excludes links that
   // have already been loaded.

   insert into hierarchy
   select distinct
          user,
          'location',
          a.parent_id,
          b.child_id,
          new_level_number
     from hierarchy a,
          hierarchy b
    where ( a.child_id = b.parent_id )
      and ( not exists (
          select c.parent_id
            from hierarchy c
           where c.parent_id = a.parent_id
             and c.child_id  = b.child_id ) );

   // Stop when no more rows are inserted.

   select count(*)
     into this_row_count
     from hierarchy;

   if this_row_count = previous_row_count then
      leave loop_insert;
   end if;

   set previous_row_count = this_row_count;

end loop loop_insert;

end`

Figure 6: Calling p_load_location_hierarchy()

declare p_load procedure for p_load_location_hierarchy
  using SQLCA;

execute p_load;
if  ( SQLCA.SQLCode <> 0 ) &
and ( SQLCA.SQLCode <> 100 ) then
   MessageBox ( "Error", "Hierarchy load failed" )
   return
end if


Breck Carter can be reached by phone at (416) 763-5200 or via email at bcarter@bcarter.com.