Recursive WITH, part II: Hierarchical queries

Natalka Roshak's picture
articles: 

In my last post, I looked at using recursive WITH to implement simple recursive algorithms in SQL. One very common use of recursion is to traverse hierarchical data. I recently wrote a series of posts on hierarchical data, using Oracle’s CONNECT BY syntax and a fun example. In this post, I’ll be revisiting the same data using recursive WITH.

There are dozens of examples of hierarchical data, from the EMP table to the Windows Registry to binary trees, but I went with something more fun: the skeleton from the old song “Dem Dry Bones”.

Quote:
Toe bone connected to the foot bone
Foot bone connected to the heel bone
Heel bone connected to the ankle bone
Ankle bone connected to the shin bone
Shin bone connected to the knee bone
Knee bone connected to the thigh bone
Thigh bone connected to the hip bone
Hip bone connected to the back bone
Back bone connected to the shoulder bone
Shoulder bone connected to the neck bone
Neck bone connected to the head bone

Since every bone has only one ancestor, and there is a root bone with no ancestor, this is hierarchical data and we can stick it in a table and query it.

SELECT * FROM skeleton;
BONE                                     CONNECTED_TO_THE
---------------------------------------- ----------------------------------------
shoulder                                 neck
back                                     shoulder
hip                                      back
thigh                                    hip
knee                                     thigh
leg                                      knee
foot                                     heel
head
neck                                     head
toe                                      foot
arm                                      shoulder
wrist                                    arm
ankle                                    leg
heel                                     ankle
finger                                   wrist
a rib                                    back
b rib                                    back
c rib                                    back

You can see that I added some ribs and an arm to make the skeleton more complete!

Using Oracle’s CONNECT BY syntax:

SQL> col bone FOR a10
SQL> col connected_to_the FOR a9
SQL> col level FOR 99
SQL> col bone_tree FOR a27
SQL> col path FOR a65
 
SELECT bone, connected_to_the, level, 
lpad(' ',2*level, ' ') || bone AS bone_tree , 
ltrim(sys_connect_by_path(bone,'>'),'>') AS path
FROM skeleton
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the 
ORDER siblings BY 1

BONE       CONNECTED LEVEL BONE_TREE                   PATH
---------- --------- ----- --------------------------- -----------------------------------------------------------------
head                     1   head                      head
neck       head          2     neck                    head>neck
shoulder   neck          3       shoulder              head>neck>shoulder
arm        shoulder      4         arm                 head>neck>shoulder>arm
wrist      arm           5           wrist             head>neck>shoulder>arm>wrist
finger     wrist         6             finger          head>neck>shoulder>arm>wrist>finger
back       shoulder      4         back                head>neck>shoulder>back
a rib      back          5           a rib             head>neck>shoulder>back>a rib
b rib      back          5           b rib             head>neck>shoulder>back>b rib
c rib      back          5           c rib             head>neck>shoulder>back>c rib
hip        back          5           hip               head>neck>shoulder>back>hip
thigh      hip           6             thigh           head>neck>shoulder>back>hip>thigh
knee       thigh         7               knee          head>neck>shoulder>back>hip>thigh>knee
leg        knee          8                 leg         head>neck>shoulder>back>hip>thigh>knee>leg
ankle      leg           9                   ankle     head>neck>shoulder>back>hip>thigh>knee>leg>ankle
heel       ankle        10                     heel    head>neck>shoulder>back>hip>thigh>knee>leg>ankle>heel
foot       heel         11                       foot  head>neck>shoulder>back>hip>thigh>knee>leg>ankle>heel>foot
toe        foot         12                         toe head>neck>shoulder>back>hip>thigh>knee>leg>ankle>heel>foot>toe

The above CONNECT BY query uses the LEVEL pseudocolumn and the SYS_CONNECT_BY_PATH function. With recursive WITH, there’s no need for these built-ins because these values fall naturally out of the recursion.

Let’s start with the basic hierarchical query rewritten in recursive WITH.
The hierarchical relationship in our table is:
Parent(row.bone) = row.connected_to_the

WITH skellarchy (bone, parent) AS
 ( SELECT bone, connected_to_the FROM skeleton 
   WHERE bone = 'head'                         -- Start with the root
 UNION ALL
   SELECT s.bone, s.connected_to_the 
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           -- Parent(row.bone) = row.connected_to_the
 )
SELECT * FROM skellarchy;

BONE       PARENT
---------- ----------------------------------------
head
neck       head
shoulder   neck
back       shoulder
arm        shoulder
hip        back
wrist      arm
a rib      back
b rib      back
c rib      back
thigh      hip
finger     wrist
knee       thigh
leg        knee
ankle      leg
heel       ankle
foot       heel
toe        foot

Because we built up the SKELLARCHY table recursively, it’s easy to make an equivalent to the LEVEL pseudocolumn; it falls right out of the recursion:

WITH skellarchy (bone, parent, the_level) AS
 ( SELECT bone, connected_to_the, 0 FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SELECT * FROM skellarchy;

BONE       PARENT      THE_LEVEL
---------- ---------- ----------
head                           0
neck       head                1
shoulder   neck                2
back       shoulder            3
arm        shoulder            3
hip        back                4
wrist      arm                 4
a rib      back                4
b rib      back                4
c rib      back                4
thigh      hip                 5
finger     wrist               5
knee       thigh               6
leg        knee                7
ankle      leg                 8
heel       ankle               9
foot       heel               10
toe        foot               11

and it’s also easy to build up a path from root to the current node like the “SYS_CONNECT_BY_PATH” function does for CONNECT BY queries:

WITH skellarchy (bone, parent, the_level, the_path) AS
 ( SELECT bone, connected_to_the, 0, CAST(bone AS varchar2(4000)) FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1, r.the_path || '->' || s.bone
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SELECT * FROM skellarchy;

BONE       PARENT     THE_LEVEL THE_PATH
---------- ---------- --------- --------------------------------------------------------------------------------
head                          0 head
neck       head               1 head->neck
shoulder   neck               2 head->neck->shoulder
back       shoulder           3 head->neck->shoulder->back
arm        shoulder           3 head->neck->shoulder->arm
hip        back               4 head->neck->shoulder->back->hip
wrist      arm                4 head->neck->shoulder->arm->wrist
a rib      back               4 head->neck->shoulder->back->a rib
b rib      back               4 head->neck->shoulder->back->b rib
c rib      back               4 head->neck->shoulder->back->c rib
thigh      hip                5 head->neck->shoulder->back->hip->thigh
finger     wrist              5 head->neck->shoulder->arm->wrist->finger
knee       thigh              6 head->neck->shoulder->back->hip->thigh->knee
leg        knee               7 head->neck->shoulder->back->hip->thigh->knee->leg
ankle      leg                8 head->neck->shoulder->back->hip->thigh->knee->leg->ankle
heel       ankle              9 head->neck->shoulder->back->hip->thigh->knee->leg->ankle->heel
foot       heel              10 head->neck->shoulder->back->hip->thigh->knee->leg->ankle->heel->foot
toe        foot              11 head->neck->shoulder->back->hip->thigh->knee->leg->ankle->heel->foot->toe

and we can use our generated the_level column to make a nice display just as we used the level pseudocolumn with CONNECT BY:

WITH skellarchy (bone, parent, the_level) AS
 ( SELECT bone, connected_to_the, 0  FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree FROM skellarchy;

BONE_TREE
---------------------------
head
  neck
    shoulder
      back
      arm
        hip
        wrist
        a rib
        b rib
        c rib
          thigh
          finger
            knee
              leg
                ankle
                  heel
                    foot
                      toe

Now, the bones are coming out in a bit of a funny order for a skeleton. Instead of this:

    shoulder
      back
      arm
        hip
        wrist
        a rib
        b rib
        c rib
          thigh
          finger

I want to see this:

    shoulder
      arm
        wrist
          finger
      back
        a rib
        b rib
        c rib
        hip
          thigh

The rows are coming out in BREADTH FIRST ordering – meaning all siblings of ‘shoulder’ are printed before any children of ‘shoulder’. But I want to see them in DEPTH FIRST: going from shoulder to finger before we start on the backbone.

WITH skellarchy (bone, parent, the_level) AS
 ( SELECT bone, connected_to_the, 0  FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SEARCH DEPTH FIRST BY bone SET bone_order
SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree FROM skellarchy
ORDER BY bone_order;

BONE_TREE
---------------------------
head
  neck
    shoulder
      arm
        wrist
          finger
      back
        a rib
        b rib
        c rib
        hip
          thigh
            knee
              leg
                ankle
                  heel
                    foot
                      toe

And now the result looks more like a proper skeleton.

Now on to cycles. A cycle is a loop in the hierarchical data: a row is its own ancestor. To put a cycle in the example data, I made the skeleton bend over and connect the head to the toe:

UPDATE skeleton SET connected_to_the='toe' WHERE bone='head';

And now if we try to run the query:

ERROR at line 2:
ORA-32044: cycle detected while executing recursive WITH query

With the CONNECT BY syntax, we can use CONNECT BY NOCYCLE to run a query even when cycles exist, and the pseudocolumn CONNECT_BY_IS_CYCLE to help detect cycles. For recursive WITH, Oracle provides a CYCLE clause, which is a bit more powerful as it allows us to name the column which is cycling.

WITH skellarchy (bone, parent, the_level) AS
 ( SELECT bone, connected_to_the, 0  FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SEARCH DEPTH FIRST BY bone SET bone_order
CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N'
SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree, is_a_cycle FROM skellarchy
--where is_a_cycle='N'
ORDER BY bone_order;

BONE_TREE                                                    I
------------------------------------------------------------ -
head                                                         N
  neck                                                       N
    shoulder                                                 N
      arm                                                    N
        wrist                                                N
          finger                                             N
      back                                                   N
        a rib                                                N
        b rib                                                N
        c rib                                                N
        hip                                                  N
          thigh                                              N
            knee                                             N
              leg                                            N
                ankle                                        N
                  heel                                       N
                    foot                                     N
                      toe                                    N
                        head                                 Y

The query runs until the first cycle is detected, then stops.

The CONNECT BY syntax does provide a nice pseudocolumn, CONNECT_BY_ISLEAF, which is 1 when a row has no further children, 0 otherwise. In my next post, I’ll look at emulating this pseudocolumn with recursive WITH.


Republished with permission. Original URL: http://rdbms-insight.com/wp/?p=103