MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Hierarchical query basics

with 10 comments

You can use hierarchical queries to step through tree-like data stored in self-referencing tables, like this one for organizations and organization structures. The ORG_STRUCTURE table contains two foreign keys (ORG_PARENT_ID and ORG_CHILD_ID). They both reference the ORGANIZATION_ID column. The top-most, or root node, contains a zero as an ORG_PARENT_ID because organizations start numbering at 1. You could also define the root node ORG_PARENT_ID as a null value. Bottom nodes, or leaf nodes, are those organizations that appear as ORG_CHILD_ID but not as ORG_PARENT_ID.

Flexible Organization Hierarchy

Flexible Organization Hierarchy

You’ll find the seeding scripts at the end of this web page. If you want to perform a hierarchical query in T-SQL, go to this blog page.

This page covers four hierarchical query techniques. It shows you how to navigate a complete tree from the top down, and order by siblings. It shows you how to navigate from a leaf, or any intermediary, node up a tree to the root node, and how to limit the depth of traversal. It also shows you how to find all leaf nodes, and use the result to navigate several trees concurrently. Lastly, it shows you how to use NOCYCLE to identify rows that cause an ORA-01436 error, which means the tree linking relationship is broken.

A quick caveat, the START WITH clause is technically optional. If you exclude it all nodes are root nodes and the default depth is only two. So, really it’s not optional for meaningful work.

Down the Tree

This query allows you to navigate down the tree from the root node. You navigate down the tree when the PRIOR keyword precedes the child or dependent node. The SQL*Plus formatting commands generate the output shown.

COL org_id FORMAT A20
COL org_name FORMAT A20
 
SELECT   LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,        o.organization_name org_name
FROM     organization o
,        org_structure os
WHERE    o.organization_id = os.org_child_id
START
WITH     os.org_parent_id = 0
CONNECT
BY PRIOR os.org_child_id = os.org_parent_id;

It produces the following output:

ORG_ID               ORG_NAME
-------------------- --------------------
1                    One
  2                  Two
    5                Five
      11             Eleven
      12             Twelve
    6                Six
      13             Thirteen
      14             Fourteen
        20           Twenty
  3                  Three
    7                Seven
      15             Fifteen
    8                Eight
      16             Sixteen
      17             Seventeen
  4                  Four
    9                Nine
      18             Eighteen
      19             Nineteen
    10               Ten

If there were an offending row in the table that didn’t have a connecting parent and child set of foreign keys, you’d raise an ORA-01436 error. It means you can’t CONNECT BY PRIOR because a row’s values are non-conforming. You can enter a row to break the hierarchy provided the ORG_PARENT_ID and ORG_CHILD_ID have the same value. Then, this modified query will identify the offending row for you:

SELECT        LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,             o.organization_name org_name
FROM          organization o
,             org_structure os
WHERE         o.organization_id = os.org_child_id
START WITH    os.org_parent_id = 0
CONNECT BY
NOCYCLE PRIOR os.org_child_id = os.org_parent_id;

The next query changes the output because it orders by siblings. This means that the numeric ordering of the parent and child nodes is overridden.

SELECT   LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,        o.organization_name org_name
FROM     organization o
,        org_structure os
WHERE    o.organization_id = os.org_child_id
START
WITH     os.org_parent_id = 0
CONNECT
BY PRIOR os.org_child_id = os.org_parent_id
ORDER
SIBLINGS
BY       o.organization_name;

The sort now represents a tree in alphabetical ordering:

ORG_ID               ORG_NAME
-------------------- --------------------
1                    One
  4                  Four
    9                Nine
      18             Eighteen
      19             Nineteen
    10               Ten
  3                  Three
    8                Eight
      17             Seventeen
      16             Sixteen
    7                Seven
      15             Fifteen
  2                  Two
    5                Five
      11             Eleven
      12             Twelve
    6                Six
      14             Fourteen
        20           Twenty
      13             Thirteen

Up the Tree

The query switches the position of the PRIOR keyword. It now precedes the parent node. This means that it will go up the tree. The START WITH in this case starts with an intermediary node.

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
START
WITH    os.org_child_id = 6
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id;

The output is:

ORG_ID               ORG_NAME
-------------------- --------------------
6                    Six
  2                  Two
    1                One

Restricting the Depth of Search

This is another up the tree hierarchical query. It starts from the bottom most leaf node, which is at depth five from the root node. The query restricts upward navigation to three levels or the two immediate parents (or if you prefer parent and grandparent).

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
AND     LEVEL <= 3
START
WITH    os.org_child_id = 20
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id;

The output is:

ORG_ID               ORG_NAME
-------------------- --------------------
20                   Twenty
  14                 Fourteen
    6                Six

Leaf Node Up

Traversing from a leaf node can start by inspecting which are leaf nodes first, and hard coding a value in the START WITH clause. A better approach is to use a subquery to identify leaf nodes and then a filter in the outer WHERE clause to limit the leaf nodes. The leaf nodes are plural in both of these solutions.

This solution works prior to Oracle 10g, and continues to work through Oracle 11g:

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
START
WITH    os.org_child_id IN (SELECT os1.org_child_id
FROM    org_structure os1 LEFT JOIN org_structure os2
ON      os2.org_parent_id = os1.org_child_id
MINUS
SELECT  os1.org_child_id
FROM    org_structure os1 JOIN org_structure os2
ON      os2.org_parent_id = os1.org_child_id)
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id;

This solution uses the CONNECT_BY_LEAF pseudocolumn (introduced in Oracle 10g) to replace the outer join minus the join in the subquery. I tested it in Oracle 10gR2 and Oracle 11gR1 (The terminal release is near, isn’t it?). It also uses the ORDER SIBLINGS BY clause to order the tree by the alphabetical ORGANIZATION_NAME column values.

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
START
WITH    os.org_child_id IN (SELECT   org_child_id
                            FROM     org_structure
                            WHERE    CONNECT_BY_ISLEAF  = 1
                            START
                            WITH     org_child_id = 1
                            CONNECT
                            BY PRIOR org_child_id = org_parent_id)
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id
ORDER
SIBLINGS
BY      o.organization_name;

The snapshot of output for the latter is:

ORG_ID               ORG_NAME
-------------------- --------------------
18                   Eighteen
  9                  Nine
    4                Four
      1              One
11                   Eleven
  5                  Five
    2                Two
      1              One
15                   Fifteen
  7                  Seven
    3                Three
      1              One
19                   Nineteen
  9                  Nine
    4                Four
      1              One
17                   Seventeen
  8                  Eight
    3                Three
      1              One
16                   Sixteen
  8                  Eight
    3                Three
      1              One
10                   Ten
  4                  Four
    1                One
13                   Thirteen
  6                  Six
    2                Two
      1              One
12                   Twelve
  5                  Five
    2                Two
      1              One
20                   Twenty
  14                 Fourteen
    6                Six
      2              Two
        1            One

I did leave out the CONNECT_BY_ROOT and SYS_CONNECT_BY_PATH. If you’ve got the urge, post me a comment with an example. Otherwise, I’ll try to update this page later.

The setup code for the test sample:

BEGIN
  FOR i IN (SELECT NULL
            FROM user_tables
            WHERE table_name = 'ORGANIZATION') LOOP
    EXECUTE IMMEDIATE 'DROP TABLE organization CASCADE CONSTRAINTS';
  END LOOP;
  FOR i IN (SELECT NULL
            FROM user_tables
            WHERE table_name = 'ORG_STRUCTURE') LOOP
    EXECUTE IMMEDIATE 'DROP TABLE org_structure CASCADE CONSTRAINTS';
  END LOOP;
  FOR i IN (SELECT NULL
            FROM user_sequences
            WHERE sequence_name = 'ORGANIZATION_S1') LOOP
    EXECUTE IMMEDIATE 'DROP SEQUENCE organization_s1';
  END LOOP;
  FOR i IN (SELECT NULL
            FROM user_sequences
            WHERE sequence_name = 'ORG_STRUCTURE_S1') LOOP
    EXECUTE IMMEDIATE 'DROP SEQUENCE org_structure_s1';
  END LOOP;
END;
/
 
CREATE TABLE ORGANIZATION
( organization_id NUMBER
, organization_name VARCHAR2(10));
 
CREATE SEQUENCE ORGANIZATION_S1;
 
DECLARE
  TYPE list IS TABLE OF VARCHAR2(10);
  ordinal LIST := list ('One','Two','Three','Four'
                       ,'Five','Six','Seven','Eight'
                       ,'Nine','Ten','Eleven','Twelve'
                       ,'Thirteen','Fourteen','Fifteen','Sixteen'
                       ,'Seventeen','Eighteen','Nineteen','Twenty');
  BEGIN
    FOR i IN 1..ordinal.COUNT LOOP
      INSERT INTO ORGANIZATION VALUES (organization_s1.NEXTVAL,ordinal(i));
  END LOOP;
  COMMIT;
END;
/
 
CREATE TABLE org_structure
( org_structure_id NUMBER
, org_parent_id NUMBER
, org_child_id NUMBER );
 
CREATE SEQUENCE org_structure_s1;
 
INSERT INTO org_structure VALUES (1,0,1);
INSERT INTO org_structure VALUES (1,1,2);
INSERT INTO org_structure VALUES (1,1,3);
INSERT INTO org_structure VALUES (1,1,4);
INSERT INTO org_structure VALUES (1,2,5);
INSERT INTO org_structure VALUES (1,2,6);
INSERT INTO org_structure VALUES (1,3,7);
INSERT INTO org_structure VALUES (1,3,8);
INSERT INTO org_structure VALUES (1,4,9);
INSERT INTO org_structure VALUES (1,4,10);
INSERT INTO org_structure VALUES (1,5,11);
INSERT INTO org_structure VALUES (1,5,12);
INSERT INTO org_structure VALUES (1,6,13);
INSERT INTO org_structure VALUES (1,6,14);
INSERT INTO org_structure VALUES (1,7,15);
INSERT INTO org_structure VALUES (1,8,16);
INSERT INTO org_structure VALUES (1,8,17);
INSERT INTO org_structure VALUES (1,9,18);
INSERT INTO org_structure VALUES (1,9,19);
INSERT INTO org_structure VALUES (1,14,20);
 
COMMIT;

Written by maclochlainn

July 16th, 2008 at 2:36 am

Posted in Uncategorized

10 Responses to 'Hierarchical query basics'

Subscribe to comments with RSS or TrackBack to 'Hierarchical query basics'.

  1. What would cause the START WITH and CONNECT BY PRIOR to break when doing joins?

    Justin

    8 Nov 12 at 2:43 pm

  2. Hy am having problems getting an sql query that can give me all downlines of my SQL TABLES.Its SQL 2000
    With a table ACCHART
    WITH i.d (for child)
    and i.d2 (for parent)

    Mike

    16 Jul 13 at 9:34 am

  3. Mike, As you can tell from the post, there’s no CONNECT BY PRIOR syntax in Microsoft SQL Server. It’s recommended if you need to go more than couple levels deep to write a T-SQL function and return the result set as a CRL table function. I’ve a simply example of how to write a CLR table function you can reference.

    maclochlainn

    16 Jul 13 at 6:31 pm

  4. Hello Maclochlainn,

    I need to traverse a hierarchy but would need to exclude leaf level and leaf + 1 levels.

    For eg:

                  1                                   2
            +-------------+                       +-----------+
            |             |                       |           |      
            3             5                       4           6
        +---------+    +-----------+           +-----+    +------+
        |         |    |           |           |     |    |      |
        7         9    11          13          8     10   12     14
    +-----+   +-----+  +--+    +-------+                       +-----+ 
    |     |   |     |     |    |       |                       |     |
    15    17  19    21    23   27      29                     16     18
                                                                     +---+
                                                                         |
                                                                         20

    The query should return 1, 3, 5, 2, 2, 6, and 14 as these are the nodes who have 2 or more levels down the chain.

    Please guide.

    Quick Script

    CREATE TABLE test_temp(id NUMBER(10), parent_id NUMBER(10));
     
    INSERT INTO test_temp(id, parent_id) VALUES(1, NULL);
    INSERT INTO test_temp(id, parent_id) VALUES(3, 1);
    INSERT INTO test_temp(id, parent_id) VALUES(5, 1);
    INSERT INTO test_temp(id, parent_id) VALUES(7, 3);
    INSERT INTO test_temp(id, parent_id) VALUES(9, 3);
    INSERT INTO test_temp(id, parent_id) VALUES(11, 5);
    INSERT INTO test_temp(id, parent_id) VALUES(13, 5);
    INSERT INTO test_temp(id, parent_id) VALUES(15, 7);
    INSERT INTO test_temp(id, parent_id) VALUES(17, 7);
    INSERT INTO test_temp(id, parent_id) VALUES(19, 9);
    INSERT INTO test_temp(id, parent_id) VALUES(21, 9);
    INSERT INTO test_temp(id, parent_id) VALUES(23, 11);
    INSERT INTO test_temp(id, parent_id) VALUES(27, 13);
    INSERT INTO test_temp(id, parent_id) VALUES(29, 13);
     
    INSERT INTO test_temp(id, parent_id) VALUES(2, NULL);
    INSERT INTO test_temp(id, parent_id) VALUES(4, 2);
    INSERT INTO test_temp(id, parent_id) VALUES(6, 2);
    INSERT INTO test_temp(id, parent_id) VALUES(8, 4);
    INSERT INTO test_temp(id, parent_id) VALUES(10, 4);
    INSERT INTO test_temp(id, parent_id) VALUES(12, 6);
    INSERT INTO test_temp(id, parent_id) VALUES(14, 6);
    INSERT INTO test_temp(id, parent_id) VALUES(16, 14);
    INSERT INTO test_temp(id, parent_id) VALUES(18, 14);
    INSERT INTO test_temp(id, parent_id) VALUES(20, 18);

    Thanks,
    Rakesh

    Rakesh

    29 Apr 15 at 11:39 am

  5. Rakesh,

    From the top down only three level, right? If so, here’s the solution (having renamed your ID column CHILD_ID):

    SELECT   LPAD(' ', 2*(LEVEL - 1)) || tt.child_id AS child_id
    FROM     test_temp tt
    WHERE    LEVEL <= 3
    START
    WITH     tt.parent_id IS NULL
    CONNECT
    BY PRIOR tt.child_id = tt.parent_id;

    It returns:

    CHILD_ID
    --------
    1
      3
        7
        9
      5
        11
        13
    2
      4
        8
        10
      6
        12
        14
     
    14 rows selected.

    Hope that helps.

    maclochlainn

    30 Apr 15 at 1:49 pm

  6. Rakesh,

    I’ve a guess that you’re actually looking for a dynamic solution. The first one shows you how to navigate to the level two above the deepest depth (or level).

    SELECT   LPAD(' ', 2*(LEVEL - 1)) || tt.child_id AS child_id
    FROM     test_temp tt
    WHERE    LEVEL <= (SELECT   MAX(LEVEL) - 2
                       FROM     test_temp tt
                       START
                       WITH     tt.parent_id IS NULL
                       CONNECT
                       BY PRIOR tt.child_id = tt.parent_id)
    START
    WITH     tt.parent_id IS NULL
    CONNECT
    BY PRIOR tt.child_id = tt.parent_id;

    However, I think your intent is to capture the only the level where the first branch reaches a leaf node. If that’s the case, here’s what you want:

    SELECT   LPAD(' ', 2*(LEVEL - 1)) || tt.child_id AS child_id
    FROM     test_temp tt
    WHERE    LEVEL <= (SELECT   MIN(LEVEL)
                       FROM     test_temp tt
                       WHERE    CONNECT_BY_ISLEAF  = 1
                       START
                       WITH     tt.parent_id IS NULL 
                       CONNECT
                       BY PRIOR tt.child_id = tt.parent_id)
    START
    WITH     tt.parent_id IS NULL
    CONNECT
    BY PRIOR tt.child_id = tt.parent_id;

    Hope this helps.

    maclochlainn

    30 Apr 15 at 4:01 pm

  7. […] dynamic level limiting hierarchical query […]

    Leaf node queries

    30 Apr 15 at 4:31 pm

  8. Hello

    I have a query regarding “Leaf Node Up”

    There is a scenario where 21 is child of 1 and 9 is child of 21

    That means 9 has 2 parents 4 and 21

    The inner query is showing correct data
    It shows 18 and 19 twice as leaf

    Now the query is showing

    1 -> 21 -> 9

    instead of

    1 -> 21 -> 9 -> 18
    1 -> 21 -> 9 -> 19

    Thanks to please help in the query that shows correct data with multiple parents

    Thanks
    Regards
    Ritesh Kumar

    ritesh kumar

    1 Sep 16 at 1:54 am

  9. Rites, I’ll take a look when and see what the problem is but could you provide the exact query you used?

    maclochlainn

    16 Oct 16 at 8:59 pm

  10. […] As Kris Köhntopp commented, computer science defines a programming language as Turing Complete. As his comment qualifies and the Wikipedia page explains: “Turing completeness in declarative SQL is implemented through recursive common table expressions. Unsurprisingly, procedural extensions to SQL (PLSQL, etc.) are also Turing-complete.” While PostgreSQL introduces recursive query syntax through CTEs, it added the search and cycle feature in PostgreSQL 14. The recursive query feature has existed in the Oracle database since Oracle 8, but their documentation calls them hierarchical queries. I wrote a quick a tutorial on hierarchical queries in 2008. […]

Leave a Reply