Hierarchical query basics
You can use hierarchical queries to step through tree-like data stored in self-referencing tables, like this one for organizations and organization structures. The ORG_STRUCTURE
table contains two foreign keys (ORG_PARENT_ID
and ORG_CHILD_ID
). They both reference the ORGANIZATION_ID
column. The top-most, or root node, contains a zero as an ORG_PARENT_ID
because organizations start numbering at 1. You could also define the root node ORG_PARENT_ID
as a null value. Bottom nodes, or leaf nodes, are those organizations that appear as ORG_CHILD_ID
but not as ORG_PARENT_ID
.
You’ll find the seeding scripts at the end of this web page. If you want to perform a hierarchical query in T-SQL, go to this blog page.
This page covers four hierarchical query techniques. It shows you how to navigate a complete tree from the top down, and order by siblings. It shows you how to navigate from a leaf, or any intermediary, node up a tree to the root node, and how to limit the depth of traversal. It also shows you how to find all leaf nodes, and use the result to navigate several trees concurrently. Lastly, it shows you how to use NOCYCLE
to identify rows that cause an ORA-01436
error, which means the tree linking relationship is broken.
A quick caveat, the START WITH
clause is technically optional. If you exclude it all nodes are root nodes and the default depth is only two. So, really it’s not optional for meaningful work.
Down the Tree
This query allows you to navigate down the tree from the root node. You navigate down the tree when the PRIOR
keyword precedes the child or dependent node. The SQL*Plus formatting commands generate the output shown.
COL org_id FORMAT A20 COL org_name FORMAT A20 SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id START WITH os.org_parent_id = 0 CONNECT BY PRIOR os.org_child_id = os.org_parent_id; |
It produces the following output:
ORG_ID ORG_NAME -------------------- -------------------- 1 One 2 Two 5 Five 11 Eleven 12 Twelve 6 Six 13 Thirteen 14 Fourteen 20 Twenty 3 Three 7 Seven 15 Fifteen 8 Eight 16 Sixteen 17 Seventeen 4 Four 9 Nine 18 Eighteen 19 Nineteen 10 Ten |
If there were an offending row in the table that didn’t have a connecting parent and child set of foreign keys, you’d raise an ORA-01436
error. It means you can’t CONNECT BY PRIOR
because a row’s values are non-conforming. You can enter a row to break the hierarchy provided the ORG_PARENT_ID
and ORG_CHILD_ID
have the same value. Then, this modified query will identify the offending row for you:
SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id START WITH os.org_parent_id = 0 CONNECT BY NOCYCLE PRIOR os.org_child_id = os.org_parent_id; |
The next query changes the output because it orders by siblings. This means that the numeric ordering of the parent and child nodes is overridden.
SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id START WITH os.org_parent_id = 0 CONNECT BY PRIOR os.org_child_id = os.org_parent_id ORDER SIBLINGS BY o.organization_name; |
The sort now represents a tree in alphabetical ordering:
ORG_ID ORG_NAME -------------------- -------------------- 1 One 4 Four 9 Nine 18 Eighteen 19 Nineteen 10 Ten 3 Three 8 Eight 17 Seventeen 16 Sixteen 7 Seven 15 Fifteen 2 Two 5 Five 11 Eleven 12 Twelve 6 Six 14 Fourteen 20 Twenty 13 Thirteen |
Up the Tree
The query switches the position of the PRIOR
keyword. It now precedes the parent node. This means that it will go up the tree. The START WITH
in this case starts with an intermediary node.
SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id START WITH os.org_child_id = 6 CONNECT BY os.org_child_id = PRIOR os.org_parent_id; |
The output is:
ORG_ID ORG_NAME -------------------- -------------------- 6 Six 2 Two 1 One |
Restricting the Depth of Search
This is another up the tree hierarchical query. It starts from the bottom most leaf node, which is at depth five from the root node. The query restricts upward navigation to three levels or the two immediate parents (or if you prefer parent and grandparent).
SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id AND LEVEL <= 3 START WITH os.org_child_id = 20 CONNECT BY os.org_child_id = PRIOR os.org_parent_id; |
The output is:
ORG_ID ORG_NAME -------------------- -------------------- 20 Twenty 14 Fourteen 6 Six |
Leaf Node Up
Traversing from a leaf node can start by inspecting which are leaf nodes first, and hard coding a value in the START WITH
clause. A better approach is to use a subquery to identify leaf nodes and then a filter in the outer WHERE
clause to limit the leaf nodes. The leaf nodes are plural in both of these solutions.
This solution works prior to Oracle 10g, and continues to work through Oracle 11g:
SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id START WITH os.org_child_id IN (SELECT os1.org_child_id FROM org_structure os1 LEFT JOIN org_structure os2 ON os2.org_parent_id = os1.org_child_id MINUS SELECT os1.org_child_id FROM org_structure os1 JOIN org_structure os2 ON os2.org_parent_id = os1.org_child_id) CONNECT BY os.org_child_id = PRIOR os.org_parent_id; |
This solution uses the CONNECT_BY_LEAF
pseudocolumn (introduced in Oracle 10g) to replace the outer join minus the join in the subquery. I tested it in Oracle 10gR2 and Oracle 11gR1 (The terminal release is near, isn’t it?). It also uses the ORDER SIBLINGS BY
clause to order the tree by the alphabetical ORGANIZATION_NAME
column values.
SELECT LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id , o.organization_name org_name FROM organization o , org_structure os WHERE o.organization_id = os.org_child_id START WITH os.org_child_id IN (SELECT org_child_id FROM org_structure WHERE CONNECT_BY_ISLEAF = 1 START WITH org_child_id = 1 CONNECT BY PRIOR org_child_id = org_parent_id) CONNECT BY os.org_child_id = PRIOR os.org_parent_id ORDER SIBLINGS BY o.organization_name; |
The snapshot of output for the latter is:
ORG_ID ORG_NAME -------------------- -------------------- 18 Eighteen 9 Nine 4 Four 1 One 11 Eleven 5 Five 2 Two 1 One 15 Fifteen 7 Seven 3 Three 1 One 19 Nineteen 9 Nine 4 Four 1 One 17 Seventeen 8 Eight 3 Three 1 One 16 Sixteen 8 Eight 3 Three 1 One 10 Ten 4 Four 1 One 13 Thirteen 6 Six 2 Two 1 One 12 Twelve 5 Five 2 Two 1 One 20 Twenty 14 Fourteen 6 Six 2 Two 1 One |
I did leave out the CONNECT_BY_ROOT
and SYS_CONNECT_BY_PATH
. If you’ve got the urge, post me a comment with an example. Otherwise, I’ll try to update this page later.
The setup code for the test sample:
BEGIN FOR i IN (SELECT NULL FROM user_tables WHERE table_name = 'ORGANIZATION') LOOP EXECUTE IMMEDIATE 'DROP TABLE organization CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT NULL FROM user_tables WHERE table_name = 'ORG_STRUCTURE') LOOP EXECUTE IMMEDIATE 'DROP TABLE org_structure CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT NULL FROM user_sequences WHERE sequence_name = 'ORGANIZATION_S1') LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE organization_s1'; END LOOP; FOR i IN (SELECT NULL FROM user_sequences WHERE sequence_name = 'ORG_STRUCTURE_S1') LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE org_structure_s1'; END LOOP; END; / CREATE TABLE ORGANIZATION ( organization_id NUMBER , organization_name VARCHAR2(10)); CREATE SEQUENCE ORGANIZATION_S1; DECLARE TYPE list IS TABLE OF VARCHAR2(10); ordinal LIST := list ('One','Two','Three','Four' ,'Five','Six','Seven','Eight' ,'Nine','Ten','Eleven','Twelve' ,'Thirteen','Fourteen','Fifteen','Sixteen' ,'Seventeen','Eighteen','Nineteen','Twenty'); BEGIN FOR i IN 1..ordinal.COUNT LOOP INSERT INTO ORGANIZATION VALUES (organization_s1.NEXTVAL,ordinal(i)); END LOOP; COMMIT; END; / CREATE TABLE org_structure ( org_structure_id NUMBER , org_parent_id NUMBER , org_child_id NUMBER ); CREATE SEQUENCE org_structure_s1; INSERT INTO org_structure VALUES (1,0,1); INSERT INTO org_structure VALUES (1,1,2); INSERT INTO org_structure VALUES (1,1,3); INSERT INTO org_structure VALUES (1,1,4); INSERT INTO org_structure VALUES (1,2,5); INSERT INTO org_structure VALUES (1,2,6); INSERT INTO org_structure VALUES (1,3,7); INSERT INTO org_structure VALUES (1,3,8); INSERT INTO org_structure VALUES (1,4,9); INSERT INTO org_structure VALUES (1,4,10); INSERT INTO org_structure VALUES (1,5,11); INSERT INTO org_structure VALUES (1,5,12); INSERT INTO org_structure VALUES (1,6,13); INSERT INTO org_structure VALUES (1,6,14); INSERT INTO org_structure VALUES (1,7,15); INSERT INTO org_structure VALUES (1,8,16); INSERT INTO org_structure VALUES (1,8,17); INSERT INTO org_structure VALUES (1,9,18); INSERT INTO org_structure VALUES (1,9,19); INSERT INTO org_structure VALUES (1,14,20); COMMIT; |
What would cause the START WITH and CONNECT BY PRIOR to break when doing joins?
Justin
8 Nov 12 at 2:43 pm
Hy am having problems getting an sql query that can give me all downlines of my SQL TABLES.Its SQL 2000
With a table ACCHART
WITH i.d (for child)
and i.d2 (for parent)
Mike
16 Jul 13 at 9:34 am
Mike, As you can tell from the post, there’s no
CONNECT BY
PRIOR syntax in Microsoft SQL Server. It’s recommended if you need to go more than couple levels deep to write a T-SQL function and return the result set as a CRL table function. I’ve a simply example of how to write a CLR table function you can reference.maclochlainn
16 Jul 13 at 6:31 pm
Hello Maclochlainn,
I need to traverse a hierarchy but would need to exclude leaf level and leaf + 1 levels.
For eg:
The query should return 1, 3, 5, 2, 2, 6, and 14 as these are the nodes who have 2 or more levels down the chain.
Please guide.
Quick Script
Thanks,
Rakesh
Rakesh
29 Apr 15 at 11:39 am
Rakesh,
From the top down only three level, right? If so, here’s the solution (having renamed your
ID
columnCHILD_ID
):It returns:
Hope that helps.
maclochlainn
30 Apr 15 at 1:49 pm
Rakesh,
I’ve a guess that you’re actually looking for a dynamic solution. The first one shows you how to navigate to the level two above the deepest depth (or level).
However, I think your intent is to capture the only the level where the first branch reaches a leaf node. If that’s the case, here’s what you want:
Hope this helps.
maclochlainn
30 Apr 15 at 4:01 pm
[…] dynamic level limiting hierarchical query […]
Leaf node queries
30 Apr 15 at 4:31 pm
Hello
I have a query regarding “Leaf Node Up”
There is a scenario where
21
is child of1
and9
is child of21
That means
9
has2
parents4
and21
The inner query is showing correct data
It shows
18
and19
twice as leafNow the query is showing
instead of
Thanks to please help in the query that shows correct data with multiple parents
Thanks
Regards
Ritesh Kumar
ritesh kumar
1 Sep 16 at 1:54 am
Rites, I’ll take a look when and see what the problem is but could you provide the exact query you used?
maclochlainn
16 Oct 16 at 8:59 pm
[…] As Kris Köhntopp commented, computer science defines a programming language as Turing Complete. As his comment qualifies and the Wikipedia page explains: “Turing completeness in declarative SQL is implemented through recursive common table expressions. Unsurprisingly, procedural extensions to SQL (PLSQL, etc.) are also Turing-complete.” While PostgreSQL introduces recursive query syntax through CTEs, it added the search and cycle feature in PostgreSQL 14. The recursive query feature has existed in the Oracle database since Oracle 8, but their documentation calls them hierarchical queries. I wrote a quick a tutorial on hierarchical queries in 2008. […]
Is SQL a Programming Language?
16 Jun 22 at 12:19 am