MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Oracle SQL Tuning Example

with one comment

Somebody wanted a post on SQL statement tuning, so here it is. SQL statement tuning requires that you understand table definitions, constraints, and indexes plus how they interact through joins before you can begin tuning. Oracle presents some of the most advanced tools for tuning SQL statements.

This example works through a demonstration of tuning one statement with multiple joins. The key elements of tuning any SQL statements is understanding the table structures and data. The key worry about tuning is that optimizing for a query can harm the effectiveness of OLTP processing by slowing the performance of INSERT and UPDATE statements. Any DML statement that adds or modifies data in tables also modifies indexes built to support queries against the table.

Oracle

You have two approaches in an Oracle database. The old way and the new way. They’re really not that different but the DBMS_XPLAN package provides a richer set of analytical data for less work.

The old way and the new way both use a single way for generating the SQL execution plan. You do that with the EXPLAIN PLAN statement.

The EXPLAIN PLAN statement is followed by a SET operator. You use the SET operator to assign a value to the STATEMENT_ID column. This is a column that uniquely identifies statements in the PLAN_TABLE. You must ensure that all STATEMENT_ID values are unique, or you’ll get incorrect results when you analyze plans. There’s no constraint that magically does this for you. After you set the statement identifier, you use the FOR keyword before the statement that you want to analyze.

An example query follows below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
EXPLAIN PLAN
SET STATEMENT_ID = '&input'
FOR
SELECT   DISTINCT
         r.rental_id
,        c.contact_id
,        tu.check_out_date AS check_out_date
,        tu.return_date AS return_date
,        3 AS created_by
,        TRUNC(SYSDATE) AS creation_date
,        3 AS last_updated_by
,        TRUNC(SYSDATE) AS last_update_date
FROM     member m INNER JOIN contact c
ON       m.member_id = c.member_id INNER JOIN transaction_upload tu
ON       c.first_name = tu.first_name
AND      NVL(c.middle_name,'x') = NVL(tu.middle_name,'x')
AND      c.last_name = tu.last_name
AND      tu.account_number = m.account_number LEFT JOIN rental r
ON       c.contact_id = r.customer_id
AND      tu.check_out_date = r.check_out_date
AND      tu.return_date = r.return_date;

After you explain a statement, you need to analyze it. Oracle uses hierarchical queries to analyze them. The following is a simple script that lets you display the execution plan by nesting the levels of the explained plan.

Here’s where you chose to follow the old or new way. Click on the link below to see the old way if you’re interested but I’d suggest you use the new way.

Oracle’s New Way

Oracle’s new way uses the DBMS_XPLAN package. In this post, you’ll examine only how to display results and check the impact of small changes that lead to improved query performance.

Assuming you entered Query1 as the &input response to the EXPLAIN PLAN statement above, then you’d call it with the following syntax:

1
2
3
4
5
6
7
8
-- Expand the width of displayed output.
SET LINESIZE 120
 
-- Query using the TABLE function that returns a user-defined object type (UDT)
-- into a SQL aggregate table, which is a fancy word for normal SQL output
-- (unfortunately, it's the technically accurate word).
SELECT *
FROM   TABLE(dbms_xplan.display(NULL,'Query1'));

The query using the DISPLAY function of the DBMS_XPLAN returns a table of data. It should generate an output stream like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
Plan hash value: 3289798709
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                    |   229 | 24274 |    50   (8)| 00:00:01 |
|   1 |  HASH UNIQUE                   |                    |   229 | 24274 |    50   (8)| 00:00:01 |
|*  2 |   HASH JOIN OUTER              |                    |   229 | 24274 |    49   (7)| 00:00:01 |
|   3 |    VIEW                        |                    |     4 |   248 |    37   (6)| 00:00:01 |
|*  4 |     HASH JOIN                  |                    |     4 |   572 |    37   (6)| 00:00:01 |
|*  5 |      HASH JOIN                 |                    |    15 |  1230 |     7  (15)| 00:00:01 |
|   6 |       TABLE ACCESS FULL        | MEMBER             |     9 |   180 |     3   (0)| 00:00:01 |
|   7 |       TABLE ACCESS FULL        | CONTACT            |    15 |   930 |     3   (0)| 00:00:01 |
|   8 |      EXTERNAL TABLE ACCESS FULL| TRANSACTION_UPLOAD |  8168 |   486K|    30   (4)| 00:00:01 |
|   9 |    TABLE ACCESS FULL           | RENTAL             |  4689 |   201K|    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND
              "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+))
   4 - access("C"."FIRST_NAME"="TU"."FIRST_NAME" AND
              NVL("C"."MIDDLE_NAME",'x')=NVL("TU"."MIDDLE_NAME",'x') AND "C"."LAST_NAME"="TU"."LAST_NAME"
              AND "TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER")
   5 - access("C"."MEMBER_ID"="M"."MEMBER_ID")
 
Note
-----
   - dynamic sampling used for this statement
 
30 rows selected.

The first thing to do is move the data immediately from an external table into a production table. You do that with the following command:

1
2
-- Create a new table from an existing one.
CREATE TABLE transaction_uploaded AS SELECT * FROM transaction_upload;

Truncating the PLAN_TABLE table, is the easiest way to start over by reusing the statement identifiers, like Query1. Then, you can edit your EXPLAIN PLAN script and change TRANSACTION_UPLOAD table reference to TRANSACTION_UPLOADED. Now, the query using the DISPLAY function of the DBMS_XPLAN returns a reduction in cost from 50 to 34.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 3624831533
 
--------------------------------------------------------------------------------------------
| Id  | Operation              | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                   |   400 | 42400 |    34   (6)| 00:00:01 |
|   1 |  HASH UNIQUE           |                   |   400 | 42400 |    34   (6)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |                   |   400 | 42400 |    33   (4)| 00:00:01 |
|   3 |    VIEW                |                   |     7 |   434 |    22   (5)| 00:00:01 |
|   4 |     NESTED LOOPS       |                   |     7 |  1001 |    22   (5)| 00:00:01 |
|*  5 |      HASH JOIN         |                   |    15 |  1230 |     7  (15)| 00:00:01 |
|   6 |       TABLE ACCESS FULL| MEMBER            |     9 |   180 |     3   (0)| 00:00:01 |
|   7 |       TABLE ACCESS FULL| CONTACT           |    15 |   930 |     3   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN  | IMPORT_DATE_RANGE |     1 |    61 |     1   (0)| 00:00:01 |
|   9 |    TABLE ACCESS FULL   | RENTAL            |  4689 |   201K|    11   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND
              "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND
              "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+))
   5 - access("C"."MEMBER_ID"="M"."MEMBER_ID")
   8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND
              "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND
              NVL("C"."MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x'))
 
Note
-----
   - dynamic sampling used for this statement
 
31 rows selected.

A downside or quirk of thought, I’m never quite sure but folks look to the bottom not the middle first. Maybe in this case you tackle it first because it’s the attractive inefficiency due to the number of rows returned by a full table scan (most expensive).

You add a unique index that maps to the natural key for the RENTAL table, like this:

1
2
CREATE UNIQUE INDEX natural_key_rental
ON rental (rental_id, customer_id, check_out_date, return_date);

You run the EXPLAIN PLAN again, using Query2, and then analyze it. You’ll see that the query uses an index fast full scan and that the cost decreases by 2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 3402838417
 
---------------------------------------------------------------------------------------------
| Id  | Operation              | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                    |   400 | 42400 |    32   (7)| 00:00:01 |
|   1 |  HASH UNIQUE           |                    |   400 | 42400 |    32   (7)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |                    |   400 | 42400 |    31   (4)| 00:00:01 |
|   3 |    VIEW                |                    |     7 |   434 |    22   (5)| 00:00:01 |
|   4 |     NESTED LOOPS       |                    |     7 |  1001 |    22   (5)| 00:00:01 |
|*  5 |      HASH JOIN         |                    |    15 |  1230 |     7  (15)| 00:00:01 |
|   6 |       TABLE ACCESS FULL| MEMBER             |     9 |   180 |     3   (0)| 00:00:01 |
|   7 |       TABLE ACCESS FULL| CONTACT            |    15 |   930 |     3   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN  | IMPORT_DATE_RANGE  |     1 |    61 |     1   (0)| 00:00:01 |
|   9 |    INDEX FAST FULL SCAN| NATURAL_KEY_RENTAL |  4689 |   201K|     9   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND
              "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND
              "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+))
   5 - access("C"."MEMBER_ID"="M"."MEMBER_ID")
   8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND
              "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND
              NVL("C"."MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x'))
 
Note
-----
   - dynamic sampling used for this statement
 
31 rows selected.

You add a unique index that maps the surrogate and natural key for the MEMBER table, like this:

1
2
CREATE UNIQUE INDEX member_account
ON member (member_id, account_number);

You run the EXPLAIN PLAN again, using Query3, and then analyze it. You’ll see that the query uses an index range scan instead of a full table access of the MEMBER table. This decreases the cost by 2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 656512492
 
---------------------------------------------------------------------------------------------
| Id  | Operation              | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                    |   400 | 42400 |    30   (7)| 00:00:01 |
|   1 |  HASH UNIQUE           |                    |   400 | 42400 |    30   (7)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |                    |   400 | 42400 |    29   (4)| 00:00:01 |
|   3 |    VIEW                |                    |     7 |   434 |    19   (0)| 00:00:01 |
|   4 |     NESTED LOOPS       |                    |     7 |  1001 |    19   (0)| 00:00:01 |
|   5 |      NESTED LOOPS      |                    |    15 |  1230 |     4   (0)| 00:00:01 |
|   6 |       TABLE ACCESS FULL| CONTACT            |    15 |   930 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN | MEMBER_ACCOUNT     |     1 |    20 |     1   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN  | IMPORT_DATE_RANGE  |     1 |    61 |     1   (0)| 00:00:01 |
|   9 |    INDEX FAST FULL SCAN| NATURAL_KEY_RENTAL |  4689 |   201K|     9   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND
              "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND
              "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+))
   7 - access("C"."MEMBER_ID"="M"."MEMBER_ID")
   8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND
              "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND
              NVL("C"."MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x'))
 
Note
-----
   - dynamic sampling used for this statement
 
31 rows selected.

You add a unique index that maps the surrogate, foreign key to the MEMBER table, and the natural key for the CONTACT table, like this:

1
2
CREATE UNIQUE INDEX contact_member
ON contact (contact_id, member_id, last_name, first_name, NVL(middle_name,'x'));

You run the EXPLAIN PLAN again, using Query3, and then analyze it. You’ll see that the query uses an index full scan instead of a full table access of the CONTACT table. This decreases the cost by another 2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 1185696375
 
---------------------------------------------------------------------------------------------
| Id  | Operation              | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                    |   400 | 42400 |    28   (8)| 00:00:01 |
|   1 |  HASH UNIQUE           |                    |   400 | 42400 |    28   (8)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |                    |   400 | 42400 |    27   (4)| 00:00:01 |
|   3 |    VIEW                |                    |     7 |   434 |    17   (0)| 00:00:01 |
|   4 |     NESTED LOOPS       |                    |     7 |  1001 |    17   (0)| 00:00:01 |
|   5 |      NESTED LOOPS      |                    |    15 |  1230 |     2   (0)| 00:00:01 |
|   6 |       INDEX FULL SCAN  | CONTACT_MEMBER     |    15 |   930 |     1   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN | MEMBER_ACCOUNT     |     1 |    20 |     1   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN  | IMPORT_DATE_RANGE  |     1 |    61 |     1   (0)| 00:00:01 |
|   9 |    INDEX FAST FULL SCAN| NATURAL_KEY_RENTAL |  4689 |   201K|     9   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND
              "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND
              "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+))
   7 - access("C"."MEMBER_ID"="M"."MEMBER_ID")
   8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND
              "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND
              NVL("MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x'))
 
Note
-----
   - dynamic sampling used for this statement
 
31 rows selected.

The question evolves after you tune the first query. Then, you must check what impacts may exist on other queries, like the statement that you use to merge data into the TRANSACTION table. You may have to suppress indexes in a subsequent query, or after you tune it, return and suppress an index here. You suppress an index by adding a '0' to a NUMBER or DATE column, and by concatenating a null string to a character data type.

Examples of Suppressing Indexes

The following is an example of suppressing an index built on a string. It extends the tuning statement earlier in the blog post, and references line #17. The concatenation of a null string suppresses the index.

17
AND      c.last_name = tu.last_name||''

The following is an example of suppressing an index built on a date. It extends the tuning statement earlier in the blog post, and references line #20. Adding a zero leaves the value intact and suppresses the index.

20
AND      tu.check_out_date = r.check_out_date+0

Written by maclochlainn

November 10th, 2011 at 12:14 am

Posted in Oracle,Oracle XE,sql