Oracle SQL Tuning Example
Somebody wanted a post on SQL statement tuning, so here it is. SQL statement tuning requires that you understand table definitions, constraints, and indexes plus how they interact through joins before you can begin tuning. Oracle presents some of the most advanced tools for tuning SQL statements.
This example works through a demonstration of tuning one statement with multiple joins. The key elements of tuning any SQL statements is understanding the table structures and data. The key worry about tuning is that optimizing for a query can harm the effectiveness of OLTP processing by slowing the performance of INSERT
and UPDATE
statements. Any DML statement that adds or modifies data in tables also modifies indexes built to support queries against the table.
Oracle
You have two approaches in an Oracle database. The old way and the new way. They’re really not that different but the DBMS_XPLAN
package provides a richer set of analytical data for less work.
The old way and the new way both use a single way for generating the SQL execution plan. You do that with the EXPLAIN PLAN
statement.
The EXPLAIN PLAN
statement is followed by a SET
operator. You use the SET
operator to assign a value to the STATEMENT_ID
column. This is a column that uniquely identifies statements in the PLAN_TABLE
. You must ensure that all STATEMENT_ID
values are unique, or you’ll get incorrect results when you analyze plans. There’s no constraint that magically does this for you. After you set the statement identifier, you use the FOR
keyword before the statement that you want to analyze.
An example query follows below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | EXPLAIN PLAN SET STATEMENT_ID = '&input' FOR SELECT DISTINCT r.rental_id , c.contact_id , tu.check_out_date AS check_out_date , tu.return_date AS return_date , 3 AS created_by , TRUNC(SYSDATE) AS creation_date , 3 AS last_updated_by , TRUNC(SYSDATE) AS last_update_date FROM member m INNER JOIN contact c ON m.member_id = c.member_id INNER JOIN transaction_upload tu ON c.first_name = tu.first_name AND NVL(c.middle_name,'x') = NVL(tu.middle_name,'x') AND c.last_name = tu.last_name AND tu.account_number = m.account_number LEFT JOIN rental r ON c.contact_id = r.customer_id AND tu.check_out_date = r.check_out_date AND tu.return_date = r.return_date; |
After you explain a statement, you need to analyze it. Oracle uses hierarchical queries to analyze them. The following is a simple script that lets you display the execution plan by nesting the levels of the explained plan.
Here’s where you chose to follow the old or new way. Click on the link below to see the old way if you’re interested but I’d suggest you use the new way.
Oracle’s Old Fashioned Way ↓
This sections shows the old way of analyzing explain plans.
The first step after you’ve explained the statement, is to analyze it. The following query performs that analysis and returns the cost of the join. More information is available with the DBMS_XPLAN
package. You can also extend this to get more information from the PLAN_TABLE
but it’s a lot of work that’s already done for you.
It’s nice to see the mechanics of how this analysis is performed, and that’s why this is in the post. This uses Oracle’s hierarchical query mechanism.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | -- Clear any reporting breaks, columns, or computing. CLEAR BREAKS CLEAR COLUMNS CLEAR COMPUTES -- Set environment to capture explained plan. SET FEEDBACK OFF SET NULL "" SET LINESIZE 120 SET PAGESIZE 9999 SET PAUSE OFF SET VERIFY OFF -- Override default substitution variable prompt. ACCEPT input1 PROMPT "What do you want as a statement ID? " -- Format the Query Plan based on a 120 character length. COL c1 FORMAT A80 HEADING 'Query Plan' -- Spool it to file. SPOOL explain_it.log -- Create the hierarchy of the join plan. SELECT LPAD(' ',2*(LEVEL-1)) || operation||' ' || options||' ' || object_name||' ' || DECODE(id,0,'Cost = '||POSITION) "Query Plan" FROM plan_table START WITH id = 0 AND statement_id = '&input1' CONNECT BY PRIOR id = parent_id AND statement_id = '&input1'; -- Reset operating environment to default. SET FEEDBACK ON SET NULL "<Null>" SET LINESIZE 80 SET PAGESIZE 999 SET VERIFY ON -- Close output file. SPOOL OFF |
Assuming you entered Query1
as the input response to the EXPLAIN PLAN
statement and you name the query above explain_it.sql
, then you’d call it with the following syntax:
1 2 | @explain_it.sql What do you want AS a statement ID? Query1 |
The query starts at the top-most node aligned to the left. Each level down the node shifts two spaces in to the right. You start moving out of the process when the indenting reverses. It would generate an output file like the following:
1 2 3 4 5 6 7 8 9 10 11 12 | Query Plan -------------------------------------------------------------------------------- SELECT STATEMENT Cost = 34 HASH UNIQUE HASH JOIN OUTER VIEW NESTED LOOPS HASH JOIN TABLE ACCESS FULL MEMBER TABLE ACCESS FULL CONTACT INDEX RANGE SCAN IMPORT_DATE_RANGE TABLE ACCESS FULL RENTAL |
This tells you that the join isn’t cheap from a system resource standpoint. You can improve the query by creating indexes. Those steps are shown using the new method.
Oracle’s New Way
Oracle’s new way uses the DBMS_XPLAN
package. In this post, you’ll examine only how to display results and check the impact of small changes that lead to improved query performance.
Assuming you entered Query1
as the &input
response to the EXPLAIN PLAN
statement above, then you’d call it with the following syntax:
1 2 3 4 5 6 7 8 | -- Expand the width of displayed output. SET LINESIZE 120 -- Query using the TABLE function that returns a user-defined object type (UDT) -- into a SQL aggregate table, which is a fancy word for normal SQL output -- (unfortunately, it's the technically accurate word). SELECT * FROM TABLE(dbms_xplan.display(NULL,'Query1')); |
The query using the DISPLAY
function of the DBMS_XPLAN
returns a table of data. It should generate an output stream like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------- Plan hash value: 3289798709 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 229 | 24274 | 50 (8)| 00:00:01 | | 1 | HASH UNIQUE | | 229 | 24274 | 50 (8)| 00:00:01 | |* 2 | HASH JOIN OUTER | | 229 | 24274 | 49 (7)| 00:00:01 | | 3 | VIEW | | 4 | 248 | 37 (6)| 00:00:01 | |* 4 | HASH JOIN | | 4 | 572 | 37 (6)| 00:00:01 | |* 5 | HASH JOIN | | 15 | 1230 | 7 (15)| 00:00:01 | | 6 | TABLE ACCESS FULL | MEMBER | 9 | 180 | 3 (0)| 00:00:01 | | 7 | TABLE ACCESS FULL | CONTACT | 15 | 930 | 3 (0)| 00:00:01 | | 8 | EXTERNAL TABLE ACCESS FULL| TRANSACTION_UPLOAD | 8168 | 486K| 30 (4)| 00:00:01 | | 9 | TABLE ACCESS FULL | RENTAL | 4689 | 201K| 11 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+)) 4 - access("C"."FIRST_NAME"="TU"."FIRST_NAME" AND NVL("C"."MIDDLE_NAME",'x')=NVL("TU"."MIDDLE_NAME",'x') AND "C"."LAST_NAME"="TU"."LAST_NAME" AND "TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER") 5 - access("C"."MEMBER_ID"="M"."MEMBER_ID") Note ----- - dynamic sampling used for this statement 30 rows selected. |
The first thing to do is move the data immediately from an external table into a production table. You do that with the following command:
1 2 | -- Create a new table from an existing one. CREATE TABLE transaction_uploaded AS SELECT * FROM transaction_upload; |
Truncating the PLAN_TABLE
table, is the easiest way to start over by reusing the statement identifiers, like Query1
. Then, you can edit your EXPLAIN PLAN
script and change TRANSACTION_UPLOAD
table reference to TRANSACTION_UPLOADED
. Now, the query using the DISPLAY
function of the DBMS_XPLAN
returns a reduction in cost from 50 to 34.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------- Plan hash value: 3624831533 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 400 | 42400 | 34 (6)| 00:00:01 | | 1 | HASH UNIQUE | | 400 | 42400 | 34 (6)| 00:00:01 | |* 2 | HASH JOIN OUTER | | 400 | 42400 | 33 (4)| 00:00:01 | | 3 | VIEW | | 7 | 434 | 22 (5)| 00:00:01 | | 4 | NESTED LOOPS | | 7 | 1001 | 22 (5)| 00:00:01 | |* 5 | HASH JOIN | | 15 | 1230 | 7 (15)| 00:00:01 | | 6 | TABLE ACCESS FULL| MEMBER | 9 | 180 | 3 (0)| 00:00:01 | | 7 | TABLE ACCESS FULL| CONTACT | 15 | 930 | 3 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | IMPORT_DATE_RANGE | 1 | 61 | 1 (0)| 00:00:01 | | 9 | TABLE ACCESS FULL | RENTAL | 4689 | 201K| 11 (0)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+)) 5 - access("C"."MEMBER_ID"="M"."MEMBER_ID") 8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND NVL("C"."MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x')) Note ----- - dynamic sampling used for this statement 31 rows selected. |
A downside or quirk of thought, I’m never quite sure but folks look to the bottom not the middle first. Maybe in this case you tackle it first because it’s the attractive inefficiency due to the number of rows returned by a full table scan (most expensive).
You add a unique index that maps to the natural key for the RENTAL
table, like this:
1 2 | CREATE UNIQUE INDEX natural_key_rental ON rental (rental_id, customer_id, check_out_date, return_date); |
You run the EXPLAIN PLAN
again, using Query2
, and then analyze it. You’ll see that the query uses an index fast full scan and that the cost decreases by 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------- Plan hash value: 3402838417 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 400 | 42400 | 32 (7)| 00:00:01 | | 1 | HASH UNIQUE | | 400 | 42400 | 32 (7)| 00:00:01 | |* 2 | HASH JOIN OUTER | | 400 | 42400 | 31 (4)| 00:00:01 | | 3 | VIEW | | 7 | 434 | 22 (5)| 00:00:01 | | 4 | NESTED LOOPS | | 7 | 1001 | 22 (5)| 00:00:01 | |* 5 | HASH JOIN | | 15 | 1230 | 7 (15)| 00:00:01 | | 6 | TABLE ACCESS FULL| MEMBER | 9 | 180 | 3 (0)| 00:00:01 | | 7 | TABLE ACCESS FULL| CONTACT | 15 | 930 | 3 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | IMPORT_DATE_RANGE | 1 | 61 | 1 (0)| 00:00:01 | | 9 | INDEX FAST FULL SCAN| NATURAL_KEY_RENTAL | 4689 | 201K| 9 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+)) 5 - access("C"."MEMBER_ID"="M"."MEMBER_ID") 8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND NVL("C"."MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x')) Note ----- - dynamic sampling used for this statement 31 rows selected. |
You add a unique index that maps the surrogate and natural key for the MEMBER
table, like this:
1 2 | CREATE UNIQUE INDEX member_account ON member (member_id, account_number); |
You run the EXPLAIN PLAN
again, using Query3
, and then analyze it. You’ll see that the query uses an index range scan instead of a full table access of the MEMBER
table. This decreases the cost by 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------- Plan hash value: 656512492 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 400 | 42400 | 30 (7)| 00:00:01 | | 1 | HASH UNIQUE | | 400 | 42400 | 30 (7)| 00:00:01 | |* 2 | HASH JOIN OUTER | | 400 | 42400 | 29 (4)| 00:00:01 | | 3 | VIEW | | 7 | 434 | 19 (0)| 00:00:01 | | 4 | NESTED LOOPS | | 7 | 1001 | 19 (0)| 00:00:01 | | 5 | NESTED LOOPS | | 15 | 1230 | 4 (0)| 00:00:01 | | 6 | TABLE ACCESS FULL| CONTACT | 15 | 930 | 3 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | MEMBER_ACCOUNT | 1 | 20 | 1 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | IMPORT_DATE_RANGE | 1 | 61 | 1 (0)| 00:00:01 | | 9 | INDEX FAST FULL SCAN| NATURAL_KEY_RENTAL | 4689 | 201K| 9 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+)) 7 - access("C"."MEMBER_ID"="M"."MEMBER_ID") 8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND NVL("C"."MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x')) Note ----- - dynamic sampling used for this statement 31 rows selected. |
You add a unique index that maps the surrogate, foreign key to the MEMBER
table, and the natural key for the CONTACT
table, like this:
1 2 | CREATE UNIQUE INDEX contact_member ON contact (contact_id, member_id, last_name, first_name, NVL(middle_name,'x')); |
You run the EXPLAIN PLAN
again, using Query3
, and then analyze it. You’ll see that the query uses an index full scan instead of a full table access of the CONTACT
table. This decreases the cost by another 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------- Plan hash value: 1185696375 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 400 | 42400 | 28 (8)| 00:00:01 | | 1 | HASH UNIQUE | | 400 | 42400 | 28 (8)| 00:00:01 | |* 2 | HASH JOIN OUTER | | 400 | 42400 | 27 (4)| 00:00:01 | | 3 | VIEW | | 7 | 434 | 17 (0)| 00:00:01 | | 4 | NESTED LOOPS | | 7 | 1001 | 17 (0)| 00:00:01 | | 5 | NESTED LOOPS | | 15 | 1230 | 2 (0)| 00:00:01 | | 6 | INDEX FULL SCAN | CONTACT_MEMBER | 15 | 930 | 1 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | MEMBER_ACCOUNT | 1 | 20 | 1 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | IMPORT_DATE_RANGE | 1 | 61 | 1 (0)| 00:00:01 | | 9 | INDEX FAST FULL SCAN| NATURAL_KEY_RENTAL | 4689 | 201K| 9 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TU"."RETURN_DATE"="R"."RETURN_DATE"(+) AND "TU"."CHECK_OUT_DATE"="R"."CHECK_OUT_DATE"(+) AND "C"."CONTACT_ID"="R"."CUSTOMER_ID"(+)) 7 - access("C"."MEMBER_ID"="M"."MEMBER_ID") 8 - access("TU"."ACCOUNT_NUMBER"="M"."ACCOUNT_NUMBER" AND "C"."LAST_NAME"="TU"."LAST_NAME" AND "C"."FIRST_NAME"="TU"."FIRST_NAME" AND NVL("MIDDLE_NAME",'x')=NVL("MIDDLE_NAME",'x')) Note ----- - dynamic sampling used for this statement 31 rows selected. |
The question evolves after you tune the first query. Then, you must check what impacts may exist on other queries, like the statement that you use to merge data into the TRANSACTION
table. You may have to suppress indexes in a subsequent query, or after you tune it, return and suppress an index here. You suppress an index by adding a '0'
to a NUMBER
or DATE
column, and by concatenating a null string to a character data type.
Examples of Suppressing Indexes
The following is an example of suppressing an index built on a string. It extends the tuning statement earlier in the blog post, and references line #17. The concatenation of a null string suppresses the index.
17 | AND c.last_name = tu.last_name||'' |
The following is an example of suppressing an index built on a date. It extends the tuning statement earlier in the blog post, and references line #20. Adding a zero leaves the value intact and suppresses the index.
20 | AND tu.check_out_date = r.check_out_date+0 |