Suppose a project uses partitions to structure its data. This concept is purely business specific and does not have to do with database partitioning.
Let's say the business logic does:
- delete from output_table where partition = < partitionX >
- insert into output_table (select * from input_table where partition = < partitionX >)
Keeping in mind that everything is structured like that, let's complicate a problem (to get to the actual question).
Suppose that I have a query (the SELECT query) that is potential killer, in terms of time :
insert into output_table (
select *
from input_table
left outer join additional_table additional_table1
on input_table.id = additional_table1.id
left outer join additional_table additional_table2
on additional_table2.id = additional_table1.parent
where partition = <partitionX>
)
Let's optimize this and explore the options. Keep in mind every table has partitions. Also notice how table2 is joined twice, but on different columns. And, also notice how the additional table is joined on itself
Everything uses the WITH clause, but there are several options and I would like to know why one of these is better.
A. direct and duplicate queries in the WITH section
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table1 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table2 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table1 additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table2 additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
B. reuse of query in the FROM section
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
C. reuse of query in the WITH section
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table1 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table2 AS (
SELECT *
FROM CACHED_additional_table1
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table1 additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table2 additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
From experience, Option A is the fastest. But why? Can someone explain this? (I am playing on Oracle v11.2)
I know that, potentially, my optimization around this company specific concept of partitions has nothing to do with the generic sql optimization around WITH clause that I am asking about, but please take it as a real-life example.
Explain plans
Option A (9900 rows in 7s)
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1037 | 18540 (8)| 00:00:03 | | |
|* 1 | HASH JOIN OUTER | | 1 | 1037 | 18540 (8)| 00:00:03 | | |
|* 2 | HASH JOIN OUTER | | 1 | 605 | 9271 (8)| 00:00:02 | | |
| 3 | PARTITION LIST SINGLE| | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 4 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 5 | PARTITION LIST SINGLE| | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 6 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
| 7 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 8 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("additional_table"."PARENT"="additional_table"."ID"(+))
2 - access("input_table"."ID"="additional_table"."ID"(+))
Option B (9900 rows in 10s)
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA2_C26AF925 | | | | | | |
| 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 4 | TABLE ACCESS FULL | additional_table1 | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
|* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | |
|* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | |
| 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
| 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("additional_table1"."PARENT"="additional_table2"."ID"(+))
6 - access("input_table"."ID"="additional_table1"."ID"(+))
Option C (9900 rows in 17s)
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA7_C26AF925 | | | | | | |
| 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 4 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
|* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | |
|* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | |
| 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
| 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("additional_table1"."PARENT_ID"="CACHED_additional_table"."ID"(+))
6 - access("input_table"."ID"="additional_table1"."ID"(+))
EDIT :
- added explain plans
- edited base query : there is an input_table, and an additional_table that is joined twice, once on input_table, and once on itself
- edited query for option A : there is an input_table, and the additional_table is joined twice, once on input_table, and once on a duplicate of itself (additional_table)
- edited query for Option B : there is an input_table, and the additional_table is joined twice, once on input_table, and once on itself, using the same alias(additional_table)
- edited query for Option C : there is an input_table, and the additional_table is joined twice, once on input_table, and once on another table created from itself in the WITH section
Aucun commentaire:
Enregistrer un commentaire