samedi 25 juin 2016

Why are queries faster when subqueries within the WITH clause are duplicated?

Suppose a project uses partitions to structure its data. This concept is purely business specific and does not have to do with database partitioning.

Let's say the business logic does:

  • delete from output_table where partition = < partitionX >
  • insert into output_table (select * from input_table where partition = < partitionX >)

Keeping in mind that everything is structured like that, let's complicate a problem (to get to the actual question).

Suppose that I have a query (the SELECT query) that is potential killer, in terms of time :

insert into output_table (
  select * 
   from input_table
   left outer join additional_table additional_table1  
     on input_table.id = additional_table1.id
   left outer join additional_table additional_table2  
    on additional_table2.id = additional_table1.parent
  where partition =  <partitionX>
)

Let's optimize this and explore the options. Keep in mind every table has partitions. Also notice how table2 is joined twice, but on different columns. And, also notice how the additional table is joined on itself

Everything uses the WITH clause, but there are several options and I would like to know why one of these is better.

A. direct and duplicate queries in the WITH section

WITH 
CACHED_input_table AS (
  SELECT *
  FROM input_table
  WHERE PARTITION_ID = < partition  X >
),
CACHED_additional_table1 AS (
  SELECT *
  FROM additional_table 
  WHERE PARTITION_ID = < partition  X >
),
CACHED_additional_table2 AS (
  SELECT *
  FROM additional_table 
  WHERE PARTITION_ID = < partition  X >
)
SELECT *    
FROM CACHED_input_table input_table
  LEFT OUTER JOIN CACHED_additional_table1 additional_table1 
    ON input_table.ID = additional_table1.ID
  LEFT OUTER JOIN CACHED_additional_table2 additional_table2 
    ON additional_table1.PARENT_ID = additional_table2.ID

B. reuse of query in the FROM section

WITH 
CACHED_input_table AS (
  SELECT *
  FROM input_table
  WHERE PARTITION_ID = < partition  X >
),
CACHED_additional_table AS (
  SELECT *
  FROM additional_table 
  WHERE PARTITION_ID = < partition  X >
)
SELECT *    
FROM CACHED_input_table input_table
  LEFT OUTER JOIN CACHED_additional_table additional_table1 
    ON input_table.ID = additional_table1.ID
  LEFT OUTER JOIN CACHED_additional_table additional_table2 
    ON additional_table1.PARENT_ID = additional_table2.ID

C. reuse of query in the WITH section

WITH 
CACHED_input_table AS (
  SELECT *
  FROM input_table
  WHERE PARTITION_ID = < partition  X >
),
CACHED_additional_table1 AS (
  SELECT *
  FROM additional_table 
  WHERE PARTITION_ID = < partition  X >
),
CACHED_additional_table2 AS (
  SELECT *
  FROM CACHED_additional_table1 
)
SELECT *    
FROM CACHED_input_table input_table

LEFT OUTER JOIN CACHED_additional_table1 additional_table1 
 ON input_table.ID = additional_table1.ID

LEFT OUTER JOIN CACHED_additional_table2 additional_table2 
 ON additional_table1.PARENT_ID = additional_table2.ID

From experience, Option A is the fastest. But why? Can someone explain this? (I am playing on Oracle v11.2)

I know that, potentially, my optimization around this company specific concept of partitions has nothing to do with the generic sql optimization around WITH clause that I am asking about, but please take it as a real-life example.

Explain plans

Option A (9900 rows in 7s)

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                              |     1 |  1037 | 18540   (8)| 00:00:03 |       |       |
|*  1 |  HASH JOIN OUTER        |                              |     1 |  1037 | 18540   (8)| 00:00:03 |       |       |
|*  2 |   HASH JOIN OUTER       |                              |     1 |   605 |  9271   (8)| 00:00:02 |       |       |
|   3 |    PARTITION LIST SINGLE|                              |     1 |   173 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL   | input_table                  |     1 |   173 |     2   (0)| 00:00:01 |    24 |    24 |
|   5 |    PARTITION LIST SINGLE|                              |  1362K|   561M|  9248   (8)| 00:00:02 |   KEY |   KEY |
|   6 |     TABLE ACCESS FULL   | additional_table             |  1362K|   561M|  9248   (8)| 00:00:02 |    24 |    24 |
|   7 |   PARTITION LIST SINGLE |                              |  1362K|   561M|  9248   (8)| 00:00:02 |   KEY |   KEY |
|   8 |    TABLE ACCESS FULL    | additional_table             |  1362K|   561M|  9248   (8)| 00:00:02 |    24 |    24 |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("additional_table"."PARENT"="additional_table"."ID"(+))
   2 - access("input_table"."ID"="additional_table"."ID"(+))

Option B (9900 rows in 10s)

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                              |     1 |  2813 | 18186  (11)| 00:00:03 |       |       |
|   1 |  TEMP TABLE TRANSFORMATION |                              |       |       |            |          |       |       |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6CA2_C26AF925  |       |       |            |          |       |       |
|   3 |    PARTITION LIST SINGLE   |                              |  1362K|   561M|  9248   (8)| 00:00:02 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL      | additional_table1            |  1362K|   561M|  9248   (8)| 00:00:02 |    24 |    24 |
|*  5 |   HASH JOIN OUTER          |                              |     1 |  2813 |  8939  (15)| 00:00:02 |       |       |
|*  6 |    HASH JOIN OUTER         |                              |     1 |  1493 |  4470  (15)| 00:00:01 |       |       |
|   7 |     PARTITION LIST SINGLE  |                              |     1 |   173 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   8 |      TABLE ACCESS FULL     | input_table                  |     1 |   173 |     2   (0)| 00:00:01 |    24 |    24 |
|   9 |     VIEW                   |                              |  1362K|  1714M|  4447  (14)| 00:00:01 |       |       |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6CA2_C26AF925  |  1362K|   561M|  4447  (14)| 00:00:01 |       |       |
|  11 |    VIEW                    |                              |  1362K|  1714M|  4447  (14)| 00:00:01 |       |       |
|  12 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6CA2_C26AF925  |  1362K|   561M|  4447  (14)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("additional_table1"."PARENT"="additional_table2"."ID"(+))
   6 - access("input_table"."ID"="additional_table1"."ID"(+))

Option C (9900 rows in 17s)

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                              |     1 |  2813 | 18186  (11)| 00:00:03 |       |       |
|   1 |  TEMP TABLE TRANSFORMATION |                              |       |       |            |          |       |       |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6CA7_C26AF925  |       |       |            |          |       |       |
|   3 |    PARTITION LIST SINGLE   |                              |  1362K|   561M|  9248   (8)| 00:00:02 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL      | additional_table             |  1362K|   561M|  9248   (8)| 00:00:02 |    24 |    24 |
|*  5 |   HASH JOIN OUTER          |                              |     1 |  2813 |  8939  (15)| 00:00:02 |       |       |
|*  6 |    HASH JOIN OUTER         |                              |     1 |  1493 |  4470  (15)| 00:00:01 |       |       |
|   7 |     PARTITION LIST SINGLE  |                              |     1 |   173 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   8 |      TABLE ACCESS FULL     | input_table                  |     1 |   173 |     2   (0)| 00:00:01 |    24 |    24 |
|   9 |     VIEW                   |                              |  1362K|  1714M|  4447  (14)| 00:00:01 |       |       |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6CA7_C26AF925  |  1362K|   561M|  4447  (14)| 00:00:01 |       |       |
|  11 |    VIEW                    |                              |  1362K|  1714M|  4447  (14)| 00:00:01 |       |       |
|  12 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6CA7_C26AF925  |  1362K|   561M|  4447  (14)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("additional_table1"."PARENT_ID"="CACHED_additional_table"."ID"(+))
   6 - access("input_table"."ID"="additional_table1"."ID"(+))

EDIT :

  • added explain plans
  • edited base query : there is an input_table, and an additional_table that is joined twice, once on input_table, and once on itself
  • edited query for option A : there is an input_table, and the additional_table is joined twice, once on input_table, and once on a duplicate of itself (additional_table)
  • edited query for Option B : there is an input_table, and the additional_table is joined twice, once on input_table, and once on itself, using the same alias(additional_table)
  • edited query for Option C : there is an input_table, and the additional_table is joined twice, once on input_table, and once on another table created from itself in the WITH section

Aucun commentaire:

Enregistrer un commentaire