Thursday, August 18, 2016

Oracle Partition Pruning on Date Comparisons

In the previous post we saw date filter clause impact over index, now let us iterate the same testcase against partitioned tables,
Table Creation
CREATE TABLE "C##HARI"."TBL_OBJECTS_p" 
   ( "ID" NUMBER, 
 "ID_VAL" VARCHAR2(46 BYTE), 
 "OWNER" VARCHAR2(128 BYTE) NOT NULL ENABLE, 
 "OBJECT_NAME" VARCHAR2(128 BYTE) NOT NULL ENABLE, 
 "SUBOBJECT_NAME" VARCHAR2(128 BYTE), 
 "OBJECT_ID" NUMBER NOT NULL ENABLE, 
 "DATA_OBJECT_ID" NUMBER, 
 "OBJECT_TYPE" VARCHAR2(23 BYTE), 
 "CREATED" DATE NOT NULL ENABLE, 
 "LAST_DDL_TIME" DATE NOT NULL ENABLE, 
 "TIMESTAMP" VARCHAR2(19 BYTE), 
 "STATUS" VARCHAR2(7 BYTE), 
 "TEMPORARY" VARCHAR2(1 BYTE), 
 "GENERATED" VARCHAR2(1 BYTE), 
 "SECONDARY" VARCHAR2(1 BYTE), 
 "NAMESPACE" NUMBER NOT NULL ENABLE, 
 "EDITION_NAME" VARCHAR2(128 BYTE), 
 "SHARING" VARCHAR2(13 BYTE), 
 "EDITIONABLE" VARCHAR2(1 BYTE), 
 "ORACLE_MAINTAINED" VARCHAR2(1 BYTE), 
 "ID_TIMESTAMP" DATE, 
 "ID_DATE" DATE, 
 "EFFECTIVE_END_TS" TIMESTAMP (9)
   )
   PARTITION BY RANGE ( effective_end_ts )
     ( PARTITION p1 VALUES LESS THAN (timestamp' 1900-01-01 00:00:00'),
       PARTITION p2 VALUES LESS THAN (timestamp' 1950-01-01 00:00:00'),
       PARTITION p3 VALUES LESS THAN (timestamp' 2000-01-01 00:00:00'),
       PARTITION p4 VALUES LESS THAN (timestamp' 2050-01-01 00:00:00'),
       PARTITION p5 VALUES LESS THAN (MAXVALUE))
Load data and index creation 
insert into "C##HARI"."TBL_OBJECTS_p"  select * from "C##HARI"."TBL_OBJECTS_1" 
commit
create unique index idx_p on "C##HARI"."TBL_OBJECTS_p"(id, effective_end_ts)
Let us repeat the SQL's

Sky is the limit :)
select * from "C##HARI"."TBL_OBJECTS_p" where effective_end_ts > 
systimestamp
 
Plan hash value: 4166151969
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |               |       |       |   572 (100)|          |       |       |
|   1 |  PARTITION RANGE ALL|               |  4454 |   695K|   572   (1)| 00:00:01 |     1 |     5 |
|*  2 |   TABLE ACCESS FULL | TBL_OBJECTS_p |  4454 |   695K|   572   (1)| 00:00:01 |     1 |     5 |
-----------------------------------------------------------------------------------------------------
 Look at the partitions being accessed, The optimizer has chosen to go through all the partitions, for the given filter i,e '> systimestamp' - ideally the optimizer has to look in to P4 and P5 partitions and that's not happening :)

Timestamp Filter
 select * from "C##HARI"."TBL_OBJECTS_p" where effective_end_ts = 
 timestamp '9999-01-01 00:00:00'
  
 Plan hash value: 2381956076
  
 --------------------------------------------------------------------------------------------------------
 | Id  | Operation              | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
 --------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT       |               |       |       |   171 (100)|          |       |       |
 |   1 |  PARTITION RANGE SINGLE|               | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 |*  2 |   TABLE ACCESS FULL    | TBL_OBJECTS_p | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 --------------------------------------------------------------------------------------------------------
Well this time the optimizer has picked the right partition - Partition P5

To_Timestamp Filter
select * from "C##HARI"."TBL_OBJECTS_p" where effective_end_ts = 
 (to_timestamp('01/01/9999 00:00:00','dd/mm/yyyy hh24:mi:ss'))
  
 Plan hash value: 2381956076
  
 --------------------------------------------------------------------------------------------------------
 | Id  | Operation              | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
 --------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT       |               |       |       |   171 (100)|          |       |       |
 |   1 |  PARTITION RANGE SINGLE|               | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 |*  2 |   TABLE ACCESS FULL    | TBL_OBJECTS_p | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 --------------------------------------------------------------------------------------------------------
Similar to previous plan, it is picking up the right partition P5. 

 Conclusion - Be it index or partition SQL with sky limit(> systimestamp) will always trouble the optimizer. Instead under these circumstances if you are aware of the need and use Timestamp and To_timestamp aptly the desired results are obtained. 

oow oow hold on, though it picks up the right partition P5 with timestamp/to_timestamp, what happen to the local index and why it is not used, is this the lead for my next post :) 

Tuesday, August 16, 2016

Date Filters and their impact over Index

I was discussing with my friend on date filters and how they impact the performance, thought this topic is worth blogging.

 With tables holding historic data, you would see columns 'Effective Date or termination date' which would logically mark the record as inactive and also helps to pick the current active record. Again few store NULL or any futuristic date in these columns to determine the active status. Now from this perspective let us see how date formatting can impact the optimizer decisions

  Let us walk through a set of SQL which has date filters,
select * from tbl_objects where effective_end_ts > systimestamp
  
 Plan hash value: 2402742835
  
 ---------------------------------------------------------------------------------
 | Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
 ---------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT  |             |       |       |   563 (100)|          |
 |*  1 |  TABLE ACCESS FULL| TBL_OBJECTS |  4454 |   695K|   563   (1)| 00:00:01 |
 ---------------------------------------------------------------------------------
  At the very first look was not convinced with the plan, because i know the total number of active records where 26721 and not 4454, so let us see what is actual and expected.
 select /*+ gather_plan_statistics */ * from tbl_objects where 
 effective_end_ts > systimestamp
  
 Plan hash value: 2402742835
  
 -------------------------------------------------------------------------------------------
 | Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
 -------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT  |             |      1 |        |  26721 |00:00:00.14 |    2539 |
 |*  1 |  TABLE ACCESS FULL| TBL_OBJECTS |      1 |   4454 |  26721 |00:00:00.14 |    2539 |
 -------------------------------------------------------------------------------------------
 Yes the actual cardinality is 26721. Now am not going to SQL Profiles to go further to validate the plan because my first bet is always on the data,application design, the first question is why this SQL and what use the operator '>' . 

         The SQL intent is to pick active trades whose end timestamp is 01-01-9999, so let us say this to Oracle and see how this overcomes the trouble created by the operator '> systimestamp'  that is sky is the limit :)   

Now with effective end timestamp values  
select * from tbl_objects where effective_end_ts = timestamp 
'9999-01-01 00:00:00'
 
Plan hash value: 1709553517
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TBL_OBJECTS |     1 |   160 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_EFF_TS  |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
    Though the cardinality is still not right, we could see a change in the access method - index is used.

Now same query with a different date formatting
select * from tbl_objects where effective_end_ts = 
 (to_timestamp('01/01/9999 00:00:00','dd/mm/yyyy hh24:mi:ss'))
  
 Plan hash value: 1709553517
  
 ---------------------------------------------------------------------------------------------------
 | Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
 ---------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
 |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TBL_OBJECTS |     1 |   160 |     2   (0)| 00:00:01 |
 |*  2 |   INDEX RANGE SCAN                  | IDX_EFF_TS  |     1 |       |     1   (0)| 00:00:01 |
 ---------------------------------------------------------------------------------------------------
    Both timestamp and to_timestamp yield the same results in terms of access path.  So avoid using sky limit (> systimestamp) in your SQL's if you know what you are looking for. SQL's like these do they impact partitions, watch out for the next post.