Thursday, August 18, 2016

Oracle Partition Pruning on Date Comparisons

In the previous post we saw date filter clause impact over index, now let us iterate the same testcase against partitioned tables,
Table Creation
CREATE TABLE "C##HARI"."TBL_OBJECTS_p" 
   ( "ID" NUMBER, 
 "ID_VAL" VARCHAR2(46 BYTE), 
 "OWNER" VARCHAR2(128 BYTE) NOT NULL ENABLE, 
 "OBJECT_NAME" VARCHAR2(128 BYTE) NOT NULL ENABLE, 
 "SUBOBJECT_NAME" VARCHAR2(128 BYTE), 
 "OBJECT_ID" NUMBER NOT NULL ENABLE, 
 "DATA_OBJECT_ID" NUMBER, 
 "OBJECT_TYPE" VARCHAR2(23 BYTE), 
 "CREATED" DATE NOT NULL ENABLE, 
 "LAST_DDL_TIME" DATE NOT NULL ENABLE, 
 "TIMESTAMP" VARCHAR2(19 BYTE), 
 "STATUS" VARCHAR2(7 BYTE), 
 "TEMPORARY" VARCHAR2(1 BYTE), 
 "GENERATED" VARCHAR2(1 BYTE), 
 "SECONDARY" VARCHAR2(1 BYTE), 
 "NAMESPACE" NUMBER NOT NULL ENABLE, 
 "EDITION_NAME" VARCHAR2(128 BYTE), 
 "SHARING" VARCHAR2(13 BYTE), 
 "EDITIONABLE" VARCHAR2(1 BYTE), 
 "ORACLE_MAINTAINED" VARCHAR2(1 BYTE), 
 "ID_TIMESTAMP" DATE, 
 "ID_DATE" DATE, 
 "EFFECTIVE_END_TS" TIMESTAMP (9)
   )
   PARTITION BY RANGE ( effective_end_ts )
     ( PARTITION p1 VALUES LESS THAN (timestamp' 1900-01-01 00:00:00'),
       PARTITION p2 VALUES LESS THAN (timestamp' 1950-01-01 00:00:00'),
       PARTITION p3 VALUES LESS THAN (timestamp' 2000-01-01 00:00:00'),
       PARTITION p4 VALUES LESS THAN (timestamp' 2050-01-01 00:00:00'),
       PARTITION p5 VALUES LESS THAN (MAXVALUE))
Load data and index creation 
insert into "C##HARI"."TBL_OBJECTS_p"  select * from "C##HARI"."TBL_OBJECTS_1" 
commit
create unique index idx_p on "C##HARI"."TBL_OBJECTS_p"(id, effective_end_ts)
Let us repeat the SQL's

Sky is the limit :)
select * from "C##HARI"."TBL_OBJECTS_p" where effective_end_ts > 
systimestamp
 
Plan hash value: 4166151969
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |               |       |       |   572 (100)|          |       |       |
|   1 |  PARTITION RANGE ALL|               |  4454 |   695K|   572   (1)| 00:00:01 |     1 |     5 |
|*  2 |   TABLE ACCESS FULL | TBL_OBJECTS_p |  4454 |   695K|   572   (1)| 00:00:01 |     1 |     5 |
-----------------------------------------------------------------------------------------------------
 Look at the partitions being accessed, The optimizer has chosen to go through all the partitions, for the given filter i,e '> systimestamp' - ideally the optimizer has to look in to P4 and P5 partitions and that's not happening :)

Timestamp Filter
 select * from "C##HARI"."TBL_OBJECTS_p" where effective_end_ts = 
 timestamp '9999-01-01 00:00:00'
  
 Plan hash value: 2381956076
  
 --------------------------------------------------------------------------------------------------------
 | Id  | Operation              | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
 --------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT       |               |       |       |   171 (100)|          |       |       |
 |   1 |  PARTITION RANGE SINGLE|               | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 |*  2 |   TABLE ACCESS FULL    | TBL_OBJECTS_p | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 --------------------------------------------------------------------------------------------------------
Well this time the optimizer has picked the right partition - Partition P5

To_Timestamp Filter
select * from "C##HARI"."TBL_OBJECTS_p" where effective_end_ts = 
 (to_timestamp('01/01/9999 00:00:00','dd/mm/yyyy hh24:mi:ss'))
  
 Plan hash value: 2381956076
  
 --------------------------------------------------------------------------------------------------------
 | Id  | Operation              | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
 --------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT       |               |       |       |   171 (100)|          |       |       |
 |   1 |  PARTITION RANGE SINGLE|               | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 |*  2 |   TABLE ACCESS FULL    | TBL_OBJECTS_p | 26724 |  4149K|   171   (1)| 00:00:01 |     5 |     5 |
 --------------------------------------------------------------------------------------------------------
Similar to previous plan, it is picking up the right partition P5. 

 Conclusion - Be it index or partition SQL with sky limit(> systimestamp) will always trouble the optimizer. Instead under these circumstances if you are aware of the need and use Timestamp and To_timestamp aptly the desired results are obtained. 

oow oow hold on, though it picks up the right partition P5 with timestamp/to_timestamp, what happen to the local index and why it is not used, is this the lead for my next post :) 

Tuesday, August 16, 2016

Date Filters and their impact over Index

I was discussing with my friend on date filters and how they impact the performance, thought this topic is worth blogging.

 With tables holding historic data, you would see columns 'Effective Date or termination date' which would logically mark the record as inactive and also helps to pick the current active record. Again few store NULL or any futuristic date in these columns to determine the active status. Now from this perspective let us see how date formatting can impact the optimizer decisions

  Let us walk through a set of SQL which has date filters,
select * from tbl_objects where effective_end_ts > systimestamp
  
 Plan hash value: 2402742835
  
 ---------------------------------------------------------------------------------
 | Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
 ---------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT  |             |       |       |   563 (100)|          |
 |*  1 |  TABLE ACCESS FULL| TBL_OBJECTS |  4454 |   695K|   563   (1)| 00:00:01 |
 ---------------------------------------------------------------------------------
  At the very first look was not convinced with the plan, because i know the total number of active records where 26721 and not 4454, so let us see what is actual and expected.
 select /*+ gather_plan_statistics */ * from tbl_objects where 
 effective_end_ts > systimestamp
  
 Plan hash value: 2402742835
  
 -------------------------------------------------------------------------------------------
 | Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
 -------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT  |             |      1 |        |  26721 |00:00:00.14 |    2539 |
 |*  1 |  TABLE ACCESS FULL| TBL_OBJECTS |      1 |   4454 |  26721 |00:00:00.14 |    2539 |
 -------------------------------------------------------------------------------------------
 Yes the actual cardinality is 26721. Now am not going to SQL Profiles to go further to validate the plan because my first bet is always on the data,application design, the first question is why this SQL and what use the operator '>' . 

         The SQL intent is to pick active trades whose end timestamp is 01-01-9999, so let us say this to Oracle and see how this overcomes the trouble created by the operator '> systimestamp'  that is sky is the limit :)   

Now with effective end timestamp values  
select * from tbl_objects where effective_end_ts = timestamp 
'9999-01-01 00:00:00'
 
Plan hash value: 1709553517
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TBL_OBJECTS |     1 |   160 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_EFF_TS  |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
    Though the cardinality is still not right, we could see a change in the access method - index is used.

Now same query with a different date formatting
select * from tbl_objects where effective_end_ts = 
 (to_timestamp('01/01/9999 00:00:00','dd/mm/yyyy hh24:mi:ss'))
  
 Plan hash value: 1709553517
  
 ---------------------------------------------------------------------------------------------------
 | Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
 ---------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
 |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TBL_OBJECTS |     1 |   160 |     2   (0)| 00:00:01 |
 |*  2 |   INDEX RANGE SCAN                  | IDX_EFF_TS  |     1 |       |     1   (0)| 00:00:01 |
 ---------------------------------------------------------------------------------------------------
    Both timestamp and to_timestamp yield the same results in terms of access path.  So avoid using sky limit (> systimestamp) in your SQL's if you know what you are looking for. SQL's like these do they impact partitions, watch out for the next post.    

Wednesday, June 15, 2016

Oracle 12c - Pattern Matching Part 1

Oracle 12c Pattern Matching





Big SQL :)

Before getting into Oracle 12c Pattern Matching - the set of questions which came across after looking at Oracle 12c Patern Matching - MATCH_RECOGNIZE are 
  1. With SQL we have many possible solution, what could be that one scenario which cannot be handled or may not be an optimal solution via SQL until Oracle 11g which lead to SQL Pattern Matching
  2. It looks big, Very BIG
  3. So many key words
  4. Complex
           Pattern Matching comes examples like V/W patterns in stock market, suspicious financial transaction and you will find much more online, But let us get into basic question why not in 11g, Demystifying the unknown usually helps so lets begin. Am going to walk you through multiple scenarios comparing regular expression and analytical SQL. 

Assume we have data stored in a comma separated format in a column like below

with tbl_reg as
(
select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual
union
select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual
union
select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual
)
SELECT * from tbl_reg;

CUSTOM BUYING_PATTERN
------ -------------------
Linda  TV,BIKE,MOBILE,CAR
Lorena TV,BIKE,MOBILE
Mark   TV,BIKE,BIKE,MOBILE

Scenario/Pattern 1- Data Format = CSV


           Now if we have to look out for buying pattern  like 'TV,BIKE,MOBILE,CAR', then would straight away jump to regular expressions,
SQL
with tbl_reg as
(
select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual
union
select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual
union
select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual
)
SELECT * FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(Buying_Pattern,',',' '),'TV BIKE MOBILE CAR');


CUSTOM BUYING_PATTERN
------ -------------------
Linda  TV,BIKE,MOBILE,CAR


          With proper Data Modeling in place the data is not gonna be stored in CSV format, so let us re-iterate the requirement to identifying the pattern

Scenario/Pattern 1 - Data Format = Rows

                    With Anaytical function - LEAD the same can be achived 
SQL
WITH PRDT_ORDER AS
(SELECT
BP.*
,LEAD(PRODUCT) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) NEXT_PRDT
,LEAD(PRODUCT,2) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) SECOND_NEXT_PRDT
,LEAD(PRODUCT,3) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) THIRD_NEXT_PRDT
FROM BUYING_PATTERN BP
)
SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER
WHERE
PRODUCT='TV'
AND NEXT_PRDT = 'BIKE'
AND SECOND_NEXT_PRDT = 'MOBILE'
AND THIRD_NEXT_PRDT = 'CAR';

CUSTOMER   BUY_DT      PRODUCT
-------- ------------------------------ -----------
LINDA     15-JUN-16 07.51.48.812559 AM     TV



Scenario 2 - Pattern 1 + Pattern 2 - Data Format = CSV


       If we want to look out for another pattern along with existing one , .ie along with pattern 1 = TV,BIKE,MOBILE,CAR, check for pattern 2 = TV,BIKE,BIKE,MOBILE  it is just a simple change to the existing SQL

with tbl_reg as
(
select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual
union
select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual
union
select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual
)
SELECT * FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(Buying_Pattern,',',' '),'TV BIKE MOBILE CAR|TV BIKE BIKE MOBILE');

CUSTOM BUYING_PATTERN
------ -------------------
Linda  TV,BIKE,MOBILE,CAR
Mark   TV,BIKE,BIKE,MOBILE


Now the same in rows

Scenario 2 - Pattern 1+Pattern 2 - Data Format = Rows


pattern 1 = TV,BIKE,MOBILE,CAR,
pattern 2 = TV,BIKE,BIKE,MOBILE


WITH PRDT_ORDER AS
(SELECT 
BP.*
,LEAD(PRODUCT) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) NEXT_PRDT
,LEAD(PRODUCT,2) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) SECOND_NEXT_PRDT
,LEAD(PRODUCT,3) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) THIRD_NEXT_PRDT
FROM BUYING_PATTERN BP
)
SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER 
WHERE 
PRODUCT='TV'
AND NEXT_PRDT = 'BIKE'
AND SECOND_NEXT_PRDT = 'MOBILE'
AND THIRD_NEXT_PRDT = 'CAR'
union
SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER 
WHERE 
PRODUCT='TV'
AND NEXT_PRDT = 'BIKE'
AND SECOND_NEXT_PRDT = 'BIKE'
AND THIRD_NEXT_PRDT = 'MOBILE';

USTOMER         BUY_DT    PRODUCT
----------- -----------------------------------  ---------------------
LINDA       15-JUN-16 07.51.48.812559 AM           TV
MARK       15-JUN-16 07.51.48.812559 AM           TV


Scenario 3 - Pattern 3 - Data Format = CSV

       Pattern 3 is some like when you think sky is the limit i.e really not sure how many of them could repeat within a pattern. For an example - All 1,2,3 from the below picture are of same pattern XYZA


with tbl_reg as
(
select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual
union
select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual
union
select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual
union
select 'RAJINI' as customer, 'TV,TV,BIKE,BIKE,BIKE,MOBILE,CAR' as Buying_Pattern from dual
)
SELECT customer, buying_pattern  FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(Buying_Pattern,','),'(TV(.*)BIKE(.*)MOBILE(.*)CAR(.*))');

CUSTOM BUYING_PATTERN
------ -------------------------------
Linda  TV,BIKE,MOBILE,CAR
RAJINI TV,TV,BIKE,BIKE,BIKE,MOBILE,CAR


Scenario 3 - Pattern 3 -  Data Format = ROWS

                  We have two sets of data, one has a fixed pattern while other has got a subset, ie, Looking out for buying pattern, matching the order - "TV(.*) BIKE(.*) MOBILE CAR". Both 1 and 2 in the below picture belong to the same pattern "TV BIKE MOBILE CAR"
























               Untill 11g if this has to be written in a single SQL, it becomes pretty tough to have it in a single optimal SQL. Oracle 12c - Pattern Matching has got the answer, when you want to perform a pattern recognition in a sequence of rows - it is MATCH_RECOGNIZE

Here it is
SELECT *
FROM BUYING_PATTERN
MATCH_RECOGNIZE ( PARTITION BY CUSTOMER ORDER BY BUY_DT
  MEASURES
    FIRST(A.PRODUCT) first_ITEM,
    FIRST(A.BUY_DT) first_ITEM_DT,
    NVL(LAST(C.PRODUCT),LAST(B.PRODUCT)) LAST_ITEM_BEFORE_X,
    LAST(D.PRODUCT) LAST_ITEM,
    LAST(D.BUY_DT) LAST_ITEM_DT
    ,COUNT(*) CNT    
  one ROW PER MATCH   
  PATTERN (A* B* C D)
  DEFINE   
  A AS A.PRODUCT='TV',
  B AS B.PRODUCT='BIKE',
  C AS C.PRODUCT='MOBILE',
  D AS D.PRODUCT='CAR'
);


CUSTOMER    FIRST_ITEM    FIRST_ITEM_DT     LAST_ITEM_BEFORE_X     LAST_ITEM LAST_ITEM_DT                CNT
----------- ------------------ -------------------------  ------------------   -------------- -------------------------------- ----
LINDA     TV      15-JUN-16 07.51.48.812559 AM     MOBILE     CAR  15-JUN-16 07.54.48.812559 AM 4
RAJINI     TV      15-JUN-16 03.22.24.354595 PM     MOBILE     CAR  15-JUN-16 03.25.24.354595 PM 7


Once we know why it is required, where we can apply,  we are half way through in Pattern Matching - MATCH_RECOGNIZE. Rest to follow in upcoming post.