- With SQL we have many possible solution, what could be that one scenario which cannot be handled or may not be an optimal solution via SQL until Oracle 11g which lead to SQL Pattern Matching
- It looks big, Very BIG
- So many key words
- Complex
Pattern Matching comes examples like V/W patterns in stock market, suspicious financial transaction and you will find much more online, But let us get into basic question why not in 11g, Demystifying the unknown usually helps so lets begin. Am going to walk you through multiple scenarios comparing regular expression and analytical SQL.
Assume we have data stored in a comma separated format in a column like below
with tbl_reg as ( select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual union select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual union select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual ) SELECT * from tbl_reg; CUSTOM BUYING_PATTERN ------ ------------------- Linda TV,BIKE,MOBILE,CAR Lorena TV,BIKE,MOBILE Mark TV,BIKE,BIKE,MOBILE
Scenario/Pattern 1- Data Format = CSV
Now if we have to look out for buying pattern like 'TV,BIKE,MOBILE,CAR', then would straight away jump to regular expressions,
SQL
with tbl_reg as ( select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual union select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual union select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual ) SELECT * FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(Buying_Pattern,',',' '),'TV BIKE MOBILE CAR'); CUSTOM BUYING_PATTERN ------ ------------------- Linda TV,BIKE,MOBILE,CAR
With proper Data Modeling in place the data is not gonna be stored in CSV format, so let us re-iterate the requirement to identifying the pattern
Scenario/Pattern 1 - Data Format = Rows
With Anaytical function - LEAD the same can be achived
SQL
WITH PRDT_ORDER AS (SELECT BP.* ,LEAD(PRODUCT) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) NEXT_PRDT ,LEAD(PRODUCT,2) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) SECOND_NEXT_PRDT ,LEAD(PRODUCT,3) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) THIRD_NEXT_PRDT FROM BUYING_PATTERN BP ) SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER WHERE PRODUCT='TV' AND NEXT_PRDT = 'BIKE' AND SECOND_NEXT_PRDT = 'MOBILE' AND THIRD_NEXT_PRDT = 'CAR'; CUSTOMER BUY_DT PRODUCT -------- ------------------------------ ----------- LINDA 15-JUN-16 07.51.48.812559 AM TV
Scenario 2 - Pattern 1 + Pattern 2 - Data Format = CSV
If we want to look out for another pattern along with existing one , .ie along with pattern 1 = TV,BIKE,MOBILE,CAR, check for pattern 2 = TV,BIKE,BIKE,MOBILE it is just a simple change to the existing SQL
with tbl_reg as ( select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual union select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual union select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual ) SELECT * FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(Buying_Pattern,',',' '),'TV BIKE MOBILE CAR|TV BIKE BIKE MOBILE'); CUSTOM BUYING_PATTERN ------ ------------------- Linda TV,BIKE,MOBILE,CAR Mark TV,BIKE,BIKE,MOBILE
Now the same in rows
Scenario 2 - Pattern 1+Pattern 2 - Data Format = Rows
pattern 1 = TV,BIKE,MOBILE,CAR,
pattern 2 = TV,BIKE,BIKE,MOBILE
WITH PRDT_ORDER AS (SELECT BP.* ,LEAD(PRODUCT) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) NEXT_PRDT ,LEAD(PRODUCT,2) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) SECOND_NEXT_PRDT ,LEAD(PRODUCT,3) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) THIRD_NEXT_PRDT FROM BUYING_PATTERN BP ) SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER WHERE PRODUCT='TV' AND NEXT_PRDT = 'BIKE' AND SECOND_NEXT_PRDT = 'MOBILE' AND THIRD_NEXT_PRDT = 'CAR' union SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER WHERE PRODUCT='TV' AND NEXT_PRDT = 'BIKE' AND SECOND_NEXT_PRDT = 'BIKE' AND THIRD_NEXT_PRDT = 'MOBILE'; USTOMER BUY_DT PRODUCT ----------- ----------------------------------- --------------------- LINDA 15-JUN-16 07.51.48.812559 AM TV MARK 15-JUN-16 07.51.48.812559 AM TV
Scenario 3 - Pattern 3 - Data Format = CSV
Pattern 3 is some like when you think sky is the limit i.e really not sure how many of them could repeat within a pattern. For an example - All 1,2,3 from the below picture are of same pattern XYZAwith tbl_reg as ( select 'Linda' as customer, 'TV,BIKE,MOBILE,CAR' as Buying_Pattern from dual union select 'Lorena' as customer, 'TV,BIKE,MOBILE' as Buying_Pattern from dual union select 'Mark' as customer, 'TV,BIKE,BIKE,MOBILE' as Buying_Pattern from dual union select 'RAJINI' as customer, 'TV,TV,BIKE,BIKE,BIKE,MOBILE,CAR' as Buying_Pattern from dual ) SELECT customer, buying_pattern FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(Buying_Pattern,','),'(TV(.*)BIKE(.*)MOBILE(.*)CAR(.*))'); CUSTOM BUYING_PATTERN ------ ------------------------------- Linda TV,BIKE,MOBILE,CAR RAJINI TV,TV,BIKE,BIKE,BIKE,MOBILE,CAR
Scenario 3 - Pattern 3 - Data Format = ROWS
We have two sets of data, one has a fixed pattern while other has got a subset, ie, Looking out for buying pattern, matching the order - "TV(.*) BIKE(.*) MOBILE CAR". Both 1 and 2 in the below picture belong to the same pattern "TV BIKE MOBILE CAR"Untill 11g if this has to be written in a single SQL, it becomes pretty tough to have it in a single optimal SQL. Oracle 12c - Pattern Matching has got the answer, when you want to perform a pattern recognition in a sequence of rows - it is MATCH_RECOGNIZE
Here it is
SELECT * FROM BUYING_PATTERN MATCH_RECOGNIZE ( PARTITION BY CUSTOMER ORDER BY BUY_DT MEASURES FIRST(A.PRODUCT) first_ITEM, FIRST(A.BUY_DT) first_ITEM_DT, NVL(LAST(C.PRODUCT),LAST(B.PRODUCT)) LAST_ITEM_BEFORE_X, LAST(D.PRODUCT) LAST_ITEM, LAST(D.BUY_DT) LAST_ITEM_DT ,COUNT(*) CNT one ROW PER MATCH PATTERN (A* B* C D) DEFINE A AS A.PRODUCT='TV', B AS B.PRODUCT='BIKE', C AS C.PRODUCT='MOBILE', D AS D.PRODUCT='CAR' ); CUSTOMER FIRST_ITEM FIRST_ITEM_DT LAST_ITEM_BEFORE_X LAST_ITEM LAST_ITEM_DT CNT ----------- ------------------ ------------------------- ------------------ -------------- -------------------------------- ---- LINDA TV 15-JUN-16 07.51.48.812559 AM MOBILE CAR 15-JUN-16 07.54.48.812559 AM 4 RAJINI TV 15-JUN-16 03.22.24.354595 PM MOBILE CAR 15-JUN-16 03.25.24.354595 PM 7
Once we know why it is required, where we can apply, we are half way through in Pattern Matching - MATCH_RECOGNIZE. Rest to follow in upcoming post.