I was not really planning to write about extended stats, it's all started with Full table scan then was trying to replace full table scan with an index and finally ended with extended stats due to the wrong cardinality estimate :)
Before we get into oracle extended statistics let check few cardinality issues when my data is skewed.
Before we get into oracle extended statistics let check few cardinality issues when my data is skewed.
Lets start with the below demo,
SQL> CREATE TABLE tbl_fts_chk 2 ( 3 id NUMBER, 4 id_val VARCHAR2(10), 5 location_val VARCHAR2(10) 6 ); Table created. Elapsed: 00:00:00.32 SQL> BEGIN 2 FOR x IN 1..1000 3 LOOP 4 INSERT INTO tbl_fts_chk 5 SELECT * 6 FROM 7 ( WITH tbl_temp AS 8 (SELECT rownum AS id, 9 'ORACLE' 10 ||rownum AS id_val, 11 'location_' 12 ||chr(rownum+64) AS location_val 13 FROM dual 14 CONNECT BY level < 11 15 ), 16 tbl_temp1 AS 17 (SELECT rownum AS id, 18 'ORACLE' 19 ||rownum AS id_val, 20 'location_' 21 ||chr(rownum+64) AS location_val 22 FROM dual 23 CONNECT BY level < 11 24 ORDER BY id DESC 25 ) 26 SELECT t1.id, 27 t1.id_val, 28 t1.location_val 29 FROM tbl_temp t1, 30 tbl_temp1 t2 31 WHERE t1.id > t2.id 32 UNION ALL 33 SELECT t1.id, 34 t1.id_val, 35 t1.location_val 36 FROM tbl_temp1 t1, 37 tbl_temp1 t2 38 WHERE t2.id > t1.id 39 ); 40 END LOOP; 41 COMMIT; 42 END; 43 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.77 SQL> select count(*) from tbl_fts_chk; COUNT(*) ---------- 90000 Elapsed: 00:00:00.01 ---------------- SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE); PL/SQL procedure successfully completed. Elapsed: 00:00:01.23
Note the estimate percent is AUTO SAMPLE SIZE.
The below simple query triggers the doubt and lets check whether do we get any merit for investigation,
SQL> set autotrace traceonly SQL> SQL> select * from tbl_fts_chk where id = 5 and id_val = 'ORACLE5'; 9000 rows selected. Elapsed: 00:00:00.12 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 900 | 19800 | 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 900 | 19800 | 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"=5 AND "ID_VAL"='ORACLE5') Statistics ---------------------------------------------------------- 275 recursive calls 0 db block gets 1005 consistent gets 378 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 9000 rows processed
Why is my cardinality just 900 instead of being 9000. Which means it is calculated as 1 percent but actually it has to be 10 percent of total number of records (90,000).
The math here is
Step1 :
Column - ID - Has 10 distinct Values
SQL> select count(*),ID from tbl_fts_chk group by ID; COUNT(*) ID ---------- ---------- 9000 1 9000 6 9000 2 9000 4 9000 5 9000 8 9000 3 9000 7 9000 9 9000 10 10 rows selected. Elapsed: 00:00:00.23
Column - ID_Val - Has 10 distinct Values
SQL> select count(*),ID_VAL from tbl_fts_chk group by ID_VAL; COUNT(*) ID_VAL ---------- ---------- 9000 ORACLE7 9000 ORACLE8 9000 ORACLE6 9000 ORACLE2 9000 ORACLE4 9000 ORACLE10 9000 ORACLE3 9000 ORACLE5 9000 ORACLE9 9000 ORACLE1 10 rows selected. Elapsed: 00:00:00.06
Step2
Given value for ID = 5 and for ID_Val = ORACLE5, which means 1 out of 10 is being retrieved out of total number of records
Step 3
Number Of records * Column ID selectivity * Column ID_Val Selectivity
90000 * .1 * .1 = 900
Oracle hasn't considered the actual repetition of data. I retrieve only 10 percent of the data. Can index help me here ?, lets check
SQL> create index idx_fts_chk1 on tbl_fts_chk(id,id_val); Index created. Elapsed: 00:00:00.25 SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE); PL/SQL procedure successfully completed. Elapsed: 00:00:00.95
Executing the query again
SQL> variable a1 number; SQL> exec :a1:=5; PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 SQL> variable a2 varchar2(10); SQL> SQL> exec :a2:='ORACLE5'; PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 SQL> SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2; 9000 rows selected. Elapsed: 00:00:00.23 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9000 | 193K| 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 9000 | 193K| 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1)) Statistics ---------------------------------------------------------- 367 recursive calls 0 db block gets 1024 consistent gets 380 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 6 sorts (memory) 0 sorts (disk) 9000 rows processed
Now After adding Index, Oracle is able to provide the correct cardinality -CARD-9000 i.e 10 percent of the total number of records in the table. Even though i retrive only one tenth of records my index is not used, it is a full table scan. This is different topic by itself, lets get into cardinality issue in this blog. Though being a full table scan, posting adding an Index optimizer is able to provide the correct cardinality, Does this mean that the optimizer for full table scan uses the Index stats on avalibility ? If that is the case what happens if i drop the index ?
SQL> drop index idx_fts_chk1; Index dropped. Elapsed: 00:00:00.46 SQL> SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE); PL/SQL procedure successfully completed. Elapsed: 00:00:03.15 SQL> SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2; 9000 rows selected. Elapsed: 00:00:00.07 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 900 | 19800 | 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 900 | 19800 | 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1)) Statistics ---------------------------------------------------------- 169 recursive calls 0 db block gets 990 consistent gets 0 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 9000 rows processed
Oh, back to 900 (1 percent of records) in the cardinality, Thus it is very evident that Oracle Full table scan uses Index stats to provide a better cardinality estimate. Now the question here is should i go for Index to help the optimizer to provide the correct cardinality estimate, Can't i just provide more details about these two columns ID and ID_VAL to the optimizer so that it helps in estimating the correct cardinality ?
How should i inform the optimizer about these two columns? And that's all about Extended Statistics in Oracle 11g. First lets create the additional stats on the required columns using DBMS_STATS
SQL> SQL> SELECT dbms_stats.create_extended_stats(OWNNAME=>'HARI',TABNAME=>'TBL_FTS_CHK',EXTENSION=>'(ID,ID_VAL)') FROM DUAL; DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'HARI',TABNAME=>'TBL_FTS_CHK',EXTENSION=>'(ID,ID_VAL)') ------------------------------------------------------------------------------------------------------------------------------------------------------ SYS_STUSBSF29XATPMX0WFZ7WKDVIV Elapsed: 00:00:00.21
Collecting stats on this table,
SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE); PL/SQL procedure successfully completed. Elapsed: 00:00:00.95
Note the dbms_stats is executed with default method_opt option - FOR ALL COLUMNS SIZE AUTO and auto sampling size. Now executing the query again
SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2; 9000 rows selected. Elapsed: 00:00:00.07 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 900 | 20700 | 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 900 | 20700 | 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1)) Statistics ---------------------------------------------------------- 8 recursive calls 0 db block gets 974 consistent gets 0 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9000 rows processed
oops still not the correct cardinality, lets try providing the literal instead of binds,
SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = 5 AND T2.ID_VAL = 'ORACLE5'; 9000 rows selected. Elapsed: 00:00:00.07 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 993 | 22839 | 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 993 | 22839 | 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T2"."ID"=5 AND "T2"."ID_VAL"='ORACLE5') Statistics ---------------------------------------------------------- 8 recursive calls 0 db block gets 974 consistent gets 0 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9000 rows processed
Not yet, how about collecting stats with appropriate columns stats - method_opt => for all columns size 254
SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => NULL,method_opt => 'for all columns size 254'); PL/SQL procedure successfully completed. Elapsed: 00:00:01.27 SQL> SQL> SQL> SQL> SQL> SQL> exec :a1:=5; PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 SQL> exec :a2:='ORACLE5'; PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 SQL> SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2; 9000 rows selected. Elapsed: 00:00:00.07 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9000 | 202K| 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 9000 | 202K| 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1)) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 972 consistent gets 0 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9000 rows processed SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = 5 AND T2.ID_VAL = 'ORACLE5'; 9000 rows selected. Elapsed: 00:00:00.07 Execution Plan ---------------------------------------------------------- Plan hash value: 2042300665 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9000 | 202K| 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL_FTS_CHK | 9000 | 202K| 103 (1)| 00:00:02 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T2"."ID"=5 AND "T2"."ID_VAL"='ORACLE5') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 972 consistent gets 0 physical reads 0 redo size 121681 bytes sent via SQL*Net to client 7008 bytes received via SQL*Net from client 601 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9000 rows processed
Finally the optimizer is now able to provide the correct cardinality and has extended stats for the correlation between these two columns ID and ID_VAL.And yes Oracle uses Index (stats) for Full table scan to provide better cardinality estimates.