Wednesday, September 26, 2012

Extended Statistics in Oracle 11g

I was not really planning to write about extended stats, it's all started with Full table scan then was trying to replace full table scan with an index and finally ended with extended stats due to the wrong cardinality estimate :)

Before we get into oracle extended statistics let check few cardinality issues when my data is skewed.

Lets start with the below demo
SQL> CREATE TABLE tbl_fts_chk
  2    (
  3      id           NUMBER,
  4      id_val       VARCHAR2(10),
  5      location_val VARCHAR2(10)
  6    );

Table created.

Elapsed: 00:00:00.32
SQL> BEGIN
  2    FOR x IN 1..1000
  3    LOOP
  4      INSERT INTO tbl_fts_chk
  5      SELECT *
  6      FROM
  7        ( WITH tbl_temp AS
  8        (SELECT rownum AS id,
  9          'ORACLE'
 10          ||rownum AS id_val,
 11          'location_'
 12          ||chr(rownum+64) AS location_val
 13        FROM dual
 14          CONNECT BY level < 11
 15        ),
 16        tbl_temp1 AS
 17        (SELECT rownum AS id,
 18          'ORACLE'
 19          ||rownum AS id_val,
 20          'location_'
 21          ||chr(rownum+64) AS location_val
 22        FROM dual
 23          CONNECT BY level < 11
 24        ORDER BY id DESC
 25        )
 26      SELECT t1.id,
 27        t1.id_val,
 28        t1.location_val
 29      FROM tbl_temp t1,
 30        tbl_temp1 t2
 31      WHERE t1.id > t2.id
 32      UNION ALL
 33      SELECT t1.id,
 34        t1.id_val,
 35        t1.location_val
 36      FROM tbl_temp1 t1,
 37        tbl_temp1 t2
 38      WHERE t2.id > t1.id
 39        );
 40    END LOOP;
 41    COMMIT;
 42  END;
 43  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.77
SQL> select count(*) from tbl_fts_chk;

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
     90000                                                                                                                                            

Elapsed: 00:00:00.01
----------------
SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.23


    Note the estimate percent is AUTO SAMPLE SIZE. The below simple query triggers the doubt and lets check whether do we get any merit for investigation,
SQL> set autotrace traceonly
SQL> 
SQL> select * from tbl_fts_chk where id = 5 and id_val = 'ORACLE5';

9000 rows selected.

Elapsed: 00:00:00.12

Execution Plan
----------------------------------------------------------
Plan hash value: 2042300665

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   900 | 19800 |   103   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |   900 | 19800 |   103   (1)| 00:00:02 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=5 AND "ID_VAL"='ORACLE5')


Statistics
----------------------------------------------------------
        275  recursive calls
          0  db block gets
       1005  consistent gets
        378  physical reads
          0  redo size
     121681  bytes sent via SQL*Net to client
       7008  bytes received via SQL*Net from client
        601  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
       9000  rows processed

   Why is my cardinality just 900 instead of being 9000. Which means it is calculated as 1 percent but actually it has to be 10 percent of total number of records (90,000).

The math here is

Step1 :
Column - ID         - Has 10 distinct Values
SQL> select count(*),ID from tbl_fts_chk group by ID;

  COUNT(*)         ID
---------- ----------
      9000          1
      9000          6
      9000          2
      9000          4
      9000          5
      9000          8
      9000          3
      9000          7
      9000          9
      9000         10

10 rows selected.

Elapsed: 00:00:00.23


Column - ID_Val  - Has 10 distinct Values
SQL> select count(*),ID_VAL from tbl_fts_chk group by ID_VAL;

  COUNT(*) ID_VAL
---------- ----------
      9000 ORACLE7
      9000 ORACLE8
      9000 ORACLE6
      9000 ORACLE2
      9000 ORACLE4
      9000 ORACLE10
      9000 ORACLE3
      9000 ORACLE5
      9000 ORACLE9
      9000 ORACLE1

10 rows selected.

Elapsed: 00:00:00.06


Step2
Given value for ID = 5 and for ID_Val = ORACLE5, which means 1 out of 10 is being retrieved out of total number of records

Step 3
Number Of records * Column ID selectivity * Column ID_Val Selectivity
90000 * .1 * .1 = 900

Oracle hasn't considered the actual repetition of data. I retrieve only 10 percent of the data. Can index help me here ?, lets check
SQL> create index idx_fts_chk1 on tbl_fts_chk(id,id_val);

Index created.

Elapsed: 00:00:00.25
SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.95

Executing the query again
SQL> variable a1 number;
SQL> exec :a1:=5;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL> variable a2 varchar2(10);
SQL> 
SQL> exec :a2:='ORACLE5';

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL> 
SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2;

9000 rows selected.

Elapsed: 00:00:00.23

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2042300665                                                                                                                           
                                                                                                                                                      
---------------------------------------------------------------------------------                                                                     
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                                                     
---------------------------------------------------------------------------------                                                                     
|   0 | SELECT STATEMENT  |             |  9000 |   193K|   103   (1)| 00:00:02 |                                                                     
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |  9000 |   193K|   103   (1)| 00:00:02 |                                                                     
---------------------------------------------------------------------------------                                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1))                                                                                         


Statistics
----------------------------------------------------------                                                                                            
        367  recursive calls                                                                                                                          
          0  db block gets                                                                                                                            
       1024  consistent gets                                                                                                                          
        380  physical reads                                                                                                                           
          0  redo size                                                                                                                                
     121681  bytes sent via SQL*Net to client                                                                                                         
       7008  bytes received via SQL*Net from client                                                                                                   
        601  SQL*Net roundtrips to/from client                                                                                                        
          6  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
       9000  rows processed
Now After adding Index, Oracle is able to provide the correct cardinality -CARD-9000 i.e 10 percent of the total number of records in the table. Even though i retrive only one tenth of records my index is not used, it is a full table scan.  This is different topic by itself,  lets get into cardinality issue in this blog. Though being a full table scan, posting adding an Index optimizer is able to provide the correct cardinality, Does this mean that the optimizer for full table scan uses the Index stats on avalibility ? If that is the case what happens if i drop the index ?         
SQL> drop index idx_fts_chk1;

Index dropped.

Elapsed: 00:00:00.46
SQL>
SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.15
SQL>
SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2;

9000 rows selected.

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2042300665                                                                                                                           
                                                                                                                                                      
---------------------------------------------------------------------------------                                                                     
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                                                     
---------------------------------------------------------------------------------                                                                     
|   0 | SELECT STATEMENT  |             |   900 | 19800 |   103   (1)| 00:00:02 |                                                                     
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |   900 | 19800 |   103   (1)| 00:00:02 |                                                                     
---------------------------------------------------------------------------------                                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1))                                                                                         


Statistics
----------------------------------------------------------                                                                                            
        169  recursive calls                                                                                                                          
          0  db block gets                                                                                                                            
        990  consistent gets                                                                                                                          
          0  physical reads                                                                                                                           
          0  redo size                                                                                                                                
     121681  bytes sent via SQL*Net to client                                                                                                         
       7008  bytes received via SQL*Net from client                                                                                                   
        601  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
       9000  rows processed
 
Oh, back to 900 (1 percent of records) in the cardinality, Thus it is very evident that Oracle Full table scan uses Index stats to provide a better cardinality estimate. Now the question here is should i go for Index to help the optimizer to provide the correct cardinality estimate, Can't i just provide more details about these two columns ID and ID_VAL to the optimizer so that it helps in estimating the correct cardinality ?

           How should i inform the optimizer about these two columns? And that's all about Extended Statistics in Oracle 11g. First lets create the additional stats on the required columns using DBMS_STATS 
SQL> 
SQL> SELECT dbms_stats.create_extended_stats(OWNNAME=>'HARI',TABNAME=>'TBL_FTS_CHK',EXTENSION=>'(ID,ID_VAL)') FROM DUAL;

DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'HARI',TABNAME=>'TBL_FTS_CHK',EXTENSION=>'(ID,ID_VAL)')                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STUSBSF29XATPMX0WFZ7WKDVIV                                                                                                                        

Elapsed: 00:00:00.21


Collecting stats on this table,  
SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.95

          Note the dbms_stats is executed with default method_opt option - FOR ALL COLUMNS SIZE AUTO and auto sampling size. Now executing the query again
SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2;

9000 rows selected.

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2042300665                                                                                                                           
                                                                                                                                                      
---------------------------------------------------------------------------------                                                                     
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                                                     
---------------------------------------------------------------------------------                                                                     
|   0 | SELECT STATEMENT  |             |   900 | 20700 |   103   (1)| 00:00:02 |                                                                     
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |   900 | 20700 |   103   (1)| 00:00:02 |                                                                     
---------------------------------------------------------------------------------                                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1))                                                                                         


Statistics
----------------------------------------------------------                                                                                            
          8  recursive calls                                                                                                                          
          0  db block gets                                                                                                                            
        974  consistent gets                                                                                                                          
          0  physical reads                                                                                                                           
          0  redo size                                                                                                                                
     121681  bytes sent via SQL*Net to client                                                                                                         
       7008  bytes received via SQL*Net from client                                                                                                   
        601  SQL*Net roundtrips to/from client                                                                                                        
          0  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
       9000  rows processed                                                                                                                           

    oops still not the correct cardinality, lets try providing the literal instead of binds,


SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = 5 AND T2.ID_VAL = 'ORACLE5';

9000 rows selected.

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2042300665                                                                                                                           
                                                                                                                                                      
---------------------------------------------------------------------------------                                                                     
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                                                     
---------------------------------------------------------------------------------                                                                     
|   0 | SELECT STATEMENT  |             |   993 | 22839 |   103   (1)| 00:00:02 |                                                                     
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |   993 | 22839 |   103   (1)| 00:00:02 |                                                                     
---------------------------------------------------------------------------------                                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   1 - filter("T2"."ID"=5 AND "T2"."ID_VAL"='ORACLE5')                                                                                                


Statistics
----------------------------------------------------------                                                                                            
          8  recursive calls                                                                                                                          
          0  db block gets                                                                                                                            
        974  consistent gets                                                                                                                          
          0  physical reads                                                                                                                           
          0  redo size                                                                                                                                
     121681  bytes sent via SQL*Net to client                                                                                                         
       7008  bytes received via SQL*Net from client                                                                                                   
        601  SQL*Net roundtrips to/from client                                                                                                        
          0  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
       9000  rows processed                                                                              

Not yet, how about collecting stats with appropriate columns stats - method_opt => for all columns size 254

SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'TBL_FTS_CHK', estimate_percent => NULL,method_opt => 'for all columns size 254');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.27
SQL> 
SQL> 
SQL> 
SQL> 
SQL> 
SQL> exec :a1:=5;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL> exec :a2:='ORACLE5';

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL> 
SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = :a1 AND T2.ID_VAL = :a2;

9000 rows selected.

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2042300665                                                                                                                           
                                                                                                                                                      
---------------------------------------------------------------------------------                                                                     
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                                                     
---------------------------------------------------------------------------------                                                                     
|   0 | SELECT STATEMENT  |             |  9000 |   202K|   103   (1)| 00:00:02 |                                                                     
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |  9000 |   202K|   103   (1)| 00:00:02 |                                                                     
---------------------------------------------------------------------------------                                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   1 - filter("T2"."ID_VAL"=:A2 AND "T2"."ID"=TO_NUMBER(:A1))                                                                                         


Statistics
----------------------------------------------------------                                                                                            
          0  recursive calls                                                                                                                          
          0  db block gets                                                                                                                            
        972  consistent gets                                                                                                                          
          0  physical reads                                                                                                                           
          0  redo size                                                                                                                                
     121681  bytes sent via SQL*Net to client                                                                                                         
       7008  bytes received via SQL*Net from client                                                                                                   
        601  SQL*Net roundtrips to/from client                                                                                                        
          0  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
       9000  rows processed                                                                                                                           

SQL> SELECT * FROM TBL_FTS_CHK T2 WHERE T2.ID = 5 AND T2.ID_VAL = 'ORACLE5';

9000 rows selected.

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2042300665                                                                                                                           
                                                                                                                                                      
---------------------------------------------------------------------------------                                                                     
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                                                     
---------------------------------------------------------------------------------                                                                     
|   0 | SELECT STATEMENT  |             |  9000 |   202K|   103   (1)| 00:00:02 |                                                                     
|*  1 |  TABLE ACCESS FULL| TBL_FTS_CHK |  9000 |   202K|   103   (1)| 00:00:02 |                                                                     
---------------------------------------------------------------------------------                                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   1 - filter("T2"."ID"=5 AND "T2"."ID_VAL"='ORACLE5')                                                                                                


Statistics
----------------------------------------------------------                                                                                            
          0  recursive calls                                                                                                                          
          0  db block gets                                                                                                                            
        972  consistent gets                                                                                                                          
          0  physical reads                                                                                                                           
          0  redo size                                                                                                                                
     121681  bytes sent via SQL*Net to client                                                                                                         
       7008  bytes received via SQL*Net from client                                                                                                   
        601  SQL*Net roundtrips to/from client                                                                                                        
          0  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
       9000  rows processed                                             


Finally the optimizer is now able to provide the correct cardinality and has extended stats for the correlation between these two columns ID and ID_VAL.And yes Oracle uses Index (stats) for Full table scan to provide better cardinality estimates.

No comments:

Post a Comment