Re: Auto stats gathering is not sufficient - what now?

From: Kellyn Pedersen <kjped1313_at_yahoo.com>
Date: Wed, 9 Feb 2011 12:32:42 -0800 (PST)
Message-ID: <401840.79944.qm_at_web120205.mail.ne1.yahoo.com>


Professionally speaking, I think you have listed a number of my reasons for 
moving away from the auto stats job for a specific object right here.  If the 
default job can not collect statistics often enough of the default choices are 
too large for the time allotment given to grant the CBO the information it 
requires to perform well for your customers.

Since pretty much all my environments are over 1TB, I have a preference for 
disabling the default auto stats job and running my own statistics gathering job 
based off the DBA_TAB_MODIFICATIONS table.  This will allow you to set small 
sample sizes and method options that make sense for your environment, (giving 
you more definitive control) and allowing you to set different objects, schemas, 
etc. to only update stats which are stale, nothing else is touched...)

The one thing you also need to take into consideration is how often you need 
updated stats.  I am a strong advocate in an environment that contains staging 
or reporting objects that there are times that the DBA is not the one who needs 
to be collecting statistics, but the developer is in their code.  If you have 
objects that are only loaded once per night and utilized once per night with 
10%> changes, then it might be times to start training the developers and 
working with them to take statistics when they matter most.  Developers want 
their code to run efficiently and I have not found a developer yet who did not 
appreciate a first class lesson in the CBO and stats collection.

If it makes sense to move the stats collection into the procedural code, I have 
them move it into the code.  I can not help them out if I collect stats AFTER 
they need really need it.

Let me know if you are interested, I'd be happy to send you my shell script to 
collect stats based on the DBA_TAB_MODIFICATIONS table-  you are welcome to it..


Kellyn Pedersen
Multi-Platform Database Administrator
www.pythian.com
http://www.linkedin.com/in/kellynpedersen
www.dbakevlar.com
 




________________________________
From: TJ Kiernan <tkiernan_at_pti-nps.com>
To: oracle-l_at_freelists.org
Cc: TJ Kiernan <tkiernan_at_pti-nps.com>
Sent: Wed, February 9, 2011 12:50:07 PM
Subject: Auto stats gathering is not sufficient - what now?


10.2.0.3 on 32-bit Windows 2003 R2
I have a reporting table that is too large to grow by 10% very often - 66 
million records growing by 250k per week = 24 weeks before stats go stale and 
are gathered, meanwhile queries against relatively recent data (last month, last 
quarter) get horrible execution plans unless we hint them.  For instance, from 
the example below, we have an index on (GROUP_KEY, DATE_PROCESSED) that would 
return this query in <1 second.  

 
If my predicate values were in range of the statistics, then I expect to get 
better plans, so the first thing I’m considering is a periodic job (probably 
weekly) to gather stats on this table.  

 
My question: What sorts of considerations should I make when setting up a 
non-standard stats gathering job?  Particularly METHOD_OPT, but with other 
parameters as well, what prompts you to step away from defaults?
 
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  0udsqttt83syw, child number 0
-------------------------------------
  SELECT /*+ gather_plan_statistics */        
         field1,
         field2,
         DATE_PROCESSED   
    FROM REPORTING_TABLE  
   WHERE GROUP_KEY = 1234 
     AND DATE_PROCESSED > to_date('25-DEC-2010', 'DD-MON-YYYY')
ORDER BY GROUP_KEY, DATE_PROCESSED
 
Plan hash value: 3444608443
 
--------------------------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name              | Starts | E-Rows | 
A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------

|*  1 |  TABLE ACCESS BY INDEX ROWID| REPORTING_TABLE   |      1 |      1 |     
28 |00:00:05.84 |     617K|    148K|
|*  2 |   INDEX RANGE SCAN          | RT_DATE_IDX       |      1 |      2 |   
1599K|00:00:28.81 |    6065 |   5828 |
--------------------------------------------------------------------------------------------------------------------

 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("GROUP_KEY"=1234)
   2 - access("DATE_PROCESSED">TO_DATE('2010-12-25 00:00:00', 'yyyy-mm-dd 
hh24:mi:ss'))
       filter("DATE_PROCESSED">TO_DATE('2010-12-25 00:00:00', 'yyyy-mm-dd 
hh24:mi:ss'))
 
****10053 trace*****
  Access Path: index (RangeScan)
    Index: RT_DATE_IDX
    resc_io: 4.00  resc_cpu: 31027
    ix_sel: 2.4406e-008  ix_sel_with_filters: 2.4406e-008
    Cost: 4.01  Resp: 4.01  Degree: 1
  Using prorated density: 2.4406e-008 of col #2 as selectivity of out-of-range 
value pred
  Using prorated density: 2.4406e-008 of col #2 as selectivity of out-of-range 
value pred
  Access Path: index (RangeScan)
    Index: RT_GROUP_DP_IDX
    resc_io: 5.00  resc_cpu: 36837
    ix_sel: 3.9615e-010  ix_sel_with_filters: 3.9615e-010
    Cost: 5.01  Resp: 5.01  Degree: 1
  Using prorated density: 2.4406e-008 of col #2 as selectivity of out-of-range 
value pred
 
T. J. Kiernan
Database Administrator
Pharmaceutical Technologies, Inc.
(402) 965-8800 ext 1039
tkiernan_at_pti-nps.com


      
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 09 2011 - 14:32:42 CST

Original text of this message