Re: Collecting statistics for large established database

From: John Hurley <johnbhurley_at_sbcglobal.net>
Date: Wed, 19 Aug 2009 04:32:09 -0700 (PDT)
Message-ID: <620d7e2f-30f6-4555-a769-fda91f055da7_at_a13g2000yqc.googlegroups.com>



On Aug 19, 7:25 am, vsevolod afanassiev
<vsevolod.afanass..._at_gmail.com> wrote:

snip

> Let's say we have a relatively large database (2 TB or more). It is
> being used by an application that went live more than 3 years ago, so
> historic (transactional) tables contain more than 1000 days worth of
> data. There has been no changes to the table structure, i.e. all
> tables are the same as 3 years ago. The tables are not partitioned.
> This is classic OLTP application where most tables get populated by
> online users (call centre staff). One year we may have 1000 people in
> call centres, another year we may have 2000, but we are not going to
> have one million. So transactional tables experience linear growth.
>
> Question: why do we need to collect table/index statistics on a
> regular basis (i.e. daily or weekly?) At the end of the day the
> optimizer does not care about number of records in a table, only
> relative values are important. Let's say we have table A with
> 1,000,000 rows and table B with 1,000 rows. One year later  table A
> may have 1,500,000 rows and table B may have 2,000 rows. It shouln't
> have any impact on the plan. Yes it is possible that as a result of
> table growth a different plan may become slightly more efficient, but
> in real-life scenario plan stability is more important.
>
> I think the only situation where regular statistics collection is
> important is when plan depends on MIN or MAX value in a column. A
> purge job it may be executing something like
>
> DELETE FROM TABLEA WHERE CREATE_DATE < SYSDATE - 180
>
> If column CREATE_DATE is indexed then Oracle needs to know MIN
> (CREATE_DATE) to decide whether to use this index or not. But this is
> rare situation. It may be simple to add hints to a few statements that
> rely on MIN / MAX value than collect statistics every day and risk
> plan instability.
>
> In 10g default job for statistics collection relies on table
> monitoring. Tables get analyzed when total number of inserted /
> updated / deleted rows exceeds 10%. There are two problems with this
> approach: (1) it makes statistics collection unpredictable from DBA
> perspective (2) the statement updating internal table OBJECT_USAGE is
> responsible for significant percentage (around 30%) of soft parses.
>
> It seems to me that for established applications the best approach is
> to stop statistics collection (except for partitioned tables where new
> partitions are created every day/week/month etc). It will guarantee
> plan stability. It is possible that performance of some statements
> will deteriorate, but it will happen gradually. With regular
> statistics collection there is always treat of optimizer choosing sub-
> optimal plan. I have seen quite a few of them, typically it involves
> switch from INDEX RANGE SCAN to FAST FULL INDEX SCAN, switch from
> nested loops to hash join, etc - often much slower than the original
> plan.

Take a look at dbms_stats.set_param('AUTOSTATS_TARGET','ORACLE') ... Received on Wed Aug 19 2009 - 06:32:09 CDT

Original text of this message