Collecting statistics for large established database

From: vsevolod afanassiev <vsevolod.afanassiev_at_gmail.com>
Date: Wed, 19 Aug 2009 04:25:31 -0700 (PDT)
Message-ID: <135bad08-1a1b-4e45-bc04-bf5a2446a9ca_at_q40g2000prh.googlegroups.com>



Let's say we have a relatively large database (2 TB or more). It is being used by an application that went live more than 3 years ago, so historic (transactional) tables contain more than 1000 days worth of data. There has been no changes to the table structure, i.e. all tables are the same as 3 years ago. The tables are not partitioned. This is classic OLTP application where most tables get populated by online users (call centre staff). One year we may have 1000 people in call centres, another year we may have 2000, but we are not going to have one million. So transactional tables experience linear growth.

Question: why do we need to collect table/index statistics on a regular basis (i.e. daily or weekly?) At the end of the day the optimizer does not care about number of records in a table, only relative values are important. Let's say we have table A with 1,000,000 rows and table B with 1,000 rows. One year later table A may have 1,500,000 rows and table B may have 2,000 rows. It shouln't have any impact on the plan. Yes it is possible that as a result of table growth a different plan may become slightly more efficient, but in real-life scenario plan stability is more important.

I think the only situation where regular statistics collection is important is when plan depends on MIN or MAX value in a column. A purge job it may be executing something like

DELETE FROM TABLEA WHERE CREATE_DATE < SYSDATE - 180

If column CREATE_DATE is indexed then Oracle needs to know MIN (CREATE_DATE) to decide whether to use this index or not. But this is rare situation. It may be simple to add hints to a few statements that rely on MIN / MAX value than collect statistics every day and risk plan instability.

In 10g default job for statistics collection relies on table monitoring. Tables get analyzed when total number of inserted / updated / deleted rows exceeds 10%. There are two problems with this approach: (1) it makes statistics collection unpredictable from DBA perspective (2) the statement updating internal table OBJECT_USAGE is responsible for significant percentage (around 30%) of soft parses.

It seems to me that for established applications the best approach is to stop statistics collection (except for partitioned tables where new partitions are created every day/week/month etc). It will guarantee plan stability. It is possible that performance of some statements will deteriorate, but it will happen gradually. With regular statistics collection there is always treat of optimizer choosing suboptimal  plan. I have seen quite a few of them, typically it involves switch from INDEX RANGE SCAN to FAST FULL INDEX SCAN, switch from nested loops to hash join, etc - often much slower than the original plan. Received on Wed Aug 19 2009 - 06:25:31 CDT

Original text of this message