Re: Collecting statistics for large established database

From: Shakespeare <whatsin_at_xs4all.nl>
Date: Wed, 19 Aug 2009 18:44:39 +0200
Message-ID: <4a8c2bfc$0$184$e4fe514c_at_news.xs4all.nl>



Helma schreef:
> On Aug 19, 1:25 pm, vsevolod afanassiev
> <vsevolod.afanass..._at_gmail.com> wrote:

>> Let's say we have a relatively large database (2 TB or more). It is
>> being used by an application that went live more than 3 years ago, so
>> historic (transactional) tables contain more than 1000 days worth of
>> data. There has been no changes to the table structure, i.e. all
>> tables are the same as 3 years ago. The tables are not partitioned.
>> This is classic OLTP application where most tables get populated by
>> online users (call centre staff). One year we may have 1000 people in
>> call centres, another year we may have 2000, but we are not going to
>> have one million. So transactional tables experience linear growth.
>>
>> Question: why do we need to collect table/index statistics on a
>> regular basis (i.e. daily or weekly?) At the end of the day the
>> optimizer does not care about number of records in a table, only
>> relative values are important. Let's say we have table A with
>> 1,000,000 rows and table B with 1,000 rows. One year later table A
>> may have 1,500,000 rows and table B may have 2,000 rows. It shouln't
>> have any impact on the plan. Yes it is possible that as a result of
>> table growth a different plan may become slightly more efficient, but
>> in real-life scenario plan stability is more important.
>>
>> I think the only situation where regular statistics collection is
>> important is when plan depends on MIN or MAX value in a column. A
>> purge job it may be executing something like
>>
>> DELETE FROM TABLEA WHERE CREATE_DATE < SYSDATE - 180
>>
>> If column CREATE_DATE is indexed then Oracle needs to know MIN
>> (CREATE_DATE) to decide whether to use this index or not. But this is
>> rare situation. It may be simple to add hints to a few statements that
>> rely on MIN / MAX value than collect statistics every day and risk
>> plan instability.
>>
>> In 10g default job for statistics collection relies on table
>> monitoring. Tables get analyzed when total number of inserted /
>> updated / deleted rows exceeds 10%. There are two problems with this
>> approach: (1) it makes statistics collection unpredictable from DBA
>> perspective (2) the statement updating internal table OBJECT_USAGE is
>> responsible for significant percentage (around 30%) of soft parses.
>>
>> It seems to me that for established applications the best approach is
>> to stop statistics collection (except for partitioned tables where new
>> partitions are created every day/week/month etc). It will guarantee
>> plan stability. It is possible that performance of some statements
>> will deteriorate, but it will happen gradually. With regular
>> statistics collection there is always treat of optimizer choosing sub-
>> optimal plan. I have seen quite a few of them, typically it involves
>> switch from INDEX RANGE SCAN to FAST FULL INDEX SCAN, switch from
>> nested loops to hash join, etc - often much slower than the original
>> plan.
> 
> I have worked in a shop where the Oracle-guru had the same idea like
> you. He didn't update his statistics FOR 3 YEARS!  Of course,
> tremendous performance problems arose, which he tried to heal by
> creating new indexes.
> He indeed thought that not updating the statistics would lead to plan
> stability (like you), and if he would have been told about your story
> about ' relative value's ' he would probably agree.
> 
> It seems to me that you do not have information on how the optimizer
> works, but your phantasy is filling in the holes. I found this link
> very usefull for info on the optimizer. For someone with questions
> like you have, i think it's a must-read:
> http://www.centrexcc.com/papers.html

He should *at least* have updated his statistics after the first new year.... I bet there must have been indexed 'year' columns. The first entry with a new year value will dramatically downgrade performance if statistics aren't updated.... for finding the first entries of the new year may cause a full table scan...
I saw the same thing happening when a company opened a second office. Location had always been indexed, but since there was only one, the index was obsolete... until records started to come with the second location in it. Unfortunately, location was the FIRST column in many indexes (multi-org design.....)

Shakespeare Received on Wed Aug 19 2009 - 11:44:39 CDT

Original text of this message