Re: dbms_stats.gather_table_stats

From: Greg Rahn <greg_at_structureddata.org>
Date: Fri, 5 Sep 2008 14:59:41 -0700
Message-ID: <a9c093440809051459i473e3dd4k4b2e1a9865c4dbe3@mail.gmail.com>


My first comment is that you are probably spending more time gathering stats than you need to be. Here is the reason why: If you are using the command you listed below, the OPTIONS parameter is defaulted (to GATHER). GATHER gathers ALL statistics on every table/partition, whether they are stale or not. If you have tables that are partitioned and many of those partitions are nonvolatile, it is unlikely that the data has changed *significantly* enough to matter (stats are still representative), so why regather stats on them? This is the case for most time series data warehouses.

Also with the command you are using GRANULARITY is defaulting to 'AUTO' (unless you changed the default with dbms_stats.set_param), so when you specify a partition you are 1) gathering stats for that specific partition and 2) gathering stats for the entire table (global) and 3) stats for indexes (cascade). If you want to only gather partition stats and not update the global stats, use GRANULARITY=>'PARTITION'. WRT the time difference, it may be that the first run only did a single pass because it gathered stats for every partition and used those samples for the global stats as well. This may be an optimization (but I would have to verify) for this case. This would explain why the single partition gathers are just a bit longer than the first - they need to do 1 partition + full table, where the first just does full table.

Also 25% seems like quite a large ESTIMATE_PERCENT. Do you require such a large sample to get representative stats? Dos dbms_stats.auto_sample_size work?

I have some more notes on my blog on the subject http://structureddata.org/2008/03/26/choosing-an-optimal-stats-gathering-strategy/

Hope that helps.

On Fri, Sep 5, 2008 at 12:00 PM, <genegurevich_at_discover.com> wrote:
> Hi everybody:
>
> I am running oracle 10.2.0.1. Once a week we have a analyze process run to
> gather stats on all the
> tables in the schema via commands like this one exec
> dbms_stats.gather_table_stats( 'schema', 'table', null, 25,
> degree=>8, cascade=>true );
>
> Some of the tables are partitioned and we have 6 or 7 historical partitions
> which are not being updated anymore
> so I thought I would save time by restricting this command to only last two
> partitions:
>
> exec dbms_stats.gather_table_stats( 'owner', 'table', 'part1', 25,
> degree=>8, cascade=>true );
> exec dbms_stats.gather_table_stats( 'owner', 'table', 'part2', 25,
> degree=>8, cascade=>true );
>
> The partitions are roughly the same size. The indices are globally
> partitioned. The first command (analyze
> the whole table) took 25 minutes. Last two command combined - 27 min.
> Considering that there are 5 more patitions
> in the table, the math does not add up. What am I missing?
>
>
> thank you
>
> Gene Gurevich
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

-- 
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Fri Sep 05 2008 - 16:59:41 CDT

Original text of this message