RE: Stats gather for big tables

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 14 Apr 2021 15:45:40 -0400
Message-ID: <235a01d73166$c05bb5e0$411321a0$_at_rsiz.com>



I think that is what JL wrote. It looks as if the scan time for the basic gather is 3.5 minutes and the minimal calculations done are essentially completed in slack time. IF you had a lot of complicated histograms and column groups, the compute time might make it take significantly longer than the scan time. So I *think* you are seeing the good case, but we cannot differentiate it from a bad case from your test results.  

I’m not sure of the best way to time just scanning the table. Perhaps select count with a full table scan hint.  

If you know the exact number of rows, possibly select some_unindexed_column from tab offset {actual_count-1}  

hmm. That might not be until oracle 12.  

If you run something like  

select n.name,s.value from sys.v_$mystat s, sys.v_$statname n

where n.statistic# = s.statistic#

     and n.name = ‘consistent gets’  

and then  

select count(*) from tab;  

and then run the query above again, you can see if it is scanning. ( if it can count some smaller index, you’ll get less )  

select count(some_unindexed_column) from tab probably works or you can hint it.  

if you set time on and set timing on in sqlplus, that will punk the basic timing wall clock and charged elapsed.      

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Pap Sent: Wednesday, April 14, 2021 2:11 PM
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: Stats gather for big tables  

Tried creating a sample table having ~130million rows and ~18GBin size holding 8 range partitions without any indexes. And what i found is the time it takes to gather stats on this table with INCREMENTAL TRUE in a single thread is ~3.5minutes. And when I gathered stats for the first time, after adding a new column with method_opt as "for columns new_column size 1", it took the same time. I have also verified in dba_part_col_statstics the last_analyzed was only updated for the new column though, but it seems oracle still has to scan all the partitions and thus no relaxation in overall run time of the stats gather even if I restrict the gather to one column only.  

Regards

Pap  

On Tue, Mar 30, 2021 at 5:15 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:  

I would test something like the following:  

After adding the column set a table preference the table's method_op that includes the phrase "for column {new column name} size 1" so that Oracle doesn't (at least initially) gather a histogram.

Then - before the next auto gather - issue an explcit gather_table_stats restricting the stats collection to just that one column.  

The entire table will be scanned, the work load and update MIGHT restrict itself only to a sysnopsis gathering on the one column, and that might be something you can fit into the time-frame you need to avoid a massive collection across the whole table.  

I can't give you a detailed strategy - there are so many differences in what goes on in the different versions in different circumstances that it's always a question of coming up with an idea, thinking of possible drawbacks, then modelling (usually with extended tracing enabled).  

Regards

Jonathan Lewis    

P.S. If you are adding columns on a fairly regular basis, the way you populate them may make a big difference to how efficiently you can gather stats, and how long it will take to get to steady state. And if you have added several columns there's a fair chance that you are introduing a huge number of chained or migrated rows to your data and need to do something to clean up a very big mess.            

On Mon, 29 Mar 2021 at 19:56, Pap <oracle.developer35_at_gmail.com> wrote:

We are using incremental stats collection approach for gathering statistics on some big partition tables(having size ~20-30 TB). And thus we just have to pass table name to the stats gather block and rest is taken care by oracle from the synopsys information. Now we encountered a situation multiple times in which the team endup adding new columns and the stats collection job started from scratch running for days and then impacting other jobs. So want to understand if there is any better way of handling this situation? If we can by some way restrict Oracle to not scan all the data again and create synopsis from scratch rather utilize existing synopsis and just scan those delta column information which was added to the table? (Oracle version - 11.2.0.4)  

Regards

Pap

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Apr 14 2021 - 21:45:40 CEST

Original text of this message