Re: Which one is good in terms of performance

From: Tim Gorman <tim_at_evdbt.com>
Date: Mon, 27 Dec 2004 20:54:06 -0700
Message-ID: <BDF626EE.21594%tim@evdbt.com>

Spooky, isn't it? Block sampling really means rolling the dice, I think...

If you'd like the script I used to generate these test results, feel free t= o
email me offline. They're nothing special, but it might allow others a jumpstart to test on different data conditions...

on 12/27/04 11:48 AM, Jacques Kilchoer at Jacques.Kilchoer_at_quest.com wrote:

> If I read your table example right, then dbms_stats and block sampling ga=
ve
> worse results for 50% sampling than it did for 20%, .01%, .000001% sampli=
ng?

>=20

> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org=
] On
> Behalf Of Tim Gorman
> Sent: vendredi, 24. d=E9cembre 2004 07:23
> To: 'Oracle-L (E-mail)
> Subject: Re: Which one is good in terms of performance

>=20

> Seema and Wolfgang,
> Here is some info I generated recently, to add to the discussion...

>=20

> My apologies for the length of the post, but feel free to delete if not
> interested...

>=20
>=20

> Accuracy when calculating table statistics with different sample sizes:
>>=20
> These are the cumulated averaged results from comparing statistics gather=
ed
> using both DBMS_STATS and ANALYZE TABLE against the baseline of performin=
g a
> COMPUTE using DBMS_STATS. For the first round, I concentrated on table
> statistics, and ignored column statistics (a.k.a. histograms) by using =B3F=
OR
> ALL COLUMNS SIZE 1=B2.

>=20

> I tested against four major tables in the Oracle E-Business Suite ERP
> system, of varying sizes (from small to humongous).
>=20
> For each of the table statistics gathered for the cost-based optimizer (i=
.e.
> the columns NUM_ROWS, BLOCKS, and AVG_ROW_LEN in DBA_TABLES), displayed i=
s
> the percentage deviation from the baseline of a COMPUTE using DBMS_STATS:

>=20


>                                                  Nbr of

>                                                  tables % Deviation %

> Deviation % Deviation

> LABEL                                             sampled Nbr of Blks Nbr=

 of

> Rows Avg Row Len
> -------------------------------------------------- ------- -----------
> ----------- -----------
> DBMS_STATS: COMPUTE, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.00 0.00
> DBMS_STATS: 10.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.02 0.00
> DBMS_STATS: 05.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.03 0.00
> DBMS_STATS: 01.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.10 0.00
> DBMS_STATS: 50.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.22 0.00
> DBMS_STATS: 10.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 0.15 0.11
> DBMS_STATS: 00.010000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.46 0.00
> DBMS_STATS: 20.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.58 0.00
> DBMS_STATS: 05.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 0.91 0.11
> DBMS_STATS: 00.000001%, FOR ALL COLUMNS SIZE 1 4 0.00
> 1.12 0.00
> DBMS_STATS: 20.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 1.35 0.00
> DBMS_STATS: 00.010000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 1.47 0.00
> DBMS_STATS: 00.000001% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 1.58 0.00
> DBMS_STATS: 50.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 2.03 0.00
> DBMS_STATS: 01.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 4.61 0.00
> ANALYZE: 05.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.42 7.56
> ANALYZE: COMPUTE, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.00 8.34
> ANALYZE: 50.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.00 8.34
> ANALYZE: 20.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.62 7.73
> ANALYZE: 10.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 1.44 7.15
> ANALYZE: 01.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 7.24 8.62
> ANALYZE: 00.010000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 7.24 8.62
> ANALYZE: 00.000001%, FOR ALL COLUMNS SIZE 1 4 0.00
> 7.24 8.62

>=20
> The sorting in this report is by the summarization of the three percentag=
e
> columns, from "most accurate" results to "least accurate" results.

>=20

> A couple things to note about the results shown here:
>=20
> 1. The value in BLOCKS is always completely accurate, regardless of sampl=
e
> size, because the value is not calculated at all, but is obtained directl=
y
> from the segment header
> 2. The very worst calculations from the DBMS_STATS package are (apparentl=
y)
> more accurate than the very best calculations from ANALYZE command
>> * The biggest cause seems to be vastly different results on AVG_ROW_LEN >>> * However, the method of calculating AVG_ROW_LEN by DBMS_STATS is easil= y

>>> exposed by performing SQL trace
>>>> * Method is:  select avg(vsize(col-1)) + avg(vsize(col-2)) + ... +
>>>> avg(vsize(col-N)) from table-name ...
>> * The method of calculating AVG_ROW_LEN by the ANALYZE command cannot be
>> determined using SQL trace

> 3. The worst deviation with the results from DBMS_STATS when calculating
> NUM_ROWS is 4.61%, while the worst deviation with the results from ANALYZ=
E
> is almost twice that, at 7.24%
> 4. The results from ANALYZE TABLE ... SAMPLE 1 PERCENT, SAMPLE 0.01 PERCE=
NT,
> and SAMPLE 0.000001 PERCENT are exactly the same, indicating the lowest
> valid sampling value for the ANALYZE command is 1 PERCENT
>> * DBMS_STATS allows 0.000001% sampling >> * DBMS_STATS also permits the use of BLOCK SAMPLING, which ANALYZE does = not

>=20

> Accuracy when calculating index statistics with different sample sizes:

>=20

> In contrast, where even the very worst results obtained by the smallest
> sample of table statistics is just 7.24% off from performing a COMPUTE
> operation, the results from using smaller samples when calculating index
> statistics get much worse very quickly as the sampling size decreases.

>=20

> Here is the results of testing against the nine indexes of the
> FND_CONCURRENT_REQUESTS table, which vary widely in terms of density, dat=
a
> characteristics, etc. The results are being sorted by the summation of t=
he
> percentage columns, so that the "most accurate" shows first and the "leas=
t
> accurate" shows last...

>=20
> Nbr of % Dev'n =
%
> Dev'n
> indexes % Dev'n % Dev'n % Dev'n Avg Leaf
> Avg Data % Dev'n % Dev'n
> LABEL sampled BLevel Leaf Blks Dist Keys Blks/Ke=
y
> Blks/Key Clu Fac # Idx Entries
> ------------------------------ ------- ------- --------- --------- ------=

--


> -------- ------- -------------

> DBMS_STATS: INDEX COMPUTE           9    0.00      0.00      0.00     0.0=

0


> 0.00   0.00          0.00

> DBMS_STATS: INDEX 05.000000%        9    0.00      0.00     22.13    14.8=

5


> 16.40   0.44          0.00

> DBMS_STATS: INDEX 20.000000%        9    0.00      0.00     25.39    24.2=

9


> 24.22   0.44          0.00

> DBMS_STATS: INDEX 10.000000%        9    0.00      0.00     25.56    16.6=

1


> 33.89   0.44          0.00

> DBMS_STATS: INDEX 00.010000%        9    0.00      0.00     27.45    20.1=

3


> 37.91   0.44          0.00

> DBMS_STATS: INDEX 50.000000%        9    0.00      0.00     26.42    21.5=

3


> 38.65   0.44          0.00

> DBMS_STATS: INDEX 01.000000%        9    0.00      0.00     31.17    27.9=

0


> 28.18   0.44          0.00

> DBMS_STATS: INDEX 00.000001%        9    0.00      0.00     27.81    22.4=

6


> 40.37   0.44          0.00

> ANALYZE: INDEX COMPUTE              9    0.00    595.30     0.00    239.8=

8


> 0.00   0.00          0.00

> ANALYZE: INDEX 50.000000%           9    0.00    596.27     23.55   329.3=

1


> 13.85  17.29         17.60

> ANALYZE: INDEX 01.000000%           9    8.33    530.13     32.28   353.8=

6


> 23.04  32.87         34.08

> ANALYZE: INDEX 00.010000%           9    8.33    530.13     32.28   353.8=

6


> 23.04  32.87         34.08

> ANALYZE: INDEX 00.000001%           9    8.33    530.13     32.28   353.8=

6


> 23.04  32.87         34.08

> ANALYZE: INDEX 20.000000%           9    0.00    596.27     32.97   551.4=

3


> 58.40  19.22         19.51

> ANALYZE: INDEX 05.000000%           9    0.00    596.27     40.77   854.4=

6


> 59.40  24.67         23.23

> ANALYZE: INDEX 10.000000%           9    0.00    596.27     40.86   792.7=

2


> 141.73  29.25         29.07

>=20


> First of all, there is the huge deviation the calculations of LEAF_BLOCKS=

 by


> DBMS_STATS and by ANALYZE. Each command (DBMS_STATS or ANALYZE) is

> consistent, so one must be wrong and the other right.  Unfortunately, in

> this case, it is the DBMS_STATS package that is uniformly wrong on this,

> because the results produced by the ANALYZE INDEX commands are corroborat=

ed


> by similar results from the ANALYZE INDEX ... VALIDATE STRUCTURE command,

> the DBMS_SPACE package, and the values stored in the DBA_SEGMENTS view.  =

I


> searched MetaLink for a bug report related to DBMS_STATS and LEAF_BLOCKS,

> but could not find one.  I intend to retest this against 9.2.0.5 and

> 10.1.0.3, when I get a chance.  I=B9ll also test it against 9.2.0.1...

>=20


> A few more things to note about the results shown here:

>=20


> 1. DBMS_STATS.GATHER_INDEX_STATS and ANALYZE INDEX produce the exact same

> results when COMPUTEing

>> * except the LEAF_BLOCKS anomaly
>>> * which also apparently causes an anomaly with AVG_LEAF_BLOCKS_PER_KEY


> 2. the ANALYZE INDEX ... COMPUTE results are corroborated by other

> utilities, so the ANALYZE INDEX command is more accurate in v9.2.0.4 than

> the DBMS_STATS.GATHER_INDEX_STATS command

> 3. The results from both DBMS_STATS and ANALYZE are wildly worse when

> performing an estimated sample, instead of compute

>=20


> That last point is the most important.  Nonetheless, since the ANALYZE IN=

DEX


> ... COMPUTE command is clearly the most accurate, here is the same result

> data, this time using the results from ANALYZE INDEX .. COMPUTE STATISTIC=

S


> as the baseline instead of DBMS_STATS:

>>=20


>                              Nbr of                             % Dev'n  =

%


> Dev'n

>                             indexes % Dev'n   % Dev'n   % Dev'n Avg Leaf

> Avg Data % Dev'n       % Dev'n

> LABEL                         sampled  BLevel Leaf Blks Dist Keys Blks/Ke=

y


> Blks/Key Clu Fac # Idx Entries

> ------------------------------ ------- ------- --------- --------- ------=

--


> -------- ------- -------------

> ANALYZE: INDEX COMPUTE              9    0.00     0.00      0.00    0.00

> 0.00    0.00          0.00

> ANALYZE: INDEX 50.000000%           9    0.00     0.13     23.55    15.18

> 13.85   17.29         17.60

> DBMS_STATS: INDEX COMPUTE           9    0.00    85.04      0.00    41.30

> 0.00    0.00          0.00

> DBMS_STATS: INDEX 05.000000%        9    0.00    85.04     22.13    38.50

> 16.40    0.44          0.00

> DBMS_STATS: INDEX 20.000000%        9    0.00    85.04     25.39    36.70

> 24.22    0.44          0.00

> ANALYZE: INDEX 01.000000%           9    8.33     8.42     32.28    36.11

> 23.04   32.87         34.08

> ANALYZE: INDEX 00.010000%           9    8.33     8.42     32.28    36.11

> 23.04   32.87         34.08

> ANALYZE: INDEX 00.000001%           9    8.33     8.42     32.28    36.11

> 23.04   32.87         34.08

> DBMS_STATS: INDEX 01.000000%        9    0.00    85.04     31.17    36.24

> 28.18    0.44          0.00

> DBMS_STATS: INDEX 10.000000%        9    0.00    85.04     25.56    38.07

> 33.89    0.44          0.00

> ANALYZE: INDEX 20.000000%           9    0.00     0.13     32.97    53.44

> 58.40   19.22         19.51

> DBMS_STATS: INDEX 50.000000%        9    0.00    85.04     26.42    37.22

> 38.65    0.44          0.00

> DBMS_STATS: INDEX 00.010000%        9    0.00    85.04     27.45    37.53

> 37.91    0.44          0.00

> DBMS_STATS: INDEX 00.000001%        9    0.00    85.04     27.81    37.05

> 40.37    0.44          0.00

> ANALYZE: INDEX 05.000000%           9    0.00     0.13     40.77   107.16

> 59.40   24.67         23.23

> ANALYZE: INDEX 10.000000%           9    0.00     0.13     40.86    95.99

> 141.73   29.25         29.07

>=20


> Again, this perspective shows the two types of COMPUTE largely agreeing w=

ith


> each other (except for the LEAF_BLOCKS and the AVG_LEAF_BLOCKS_PER_KEY va=

lue


> derived from it), with all other results showing dramatically less accura=

te


> due to decreases in the sample size.

>=20


> For indexes, clearly computing yields the best results, while a bug in

> DBMS_STATS appears to be skewing the LEAF_BLOCK results in v9.2.0.4. maki=

ng


> the ANALYZE INDEX command the best alternative for that version.

>=20


> So, from this data, which is admittedly sparse and limited (needs more

> testing on a wider range of versions and applications), I think that one =

can


> conclude:

>=20


> * Perform the gathering of table/column and index statistics separately f=

rom


> one another

>> * Gather statistics on tables and columns in one operation using
>> DBMS_STATS.GATHER_TABLE_STATS
>>> * leave the CASCADE parameter at the default of FALSE
>>> * use a low ESTIMATE_PERCENT and even use BLOCK_SAMPLE=3D>TRUE, if necess=
ary
>> * Gather statistics on indexes in another operation to either ANALYZE IN=
DEX
>> or
>> DBMS_STATS.GATHER_INDEX_STATS
>>> * only do COMPUTE


> * There is a bug in DBMS_STATS.GATHER_INDEX_STATS in 9.2.0.4 regarding th=

e


> calculation of LEAF_BLOCKS that needs to be explored further...

>=20
>> From this, I don=B9t think that one can conclude that it is better to ANAL=
YZE


> INDEX ... COMPUTE instead of

> DBMS_STATS.GATHER_INDEX_STATS(ESTIMATE_PERCENT=3D>NULL), because I don=B9t kn=

ow


> if the anomaly found exists outside of 9.2.0.4...

>=20


> Hope this helps...

>=20
>=20

--
http://www.freelists.org/webpage/oracle-l

Received on Mon Dec 27 2004 - 21:48:57 CST