Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Which one is good in terms of performance

Re: Which one is good in terms of performance

From: Tim Gorman <tim_at_evdbt.com>
Date: Mon, 27 Dec 2004 20:54:06 -0700
Message-ID: <BDF626EE.21594%tim@evdbt.com>


Spooky, isn't it? Block sampling really means rolling the dice, I think...

If you'd like the script I used to generate these test results, feel free t= o
email me offline. They're nothing special, but it might allow others a jumpstart to test on different data conditions...

on 12/27/04 11:48 AM, Jacques Kilchoer at Jacques.Kilchoer_at_quest.com wrote:

> If I read your table example right, then dbms_stats and block sampling ga=
ve
> worse results for 50% sampling than it did for 20%, .01%, .000001% sampli=
ng?

>=20

> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org=
] On
> Behalf Of Tim Gorman
> Sent: vendredi, 24. d=E9cembre 2004 07:23
> To: 'Oracle-L (E-mail)
> Subject: Re: Which one is good in terms of performance
>=20

> Seema and Wolfgang,
> Here is some info I generated recently, to add to the discussion...
>=20

> My apologies for the length of the post, but feel free to delete if not
> interested...
>=20
>=20

> Accuracy when calculating table statistics with different sample sizes:
>>=20
> These are the cumulated averaged results from comparing statistics gather=
ed
> using both DBMS_STATS and ANALYZE TABLE against the baseline of performin=
g a
> COMPUTE using DBMS_STATS. For the first round, I concentrated on table
> statistics, and ignored column statistics (a.k.a. histograms) by using =B3F=
OR
> ALL COLUMNS SIZE 1=B2.
>=20

> I tested against four major tables in the Oracle E-Business Suite ERP
> system, of varying sizes (from small to humongous).
>=20
> For each of the table statistics gathered for the cost-based optimizer (i=
.e.
> the columns NUM_ROWS, BLOCKS, and AVG_ROW_LEN in DBA_TABLES), displayed i=
s
> the percentage deviation from the baseline of a COMPUTE using DBMS_STATS:
>=20

> Nbr of
> tables % Deviation %
> Deviation % Deviation
> LABEL sampled Nbr of Blks Nbr=
of

> Rows Avg Row Len
> -------------------------------------------------- ------- -----------
> ----------- -----------
> DBMS_STATS: COMPUTE, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.00 0.00
> DBMS_STATS: 10.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.02 0.00
> DBMS_STATS: 05.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.03 0.00
> DBMS_STATS: 01.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.10 0.00
> DBMS_STATS: 50.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.22 0.00
> DBMS_STATS: 10.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 0.15 0.11
> DBMS_STATS: 00.010000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.46 0.00
> DBMS_STATS: 20.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.58 0.00
> DBMS_STATS: 05.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 0.91 0.11
> DBMS_STATS: 00.000001%, FOR ALL COLUMNS SIZE 1 4 0.00
> 1.12 0.00
> DBMS_STATS: 20.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 1.35 0.00
> DBMS_STATS: 00.010000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 1.47 0.00
> DBMS_STATS: 00.000001% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 1.58 0.00
> DBMS_STATS: 50.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 2.03 0.00
> DBMS_STATS: 01.000000% BLKSAMP, FOR ALL COLUMNS SI 4 0.00
> 4.61 0.00
> ANALYZE: 05.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.42 7.56
> ANALYZE: COMPUTE, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.00 8.34
> ANALYZE: 50.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.00 8.34
> ANALYZE: 20.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 0.62 7.73
> ANALYZE: 10.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 1.44 7.15
> ANALYZE: 01.000000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 7.24 8.62
> ANALYZE: 00.010000%, FOR ALL COLUMNS SIZE 1 4 0.00
> 7.24 8.62
> ANALYZE: 00.000001%, FOR ALL COLUMNS SIZE 1 4 0.00
> 7.24 8.62

>=20
> The sorting in this report is by the summarization of the three percentag=
e
> columns, from "most accurate" results to "least accurate" results.
>=20

> A couple things to note about the results shown here:
>=20
> 1. The value in BLOCKS is always completely accurate, regardless of sampl=
e
> size, because the value is not calculated at all, but is obtained directl=
y
> from the segment header
> 2. The very worst calculations from the DBMS_STATS package are (apparentl=
y)
> more accurate than the very best calculations from ANALYZE command
>> * The biggest cause seems to be vastly different results on AVG_ROW_LEN >>> * However, the method of calculating AVG_ROW_LEN by DBMS_STATS is easil= y
>>> exposed by performing SQL trace
>>>> * Method is:  select avg(vsize(col-1)) + avg(vsize(col-2)) + ... +
>>>> avg(vsize(col-N)) from table-name ...
>> * The method of calculating AVG_ROW_LEN by the ANALYZE command cannot be
>> determined using SQL trace

> 3. The worst deviation with the results from DBMS_STATS when calculating
> NUM_ROWS is 4.61%, while the worst deviation with the results from ANALYZ=
E
> is almost twice that, at 7.24%
> 4. The results from ANALYZE TABLE ... SAMPLE 1 PERCENT, SAMPLE 0.01 PERCE=
NT,
> and SAMPLE 0.000001 PERCENT are exactly the same, indicating the lowest
> valid sampling value for the ANALYZE command is 1 PERCENT
>> * DBMS_STATS allows 0.000001% sampling >> * DBMS_STATS also permits the use of BLOCK SAMPLING, which ANALYZE does = not
>=20

> Accuracy when calculating index statistics with different sample sizes:
>=20

> In contrast, where even the very worst results obtained by the smallest
> sample of table statistics is just 7.24% off from performing a COMPUTE
> operation, the results from using smaller samples when calculating index
> statistics get much worse very quickly as the sampling size decreases.
>=20

> Here is the results of testing against the nine indexes of the
> FND_CONCURRENT_REQUESTS table, which vary widely in terms of density, dat=
a
> characteristics, etc. The results are being sorted by the summation of t=
he
> percentage columns, so that the "most accurate" shows first and the "leas=
t
> accurate" shows last...

>=20
> Nbr of % Dev'n =
%
> Dev'n
> indexes % Dev'n % Dev'n % Dev'n Avg Leaf
> Avg Data % Dev'n % Dev'n
> LABEL sampled BLevel Leaf Blks Dist Keys Blks/Ke=
y
> Blks/Key Clu Fac # Idx Entries
> ------------------------------ ------- ------- --------- --------- ------=
--

> -------- ------- -------------
> DBMS_STATS: INDEX COMPUTE 9 0.00 0.00 0.00 0.0=
0
> 0.00 0.00 0.00
> DBMS_STATS: INDEX 05.000000% 9 0.00 0.00 22.13 14.8=
5
> 16.40 0.44 0.00
> DBMS_STATS: INDEX 20.000000% 9 0.00 0.00 25.39 24.2=
9
> 24.22 0.44 0.00
> DBMS_STATS: INDEX 10.000000% 9 0.00 0.00 25.56 16.6=
1
> 33.89 0.44 0.00
> DBMS_STATS: INDEX 00.010000% 9 0.00 0.00 27.45 20.1=
3
> 37.91 0.44 0.00
> DBMS_STATS: INDEX 50.000000% 9 0.00 0.00 26.42 21.5=
3
> 38.65 0.44 0.00
> DBMS_STATS: INDEX 01.000000% 9 0.00 0.00 31.17 27.9=
0
> 28.18 0.44 0.00
> DBMS_STATS: INDEX 00.000001% 9 0.00 0.00 27.81 22.4=
6
> 40.37 0.44 0.00
> ANALYZE: INDEX COMPUTE 9 0.00 595.30 0.00 239.8=
8
> 0.00 0.00 0.00
> ANALYZE: INDEX 50.000000% 9 0.00 596.27 23.55 329.3=
1
> 13.85 17.29 17.60
> ANALYZE: INDEX 01.000000% 9 8.33 530.13 32.28 353.8=
6
> 23.04 32.87 34.08
> ANALYZE: INDEX 00.010000% 9 8.33 530.13 32.28 353.8=
6
> 23.04 32.87 34.08
> ANALYZE: INDEX 00.000001% 9 8.33 530.13 32.28 353.8=
6
> 23.04 32.87 34.08
> ANALYZE: INDEX 20.000000% 9 0.00 596.27 32.97 551.4=
3
> 58.40 19.22 19.51
> ANALYZE: INDEX 05.000000% 9 0.00 596.27 40.77 854.4=
6
> 59.40 24.67 23.23
> ANALYZE: INDEX 10.000000% 9 0.00 596.27 40.86 792.7=
2
> 141.73 29.25 29.07
>=20
> First of all, there is the huge deviation the calculations of LEAF_BLOCKS=
by
> DBMS_STATS and by ANALYZE. Each command (DBMS_STATS or ANALYZE) is
> consistent, so one must be wrong and the other right. Unfortunately, in
> this case, it is the DBMS_STATS package that is uniformly wrong on this,
> because the results produced by the ANALYZE INDEX commands are corroborat=
ed
> by similar results from the ANALYZE INDEX ... VALIDATE STRUCTURE command,
> the DBMS_SPACE package, and the values stored in the DBA_SEGMENTS view. =
I
> searched MetaLink for a bug report related to DBMS_STATS and LEAF_BLOCKS,
> but could not find one. I intend to retest this against 9.2.0.5 and
> 10.1.0.3, when I get a chance. I=B9ll also test it against 9.2.0.1...
>=20
> A few more things to note about the results shown here:
>=20
> 1. DBMS_STATS.GATHER_INDEX_STATS and ANALYZE INDEX produce the exact same
> results when COMPUTEing
>> * except the LEAF_BLOCKS anomaly >>> * which also apparently causes an anomaly with AVG_LEAF_BLOCKS_PER_KEY
> 2. the ANALYZE INDEX ... COMPUTE results are corroborated by other
> utilities, so the ANALYZE INDEX command is more accurate in v9.2.0.4 than
> the DBMS_STATS.GATHER_INDEX_STATS command
> 3. The results from both DBMS_STATS and ANALYZE are wildly worse when
> performing an estimated sample, instead of compute
>=20
> That last point is the most important. Nonetheless, since the ANALYZE IN=
DEX
> ... COMPUTE command is clearly the most accurate, here is the same result
> data, this time using the results from ANALYZE INDEX .. COMPUTE STATISTIC=
S
> as the baseline instead of DBMS_STATS:
>>=20
> Nbr of % Dev'n =
%
> Dev'n
> indexes % Dev'n % Dev'n % Dev'n Avg Leaf
> Avg Data % Dev'n % Dev'n
> LABEL sampled BLevel Leaf Blks Dist Keys Blks/Ke=
y
> Blks/Key Clu Fac # Idx Entries
> ------------------------------ ------- ------- --------- --------- ------=
--
> -------- ------- -------------
> ANALYZE: INDEX COMPUTE 9 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00
> ANALYZE: INDEX 50.000000% 9 0.00 0.13 23.55 15.18
> 13.85 17.29 17.60
> DBMS_STATS: INDEX COMPUTE 9 0.00 85.04 0.00 41.30
> 0.00 0.00 0.00
> DBMS_STATS: INDEX 05.000000% 9 0.00 85.04 22.13 38.50
> 16.40 0.44 0.00
> DBMS_STATS: INDEX 20.000000% 9 0.00 85.04 25.39 36.70
> 24.22 0.44 0.00
> ANALYZE: INDEX 01.000000% 9 8.33 8.42 32.28 36.11
> 23.04 32.87 34.08
> ANALYZE: INDEX 00.010000% 9 8.33 8.42 32.28 36.11
> 23.04 32.87 34.08
> ANALYZE: INDEX 00.000001% 9 8.33 8.42 32.28 36.11
> 23.04 32.87 34.08
> DBMS_STATS: INDEX 01.000000% 9 0.00 85.04 31.17 36.24
> 28.18 0.44 0.00
> DBMS_STATS: INDEX 10.000000% 9 0.00 85.04 25.56 38.07
> 33.89 0.44 0.00
> ANALYZE: INDEX 20.000000% 9 0.00 0.13 32.97 53.44
> 58.40 19.22 19.51
> DBMS_STATS: INDEX 50.000000% 9 0.00 85.04 26.42 37.22
> 38.65 0.44 0.00
> DBMS_STATS: INDEX 00.010000% 9 0.00 85.04 27.45 37.53
> 37.91 0.44 0.00
> DBMS_STATS: INDEX 00.000001% 9 0.00 85.04 27.81 37.05
> 40.37 0.44 0.00
> ANALYZE: INDEX 05.000000% 9 0.00 0.13 40.77 107.16
> 59.40 24.67 23.23
> ANALYZE: INDEX 10.000000% 9 0.00 0.13 40.86 95.99
> 141.73 29.25 29.07
>=20
> Again, this perspective shows the two types of COMPUTE largely agreeing w=
ith
> each other (except for the LEAF_BLOCKS and the AVG_LEAF_BLOCKS_PER_KEY va=
lue
> derived from it), with all other results showing dramatically less accura=
te
> due to decreases in the sample size.
>=20
> For indexes, clearly computing yields the best results, while a bug in
> DBMS_STATS appears to be skewing the LEAF_BLOCK results in v9.2.0.4. maki=
ng
> the ANALYZE INDEX command the best alternative for that version.
>=20
> So, from this data, which is admittedly sparse and limited (needs more
> testing on a wider range of versions and applications), I think that one =
can
> conclude:
>=20
> * Perform the gathering of table/column and index statistics separately f=
rom
> one another
>> * Gather statistics on tables and columns in one operation using >> DBMS_STATS.GATHER_TABLE_STATS >>> * leave the CASCADE parameter at the default of FALSE >>> * use a low ESTIMATE_PERCENT and even use BLOCK_SAMPLE=3D>TRUE, if necess= ary >> * Gather statistics on indexes in another operation to either ANALYZE IN= DEX >> or >> DBMS_STATS.GATHER_INDEX_STATS >>> * only do COMPUTE
> * There is a bug in DBMS_STATS.GATHER_INDEX_STATS in 9.2.0.4 regarding th=
e
> calculation of LEAF_BLOCKS that needs to be explored further...
>=20 >> From this, I don=B9t think that one can conclude that it is better to ANAL= YZE
> INDEX ... COMPUTE instead of
> DBMS_STATS.GATHER_INDEX_STATS(ESTIMATE_PERCENT=3D>NULL), because I don=B9t kn=
ow
> if the anomaly found exists outside of 9.2.0.4...
>=20
> Hope this helps...
>=20 >=20 -- http://www.freelists.org/webpage/oracle-l
Received on Mon Dec 27 2004 - 21:48:57 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US