Re: AUTO_SAMPLE_SIZE is every row

From: Neil Chandler <neil_chandler_at_hotmail.com>
Date: Wed, 17 Jan 2018 18:15:32 +0000
Message-ID: <HE1PR1001MB12432EC827651472F0A5310185E90_at_HE1PR1001MB1243.EURPRD10.PROD.OUTLOOK.COM>



Do you have large partitioned tables, or large unpartitioned tables, or thousands of tables?

Using an estimate percentage, you will trade off accurate stats for less reliable stats, meaning poorer execution plans and plan stability issues as the blocks and therefore values sampled change every time you gather stats, with (rarer) values appearing and disappearing, varying cardinalities, etc.

You might also find that sampling data is slower than the table scan unless you are using a very low percentage. Geting the estimate percentage correct will attempt to find the right balance for you. You might also find Oracle arbitrarily increasing the sample size and resampling if it didn't get enough data back from the sampled blocks provided.

regards

Neil Chandler

Database Guy.



From: Matt Adams <MAdams_at_equian.com>
Sent: 17 January 2018 17:48
To: Neil Chandler
Subject: RE: AUTO_SAMPLE_SIZE is every row

It’s 11g. Our reason is that for most of our schemas, the statistics generation job is taking 4 days for these very large schemas.

From: Neil Chandler [mailto:neil_chandler_at_hotmail.com] Sent: Wednesday, January 17, 2018 12:43 PM To: Matt Adams
Subject: Re: AUTO_SAMPLE_SIZE is every row

Matt,

That is correct.

"estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE" is the default setting from Oracle 11G and it perform a full table scan to determine the table statistics very accurately.

If you are on Oracle 12C, this has the added advanage that the 2 new histogram types - Hybrid and Top Frequency - are allowed to be created. This is a very good thing as the hybrid histograms are significantly better than the height balanced histograms they replace. Additionally, the gathering of Frequency and Top Frequency histograms is effectivley free when using the defaults as it allows Oracle to use a "HyperLogLog" algorythm to get the approvimate_ndv values in a single pass. This also makes those histograms hugely accurate and stable compared to histograms created by performing adaptive sampling of data.

I would recommend that you do not change from this default unless you have a good, *proven* reason for the change.

(Whether you want lots of histograms on your data is a very different question)

regards

Neil Chandler

Database Guy


From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on behalf of Matt Adams <MAdams_at_equian.com<mailto:MAdams_at_equian.com>> Sent: 17 January 2018 16:19
To: oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org> Subject: AUTO_SAMPLE_SIZE is every row

A previous DBA set up all our stats analyzation jobs to be of the form.

BEGIN DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'FRED', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, cascade => TRUE ); END;

But for every table I can see, the SAMPLE_SIZE is the same as NUM_ROWS, which is the same as the actual number of rows in the table.

I read somewhere that actual value of AUTO_SAMPLE_SIZE can vary from release to release (and patch to patch), but why on earth would it always be every row in the table?

I’m getting ready to change the estimate percentage to something more reasonable. Just wondering if there is something I’m missing somewhere that is influencing this behavior.

Matt

  • This communication may contain privileged and/or confidential information. If you are not the intended recipient, you are hereby notified that disclosing, copying, or distributing of the contents is strictly prohibited. If you have received this message in error, please contact the sender immediately and destroy any copies of this document. ****
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 17 2018 - 19:15:32 CET

Original text of this message