Re: AUTO_SAMPLE_SIZE is every row

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Sat, 27 Jan 2018 13:40:01 -0500
Message-ID: <93276ce1-168b-0916-068a-9164193ecbfc_at_gmail.com>



Replies in-line

On 01/27/2018 10:12 AM, Neil Chandler wrote:
>
> Mladen,
>
>
> Sweeping generalisations are not helpful, and this is the 2nd time you
> have made this one, fishing for a response. So let me respond.
>

I haven't made any generalization, much less "a sweeping one".  My statement is that for some tables, full table scan is unacceptable. The very form of the sentence, "for some tables", tells you that this is not a generalization. And I am not fishing for an answer, I'm just stating the obvious. I'm having a cold and I don't feel like having an argument today, much less a dilbertian one, like this. And you can consider me a narcissist.

>
> Some of the time a full table scan may be problematically slow, but it
> is simplistic to assume that sampling a fractional percentage will
> result in good overall outcomes.
>

Who has said that? You are arguing against something I have never said. There is nothing that will result in "good overall outcomes". Such a solution is popularly known as a "silver bullet" and its existence is widely disputed. Not even tuning pack and tuning profile will result in "good overall outcomes". That is why consultants like me still have their jobs.

>
> The full scan provides other benefits, by minimising additional work
> needed to be taken by the stats gathering job. It ensures increased
> accuracy of the stats, improves single-sample adaptive stat collection
> by understanding NULL distributions (and therefore sample escalations
> are proactively anticipated), and allows perfect frequency and top
> frequency histograms at marginal additional cost. It is needed for
> Hybrid histograms to replace the largely ineffectual Height-balanced
> histograms. It allows the use of approximate_ndv to lower the overall
> impact of the stats gather. All of those benefits are lost by
> implementing any other estimate_percent.
>

I have seen tables of more than a TB in size. Good luck with a full table scan and benefits it can produce.

>
> Given estimate_percent=auto_sample_size is the Oracle default, and has
> been through several major releases, there is a good chance that it
> has rather a lot of benefits. As a default, it's not going away.
>
>
> If your table is so large that the scan time is unacceptable, the
> better solution is more likely to be to implement partitioning and use
> incremental stats (or a locked-partition manual version), or maybe
> even look to manually fabricate your stats if you know how to do that.
> It depends upon your goals (and licensing).
>

And partition licenses are expensive. Oracle is the only major RDBMS vendor which charges for the right to create a partitioned tables. And yes, I do know how to partition tables man manually fabricate the stats. However, that is no longer necessary. There is dynamic sampling, which can produce the results of the same quality as DBMS_STATS. Furthermore, I  don't see the need for being rude and personal with "if you know how to do that". Please keep the discussion civil, if you know how to do that.

>
>
> If you are going to suggest alternatives, please ensure you list at
> least some of the benefits and disbenefits associated with the
> alternatives so people can make informed descisions about their 
> approach.
>

I am not going to suggest alternatives any more than I have done that so far. I don't really care about people making informed decisions. I am not running for office.  I am a consultant and I will help tune any SQL that my customers require me.

>
> regards
>
>
> Neil Chandler
>
> Database Guy.
>
>
>

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217


--
http://www.freelists.org/webpage/oracle-l
Received on Sat Jan 27 2018 - 19:40:01 CET

Original text of this message