Re: dbms_stats default

From: John Darrah <nospam_at_nospam.net>
Date: Sat, 28 Dec 2002 22:46:52 GMT
Message-ID: <wRpP9.485312$P31.157829@rwcrnsc53>

Sorry for the vague response. The 3X increase was on the analyze operation not queries against the table. I thought there was a difference between gathering column stats and creating histograms. I thought that gathering stats with a size of 1 meant that I would only be populating the columns in the col$ table and not be creating histograms. It turns out that gathering for columns does create a histogram on that column. It seems strange that the default behavior of the dbms_stats package would be to create a one bucket histogram for every column in the table but I guess the rational is to err on the side of too much information rather than not enough. I wasn't aware that the optimizer was sophisticated enough to order predicates according by what will most reduce the row source, that's good to know. Would your strategy be to gather table stats, without gathering column stats and then selectively gather stats on columns with appropriately sized histograms? The only bad thing I can see with this method is that 1) the avg_row_len of the table may be inaccurate and 2) there will be columns in the table with no stats which could pose a problem for future queries. Finally, I just wanted to say how much I enjoyed your seminar. Its made me want to go back and take a more in-depth look at a lot of the things I thought I understood.

Thanks for the reply,

John
"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message news:auejup$gu5$1$8302bc10_at_news.demon.co.uk...
>
> >For wide tables, I saw a 3X increase in response time when gathering
> stats
> >for all columns in the table vs just those that are indexed.
>
> Could you clarify this - did the stats gathering take
> three times as long or did the subsequent queries
> take three times as long ?
>
> If you collect massively more column stats, it should not be
> a surprise to find that all the extra sorting and counting
> adds time to the stats collection.
>
> If you have the column stats, your queries MAY run quicker,
> they MAY run slower.
>
> The drawback to collecting REDUNDANT column stats
> is that they have to be loaded into the dictionary cache,
> and they have to be considered when the query is
> optimised. This waste shared pool space and CPU.
>
> On the other hand, if the statistics add value about
> the data - which does mean EVEN on unindexed
> columns - queries can run faster.
>
> Consider the unindexed predicates:
> date1 between 1st Jan 2003 and 31st Jan 2003
> and date2 between 1st Jan 2002 and 31st Jan 2002
>
> Assume that almost all the data has date1 in Jan 2003,
> and date2 in Dec 2002. In this case it would be more
> efficient to run the date2 test before the date1 test, as
> this would filter out all but a very few rows for the second
> (CPU costly) test. (9.2 execution plans can show this
> type of calculation taking place, by the way)
>
>
> Bottom line - most systems should have only a few
> column histograms in place, but there is no obvious
> requirement that they should only be on indexed
> columns.
>
> --
> Regards
>
> Jonathan Lewis
> http://www.jlcomp.demon.co.uk
>
> Coming soon a new one-day tutorial:
> Cost Based Optimisation
> (see http://www.jlcomp.demon.co.uk/tutorial.html )
>
> Next Seminar dates:
> (see http://www.jlcomp.demon.co.uk/seminar.html )
>
> ____England______January 21/23
>
>
> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
>
>
>
>
> John Darrah wrote in message ...
> >I use indexed columns and non-indexed columns in WHERE clauses. My
> question
> >is, for non indexed-columns, is gathering stats on density low val
> high val
> >and distinct values going to have any affect on the access plan
> generated?
> >It seems unlikely that it would since Oracle has 1 and only 1 way to
> access
> >that data, through a table access. The reason I want to know this is
> because
> >using the default method produces sql similar to the following:
> >
> >SELECT count(*),count(col1),count(distinct
> >col1),min(col1),max(col1),count(col2),count(distinct
> col2),min(col2),...
> >
> >For wide tables, I saw a 3X increase in response time when gathering
> stats
> >for all columns in the table vs just those that are indexed.
>
>
>
Received on Sat Dec 28 2002 - 16:46:52 CST