Re: searching for records with skewed data values (skewed data changes rapidly)

Home -> Community -> Usenet -> c.d.o.server -> Re: searching for records with skewed data values (skewed data changes rapidly)

From: <ctcgag_at_hotmail.com>
Date: 20 Aug 2004 01:50:58 GMT
Message-ID: <20040819215058.648$C0@newsreader.com>

mathewbutler_at_yahoo.com (Matt) wrote:
> Xho,
>
> >>Is the nature of the skew permanent (a few "A" and many "B",
> >>all the time), or does the skew oscillate (beginning of the month
> has
> >>many "A", few "B", middel of month has few "A", many "B")?
>
> Records arrive in rapid succession and are pushed through the various
> statuses. In the day there are typically 1% of pending records that
> are picked up (hopefully via the index) before some processing occurs
> and the status is then changed. So, the status changes probably happen
> every few seconds (depending on how fast the query is!)
>
> >>This doesn't make sense to me, unless the skewness oscillates
> wildly. The
> >>nature of statistics is, well, statistical. It doesn't matter
> whether any
> >>particular record is in some particular state at the time statistics
> are
> >>gathered.
>
> At the end of the day no more records arrive and the automatic
> processing of records ensure that all pending records are processed
> and their state changed. Thus, when stats are generated they don't
> (necessarily) include the pending state.

Right, OK. But then the histogram should reflect the absense of those statuses, and an absense should be even more selective than a rare presence. So the estimate error caused by the this would be in the other direction, i.e. away from FTS.

>
> >>-1) Verify that you are in fact collecting histogram data, rather
> than just
> >>table statistics without histograms.
>
> I think that this is the problem (see earlier post from me to Daniel).
> However, I am not clear how many histograms I should find in
> user_histograms when my dbms_stats call is defined correctly. I think
> some further reading and testing is required.

If the number of buckets you told it to use is more than the number of distinct values of the status, it will probably use value-based histograms, in which case you will find one row in user_histograms (on that table and column) for each distinct value of status. endpoint_value is the distinct value of status (cryptically interpreted into a number, if it is a string column), and endpoint_number should be the number of times that that status and all previous statuses occur.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB

Received on Thu Aug 19 2004 - 20:50:58 CDT