| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: RE: RULE versus CHOOSE
Vivek:
Skewed data is data that is not evenly distributed in a column. Let me try to explain. Let's say you have a shipping company that does business throughout the US (in all 50 states) and you have a shipping table with address to customer that has a STATE column with the 2 letter state abbreviation as part of their address. You have done business in all 50 states, but most of your business comes from the 6 states that are in your region of the country. Therefore, statistically speaking, those 6 state abbreviations are represented an abnormally high number of times; they may represent as much as 80% of all the shipping orders. Without evidence to the contrary, the CBO assumes evenly distributed data when it works. This would mean, for the purposes of our example, that the CBO would assume that each state is represented 2% of the time and that these 6 states taken together would represent 12% of the total, not 60%.
So, let's say I run a query selecting all the ordered shipped to one of those states. Something like "select * from orders where state='NY'. Now, the CBO is going to assume that the state 'NY' represents 2% of the total orders on average and do use the index on the "state" column. In fact, however, much of our business comes from NY and it represents 35% of all the orders. So, the CBO should go for a full table scan. But it doesn't know that--it can only assume an even distribution of the data from the normal statistics collected. A histogram on the "state" column would provide the CBO with the additional information it needs to know that, in this case, the expected number of rows to return from the query would represent a large percentage of the total number of rows and that it should do a full table scan.
So, as has been said earlier, you want to look for data in indexed columns that are used in where clauses as literal predicates where the data is not evenly distributed in the column. These generally make good candidates for histograms.
Hope this makes sense.
-- Jon Walthour, OCDBA Oracle DBA Cincinnati, Ohio ----- Original Message ----- To: "Multiple recipients of list ORACLE-L" <ORACLE-L_at_fatcity.com> Sent: Friday, July 06, 2001 1:28 PM Can you possibly Detail what you mean by Skewed Data ?Received on Sat Jul 07 2001 - 07:55:54 CDT
> -----Original Message-----
> From: Jon Walthour [SMTP:jonw_at_fuse.net]
> Sent: Friday, July 06, 2001 8:25 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: RE: RULE versus CHOOSE
>
>
>
> Tom:
>
> I never did any official benchmark studies, per se. I studied
> the CBO/histograms about 4 months ago when we were converting
> an app over from Oracle 7 to 8i. The new version of the app had
> queries that were very different from its earlier version and
> so performance on 8i as compared to 7 was dramatically worse.
> Of course, the app owners blamed the database. In the course
> of my defense of the db, I discovered that the data (being primarily
> composed of case studies) was severely skewed to the more recent
> dates (i.e., the further back you went, the sparser the number
> of cases). I discovered that the optimizer was doing a lot of
> range scans for queries when it should have been doing full table
> scans according to the CBO thresholds. Histograms on certain
> date fields throughout the schema dropped times on certain large
> report queries from 30 minutes to under 2.
>
> That's all I know. I can't give you an hard empirical evidence,
> just anecdotal evidence that, when properly used, histograms
> do seem to have dramatic impacts.
>
> --
>
> Jon Walthour, OCDBA
> Oracle DBA
> Computer Horizons
> Cincinnati, Ohio
>
>
> >--- Original Message ---
> >From: "Terrian, Tom" <tterrian_at_daas.dla.mil>
> >To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
> >Date: 7/6/01 9:20:24 AM
> >
>
> >Jon,
> >
> >Great write up.
> >
> >Several times you mentioned creating Histograms for skewed data
> distributions.
> >I am just curious if you have ever studied the performance impact
> with and
> >without them? We used to maintain histograms but when we studied
> the
> >performance impact (with and without them) we determined that
> there was very
> >little benefit with histograms. The down side with them is
> that they
> >drastically increased the amount of time it took to analyze
> the tables at night.
> >We decided to do without them. Have you ever studied their
> benefits verse
> >drawbacks?
> >
> >Tom
> >
> >Tom Terrian
> >Oracle DBA
> >WPAFB - DAASC
> >tterrian_at_daas.dla.mil
> >937-656-3844
> >
>
>
-- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: VIVEK_SHARMA INET: VIVEK_SHARMA_at_infy.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jon Walthour INET: jonw_at_fuse.net Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
![]() |
![]() |