RE: index on null

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sat, 29 Aug 2015 20:16:44 -0400
Message-ID: <053801d0e2b9$2768ef80$763ace80$_at_rsiz.com>



CP wrote, in part:

" Assume the query accesses the table using business_unit column only. " and
" So far I am trying to tune the query that supplies values for the bus_unit column. "

  1. if any of the values supplied for the business_unit column are NULL (as Stefan mentioned for specifically IS NULL, then it you'll be doing a full table scan.)
  2. otherwise the index will be a HUGE win in your case barring a bizarrely horrible cluster factor, meaning that even if you only went to the index for 242,808 rows in the case of XY410, you had to read over half (or so, gross approximation) the table blocks to get all of them.
  3. if the predicate is "business_unit is NOT NULL" then the index should be fast full scanned, and with your numbers, again barring a bizarrely bad cluster_factor, that is likely a HUGE win. (Nearly 12 million table rows DON'T need to be visited, and that most likely translates to a lot of table blocks that you will not have to visit to pick up the 500,000 or so rows that are NOT NULL.)

IF it is NOT a HUGE win, and since you assert this table is accessed ONLY by business_unit, it is worth considering physically re-order (re-loading in order using some method of direct loading) the table in business_unit order. (We don't care if this results in a bad cluster factor for some other index on the table, since you're only going after the table by business_unit so any indexes in place to support constraints don't have to visit the table and therefore we don't care if they have bad cluster factors. [I haven't seen ordering on one column do something horrible to a different index in the wild, but in theory and lab examples it can happen.])

IF you are licensed for partitioning, it might also be worthwhile to partition putting values together either null and everything else or null and each business_unit defined value in its own partition, possibly with row movement enabled. If your number of different business_unit values is not going to grow much, a partition for each is probably best. If a lot of new business_unit values (not additional occurences of the existing values) are planned to occur, then you might end up with too many partitions to be reasonable for such a small table.

Good luck. If any of this doesn't immediately make sense I've got a paper on when physical reordering should be considered and another on how to benefit from "sparse" indexes on columns containing a preponderance of NULL values. I can get you links to some conference where I presented those if you can't find them, or if they are no longer on line I can send them. Very likely it will make sense when you try it and you won't need the papers. Just please don't create a treadmill for yourself frequently reloading the table in order.

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Stefan Koehler Sent: Friday, August 28, 2015 4:49 PM
To: carlospena999_at_gmail.com; oracle-l_at_freelists.org Subject: Re: index on null

Hi CP,

> would it make a difference adding an index on the column. Assume the query accesses the table using business_unit column only.

Depends on your query (does it contain a predicate like "business_unit IS NULL" or not) and on the index you want to create (B*Tree or Bitmap). If you just index column business_unit, then NULL values are not included in a standard B*Tree index. You can also outwit this implementation, but it all depends on your query as well of course :-)  

Best Regards
Stefan Koehler

Freelance Oracle performance consultant and researcher Homepage: http://www.soocs.de
Twitter: _at_OracleSK

> Cee Pee <carlospena999_at_gmail.com> hat am 28. August 2015 um 22:15 geschrieben:
>
>
> Hi
>
> We have a table with data distribution like this:
>
> COUNT(*) BUSINESS_UNIT
> -------------------- -----
> 539 AB340
> 242,808 XY410
> 81 AB941
> 11,890,271
> 81,494 XY411
> 71,532 TY810
>
> Give that majority of the values are null for the Business_unit
> column, would it make a difference adding an index on the column.
> Assume the query accesses the table using business_unit column only. So far I am trying to tune the query that supplies values for the bus_unit column. the query is a multi table join; trying to make this specific table driving table.
>
> CP.
 

--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Sun Aug 30 2015 - 02:16:44 CEST

Original text of this message