RE: Unique Index Re-design

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 28 Mar 2014 07:51:13 -0400
Message-ID: <07f701cf4a7c$06053b40$120fb1c0$_at_rsiz.com>



+1, AND,   Not knowing what null represents in column b and not knowing the type of b bothers me, and the whether the clustering factor based on “a” first also represents a potential advantage to pursue.  

If column b is a date or at least represents a date, there is a good chance it is closely correlated with the time of insertion (it may in fact be the insertion date, but even if it is something else such as a transaction date not tightly linked to insertion time or an updated date it still is true weighed over most of the data I’ve ever seen it will be correlated with the time of insertion.) What has this got to do with the price of tea in China? Time of insertion is usually correlated decently (if not exactly) with blocks in storage. So without knowing more (b,a) is an odds on favorite to have a better cluster factor and in particular if your query base includes analysis of rows within date ranges (b,a) will likely be an improvement both in chances of getting picked by the optimizer and fetching fewer total blocks per date range scan.  

This is all secondary musing subordinate to what JL mentioned, ergo, starting this response with +1.  

I do think the date correlation may be quite helpful in practice, but that depends on your actual data and how it came to be physically stored.  

This entire case also seems very different from the case where perhaps the null “b” represents “I’m done processing this row” on a single column index where it might trend toward nearly 100% of the rows over the long haul and so the resultant single column index would be tiny and focused on rows of continued interest.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis
Sent: Friday, March 28, 2014 6:42 AM
To: mohamed.houri_at_gmail.com; ORACLE-L Subject: RE: Unique Index Re-design    

In principle if you hide the 90996518 values you're only going to reduce the volume of the index by 5% - which doesn't sound like the solution to "a lot of" single block reads.

Why are you seeing a lot of reads ? What's the nature of the queries (or DML) that causes them to happen ?

When you say this index is "generating" a lot of single block reads, do you mean that index blocks are read randomly, or that table blocks identified by the index are read randomly ?  

Looking at the states (8,000 to 9,000 rows per date) I'd guess that inserts (or updated) for a date may be happening around the same time requiring a constant volume of random reads of the index to find the blocks that need updating. But are the queries then: fetch me everything for a date, fetch me everything for an "a" value, or fetch me an "a" value across a range of dates ? And what queries do you have that address the nulls in the data column ? Is some of you db file sequential read the result of index fast full scans where lots of index blocks are already in memory ?  

Based on details supplied so far I'd be considering the two indexes (b,a) for the uniqueness and (a) for precision when only (a) has been supplied in predicate (knowing, of course) that there should be no production queries for a=90996518 - but that's just based on a few guesses about what the system might be trying to achieve.        

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle


From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of Mohamed Houri [mohamed.houri_at_gmail.com] Sent: 28 March 2014 09:53DML)
To: ORACLE-L
Subject: Unique Index Re-design

I don’t know if this is a good question worth an answer or not; nevertheless I couldn’t resist the temptation to ask it.

I have a real life unique two columns composite index ind_uk (a,b) having the following picture  

select num_rows, distinct_keys, clustering_factor

from all_indexes where index_name = ‘IND_UK’;  

num_rows, distinct_keys, clustering_factor

1,350,375 1,350,375 146,386  

The data repartition of these two indexed columns are  

select a, count(1)

from table_t

group by a

order by count(1) desc;  

shows one unique extreme count

a count(1)

90996518 67977 -à this is my concern

106628306 8

104585295 8

105558779 8

105243015 8

84407427 8

106183944 7

73262355 1

73262392 1

73393305 1

73393309 1

73393325 1

73469367 1  

The majority of the remaining records are with count = 1  

select

      b

    , count(1)

from table_t

group by b

order by count(1) desc;  

b                  count(1)

 

null                432500  -à this is my concern

13/11/2013 00:00:00 9075

14/11/2013 00:00:00 9030

08/11/2013 00:00:00 8780

15/11/2013 00:00:00 8721

12/11/2013 00:00:00 8060

19/11/2013 00:00:00 7772

22/11/2013 00:00:00 7696

21/11/2013 00:00:00 7618

26/11/2013 00:00:00 7539

Etc..  

This index when used by the CBO is generating a lot of time consuming db file sequential read wait events  

When I asked the client what is the particularity of this a value (90996518) he answered that this a dummy value used for testing (yes for testing in PRODUCTION).   My question finally is: I want to reengineer this index so that (a) it will still be unique (b) do not contain a = 90996518 value and (c) do not contains column b having null values.  

I created the following index to honor my question  

create unique index mho_ind on t4 (case when a = 90996518 then null else a end, case when b is not null then b end);  

Have you any other suggestions?  

Thanks

Mohamed Houri

www.hourim.wordpress.com <http://www.hourim.wordpress.com/>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Mar 28 2014 - 12:51:13 CET

Original text of this message