RE: conditional unique index to restrict duplicates of a particular set

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 7 May 2014 08:57:11 -0400
Message-ID: <00fe01cf69f3$dd4cc8c0$97e65a40$_at_rsiz.com>



First, you probably do not want to start this index with the date column, which would maximize the chances of a hot block on concurrent insert.

Second, in the actual load tests you'll want to be sure your insert pattern mimics the texture of inserts you get in actual use so any block concurrency and splitting issues on the index happen in a similar way.

Third, depending on the texture of your data and especially if most entries are 'P' (which will be all null and therefore not in the index), then this index will be more or less sparse compared to the full data set, and that would determine whether it was worthwhile to code predicates able to reference this index for equality or range references when you are excluding 'P' values. I've never actually coded up a predicate to reference to a CASE defined index and I'm not sure of the syntax, but you can make this work for function based indexes, like this (not your actual table, but pretty easy to map to your case):
SQL> r
  1 create unique index junk13_a_not_1_i   2 on junk13 (

  3  decode(a,1,null,a),
  4  decode(a,1,null,b),
  5  decode(a,1,null,c),
  6  decode(a,1,null,d)

  7* )

Index created.

SQL> get gs_onetab
  1 begin
  2 dbms_stats.gather_table_stats(
  3 ownname => null

  4  ,tabname => '&tabname'
  5  ,partname => null
  6  ,estimate_percent => null
  7  ,block_sample => false
  8  ,method_opt => 'for columns access_order size 254'
  9  ,degree => NULL
 10  ,cascade => true

 11 );
 12* end;
SQL> 8
  8* ,method_opt => 'for columns access_order size 254' SQL> del
SQL> r
  1 begin
  2 dbms_stats.gather_table_stats(
  3 ownname => null
  4  ,tabname => '&tabname'
  5  ,partname => null
  6  ,estimate_percent => null
  7  ,block_sample => false
  8  ,degree => NULL
  9  ,cascade => true

 10 );
 11* end;
Enter value for tabname: junk13
old 4: ,tabname => '&tabname'
new 4: ,tabname => 'junk13'

PL/SQL procedure successfully completed.

SQL> set autotrace on;
SQL> select * from junk13

  2  where decode(a,1,null,a) is not null
  3  and   decode(a,1,null,b) is not null
  4  and   decode(a,1,null,c) is not null
  5  and   decode(a,1,null,d) is not null;

A                    B                    C                    D

-------------------- -------------------- --------------------
----------------------------------------------------
2 1 1 not aA

Execution Plan



Plan hash value: 3617190931

| Id  | Operation                   | Name             | Rows  | Bytes |
Cost (%CPU)| Time     |

----------------------------------------------------------------------------
--------------------
| 0 | SELECT STATEMENT | | 1 | 63 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| JUNK13 | 1 | 63 | 2 (0)| 00:00:01 | |* 2 | INDEX FULL SCAN | JUNK13_A_NOT_1_I | 1 | |
1 (0)| 00:00:01 |

Predicate Information (identified by operation id):


   2 - filter(DECODE(TO_NUMBER("A"),1,NULL,"A") IS NOT NULL AND
              DECODE(TO_NUMBER("A"),1,NULL,"B") IS NOT NULL AND
DECODE(TO_NUMBER("A"),1,NULL,"C") IS
              NOT NULL AND DECODE(TO_NUMBER("A"),1,NULL,"D") IS NOT NULL)

or, if you're more selective on some column: SQL> r
  1 select * from junk13

  2  where decode(a,1,null,a)  = '2'
  3  and   decode(a,1,null,b) is not null
  4  and   decode(a,1,null,c) is not null
  5* and   decode(a,1,null,d) is not null

A                    B                    C                    D

-------------------- -------------------- --------------------
----------------------------------------------------
2 1 1 not aA

Execution Plan



Plan hash value: 2352825022

| Id  | Operation                   | Name             | Rows  | Bytes |
Cost (%CPU)| Time     |

----------------------------------------------------------------------------
--------------------
| 0 | SELECT STATEMENT | | 1 | 63 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| JUNK13 | 1 | 63 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | JUNK13_A_NOT_1_I | 1 | |
1 (0)| 00:00:01 |

Predicate Information (identified by operation id):


   2 - access(DECODE(TO_NUMBER("A"),1,NULL,"A")='2')
       filter(DECODE(TO_NUMBER("A"),1,NULL,"B") IS NOT NULL AND
              DECODE(TO_NUMBER("A"),1,NULL,"C") IS NOT NULL AND
DECODE(TO_NUMBER("A"),1,NULL,"D") IS
              NOT NULL)

can get the index range scan.

Left to right concatenated column reference rules don't change, so

SQL> r
  1 select * from junk13
  2* where decode(a,1,null,a) = '2'

A                    B                    C                    D

-------------------- -------------------- --------------------
----------------------------------------------------
2 1 1 not aA

Execution Plan



Plan hash value: 2352825022

| Id  | Operation                   | Name             | Rows  | Bytes |
Cost (%CPU)| Time     |

----------------------------------------------------------------------------
--------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| JUNK13 | 1 | 61 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | JUNK13_A_NOT_1_I | 1 | |
1 (0)| 00:00:01 |

Predicate Information (identified by operation id):


   2 - access(DECODE(TO_NUMBER("A"),1,NULL,"A")='2')

works just fine. In the examples I've tried it seems to need reference the left most column of the functional index to recognize it is there, but consider that anecdotal.

Alternatives to consider if there are relatively very many tuples with 'P'

are separate tables for 'P' versus non-'P' tuples (either one per type or
simply 'P' versus non-'P'),
in which case the non-'P' table or tables would have a regular unique index
quite naturally in use with appropriate predicate, and making use of the a union all view to preserve existing queries while having the insert programs choose the correct insert statement based on FEED TYPE (which seems likely to be batched in the first place). If inserts do not tend to be batched by FEED TYPE some care may be needed to avoid row by row inserts instead of bulk inserts if you are currently using bulk inserts.

And the trigger option Carlos mentions below.

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Carlos Sierra
Sent: Wednesday, May 07, 2014 6:51 AM
To: vijaysehgal21_at_gmail.com
Cc: ORACLE-L
Subject: Re: conditional unique index to restrict duplicates of a particular set

Vijay,

  1. To me sure the performance impact of this index I would do an actual test with some volume with and without the index.
  2. I do not see how this index could be used by real predicates, so most probably this index would not be useful for anything else.
  3. You may want to consider NOT having this index and use a trigger instead. In other words, have a regular non-unique B-Tree index on acc_number, branch_code, feed_date and feed_type. Then the trigger would validate that values I and E do not duplicate. The business requirement that you describe, in my opinion is better served with a trigger than with the unique index you described. This regular non-unique B-Tree index would most probably be used by other predicates in your application.

Cheers - Carlos

On May 7, 2014, at 3:52 AM, vijay sehgal <vijaysehgal21_at_gmail.com> wrote:

> Dear Experts,
> 
> I have a few questions with reference to below scenario:
> 
> Table Structure : feed_details( id_pk      number,
>                 acc_number number,
>                                 branch_code number,
>                                 feed_date   date,
>                                 feed_type   varchar2(1),
>                                 balance     number
>                                );
> 
> The feed_type can have three values 'I','E' and 'P'. We have to put a
conditional unique constraint on acc_number, branch_code, feed_date and feed_type. The feed_date will have date and time will be always set to midnight.
>
> The constraint is to be put such that duplicates of feed type 'P' should be allowed but duplicates for 'I' and 'E' should not be allowed.
> 
> e.g. acc_number|branch_code|feed_type|feed_date
>      101       |10         | I      | 07052014
>      102       |10         | E     | 07052014
>      101       |10         | I       | 07052014 -- should not be allowed
>      102       |10         | E      | 07052014 -- should not be allowed
>      103       |10         | P      | 07052014
>      103       |10         | P      | 07052014
> 
> FB unique index as below would allow this to be achieved. 
> 
> CREATE UNIQUE INDEX idx_feed_details 
>     ON feed_details(CASE WHEN fed_typ!='P' THEN feed_date ELSE null END,
>                CASE WHEN fed_typ!='P' THEN acc_number ELSE null END,
>                CASE WHEN fed_typ!='P' THEN branch_code ELSE null END,
>                CASE WHEN fed_typ!='P' THEN feed_type ELSE null END);
> 
> 
> 1) How should we correctly measure the performance hit after his index is
created (the inserts would be concurrent from multiple java connections). > 2) Since the index is only to restrict duplicates, would this index be helpful in some other scenarios of data selection when the predicate is on these columns.
> 3) any other overheads / side effects of this kind of index.
> 
> Hope the questions and scenario is clear.
> 
> Thanking you all for your time and help.
> 
> Warm Regards,
> Vijay Sehgal.
> 
> 

--
http://www.freelists.org/webpage/oracle-l



--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 07 2014 - 14:57:11 CEST

Original text of this message