RE: BTree Index Leading Column(s) Thought Question?

From: Taylor, Chris David <ChrisDavid.Taylor_at_ingrambarge.com>
Date: Fri, 9 Mar 2012 15:09:21 -0600
Message-ID: <C5533BD628A9524496D63801704AE56D6ADD5C1287_at_SPOBMEXC14.adprod.directory>



I think this answered my question. This is basically what I'm seeing - a few differences in buffer gets but nothing major. But if COMPRESS is used, there is some difference that could be considered. From: http://asktom.oracle.com/pls/asktom/f?p0:11:0::::P11_QUESTION_ID:1296165726968

Here is a short excerpt from my recent book that relates to this topic as well:

<quote>

Myth: Most Discriminating Elements Should Be First

This seems like common sense. If you are going to create an index on the columns C1 and C2 in a table with 100,000 rows, and you find C1 has 100,000 distinct values and C2 has 25,000 distinct values, you would want to create the index on T(C1,C2). This means that C1 should be first, which is the “commonsense” approach. The fact is, when comparing vectors of data (consider C1, C2 to be a vector), it doesn’t matter which you put first. Consider the following example. We will create a table based on ALL_OBJECTS and an index on the OWNER, OBJECT_TYPE, and OBJECT_NAME columns (least discriminating to most discriminating) and also on OBJECT_NAME, OBJECT_TYPE, and OWNER:

ops$tkyte_at_ORA10GR1> create table t
  2 as
  3 select * from all_objects;
Table created.

ops$tkyte_at_ORA10GR1> create index t_idx_1 on t(owner,object_type,object_name); Index created.

ops$tkyte_at_ORA10GR1> create index t_idx_2 on t(object_name,object_type,owner); Index created.

ops$tkyte_at_ORA10GR1> select count(distinct owner), count(distinct object_type),   2 count(distinct object_name ), count(*)   3 from t;

DISTINCTOWNER DISTINCTOBJECT_TYPE DISTINCTOBJECT_NAME COUNT(*)

------------- ------------------- ------------------- --------
           28                  36               28537    48243
Now, to show that neither is more efficient space-wise, we’ll measure their space utilization: ops$tkyte_at_ORA10GR1> analyze index t_idx_1 validate structure; Index analyzed.

ops$tkyte_at_ORA10GR1> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave   2 from index_stats;

BTREE_SPACE PCT OPT_CMPR_COUNT OPT_CMPR_PCTSAVE

----------- ------  -------------- ----------------
    2702744   89.0               2               28

ops$tkyte_at_ORA10GR1> analyze index t_idx_2 validate structure; Index analyzed.

ops$tkyte_at_ORA10GR1> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave   2 from index_stats;

BTREE_SPACE PCT OPT_CMPR_COUNT OPT_CMPR_PCTSAVE

----------- ------  -------------- ----------------
    2702744   89.0               1               13

They use exactly the same amount of space, down to the byte—there are no differences there. However, the first index is a lot more compressible if we use index key compression, as evidenced by the OPT_CMP_PCTSAVE value. There is an argument for arranging the columns in the index in order from the least discriminating to the most discriminating. Now let’s see how they perform, to determine if either index is generally more efficient than the other. To test this, we’ll use a PL/SQL block with hinted queries (so as to use one index or the other):

ops$tkyte_at_ORA10GR1> alter session set sql_trace=true; Session altered.

ops$tkyte_at_ORA10GR1> declare
  2 cnt int;
  3 begin
  4 for x in ( select /*+FULL(t)*/ owner, object_type, object_name from t )   5 loop

  6         select /*+ INDEX( t t_idx_1 ) */ count(*) into cnt
  7           from t
  8          where object_name = x.object_name
  9            and object_type = x.object_type
 10            and owner = x.owner;
 11
 12          select /*+ INDEX( t t_idx_2 ) */ count(*) into cnt
 13           from t
 14          where object_name = x.object_name
 15            and object_type = x.object_type
 16            and owner = x.owner;

 17 end loop;
 18 end;
 19 /
PL/SQL procedure successfully completed. These queries read every single row in the table by means of the index. The TKPROF report shows us the following:
SELECT /*+ INDEX( t t_idx_1 ) */ COUNT(*) FROM T WHERE OBJECT_NAME = :B3 AND OBJECT_TYPE = :B2 AND OWNER = :B1 call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  48243     10.63      10.78          0          0          0           0
Fetch    48243      1.90       1.77          0     145133          0       48243

------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 96487 12.53 12.55 0 145133 0 48243 Rows Row Source Operation ------- ---------------------------------------------------
  48243 SORT AGGREGATE (cr5133 pr=0 pw=0 time#34197 us)   57879 INDEX RANGE SCAN T_IDX_1 (cr5133 pr=0 pw=0 time40672 us)(object…

SELECT /*+ INDEX( t t_idx_2 ) */ COUNT(*) FROM T WHERE OBJECT_NAME = :B3 AND OBJECT_TYPE = :B2 AND OWNER = :B1 call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse        1      0.00       0.00          0          0          0           0
Execute  48243     11.00      10.78          0          0          0           0
Fetch    48243      1.87       2.10          0     145168          0       48243

------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 96487 12.87 12.88 0 145168 0 48243 Rows Row Source Operation ------- ---------------------------------------------------
  48243 SORT AGGREGATE (cr5168 pr=0 pw=0 time"51857 us)   57879 INDEX RANGE SCAN T_IDX_2 (cr5168 pr=0 pw=0 time82547 us)(object…

They processed the same exact number of rows and very similar numbers of blocks (minor variations coming from accidental ordering of rows in the table and consequential optimizations made by Oracle), used equivalent amounts of CPU time, and ran in about the same elapsed time (run this same test again and the CPU and ELAPSED numbers will be a little different, but on average they will be the same). There are no inherent efficiencies to be gained by placing the columns in order of how discriminating they are, and as stated previously, with index key compression there is an argument for putting the least selective first. If you run the preceding example with COMPRESS 2 on the indexes, you’ll find that the first index will perform about two-thirds the I/O of the second, given the nature of the query in this case.

However, the fact is that the decision to put column C1 before C2 must be driven by how the index is used. If you have lots of queries like the following:

select * from t where c1 = :x and c2 = :y; select * from t where c2 = :y;

it makes more sense to place the index on T(C2,C1). This single index could be used by either of the queries. Additionally, using index key compression (which we looked at with regard to IOTs and will examine further later), we can build a smaller index if C2 is first. This is because each value of C2 repeats itself on average four times in the index. If C1 and C2 are both, on average, 10 bytes in length, the index entries for this index would nominally be 2,000,000 bytes (100,000 ~TMS 20). Using index key compression on (C2, C1), we could shrink this index to 1,250,000 (100,000 ~TMS 12.5), since three out of four repetitions of C2 could be suppressed.

In Oracle 5 (yes, version 5!), there was an argument for placing the most selective columns first in an index. It had to do with the way version 5 implemented index compression (not the same as index key compression). This feature was removed in version 6 with the addition of row-level locking. Since then, it is not true that putting the most discriminating entries first in the index will make the index smaller or more efficient. It seems like it will, but it will not. With index key compression, there is a compelling argument to go the other way since it can make the index smaller. However, it should be driven by how you use the index, as previously stated. </quote>

Chris Taylor

"Quality is never an accident; it is always the result of intelligent effort." -- John Ruskin (English Writer 1819-1900)

Any views and/or opinions expressed herein are my own and do not necessarily reflect the views of Ingram Industries, its affiliates, its subsidiaries or its employees.



From: Taylor, Chris David
Sent: Friday, March 09, 2012 2:16 PM
To: Taylor, Chris David; 'oracle-l_at_freelists.org' Subject: RE: BTree Index Leading Column(s) Thought Question?

(Data Distribution Fixed - I think)

Forgive me if this seems like a silly question but it's been a long week and my brain is tired LOL Let's say I have the following considerations:

a.) Large table in GBs

b.) Many queries using similar PREDICATES (but in different orders, for different reasons)

c.) Let's say I look at my data distribution for 2 of the columns (for queries using BUSINESS_UNITS and ACCOUNTING_PERIODS)

d.) Example query: select x,y,z from tableA where business_Unit = 'xxxxx' and accounting_period='yyyyy'

e.) Here is the data distribution

Does it make more sense to have a B-Tree index with Accounting Period as the first column (generally speaking I mean - based on the data dist below)?

Or does it make more sense to have BU first - because if BU != 10000 we can effectively disregard 844,775 rows immediately.

But if we have ACCOUNTING_PERIOD = 1 then we only have to go through 113,358 'rows'

(it's entirely possible I'm thinking about BTree indexes incorrectly here as well)

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Mar 09 2012 - 15:09:21 CST

Original text of this message