Re: index columns

From: Mohamed Houri <mohamed.houri_at_gmail.com>
Date: Sun, 19 Apr 2015 18:08:30 +0200
Message-ID: <CAJu8R6i6L6krb7tYtT4-9whkh8zGb8VNyKwLcHsV0-0TvB27-w_at_mail.gmail.com>



Thanks Jonathan ,

It's always a pleasure reading you.

In my article I continued doing this:

begin

dbms_stats.gather_table_stats

           (user

,'t_ext_stat'

,method_opt => 'for columns vpk_id size skewonly'

,cascade => true

,no_invalidate => false

           );

end;

/

SQL> SELECT column_name, num_distinct, density, histogram

     FROM user_tab_col_statistics

     WHERE table_name = 'T_EXT_STAT'

     AND column_name in
('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');

COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM

------------------------------ ------------ ---------- ---------------

SYS_STUMVIRBZA6_$QWEX6DE2NGQA1         4967 .000201329 NONE

LAYER_CODE                                4 6.2471E-07 FREQUENCY

VPK_ID                                  246    .000375 HYBRID

As you have explained above, vpk_id got a HYBRID histogram and the extension got no histogram again.

SQL> begin

    dbms_stats.gather_table_stats

              (user

              ,'t_ext_stat'

             ,method_opt => 'for all columns size skewonly'

              ,cascade => true

              ,no_invalidate => false

              );

   end;

   /

SQL> SELECT column_name, num_distinct, density, histogram

  2 FROM user_tab_col_statistics

  3 WHERE table_name = 'T_EXT_STAT'

  4 AND column_name in
('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');

COLUMN_NAME                                        NUM_DISTINCT    DENSITY
HISTOGRAM
  • --------------- ---------------------

SYS_STUMVIRBZA6_$QWEX6DE2NGQA1 5028 .000198886 NONE

LAYER_CODE                                                    4

    6.2846E-07 FREQUENCY

VPK_ID                                                              246
          .000351     HYBRID


 Even when forcing a histogram on all skewed columns the extension has not been found by Oracle to be skewed even when all the columns it is based on have histogram (which confirm what you have said above that a skew in the two columns forming the extension doesn’t necessarily mean that the extension will be identified as skewed)

Best regards

Mohamed Houri

2015-04-18 20:34 GMT+02:00 Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>:

>
>
> Mohamed,
>
> Your question is exactly the point I was making about Tim's example of
> yes_or_no with a 99/1 split and a nearly unique account_id.  Just because
> there is a massive skew in one column that doesn't mean you can see ANY
> skew in the combination when the other column has a large number of
> distinct values.
>
> In particular - Oracle 11g has a limit of 255 columns in a frequency
> histogram, 12c has a limit of 2,000 (or maybe 2,047), so your example with
> one column having 2,712 distinct values CLEARLY can't get a frequency
> histogram when you combine it with another column.
>
> If BOTH columns had individual histograms the histogram on your vpk_id
> column would have to be height-balanced (or top-N, or hybrid) and might be
> identifying one or more extremely popular columns - in which case a popular
> vpk_id value might frequently coincide with a popular layer_code, so that a
> height-balanced/top-N/hybrid histogram on the column group could identify
> the special combination.  But since vpk_id doesn't have (need) a histogram
> then the combination can't need a histogram.
>
>
>
> Regards
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
> _at_jloracle
>   ------------------------------
> *From:* Mohamed Houri [mohamed.houri_at_gmail.com]
> *Sent:* 18 April 2015 15:50
> *To:* Jonathan Lewis
> *Cc:* oracle-l_at_freelists.org
> *Subject:* Re: index columns
>
>   When all other things being equal, pushing the less selective columns
> toward the leading edge of the index in order to have a better index
> compressibility might be an option.
>
> Below is the conclusion of an article I wrote for all things Oracle about
> index design
>
> *http://allthingsoracle.com/index-design-discard-and-sort/
> <http://allthingsoracle.com/index-design-discard-and-sort/>*
>  Conclusion
>
> *Engineering an index should be dictated first by the query predicate part
> (where clause, group by and order by). Look carefully to the column you
> will use as the leading edge of the index. They should be the ones on which
> an equality predicate is applied. You should also have a **“kill two
> birds with one stone”** design strategy as far as with one index you can
> cover multiple queries, **avoid redundant indexes*
> <http://hourim.wordpress.com/2014/03/24/redundant-indexes/>* and cover
> the foreign key lock threat. Do not forget the benefit an indexed virtual
> column could have on helping the CBO make good guesses (estimations) and
> producing attractive small indexes.*
>
> *If a switch in the column order is still able to guaranty the precision
> and the use of the index then start your index with the column having the
> lowest number of distinct values**. As such you can efficiently compress
> your index and give a CBO an extra possible path represented by the **index
> skip scan**.*
>
> As per regards to the column group extension created on columns having or
> not histogram collected on them, I have an article to be published on that
> topic which I have kept for several months waiting to validate one CBO
> estimation I am still unable to figure out.
>
> Christian article shows this
>
> “*In other words, it seems that the query optimizer bypasses if the
> extension has no histogram AND **histograms exist on the columns** on
> which the extension is based.”*
>
>
>  My article shows that if only *one* column of the group of columns
> forming the extension has histogram the CBO will by pass it
>
> *1) no histogram at all*
>
> SQL> SELECT column_name, num_distinct, density, histogram
>
>      FROM user_tab_col_statistics
>
>      WHERE table_name = 'T_EXT_STAT'
>
>      AND column_name in
> ('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');
>
>
>  COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM
>
> ------------------------------ ------------ ---------- ---------------
>
> VPK_ID                                 2712 .000368732 NONE
>
> LAYER_CODE                                4        .25 NONE
>
> SYS_STUMVIRBZA6_$QWEX6DE2NGQA1         4731 .000211372 NONE
>
> SQL> select
>
>     count(1)
>
> from
>
>     t_ext_stat
>
> where vpk_id   = 63148
>
> and layer_code = 'R';
>
>
> --------------------------------------------------------------------------------
>
> | Id  | Operation         | Name       | Starts | E-Rows | A-Rows |
> A-Time   |
>
>
> --------------------------------------------------------------------------------
>
> |   0 | SELECT STATEMENT  |            |      1 |        |      1
> |00:00:00.01 |
>
> |   1 |  SORT AGGREGATE   |            |      1 |      1 |      1
> |00:00:00.01 |
>
> |*  2 |   INDEX RANGE SCAN| T_EXT_UK_I |      1 |    172 |    338
> |00:00:00.01 |
>
>
> --------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>    2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')
>
>
>  Access path analysis for T_EXT_STAT
>
> ***************************************
>
> SINGLE TABLE ACCESS PATH
>
>   Single Table Cardinality Estimation for T_EXT_STAT[VPK]
>
> SPD: Directive valid: dirid = 17542990197075222359, state = 5, flags = 1,
> loc = 1 {EC(98564)[2, 3]}
>
> SPD: Return code in qosdDSDirSetup: EXISTS, estType = TABLE
>
>   Column (#2): VPK_ID(NUMBER)
>
>     AvgLen: 5 NDV: 2712 Nulls: 0 Density: 0.000000 Min: 0.000000 Max:
> 62849.000000
>
>   Column (#3): LAYER_CODE(VARCHAR2)
>
>     AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
>
>   Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
>
>     AvgLen: 12 NDV: 4731 Nulls: 0 Density: 0.000000
>
>   ColGroup (#2, Index) T_EXT_UK_I
>
>     Col#: 2 3 4    CorStregth: -1.00
>
>   ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
>
>     Col#: 2 3    CorStregth: 2.29
>
>   ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0002
>
>   Table: T_EXT_STAT  Alias: VPK
>
>     Card: Original: 813541.000000  Rounded: 172  Computed: 171.96  Non
> Adjusted: 171.96
>
>  ***** Logdef predicate Adjustment ******
>
>
>  The corresponding 10053 trace file clearly shows in this case that the
> extension has been used:
>
>
>  ColGroup Usage:: PredCnt: 2 Matches Full: #1 Partial: Sel: 0.0002
>
> E-Rows = num_rows(t_ext_stat) * selectivity (
> SYS_STUMVIRBZA6_$QWEX6DE2NGQA1);
>
> E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1));
>
> E-rows = 813541.000000 * 1/*4731* = 171.959628 rounded to 172
>
>
>  *2) with histogram*
>
> However, collecting histogram changed the CBO estimation
>
> begin
>
> dbms_stats.gather_table_stats
>
>            (user
>
>            ,'t_ext_stat'
>
>            ,method_opt    => 'for all columns size auto'
>
>            ,cascade       => true
>
>            ,no_invalidate => false
>
>            );
>
> end;
>
> /
>
> SQL> SELECT column_name, num_distinct, density, histogram
>
>      FROM user_tab_col_statistics
>
>      WHERE table_name = 'T_EXT_STAT'
>
>      AND column_name in
> ('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');
>
>
>  COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM
>
> ------------------------------ ------------ ---------- ---------------
>
> SYS_STUMVIRBZA6_$QWEX6DE2NGQA1         4967 .000201329 NONE
>
> LAYER_CODE                                4 6.2471E-07 FREQUENCY
>
> VPK_ID                                 2862 .000349406 NONE
>
>
>  The *layer_code* column, as expected, has been identified as a skewed
> column and henceforth a frequency histogram has been gathered on it to
> indicate this skewness. There is two remarks which seems to be worth
> pointing them out
>
>    -
>
>    Since one of the column group extension has a histogram why the
>    extension itself has not been identified as a skewed column as well
>    -
>
>    What happens in this particular case where there is no histogram on
>    the extension and a histogram on one of the column forming the extension
>
> select
>
>     count(1)
>
> from
>
>     t_ext_stat
>
> where vpk_id   = 63148
>
> and layer_code = 'R';
>
>
>    COUNT(1)
>
> ----------
>
>        338
>
> SQL_ID  d26ra17afbfyh, child number 0
>
> -------------------------------------
>
>
> ---------------------------------------------------------------------------------
>
> | Id  | Operation         | Name       | Starts | E-Rows | A-Rows |
> A-Time   |
>
>
> ---------------------------------------------------------------------------------
>
> |   0 | SELECT STATEMENT  |            |      1 |        |      1
> |00:00:00.01 |
>
> |   1 |  SORT AGGREGATE   |            |      1 |      1 |      1
> |00:00:00.01 |
>
> |*  2 |   INDEX RANGE SCAN| T_EXT_UK_I |      1 |    142 |    338
> |00:00:00.01 |
>
>
> ---------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>    2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')
>
>
>  A slight degradation on the estimation occurred. Has the CBO used the
> extension to compute the 142 estimated cardinality? I don't think so
> otherwise it would have given the following estimation:
>
> E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
>
> E-rows = 803809 * 1/(4967) = 161.829877
>
> In addition, the following line in the 10053 trace file
>
> ColGroup Usage:: PredCnt: 2 Matches Full: Partial:
>
> confirmsthat the extended column statistics has not been used. Otherwise
> we would have observed, as shown in the first 10053 trace file above, the
> following line
>
> ColGroup Usage:: PredCnt: 2 Matches Full: #1 Partial: Sel: 0.0002
>
> This in fact confirms that when *one* of the columns on which an
> extension has been created has a histogram while the virtual column
> representing the extension has no histogram then the extension will not be
> used by the CBO. But why Oracle has not collected a frequency histogram on
> the extension when it knows that one of the columns participating in the
> extension posses a histogram? Does this means that a data skewness is not
> guaranteed for a combination of two columns where one column is skewed and
> the other isn't?
>
> I hope I will publish this article asap
>
> Best regards
>
> Mohamed Houri
>
>
>


-- 

Houri Mohamed

Oracle DBA-Developer-Performance & Tuning

Member of Oraworld-team <http://www.oraworld-team.com/>

Visit My         - Blog <http://www.hourim.wordpress.com/>

Let's Connect -
<http://fr.linkedin.com/pub/mohamed-houri/11/329/857/>*Linkedin
Profile <http://fr.linkedin.com/pub/mohamed-houri/11/329/857/>*

My Twitter <https://twitter.com/MohamedHouri>      - MohamedHouri
<https://twitter.com/MohamedHouri>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Apr 19 2015 - 18:08:30 CEST

Original text of this message