Re: null values

From: Martijn Bos <maboc_at_maboc.nl>
Date: Wed, 20 Nov 2013 23:13:23 +0100
Message-ID: <20131120221323.GC17321_at_app01.bos>



Jonathan,

As suggested I re-created the table, now with 100000 rows, to eliminate 'small-set' anomalys. Also as shown in a post to Mark Farnham, the number of nulls is reduced to 1% of the rows.

Following are some statistics:
***** Table Stats *****

TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN

--------------- -------- ---------- ---------- ---------- ---------- ------------ -------------------- -----------
CBO_TAB_1       VALID            10                100000       7804            0 2013-NOV-20 22:27:59         518


***** Table Column Stats *****
TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLO USE AVG_COL_LEN HISTOGRAM --------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- -------------------- ----------- --- --- ----------- --------------- CBO_TAB_1 NULL_COL 99864 C102 C30A6464 .000010014 1000 1 2013-NOV-20 22:27:58 99000 YES NO 5 NONE
***** Index Stats *****
INDEX_NAME INDEX_TYPE UNIQUENES BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZED ------------------------------ --------------------------- --------- ---------- ----------- ------------- ----------------- ---------- -------------------- CBO_TAB_1_NULL_COL NORMAL NONUNIQUE 1 220 99000 7693 99000 2013-NOV-20 22:27:59
***** Index Columns ****
INDEX_NAME TABLE_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- ------------------------- --------------- CBO_TAB_1_NULL_COL CBO_TAB_1 NULL_COL 1

The cardinality estimated by the CBO did not change (luckily): SQL> explain plan for select count(null_col) from cbo_tab_1; SQL> _at_xpln

ID         OPERATION                                OBJECT                                   ALIAS                CARDINALITY       COST OPTIMIZER            BYTES
---------- ---------------------------------------- ---------------------------------------- -------------------- ----------- ---------- --------------- ----------
--- 0       SELECT STATEMENT()                                                                                              1         11 ALL_ROWS                 5
--P 1        SORT(AGGREGATE)                                                                                                1                                     5
--P 2         INDEX(FAST FULL SCAN)                 (INDEX) MARTIJN.CBO_TAB_1_NULL_COL       CBO_TAB_1_at_SEL$1           100000         11 ANALYZED            500000

    ID PROJECTION

------ ---------------------------------------------------------------------------
     1 (#keys=0) COUNT("NULL_COL")[22]
     2 "NULL_COL"[NUMBER,22]

SQL> As shown in a post to Mark Farnham I noticed, since you and Mark mentioned it, that including some predicate that other statistics were used by the CBO.

Still I was curious as to what a 100053 trace might tell me, and as you allready showed (in another post) also, no mention of the number of rows in the index.



BASE STATISTICAL INFORMATION
  • 2013-11-20 22:49:58.415
    Table Stats:: Table: CBO_TAB_1 Alias: CBO_TAB_1 #Rows: 100000 #Blks: 7804 AvgRowLen: 518.00 ChainCnt: 0.00 Index Stats:: Index: CBO_TAB_1_CLU Col#: 3 LVLS: 1 #LB: 208 #DK: 1001 LB/K: 1.00 DB/K: 8.00 CLUF: 8592.00 Index: CBO_TAB_1_ID Col#: 1 LVLS: 1 #LB: 208 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 7693.00 Index: CBO_TAB_1_NULL_COL Col#: 4 LVLS: 1 #LB: 220 #DK: 99000 LB/K: 1.00 DB/K: 1.00 CLUF: 7693.00 Index: CBO_TAB_1_SCA Col#: 2 LVLS: 1 #LB: 194 #DK: 10 LB/K: 19.00 DB/K: 7692.00 CLUF: 76924.00 Access path analysis for CBO_TAB_1

Anyway, it strikes me as a little odd that, the number of rows in a index are not being used by the CBO (at least not as far as I can see). Maybe it's not a bug, but to me it looks close to one.

In another post you say "In fact, the only time you would need this statistic would, I think, be for the specific case you've supplied - when you want to count the number of non-null entries without including a predicate of the form "is not null"." I suppose you are right, but over time I learned that there is always "another" side, which I did not see. So I will not state that this statistic is only needed in this very spesific case :-)

Thanks for taking time to think with me.

Best Regards,
Martijn

On Tue, Nov 19, 2013 at 01:21:06AM +0000, Jonathan Lewis wrote:

> 
> 
> Martijn,
> 
> I haven't had time to look at this, but it wouldn't surprise me in the slightest that you've managed to find an optimizer error with an index fast full scan - it wouldn't be the first time that the optimizer has used a table-based strategy to handle an index-based execution path.
> 
> Note particularly that you have no predicate on any of the columns, so the selectivity of your query is necessarily 1 - at which point Oracle can detect that there are 1,000 rows in the table (which you would HAVE to visit if you did a full tablescan), but fails to allow for the fact that if you use an index fast full scan the number of index entries is smaller than the number of index entries.
> 
> I recall, though that there was a "magic" 10% relating to null-related calculations that appeared some time ago, so possibly your example is on the boundary of a change in strategy.  It would be interesting to see what happens in a more extreme case - e.g. create the table and index with 50% of the rows set to null (and create the table with a much larger volume of data anyway (small number often introduce special cases).
> 
> Obviously a 10053 trace might give you some clue that Oracle is, very specifically, failing to consider the number of rows in the index when doing its calculations.
> 
> Regards
> Jonathan Lewis
> 
> ________________________________________
> From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of Martijn Bos [maboc_at_maboc.nl]
> Sent: 17 November 2013 21:50
> To: Oracle-L
> Subject: null values
> 
> Hi List,
> 
> On a play-around sytem I do the following:
> (This is a 11.2.0.4 RDBMS on linux x86 (32 bits))
> 
> drop table cbo_tab_1;
> create table cbo_tab_1 (id number, sca number(10,0), clu number(10,0), null_col number(10,0), filler char(500)) pctfree 99 pctused 1;
> 
> -- Following statement makes sure that ther actually are nulls in column null_col
> insert into cbo_tab_1 select level, mod(level,10), round(level/100), decode(mod(level, 10), 0, null, level), 'filler' from   dual connect by level<=1000;
> commit;
> 
> create unique index CBO_TAB_1_ID on CBO_TAB_1(id);
> -- create index cbo_tab_1_id on cbo_tab_1(id);
> create index cbo_tab_1_sca on cbo_tab_1(sca);
> create index cbo_tab_1_clu on cbo_tab_1(clu);
> create index cbo_tab_1_null_col on cbo_tab_1(null_col);
> 
> exec dbms_stats.gather_table_stats('MARTIJN','CBO_TAB_1', block_sample=>false, cascade=>true, method_opt=>'for all columns size 1');
> alter system flush shared_pool; --Since I try it a lot I need it to parse over and over again.
> 
> explain plan for select count(null_col) from cbo_tab_1;  --HERE IS THE SQL OF WHICH I DON'T UNDERSTAND THE EXPLAIN PLAN
> 
> SQL> _at_xpln -- A script I created just to get a little more understanding of explaining statement.
> 
> ID         OPERATION                                OBJECT                                   ALIAS                CARDINALITY       COST OPTIMIZER            BYTES
> ---------- ---------------------------------------- ---------------------------------------- -------------------- ----------- ---------- --------------- ----------
> --- 0       SELECT STATEMENT()                                                                                              1          2 ALL_ROWS                 4
> --P 1        SORT(AGGREGATE)                                                                                                1                                     4
> --P 2         INDEX(FAST FULL SCAN)                 (INDEX) MARTIJN.CBO_TAB_1_NULL_COL       CBO_TAB_1_at_SEL$1             1000          2 ANALYZED              4000
> 
>     ID PROJECTION
> ------ ---------------------------------------------------------------------------
>      1 (#keys=0) COUNT("NULL_COL")[22]
>      2 "NULL_COL"[NUMBER,22]
> SQL>
> 
> (if you find that suspect, I also added a more documented way to display explain plans)
> 
> SQL> select * from table(dbms_xplan.display(format=>'all'));
> 
> PLAN_TABLE_OUTPUT
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Plan hash value: 2862360477
> 
> --------------------------------------------------------------------------------------------
> | Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
> --------------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT      |                    |     1 |     4 |     2   (0)| 00:02:01 |
> |   1 |  SORT AGGREGATE       |                    |     1 |     4 |            |          |
> |   2 |   INDEX FAST FULL SCAN| CBO_TAB_1_NULL_COL |  1000 |  4000 |     2   (0)| 00:02:01 |
> --------------------------------------------------------------------------------------------
> 
> Query Block Name / Object Alias (identified by operation id):
> -------------------------------------------------------------
> 
>    1 - SEL$1
>    2 - SEL$1 / CBO_TAB_1_at_SEL$1
> 
> Column Projection Information (identified by operation id):
> -----------------------------------------------------------
> 
>    1 - (#keys=0) COUNT("NULL_COL")[22]
>    2 - "NULL_COL"[NUMBER,22]
> SQL>
> 
> 
> 
> Now the thing that is troubling me is the cardinality estimate at id 2 (1000). I would say that the CBO could esitmate 900 rows.
> Where does the CBO get's that idea of 1000?
> 
> 
> SQL> select TABLE_NAME, STATUS, PCT_FREE, PCT_USED, NUM_ROWS, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED, AVG_ROW_LEN
> from   user_tables
> where  table_name='CBO_TAB_1';
> 
> TABLE_NAME      STATUS     PCT_FREE   PCT_USED   NUM_ROWS     BLOCKS EMPTY_BLOCKS LAST_ANALYZED        AVG_ROW_LEN
> --------------- -------- ---------- ---------- ---------- ---------- ------------ -------------------- -----------
> CBO_TAB_1       VALID            99                  1000       1000            0 2013-NOV-17 22:27:10         515
> 
> SQL> select index_name, index_type, uniqueness, blevel, leaf_blocks, distinct_keys, clustering_factor, num_rows, last_analyzed
> from   user_indexes
> where  table_name='CBO_TAB_1';
> 
> INDEX_NAME                     INDEX_TYPE                  UNIQUENES     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED
> ------------------------------ --------------------------- --------- ---------- ----------- ------------- ----------------- ---------- --------------------
> CBO_TAB_1_ID                   NORMAL                      UNIQUE             1           2          1000              1000       1000 2013-NOV-17 22:27:10
> CBO_TAB_1_SCA                  NORMAL                      NONUNIQUE          1           2            10              1000       1000 2013-NOV-17 22:27:10
> CBO_TAB_1_CLU                  NORMAL                      NONUNIQUE          1           2            11              1000       1000 2013-NOV-17 22:27:10
> CBO_TAB_1_NULL_COL             NORMAL                      NONUNIQUE          1           2           900               900        900 2013-NOV-17 22:27:10
> SQL>
> 
> As far as I'm concerned/understand the CBO chooses the right index (CBO_TAB_1_NULL_COL). CBO_TAB_1_NULL_COL has 900 num rows, since the other 100 rows are null.
> I would say that in this specific case it would be easy for the CBO to estimate 900 rows since it only visits this index (all information is in the index).
> 
> Now I'm thinking that I might oversee something very simple (or that i'm flat out wrong), so would anyone be so kind as to point me in the right direction for some guidance, or (better yet :-)) explain where my thinking is wrong.
> 
> 
> Best Regards,
> Martijn



--
http://www.freelists.org/webpage/oracle-l
Received on Wed Nov 20 2013 - 23:13:23 CET

Original text of this message