Re: indexing null values curious case?

From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
Date: Thu, 21 Nov 2019 19:23:25 +0530
Message-ID: <CAP-Ryww3j_9idJvKWxeYvhsNky2k1c=CcsT8ho2+qQ44dXx8fg_at_mail.gmail.com>



Missed your mail Andy,
yap, appears that i missed this altogether, and was looking at dba_tab_histograms regarding endpoints, dba_ind_statistics, looked at dba_tab_col_statistics view... looks like it does maintain these statistics.
have to look more keenly to how oracle estimates the selectivity. Thanks for your time... its really good that i learnt some thing new today.

Thanks,
Vishnu

On Thu, Nov 21, 2019 at 6:09 PM Andy Sayer <andysayer_at_gmail.com> wrote:

> > Another part I believe is that Oracle's statistics (including histogram
> doesn't maintain any statistics regarding nulls, which makes sense in a
> way), as these are only explicit cases and the concept of statistics
> optimizer everything resolves only around the values that are present but
> not null..
>
> It’s quite the opposite, Oracle gives you a free frequency histogram for
> null values with the num_nulls column. It will use these statistics when
> relevant.
>
> Thanks,
> Andy
>
> On Thu, 21 Nov 2019 at 12:18, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
> wrote:
>
>> Vishnu,
>>
>> Your STUDENTS example looks like a bug. (The stats for the NULL and ''
>> case are identical, by the way, so it's likely to be related to the way
>> that Oracle stores the '' as the DEFAULT for the column definition).
>>
>> re:
>> select * from temp where mark1 is null; --> it always goes with a full
>> table scan
>> It's got to; Oracle know that there are some rows with mark1 null, some
>> with mark2 null, so it have to asume that the two sets of rows may overlap
>> to give a row which does not appear in thei index.
>>
>> select * from temp where mark1 is null and mark2 = 123123;
>> Now you're looking for a row which (if it exists) will be in the index.
>> You could equally change the mark2 = 123123 to mark2 is not null.
>>
>> re:
>> all i did was add an extra column to the index (virtual column).
>> So now you have a column in the index which is NEVER null, so every row
>> in the table will appear in the index, so the index can be used for mark1
>> is null
>>
>>
>> re
>> mark2 is null
>> The possibly paths are
>> index skip scan
>> index full scan
>> index fast full scan
>> tablescan
>>
>> The choice of tablescan is almost certainly dictated by cost.
>>
>>
>> BOTTOM LINE - your students case looks like a bug relating to the special
>> case of the way Oracle handles the DEFAULT for a constant empty string. The
>> rest are correct behaviour.
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> ________________________________________
>> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
>> behalf of Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
>> Sent: 21 November 2019 11:07
>> To: Andy Sayer
>> Cc: oracle-l_at_freelists.org
>> Subject: Re: indexing null values curious case?
>>
>> Apparently looks like this entire section dealing with the null is not
>> developed in its entirety.. the following is a different case as well
>> regarding the nulls.
>>
>> even with a multi-column index, considering another scenario as the
>> following:
>> create table temp (roll number, name varchar2(20), mark1 number, mark2
>> number);
>> insert into temp select rownum, dbms_random.string(0,20),
>> round(dbms_random.value(0,100), round(dbms_random.value(0,200)) from dual
>> connect by level < 1000000;
>> insert into temp (roll, name, mark2) select rownum,
>> dbms_random.string(0,20), round(dbms_random.value(0,200) from dual connect
>> by level < 10;
>> insert into temp (roll, name, mark1) select rownum,
>> dbms_random.string(0,20), round(dbms_random.value(0,200) from dual connect
>> by level < 10;
>> commit;
>> create index idx on temp(mark1, mark2);
>> Here i have only 10 rows from each of the columns as null;
>>
>> exec dbms_Stats.gather_table_stats('VISHNU','TEMP',CASCADE=>TRUE);
>>
>> from the leaf blocks we can find that the nulls are stored in the index.
>> The leaf blocks containing the NULLs is as follows:
>> row#0[8019] flag: -------, lock: 0, len=13
>> col 0; len 2; (2): c1 02
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 de 00 00
>> row#1[7992] flag: -------, lock: 2, len=14
>> col 0; len 3; (3): c2 02 04
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 de 00 02
>> row#2[8006] flag: -------, lock: 0, len=13
>> col 0; NULL
>> col 1; len 2; (2): c1 02
>> col 2; len 6; (6): 01 c0 21 de 00 01
>> row#3[7978] flag: -------, lock: 2, len=14
>> col 0; NULL
>> col 1; len 3; (3): c2 0d 20
>> col 2; len 6; (6): 01 c0 21 de 00 03
>>
>> if we issue a query like select * from temp where mark1 is null; --> it
>> always goes with a full table scan even when 10 rows only have null values
>> and apparently the index leaf blocks do contain entries regarding the null
>> values), for the leading as well as the tail columns as shown above. But
>> the wierd thing is if we include both the columns mark1, mark2 in the
>> predicate clause as following:
>> select * from temp where mark1 is null and mark2 = 123123;
>> the optimizer selects the index.
>>
>>
>> now comes even better part.
>> drop the index and create the index as follows:
>> create index idx on temp(mark1,mark2,1);
>> all i did was add an extra column to the index (virtual column).
>> collected the statistics and ran the query back again.
>> select * from temp where mark1 is null;
>> this time it selects the index properly which makes me wonder if they
>> have included or written the code or made the optimizer to consider the
>> nulls in the predicate clause only in the presence of virtual columns.
>>
>> Another part I believe is that Oracle's statistics (including histogram
>> doesn't maintain any statistics regarding nulls, which makes sense in a
>> way), as these are only explicit cases and the concept of statistics
>> optimizer everything resolves only around the values that are present but
>> not null..
>>
>> also an another distinct behavior observed is this regarding of whether
>> the index is
>> create index idx on temp(mark1,mark2)
>> or create index idx on temp(mark1,mark2,1);
>>
>> any query with predicates involving only mark2 regarding nulls goes with
>> full table scan forget skip scans etc etc.
>> queries like
>> select * from temp where mark1 is not null and mark2 is null;
>> select * from temp where mark2 is null;
>> apparently some explanation for this can be the way oracle doesn't
>> include nulls in the histograms.. and I am tired for the day!
>>
>> I can't make any assumptions as of which of the optimizer code has to be
>> developed just with these observations, but the way oracle handles nulls
>> should be improved consistently.
>>
>> can someone please tell me if i am missing any thing here? Oracle is a
>> like a ocean. so many concepts no wonder i became bald at a very young age.
>>
>> Thanks,
>> Vishnu
>>
>> On Thu, Nov 21, 2019 at 2:05 PM Vishnu Potukanuma <
>> vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>> wrote:
>> This is reproducible case and probably hitting an undeveloped section of
>> oracle code or a bug. Sure, let me see what I can do.
>>
>> Thanks,
>> Vishnu
>>
>> On Thu, Nov 21, 2019 at 1:55 PM Andy Sayer <andysayer_at_gmail.com<mailto:
>> andysayer_at_gmail.com>> wrote:
>> Hi Vishnu
>>
>> File your reproducible test case with Oracle support if you suspect
>> you’re hitting a bug.
>>
>> It is not expected for the existence of a special index to change the
>> result of a query.
>>
>> If you want to find null values and the selectivity is decent for an
>> index then I would use an index on (column_name,0).
>>
>> Thanks,
>> Andy
>>
>> On Thu, 21 Nov 2019 at 08:18, Vishnu Potukanuma <
>> vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>> wrote:
>> Hi,
>>
>> If all the columns in the index are null then the entries are not stored
>> in the index. This is a known fact and empty strings are stored as null
>> values. combining both breaks the consistency part where the query returns
>> wrong results.
>>
>> this is really a worst case as no one creates such a index as
>> (column_name,null) or (column_name,'') as this really doesn't make sense
>> but this breaks the consistency aspect of database as the query gives
>> wrong result as the optimizer consider index even when the leading column
>> is null.
>>
>> the scenario is basically as following:
>> Create table students (student_id number, name varchar2(20);
>> insert into students select rownum, dbms_random.string(0,20) form dual
>> connect by level < 1000000;
>> insert into students (name) select dbms_random.string(0,20) from dual
>> connect by level < 100;
>> commit;
>> create index idx on student(student_id,null);
>> exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
>> select * from students where student_id is null;
>> in this case it goes with the full table scan which is correct since both
>> nulls are not stored.
>>
>> so now we drop the index idx;
>> Here we create the index as this.
>> create index idx on student(student_id,'');
>> exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
>> select * from students where student_id is null;
>> now Oracle retrives the results as 0, with the index scan.
>>
>> the leaf blocks store the details as following since the empty strings
>> are treated as nulls
>> row#0[8019] flag: -------, lock: 0, len=13
>> col 0; len 2; (2): c1 02
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 dc 00 00
>> row#1[8006] flag: -------, lock: 2, len=13
>> col 0; len 2; (2): c1 03
>> col 1; NULL
>> col 2; len 6; (6): 01 c0 21 dc 00 02
>> but things as expected, index doesn't store entries when the leading
>> column is null.
>>
>> here the oracle goes with the index scan and returns zero results. which
>> is inconsistent result.
>> optimizer trace considers that index with (student_id, '');
>>
>> Access Path: TableScan
>> Cost: 21017.655599 Resp: 21017.655599 Degree: 0
>> Cost_io: 20995.000000 Cost_cpu: 807173669
>> Resp_io: 20995.000000 Resp_cpu: 807173669
>> Access Path: index (index (FFS))
>> Index: IDX
>> resc_io: 1039.000000 resc_cpu: 237414695
>> ix_sel: 0.000000 ix_sel_with_filters: 1.000000
>> Access Path: index (FFS)
>> Cost: 1045.663711 Resp: 1045.663711 Degree: 1
>> Cost_io: 1039.000000 Cost_cpu: 237414695
>> Resp_io: 1039.000000 Resp_cpu: 237414695
>> ****** Costing Index IDX
>> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
>> SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
>> Access Path: index (IndexOnly)
>> Index: IDX
>> resc_io: 5.000000 resc_cpu: 235407
>> ix_sel: 6.6556e-04 ix_sel_with_filters: 6.6556e-04
>> Cost: 5.006607 Resp: 5.006607 Degree: 1
>> Best:: AccessPath: IndexRange
>> Index: IDX
>> Cost: 5.006607 Degree: 1 Resp: 5.006607 Card: 999.000000
>> Bytes: 0.000000
>>
>> probably we are hitting an undeveloped section of optimizer? producing a
>> wrong plan is ok, but producing wrong results?
>>
>> Thanks,
>> vishnu
>> --
>> http://www.freelists.org/webpage/oracle-l
>>
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 21 2019 - 14:53:25 CET

Original text of this message