Re: Advice on a weird query plan

From: Dion Cho <ukja.dion_at_gmail.com>
Date: Wed, 16 Apr 2008 18:31:40 -0700 (PDT)
Message-ID: <7ac1133a-e35f-4ccb-a6b3-aa3e35ea9158@d1g2000hsg.googlegroups.com>


On 4¿ù17ÀÏ, ¿ÀÀü8½Ã07ºÐ, Pat <pat.ca..._at_service-now.com> wrote:
> On Apr 16, 3:06 pm, Pat <pat.ca..._at_service-now.com> wrote:
>
>
>
> > On Apr 16, 10:35 am, "Jonathan Lewis" <jonat..._at_jlcomp.demon.co.uk>
> > wrote:
>
> > > "Pat" <pat.ca..._at_service-now.com> wrote in message
>
> > >news:8e132d52-45fb-4d8f-8ded-de9583399902_at_m71g2000hse.googlegroups.com...
>
> > > > Oracle 10.2.0.4
> > > > Running in case Insensative mode
> > > > ALTER SESSION SET NLS_COMP=LINGUISTIC;
> > > > ALTER SESSION SET NLS_SORT=BINARY_CI;
>
> > > > I have a table, sysevent, with about 4.5 million rows in it.
> > > > I have a column "state" with a ridiculously squewed data pattern:
> > > > SQL> select "state", count("state") from sysevent group by "state";
>
> > > > state COUNT("STATE")
> > > > ---------------------------------------- --------------
> > > > processed 4492467
> > > > ready 63
>
> > > > Basically, data gets inserted into this table by an application with a
> > > > state of "ready" then changed, programatically, to processed when it's
> > > > done grinding over the data. So at any given point in time, about
> > > > 99.9999% of the table has a state of "processed".
>
> > > > I have an nls index on state:
>
> > > > create index sysevent_state on sysevent(NLSSORT("state",
> > > > 'nls_sort=''BINARY_CI'''));
>
> > > > I'd reasonably expect a query against state="processed" to use a full
> > > > table scan.
> > > > I'd reasonably expect a query against state="ready" to use an index
> > > > range scan.
>
> > > > I've got current stats on that table, but I double checked for
> > > > purposes of this test:
>
> > > > SQL> analyze table sysevent compute statistics for columns "state"
> > > > size 10;
>
> > > > Table analyzed.
>
> > > > SQL> select endpoint_number, endpoint_value, endpoint_actual_value
> > > > from user_tab_histograms where table_name = 'sysevent' and column_name
> > > > = 'state';
>
> > > > ENDPOINT_NUMBER ENDPOINT_VALUE
> > > > --------------- --------------
> > > > ENDPOINT_ACTUAL_VALUE
> > > > ------------------------------------------------------------------------------------------------------------------------------------
> > > > 4494114 5.8386E+35
> > > > processed
>
> > > > 4494216 5.9398E+35
> > > > ready
>
> > > > So now I do two queries:
>
> > > > SQL> explain plan for select * from sysevent where "state" = 'ready';
>
> > > > Explained.
>
> > > > SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> > > > PLAN_TABLE_OUTPUT
> > > > ------------------------------------------------------------------------------------------------------------------------------------
> > > > Plan hash value: 122562926
>
> > > > ------------------------------------------------------------------------------
> > > > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> > > > |
> > > > ------------------------------------------------------------------------------
> > > > | 0 | SELECT STATEMENT | | 100 | 24700 | 26 (89)| 00:00:01
> > > > |
> > > > |* 1 | TABLE ACCESS FULL| SYSEVENT | 100 | 24700 | 26 (89)|
> > > > 00:00:01 |
> > > > ------------------------------------------------------------------------------
>
> > > > Predicate Information (identified by operation id):
> > > > ---------------------------------------------------
>
> > > > PLAN_TABLE_OUTPUT
> > > > ------------------------------------------------------------------------------------------------------------------------------------
>
> > > > 1 -
> > > > filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('726561
> > > > 647900') )
>
> > > > 14 rows selected.
>
> > > > SQL> explain plan for select * from sysevent where "state" =
> > > > 'processed';
>
> > > > Explained.
>
> > > > SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> > > > PLAN_TABLE_OUTPUT
> > > > ------------------------------------------------------------------------------------------------------------------------------------
> > > > Plan hash value: 122562926
>
> > > > ------------------------------------------------------------------------------
> > > > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> > > > |
> > > > ------------------------------------------------------------------------------
> > > > | 0 | SELECT STATEMENT | | 100 | 24700 | 26 (89)| 00:00:01
> > > > |
> > > > |* 1 | TABLE ACCESS FULL| SYSEVENT | 100 | 24700 | 26 (89)|
> > > > 00:00:01 |
> > > > ------------------------------------------------------------------------------
>
> > > > Predicate Information (identified by operation id):
> > > > ---------------------------------------------------
>
> > > > PLAN_TABLE_OUTPUT
> > > > ------------------------------------------------------------------------------------------------------------------------------------
>
> > > > 1 -
> > > > filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('70726F
> > > > 63657373656400') )
>
> > > > 14 rows selected.
>
> > > > I cannot, for the life of me, figure out why we're table scanning in
> > > > the case of state='ready'. It's a highly, highly selective query and
> > > > the histogram bears that out. It's not bind variable peeking b/c I'm
> > > > using literals in this case.
>
> > > > Does anyone have any idea what I'm doing wrong?
>
> > > You need the histogram to be on the hidden column that defines
> > > the function-based index, not the underlying column.
>
> > > The hidden column will have a name like 'SYS_NC00032$.
>
> > > You should really be using dbms_stats to collect statistics; the following
> > > should be enough to ensure you have stats on the appropriate column:
>
> > > execute dbms_stats.gather_table_stats(user, 'sysevent',method_opt=>'for all
> > > hidden columns');
>
> > > Then as a check:
> > > select column_name, endpoint_Number, endpoint_value
> > > from user_tab_histograms
> > > where table_name = 'SYSEVENT'
> > > and column_name like 'SYS%'
> > > order by column_name, endpoint_Number;
>
> > > --
> > > Regards
>
> > > Jonathan Lewishttp://jonathanlewis.wordpress.com
>
> > > Author: Cost Based Oracle: Fundamentalshttp://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
>
> > > The Co-operative Oracle Users' FAQhttp://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
> > Thanks! I never would have thought of that :).
>
> Just thought I'd let you know your proposal was spot on.
>
> I ran a histogram against the hidden column:
>
> analyze table sysevent compute statistics for columns SYS_NC00022$
> size 10;
>
> After that, I got reasonable execution plans.
>
> SQL> explain plan for select * from sysevent where "state" = 'ready';
>
> Explained.
>
> SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------
> Plan hash value: 2297044318
>
> -----------------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
> Time |
> -----------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 8 | 2024 | 8 (0)|
> 00:00:01 |
> | 1 | TABLE ACCESS BY INDEX ROWID| SYSEVENT | 8 | 2024
> | 8 (0)| 00:00:01 |
> |* 2 | INDEX RANGE SCAN | SYSEVENT_INDEX1 | 8 | |
> 3 (0)| 00:00:01 |
> -----------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------
>
> 2 -
> access(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('726561647900') )
>
> 14 rows selected.
>
> SQL> explain plan for select * from sysevent where "state" =
> 'processed'; 2
>
> Explained.
>
> SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------
> Plan hash value: 122562926
>
> ------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> |
> ------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 2115K| 510M| 17949 (3)| 00:03:36
> |
> |* 1 | TABLE ACCESS FULL| SYSEVENT | 2115K| 510M| 17949 (3)|
> 00:03:36 |
> ------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------
>
> 1 -
> filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('70726F
> 63657373656400') )
>
> 14 rows selected.

Congratulation for problem being solved. But, why you're not using dbms_stats yet? Just from curiostiy. :-)

Besides, i don't think that state column is appropriate for index. Actually it's the most inefficient index imagineable from the point of view of space abuse.

As an alternative, how about following technique?

create or replace function check_decode(state in varchar2) return varchar2
deterministic
is
begin
 if lower(state) = 'processed' then
  return null;
 else
  return lower(state);
 end if;
end;
/

create index i on t(check_decode(state));

exec dbms_stats.gather_table_stats(..., method_opt=> 'for all columns size auto', ...);

alter session set query_rewrite_enabled = true; alter session set query_rewrite_integrity = trusted;

select * from t where check_decode(state) = 'processed' -- this will scan table
select * from t where check_decode(state) = 'ready' -- this will seek index

Works same, but shows the result of more efficient use of space.

Dion Cho Received on Wed Apr 16 2008 - 20:31:40 CDT

Original text of this message