Re: Advice on a weird query plan

From: Pat <pat.casey_at_service-now.com>
Date: Wed, 16 Apr 2008 16:07:50 -0700 (PDT)
Message-ID: <7109d95b-60d6-4b3b-be46-f44c760f45ca@t54g2000hsg.googlegroups.com>


On Apr 16, 3:06 pm, Pat <pat.ca..._at_service-now.com> wrote:
> On Apr 16, 10:35 am, "Jonathan Lewis" <jonat..._at_jlcomp.demon.co.uk>
> wrote:
>
>
>
> > "Pat" <pat.ca..._at_service-now.com> wrote in message
>
> >news:8e132d52-45fb-4d8f-8ded-de9583399902_at_m71g2000hse.googlegroups.com...
>
> > > Oracle 10.2.0.4
> > > Running in case Insensative mode
> > > ALTER SESSION SET NLS_COMP=LINGUISTIC;
> > > ALTER SESSION SET NLS_SORT=BINARY_CI;
>
> > > I have a table, sysevent, with about 4.5 million rows in it.
> > > I have a column "state" with a ridiculously squewed data pattern:
> > > SQL> select "state", count("state") from sysevent group by "state";
>
> > > state COUNT("STATE")
> > > ---------------------------------------- --------------
> > > processed 4492467
> > > ready 63
>
> > > Basically, data gets inserted into this table by an application with a
> > > state of "ready" then changed, programatically, to processed when it's
> > > done grinding over the data. So at any given point in time, about
> > > 99.9999% of the table has a state of "processed".
>
> > > I have an nls index on state:
>
> > > create index sysevent_state on sysevent(NLSSORT("state",
> > > 'nls_sort=''BINARY_CI'''));
>
> > > I'd reasonably expect a query against state="processed" to use a full
> > > table scan.
> > > I'd reasonably expect a query against state="ready" to use an index
> > > range scan.
>
> > > I've got current stats on that table, but I double checked for
> > > purposes of this test:
>
> > > SQL> analyze table sysevent compute statistics for columns "state"
> > > size 10;
>
> > > Table analyzed.
>
> > > SQL> select endpoint_number, endpoint_value, endpoint_actual_value
> > > from user_tab_histograms where table_name = 'sysevent' and column_name
> > > = 'state';
>
> > > ENDPOINT_NUMBER ENDPOINT_VALUE
> > > --------------- --------------
> > > ENDPOINT_ACTUAL_VALUE
> > > ------------------------------------------------------------------------------------------------------------------------------------
> > > 4494114 5.8386E+35
> > > processed
>
> > > 4494216 5.9398E+35
> > > ready
>
> > > So now I do two queries:
>
> > > SQL> explain plan for select * from sysevent where "state" = 'ready';
>
> > > Explained.
>
> > > SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
> > > Plan hash value: 122562926
>
> > > ------------------------------------------------------------------------------
> > > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> > > |
> > > ------------------------------------------------------------------------------
> > > | 0 | SELECT STATEMENT | | 100 | 24700 | 26 (89)| 00:00:01
> > > |
> > > |* 1 | TABLE ACCESS FULL| SYSEVENT | 100 | 24700 | 26 (89)|
> > > 00:00:01 |
> > > ------------------------------------------------------------------------------
>
> > > Predicate Information (identified by operation id):
> > > ---------------------------------------------------
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
>
> > > 1 -
> > > filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('726561
> > > 647900') )
>
> > > 14 rows selected.
>
> > > SQL> explain plan for select * from sysevent where "state" =
> > > 'processed';
>
> > > Explained.
>
> > > SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
> > > Plan hash value: 122562926
>
> > > ------------------------------------------------------------------------------
> > > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> > > |
> > > ------------------------------------------------------------------------------
> > > | 0 | SELECT STATEMENT | | 100 | 24700 | 26 (89)| 00:00:01
> > > |
> > > |* 1 | TABLE ACCESS FULL| SYSEVENT | 100 | 24700 | 26 (89)|
> > > 00:00:01 |
> > > ------------------------------------------------------------------------------
>
> > > Predicate Information (identified by operation id):
> > > ---------------------------------------------------
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
>
> > > 1 -
> > > filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('70726F
> > > 63657373656400') )
>
> > > 14 rows selected.
>
> > > I cannot, for the life of me, figure out why we're table scanning in
> > > the case of state='ready'. It's a highly, highly selective query and
> > > the histogram bears that out. It's not bind variable peeking b/c I'm
> > > using literals in this case.
>
> > > Does anyone have any idea what I'm doing wrong?
>
> > You need the histogram to be on the hidden column that defines
> > the function-based index, not the underlying column.
>
> > The hidden column will have a name like 'SYS_NC00032$.
>
> > You should really be using dbms_stats to collect statistics; the following
> > should be enough to ensure you have stats on the appropriate column:
>
> > execute dbms_stats.gather_table_stats(user, 'sysevent',method_opt=>'for all
> > hidden columns');
>
> > Then as a check:
> > select column_name, endpoint_Number, endpoint_value
> > from user_tab_histograms
> > where table_name = 'SYSEVENT'
> > and column_name like 'SYS%'
> > order by column_name, endpoint_Number;
>
> > --
> > Regards
>
> > Jonathan Lewishttp://jonathanlewis.wordpress.com
>
> > Author: Cost Based Oracle: Fundamentalshttp://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
>
> > The Co-operative Oracle Users' FAQhttp://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
> Thanks! I never would have thought of that :).

Just thought I'd let you know your proposal was spot on.

I ran a histogram against the hidden column:

 analyze table sysevent compute statistics for columns SYS_NC00022$ size 10;

After that, I got reasonable execution plans.

SQL> explain plan for select * from sysevent where "state" = 'ready';

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT



Plan hash value: 2297044318
| Id  | Operation		    | Name	      | Rows  | Bytes | Cost (%CPU)|
Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		      |     8 |  2024 |     8	(0)|
00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| SYSEVENT	      |     8 |  2024
|     8	(0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN	    | SYSEVENT_INDEX1 |     8 |       |
3	(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT



   2 -
access(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('726561647900') )

14 rows selected.

SQL> explain plan for select * from sysevent where "state" = 'processed'; 2

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT



Plan hash value: 122562926

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
|   0 | SELECT STATEMENT  |	     |	2115K|	 510M| 17949   (3)| 00:03:36
|
|*  1 |  TABLE ACCESS FULL| SYSEVENT |	2115K|	 510M| 17949   (3)|
00:03:36 |

Predicate Information (identified by operation id):


PLAN_TABLE_OUTPUT


   1 -
filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('70726F

              63657373656400') )

14 rows selected. Received on Wed Apr 16 2008 - 18:07:50 CDT

Original text of this message