Re: Advice on a weird query plan
Date: Wed, 16 Apr 2008 16:07:50 -0700 (PDT)
Message-ID: <7109d95b-60d6-4b3b-be46-f44c760f45ca@t54g2000hsg.googlegroups.com>
On Apr 16, 3:06 pm, Pat <pat.ca..._at_service-now.com> wrote:
> On Apr 16, 10:35 am, "Jonathan Lewis" <jonat..._at_jlcomp.demon.co.uk>
> wrote:
>
>
>
> > "Pat" <pat.ca..._at_service-now.com> wrote in message
>
> >news:8e132d52-45fb-4d8f-8ded-de9583399902_at_m71g2000hse.googlegroups.com...
>
> > > Oracle 10.2.0.4
> > > Running in case Insensative mode
> > > ALTER SESSION SET NLS_COMP=LINGUISTIC;
> > > ALTER SESSION SET NLS_SORT=BINARY_CI;
>
> > > I have a table, sysevent, with about 4.5 million rows in it.
> > > I have a column "state" with a ridiculously squewed data pattern:
> > > SQL> select "state", count("state") from sysevent group by "state";
>
> > > state COUNT("STATE")
> > > ---------------------------------------- --------------
> > > processed 4492467
> > > ready 63
>
> > > Basically, data gets inserted into this table by an application with a
> > > state of "ready" then changed, programatically, to processed when it's
> > > done grinding over the data. So at any given point in time, about
> > > 99.9999% of the table has a state of "processed".
>
> > > I have an nls index on state:
>
> > > create index sysevent_state on sysevent(NLSSORT("state",
> > > 'nls_sort=''BINARY_CI'''));
>
> > > I'd reasonably expect a query against state="processed" to use a full
> > > table scan.
> > > I'd reasonably expect a query against state="ready" to use an index
> > > range scan.
>
> > > I've got current stats on that table, but I double checked for
> > > purposes of this test:
>
> > > SQL> analyze table sysevent compute statistics for columns "state"
> > > size 10;
>
> > > Table analyzed.
>
> > > SQL> select endpoint_number, endpoint_value, endpoint_actual_value
> > > from user_tab_histograms where table_name = 'sysevent' and column_name
> > > = 'state';
>
> > > ENDPOINT_NUMBER ENDPOINT_VALUE
> > > --------------- --------------
> > > ENDPOINT_ACTUAL_VALUE
> > > ------------------------------------------------------------------------------------------------------------------------------------
> > > 4494114 5.8386E+35
> > > processed
>
> > > 4494216 5.9398E+35
> > > ready
>
> > > So now I do two queries:
>
> > > SQL> explain plan for select * from sysevent where "state" = 'ready';
>
> > > Explained.
>
> > > SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
> > > Plan hash value: 122562926
>
> > > ------------------------------------------------------------------------------
> > > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> > > |
> > > ------------------------------------------------------------------------------
> > > | 0 | SELECT STATEMENT | | 100 | 24700 | 26 (89)| 00:00:01
> > > |
> > > |* 1 | TABLE ACCESS FULL| SYSEVENT | 100 | 24700 | 26 (89)|
> > > 00:00:01 |
> > > ------------------------------------------------------------------------------
>
> > > Predicate Information (identified by operation id):
> > > ---------------------------------------------------
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
>
> > > 1 -
> > > filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('726561
> > > 647900') )
>
> > > 14 rows selected.
>
> > > SQL> explain plan for select * from sysevent where "state" =
> > > 'processed';
>
> > > Explained.
>
> > > SQL> SELECT * FROM TABLE(dbms_xplan.display);
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
> > > Plan hash value: 122562926
>
> > > ------------------------------------------------------------------------------
> > > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> > > |
> > > ------------------------------------------------------------------------------
> > > | 0 | SELECT STATEMENT | | 100 | 24700 | 26 (89)| 00:00:01
> > > |
> > > |* 1 | TABLE ACCESS FULL| SYSEVENT | 100 | 24700 | 26 (89)|
> > > 00:00:01 |
> > > ------------------------------------------------------------------------------
>
> > > Predicate Information (identified by operation id):
> > > ---------------------------------------------------
>
> > > PLAN_TABLE_OUTPUT
> > > ------------------------------------------------------------------------------------------------------------------------------------
>
> > > 1 -
> > > filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('70726F
> > > 63657373656400') )
>
> > > 14 rows selected.
>
> > > I cannot, for the life of me, figure out why we're table scanning in
> > > the case of state='ready'. It's a highly, highly selective query and
> > > the histogram bears that out. It's not bind variable peeking b/c I'm
> > > using literals in this case.
>
> > > Does anyone have any idea what I'm doing wrong?
>
> > You need the histogram to be on the hidden column that defines
> > the function-based index, not the underlying column.
>
> > The hidden column will have a name like 'SYS_NC00032$.
>
> > You should really be using dbms_stats to collect statistics; the following
> > should be enough to ensure you have stats on the appropriate column:
>
> > execute dbms_stats.gather_table_stats(user, 'sysevent',method_opt=>'for all
> > hidden columns');
>
> > Then as a check:
> > select column_name, endpoint_Number, endpoint_value
> > from user_tab_histograms
> > where table_name = 'SYSEVENT'
> > and column_name like 'SYS%'
> > order by column_name, endpoint_Number;
>
> > --
> > Regards
>
> > Jonathan Lewishttp://jonathanlewis.wordpress.com
>
> > Author: Cost Based Oracle: Fundamentalshttp://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
>
> > The Co-operative Oracle Users' FAQhttp://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
> Thanks! I never would have thought of that :).
Just thought I'd let you know your proposal was spot on.
I ran a histogram against the hidden column:
analyze table sysevent compute statistics for columns SYS_NC00022$ size 10;
After that, I got reasonable execution plans.
SQL> explain plan for select * from sysevent where "state" = 'ready';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2297044318
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8 | 2024 | 8 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| SYSEVENT | 8 | 2024 | 8 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYSEVENT_INDEX1 | 8 | | 3 (0)| 00:00:01 | -----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
2 -
access(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('726561647900') )
14 rows selected.
SQL> explain plan for select * from sysevent where "state" = 'processed'; 2
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 122562926
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2115K| 510M| 17949 (3)| 00:03:36 | |* 1 | TABLE ACCESS FULL| SYSEVENT | 2115K| 510M| 17949 (3)|00:03:36 |
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
1 -
filter(NLSSORT("state",'nls_sort=''BINARY_CI''')=HEXTORAW('70726F
63657373656400') )
14 rows selected. Received on Wed Apr 16 2008 - 18:07:50 CDT