Re: Differences in Range Scan Performance between Binary and NLS Indexes

From: joel garry <joel-garry_at_home.com>
Date: Tue, 6 May 2008 14:26:43 -0700 (PDT)
Message-ID: <f00d4c98-6539-4ac9-a1f0-1809824c1152@q27g2000prf.googlegroups.com>


On May 6, 1:16 pm, Pat <pat.ca..._at_service-now.com> wrote:
> I have a table set with about 340k rows in it. User's execute queries
> against this table for arbitrary search strings which are resolved
> against the name table in the column. The table contains mixed case,
> but the users expect to match their search regardless of case. So, for
> example, both of these should match a search term of "badg":
>
> ARabidBadger
> somebadglowercase
>
> A typical search, for all those entries whose name contains "badg"
> would look like this:
>
> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum
> FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0  WHERE
> lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name"))
> a where ROWNUM <= 250) WHERE rnum > 0
>
> To make this query efficient, I added a covering index
>
> create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"),
> "sys_id");
>
> And our query plan looks like this and completes in about 120 ms. This
> was worst case since only 1 row was returned. Short search strings
> like "b" will hit their stop key quickly and exit after finding 250
> matches without having to exhaust the index.
>
> Still, worst case, this query does a full index scan in about 120ms;
>
> SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM
> rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0  WHERE
> lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name"))
> a where ROWNUM <= 250) WHERE rnum > 0;
>
> Elapsed: 00:00:00.12
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 424525705
>
> ---------------------------------------------------------------------------­---------------------
> | Id  | Operation          | Name                      | Rows  | Bytes | Cost (%CPU)|
> Time     |
> ---------------------------------------------------------------------------­---------------------
> |   0 | SELECT STATEMENT   |                           |   250 | 11750 |    46   (0)|
> 00:00:01 |
> |*  1 |  VIEW              |                           |   250 | 11750 |    46   (0)| 00:00:01 |
> |*  2 |   COUNT STOPKEY    |                           |       |       |            |          |
> |   3 |    VIEW            |                           |   251 |  8534 |    46   (0)| 00:00:01
> |
> |*  4 |     INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 |
> 813K|    46      (0)| 00:00:01 |
> ---------------------------------------------------------------------------­---------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    1 - filter("RNUM">0)
>    2 - filter(ROWNUM<=250)
>    4 - filter(LOWER("name") LIKE '%badge%')
>
> Statistics
> ----------------------------------------------------------
>         141  recursive calls
>           0  db block gets
>        2964  consistent gets
>           0  physical reads
>           0  redo size
>         439  bytes sent via SQL*Net to client
>         400  bytes received via SQL*Net from client
>           2  SQL*Net roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
>
> Recently though, we started storing NLS characters in this column
> (european customers), so we'll have values like:
>
> badger
> bädger <-- note the accented a
>
> The users want both of the above to collate next to each other.
>
> So we, not unnaturally, put the database in linguisitc mode:
>
> alter session set nls_sort=german;
> alter session set nls_comp=linguistic;
>
> Then we issue the same query. The query plan here involves a sort, but
> the query returns a grand total of 1 record, so the sort isn't
> material to the response time.

...

> 1) Am I doing something wrong here? Why does putting the system into
> NLS_SORT and NLS_COMP make this big a difference on performance?
> 2) Is there an index I should add here that would allow some form of
> efficient querying across this set? I know I'm going to have to do a
> full scan here (contains queries being what they are), but why is my
> full scan 10X slower in linguistic mode?
> 3) Is there some other recommended approach to getting linguistic
> collation working efficiently? Shadow columns aren't really practical
> here (and even then I'm not sure what I'd put in the shadow unless I
> wrote my own german to english unaccenter).
>
> Any help would be much appreciated.

See metalink Note:30779.1

"Setting NLS_SORT to anything other than BINARY causes a sort to use a full table scan, regardless of the path chosen by the optimizer. BINARY is the exception because indexes are built according to a binary order of keys. Thus the optimizer can use an index to satisfy the ORDER BY clause when NLS_SORT is set to BINARY. If NLS_SORT is set to any linguistic sort, the optimizer must include a full table scan and a full sort in the execution plan. "

Things may be different in different versions, I wouldn't know.

jg

--
@home.com is bogus.
http://forums.oracle.com/forums/thread.jspa?messageID=2507441&#2507441
Received on Tue May 06 2008 - 16:26:43 CDT

Original text of this message