Differences in Range Scan Performance between Binary and NLS Indexes

From: Pat <pat.casey_at_service-now.com>
Date: Tue, 6 May 2008 13:16:56 -0700 (PDT)
Message-ID: <b0cf7d61-3547-48e0-a6e4-f2cdbfcde83a@27g2000hsf.googlegroups.com>


I have a table set with about 340k rows in it. User's execute queries against this table for arbitrary search strings which are resolved against the name table in the column. The table contains mixed case, but the users expect to match their search regardless of case. So, for example, both of these should match a search term of "badg":

ARabidBadger
somebadglowercase

A typical search, for all those entries whose name contains "badg" would look like this:

SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0

To make this query efficient, I added a covering index

create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"), "sys_id");

And our query plan looks like this and completes in about 120 ms. This was worst case since only 1 row was returned. Short search strings like "b" will hit their stop key quickly and exit after finding 250 matches without having to exhaust the index.

Still, worst case, this query does a full index scan in about 120ms;

SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0;

Elapsed: 00:00:00.12

Execution Plan



Plan hash value: 424525705

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time |

| 0 | SELECT STATEMENT | | 250 | 11750 | 46 (0)|
00:00:01 |
|* 1 | VIEW | | 250 | 11750 | 46 (0)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | VIEW | | 251 | 8534 | 46 (0)| 00:00:01
|
|* 4 | INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 |
813K| 46 (0)| 00:00:01 |

Predicate Information (identified by operation id):


   1 - filter("RNUM">0)
   2 - filter(ROWNUM<=250)
   4 - filter(LOWER("name") LIKE '%badge%')


Statistics


	141  recursive calls
	  0  db block gets
       2964  consistent gets
	  0  physical reads
	  0  redo size
	439  bytes sent via SQL*Net to client
	400  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Recently though, we started storing NLS characters in this column (european customers), so we'll have values like:

badger
bädger <-- note the accented a

The users want both of the above to collate next to each other.

So we, not unnaturally, put the database in linguisitc mode:

alter session set nls_sort=german;
alter session set nls_comp=linguistic;

Then we issue the same query. The query plan here involves a sort, but the query returns a grand total of 1 record, so the sort isn't material to the response time.

Note that the same query, running the same index full scan against the same index, now takes 1.23 second, roughly 10X as long as it took when running in binary mode.

SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0;

Elapsed: 00:00:01.23

Execution Plan



Plan hash value: 3626452865

| Id | Operation | Name | Rows | Bytes |TempSpc| Cost
(%CPU)| Time |

| 0 | SELECT STATEMENT | | 250 | 11750 | | 870 (3)|
00:00:11 |
|* 1 | VIEW | | 250 | 11750 | | 870 (3)| 00:00:11
|
|* 2 | COUNT STOPKEY | | | | | | |
| 3 | VIEW | | 16997 | 564K| | 870 (3)|
00:00:11 |
|* 4 | SORT ORDER BY STOPKEY| | 16997 | 813K| 2008K|
870 (3)| 00:00:11 |
|* 5 | INDEX FAST FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997
| 813K| | 659 (3)| 00:00:08 |

Predicate Information (identified by operation id):


   1 - filter("RNUM">0)
   2 - filter(ROWNUM<=250)
   4 - filter(ROWNUM<=250)
   5 - filter(LOWER("name") LIKE '%badge%')


Statistics


	134  recursive calls
	  0  db block gets
       2978  consistent gets
       2929  physical reads
	  0  redo size
	439  bytes sent via SQL*Net to client
	400  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  1  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

If I take off the order by clause so we use exactly the same plan as our original query, we still get:

SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ) a where ROWNUM <= 250) WHERE rnum > 0;

Elapsed: 00:00:01.20

Execution Plan



Plan hash value: 1941712263

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time |

| 0 | SELECT STATEMENT | | 250 | 11750 | 11 (0)|
00:00:01 |
|* 1 | VIEW | | 250 | 11750 | 11 (0)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
|* 3 | INDEX FAST FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 251 |
12299 | 11 (0)| 00:00:01 |

Predicate Information (identified by operation id):


   1 - filter("RNUM">0)
   2 - filter(ROWNUM<=250)
   3 - filter(LOWER("name") LIKE '%badge%')


Statistics


	134  recursive calls
	  0  db block gets
       2979  consistent gets
	  0  physical reads
	  0  redo size
	439  bytes sent via SQL*Net to client
	400  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	  1  rows processed


So my questions are is:

  1. Am I doing something wrong here? Why does putting the system into NLS_SORT and NLS_COMP make this big a difference on performance?
  2. Is there an index I should add here that would allow some form of efficient querying across this set? I know I'm going to have to do a full scan here (contains queries being what they are), but why is my full scan 10X slower in linguistic mode?
  3. Is there some other recommended approach to getting linguistic collation working efficiently? Shadow columns aren't really practical here (and even then I'm not sure what I'd put in the shadow unless I wrote my own german to english unaccenter).

Any help would be much appreciated. Received on Tue May 06 2008 - 15:16:56 CDT

Original text of this message