Re: Numerical vs. Character Indexes

From: Thomas Kyte <tkyte_at_us.oracle.com>
Date: Mon, 24 Jan 2000 06:53:16 -0500
Message-ID: <bueo8s0o2irg1lqf40mkuvl8ekrslj6j2j@4ax.com>

A copy of this was sent to JimR1998 <JimR1998_at_aol.com> (if that email address didn't require changing) On Sun, 23 Jan 2000 21:20:57 -0500, you wrote:

>I'm counting the net wisdom here in CDOS to help settle an argument.
>There is a prevailing sense around my department that indexing character
>keys are a bad thing, and that such indexes are drastically less efficient
>than numerical ones. This is resulting in people inventing all kinds of
>kludgy workarounds to circumvent character based indexes.
>
>Their argument, of course, is that the CPU can natively compare numbers in
>a single instruction, but there is no hardware mechanism for comparing
>strings. I believe this argument falls apart at the RDBMS level, as Oracle
>is doing millions of other operations for that query besides a tiny little
>compare.

and the fact that we never store numbers in a native format, we cannot.

>
>Here's my reasoning:
>
> - Internally, Oracle stores both characters and numbers as bytes.
>Since the Oracle NUMBER datatype isn't necessarily the CPU native integer
>format, Oracle cannot and does not combine the byte values together to
>form a larger integer. (i.e. if you have a NUMBER(4) and a CHAR(4), Oracle
>is doing 4 single byte compares either way).
>

well, not really. A CHAR(4) is 5 bytes long (leading byte length, 4 bytes of character data). A Number(4) is from 1 to 4 bytes in length:

ops$tkyte_at_8i> select vsize(0), vsize(9999), vsize(-9999) from dual;

VSIZE(0) VSIZE(9999) VSIZE(-9999)
---------- ----------- ------------

1 3 4

ops$tkyte_at_8i>

In either case though -- your assumption is correct that we never store numbers in a table in the native format -- all data (7 bytes of a DATE, N bytes of a varchar, N bytes of a char, n bytes of a number) are compared using the equivalent of "memcmp()". Char data, when using different character sets to sort and such can take longer in certain cases but typically NOT when using an index.

> - Performance tests on 6 million row tables have yielded a very
>tiny performance difference, sometimes going either way.
>

exactly. (sort of fixes the misconception in paragraph 1 above)

> - It is the depth of the B-Tree index, the number of leaf blocks,
>and uniqueness that govern performance, not the datatype being indexed.
>

pretty much. disk layout - number of 'holes' in the index and other things may come into play but the above is correct.

>NUMBER(10) vs. VARCHAR2(10) -- both unique indexes -- Should there be a
>difference?
>

not noticable in my experience

>NUMBER(10) vs. DATE -- both non-unique indexes -- Any difference?
>

number 10, varying 1-6 bytes.
DATE fixed length field 7 bytes.

not a noticable difference when indexed.

>Does anyone want to voice their opinions here? I'm in favor of numerical
>keys, but in some places it's just far more convienant to use varchar
>keys. Of course long varchar indexes will be slower, but we're talking
>10 or 20 byte varchar's at most.
>
>Comments anyone?

--
See http://osi.oracle.com/~tkyte/ for my columns 'Digging-in to Oracle8i'... Current article is "Part I of V, Autonomous Transactions" updated June 21'st

Thomas Kyte                   tkyte_at_us.oracle.com
Oracle Service Industries     Reston, VA   USA

Opinions are mine and do not necessarily reflect those of Oracle Corporation Received on Mon Jan 24 2000 - 05:53:16 CST