RE: indexes in big character columns

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 26 Dec 2012 15:38:11 -0500
Message-ID: <017901cde3a8$ece146d0$c6a3d470$_at_rsiz.com>



Excellent point by Stephane, as usual.

Now if you do have an application where using an index to dramatically reduce the candidate pool and then you paw through the remaining actual full values for an exact match, you *MAY* want to consider a hash (possibly virtual column) on the long text and then index that surrogate. Or implement the similar thing as a function based index.

So your predicate for that bit would end up looking like ...
...
and vhash_of_title = your_hash_function(<title_column_expression>) and title = <title_column_expression>...

That will still not solve the difference in value between a full name and a nick name, but it does potentially dramatically shrink the size of the index "keys" so that many more will fit per page at the cost of having extra full values to sift through on any hash collisions.

Whether this makes sense is a case-by-case measurement of actuals if you have test data accumulated, and a case-by-case prognostication if you do not.

And, a hash might make no sense at all in some applications, so if I see this as a silver bullet or best practice, I'll be very disappointed.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Stephane Faroult
Sent: Wednesday, December 26, 2012 2:32 AM To: oralrnr_at_gmail.com
Cc: oracle-l_at_freelists.org
Subject: Re: indexes in big character columns

Orlando,

   I believe that there are two sides to your question.

First of all, from a technical point of view, an index is basically a tree plugged over an ordered list of keys and locators (in other words, the values that are indexed and the rowids where to find them). The bigger the keys, the bigger the index, the more bytes you have to shuffle when you insert or delete rows, or update the value. Note though that you can have index compression (it basically removes common prefixes/suffixes from successive key values) which improves the byte count, at the cost of additional table accesses for some queries since you no longer find the full value in the index. And it's a question of actual value size, not column size.

The second side is that index search is based on equality between input and key value. The longer the key, the higher the odds of having divergences. If you take the example of book titles, for instance, if people look for "Don Quixote" (the full title of which is "The Ingenious Gentleman Don Quixote of La Mancha") they will never find it in the index. That's where full text indexing chimes in.

For me, the second reason is the bigger problem.

HTH,

--
Stephane Faroult
RoughSea Ltd <http://www.roughsea.com>
Konagora <http://www.konagora.com>
RoughSea Channel on Youtube <http://www.youtube.com/user/roughsealtd>




On 12/26/2012 06:53 AM, Orlando L wrote:

> All,
> Me and a colleague were having discussion about creating indexes in
> character columns that are big in size. He is of the opinion that big
> character columns should not be indexed. He could not explain why. I
> remember hearing about it, but did not know why that was the case. Does it
> still apply and can someone please explain. Thanks for the help.
>
> Orlando.
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-l -- http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 26 2012 - 21:38:11 CET

Original text of this message