Re: Indexing a char column

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Wed, 11 Mar 2015 08:54:54 +0100
Message-ID: <54FFF4CE.4040705_at_bluewin.ch>



Hi,

I think the index is inefficient because you do by no means need 80 character to uniquely identify a million rows. We can assume to need 12 digits for reasonable key playing it very safe and the 80-90 column still has an overhead of factor 7.
This is also the reason why I am less concerned about coalitions using a FBI based on ora_hash, checkum, etc.
The size of the index key does matter in many respects. Mainly there will be less entries per index block as compared to a key that is shorter. As a consequence the index is larger, will trigger more block splits, more maintenance overhead. It will have a likely lesser hit rate per block in buffer cache on average, will therefore not cache as well etc. I was once involved in a project where they where mimicking ISAM by padding key values to the max length. It was quite a nightmare how slow that stuff worked.

My gut feeling would still be that you might want to just go ahead and try it, but it is possible by the rate of data growth that will build a maintenance nightmare.
I think a viable plan B might be to index on ORA_HASH(xa,any_prime_number_with_enough_digits) and use VPD to change your query to:

/SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM xa WHERE x IN ('123456789101112') *and

*/*/ORA_HASH(xa,any_prime_number_with_enough_digits) = 
/**/ORA_HASH(/*/*/'123456789101112'/**,any_prime_number_with_enough_digits) 
*) /;

Thanks

Lothar

On 10.03.2015 22:29, Cee Pee wrote:
> To answer Seth and Lothar, the column is almost unique though no
> unique constraint is declared. There is only one duplicate value among
> the million rows, could be due to testing. I think that would make the
> index highly selective. Can you please explain what you mean by the
> index will not be efficient.
>
> I do not know yet how big the table can grow or what will be the
> delete or insert frequency, we are testing.
>
>
> On Tue, Mar 10, 2015 at 12:16 PM, Lothar Flatz <l.flatz_at_bluewin.ch
> <mailto:l.flatz_at_bluewin.ch>> wrote:
>
> Hi,
>
> I think you could eventually use an Index here, but i am certainly
> not very happy with it considering the update frequency you are
> talking about. I have seen Indexes based on columns that long.
> There are not very efficient.
> I would consider clever ways to decrease the length of the column,
> e.g. use the ORA_HASH function. Certainly the consequence is
> rewriting the query though.
> If the data is not only growing but also gets deleted eventually
> you might even consider hash cluster. I normally do not recommend
> it, but that could be an exceptional case.
> If teh table is constantly growing forget about hash cluster.
>
> On other more conservative way would be using hash partitioning
> and compressing the table. Few columns is supporting compression.
> but your long column looks bad in that respect.
> Is the column unique?
> I have to admit the choice is not easy and I am not very happy
> with either suggestion.
> I am afraid you have to try.
>
> Thanks
>
> Lothar
>
>
>
> On 10.03.2015 17:52, Cee Pee wrote:
>>
>> It is a varchar2 column with max length of 150. I checked and the
>> lengths of strings stored currently vary from 85 to 90 characters
>> distributed evenly across the values in about 1M rows. The sql in
>> modified form with table name and col name changed:
>>
>> /SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM xa WHERE
>> x IN ('123456789101112') ) /
>>
>> table name and col name in the predicates are same; the subquery
>> selects from the same user.table as the main query.
>> The same SQL is getting executed several hundreds of times during
>> the peak hours.
>>
>>
>>
>> On Tue, Mar 10, 2015 at 7:44 AM, Lothar Flatz <l.flatz_at_bluewin.ch
>> <mailto:l.flatz_at_bluewin.ch>> wrote:
>>
>> Hi CP,
>>
>> I would rather not index a 150 char long column. Is the
>> actual length really 150?
>> what is the comparison operator? I think there is a chance it
>> is not equal, but substr, like or similar.
>>
>> Thanks
>>
>> Lothar
>>
>>
>> On 10.03.2015 12:22, Cee Pee wrote:
>>
>> List
>>
>> I see a table being hit and queried a few thousand times
>> over peak hours. The table has more than million rows and
>> grows while operational. I did not measure the growth
>> fully, but based on monitoring for several minutes, it is
>> adding about 20 to 40 rows in few seconds, when i was
>> checking via sqlplus. I dont see this growth most of the
>> times though. It is a small table with 6 columns and the
>> app in a web application. There is also another sql that
>> is run constantly that accesses the rows based on one of
>> the char columns which is 150 characters, but there is no
>> index on the column which is causing a tablescan. Is this
>> column a bad candidate for indexing. Any rule of thumb
>> length for char columns above which adding index is
>> considered moot? v11.2.
>>
>> CP.
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>> ---
>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren
>> geprüft.
>> http://www.avast.com
>>
>> --
>> http://www.freelists.org/webpage/oracle-l
>>
>>
>>
>
>
> --
>
>
>
>
>
>
> ------------------------------------------------------------------------
> Avast logo <http://www.avast.com/>
>
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> www.avast.com <http://www.avast.com/>
>
>
>

-- 






---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
http://www.avast.com

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Mar 11 2015 - 08:54:54 CET

Original text of this message