Re: Case sensitivity = property of column, table or database?

From: Bob Hairgrove <rhairgroveNoSpam_at_Pleasebigfoot.com>
Date: Mon, 15 Apr 2002 19:05:49 GMT
Message-ID: <3cbb205c.41499002_at_news.ch.kpnqwest.net>


On Mon, 15 Apr 2002 14:25:17 +0100, "Bill Rayer" <william.rayer_at_virgin.net> wrote:

>
>Thanks for the quick response (from you and Anton)!
>
>Reading your replies, and based on other info I've gathered, a
>separate upper-cased column seems to be the best option. This can be
>indexed to ensure quick access. Also it should be unique, since the
>language is case insensitive (ie foo and Foo should never be in the
>symbol table together).

It's a little hard to believe that you want to make tables of only identifiers of your own new language ... if that's the case, I would check the validity of the identifier (case insensitively) BEFORE storing it in the table, then access it using the least expensive means.

Since the number of identifiers of any given programming language typically never exceed a hundred or so, you'd be better off caching all your identifiers in memory and doing FTS (full-table scans) instead of some inefficient index reads which have to be retrieved from disk each time you wish to find an entry.

>> If you restrict your language's database capabilities to users of
>> the English language, you could get away with an assembler function
>> which subtracted 32 (or masked off that bit) from the character's
>> ASCII code...
>
>Surely that is the same as converting all characters to upper case
>before adding them to an index? Is this last suggestion equivalent to
>saying 'create a new column storing the identifier in upper case'?

Not necessarily ... depends on how you implement your sort function.

Storing an additional column is resource-expensive; converting the data at runtime is much less expensive, but only as long as it is only converted once PER SEARCH INSTANCE. ;) (BTW you would need to convert it once, anyway, just to store the data in the extra column ... I am suggesting that you convert the data only when you need to search for it. This should produce minimal overhead, since disk reads are by far the most expensive part of any DBMS search operation).

>Lastly, I am still puzzled why products like SQL server want to know
>during installation whether they should be case sensitive. It seems
>as if case sensitivity is most efficiently dealt with by having a new
>column which can be indexed. So it is really a property of each string
>column, not the entire database.<

Other products, such as Oracle, only want to know the locale's code page ... there are additional options such as "function-based indices". In Oracle, unless otherwise specified, all text search is case-sensitive.

> Maybe I have misunderstood SQL server
>(i've only read about it, never used it!)

May I ask, what RDBMS you have actually worked with?

Have you ever had to deal with databases with millions of rows? Received on Mon Apr 15 2002 - 21:05:49 CEST

Original text of this message