Re: Case sensitivity = property of column, table or database?

From: Bill Rayer <william.rayer_at_virgin.net>
Date: Mon, 15 Apr 2002 14:25:17 +0100
Message-ID: <exAu8.20285$tZ1.6056374_at_news2-win.server.ntlworld.com>


Thanks for the quick response (from you and Anton)!

Reading your replies, and based on other info I've gathered, a separate upper-cased column seems to be the best option. This can be indexed to ensure quick access. Also it should be unique, since the language is case insensitive (ie foo and Foo should never be in the symbol table together).

> >So to get to my question finally: If I store strings in a table
> >using mixed case, and I want to be able to search for them using
> >a case insensitive search, how can I get efficient O(log(N)) access?
>
> Case depends on locale. For example, in Swiss German, the upper case
> of "ä" is not "Ä", as in Germany or Austria, but "Ae". The same
> applies to the other Umlaute. A true case-insensitive search would
> have to take both spellings into account if it did not differentiate
> between locales.
>
> Also, alphabetical sorting varies tremendously between locales. In
> Germany (I believe), "ä" is sorted equivalently to "ae" and can occur
> interspersed in a list among words beginning with "ae". In Austria, I
> believe the sort order is different (or perhaps it is the other way
> around).

The locales add an extra layer of compexity. Although I thought I understood German umlauts, I didn't know Swiss German upper-cased "ä" as "Ae"! However the new computer language uses 7-bit ASCII text only, so I was only concerned with 7-bit ASCII identifiers for the symbol table.

I can see the locales would cause other problems if the database was used for case insensitive access for people with German or Swiss German names.

> Are you sure you really want to do this? <VBG> If you restrict your
> language's database capabilities to users of the English language, you
> could get away with an assembler function which subtracted 32 (or
> masked off that bit) from the character's ASCII code...

Surely that is the same as converting all characters to upper case before adding them to an index? Is this last suggestion equivalent to saying 'create a new column storing the identifier in upper case'?

Lastly, I am still puzzled why products like SQL server want to know during installation whether they should be case sensitive. It seems as if case sensitivity is most efficiently dealt with by having a new column which can be indexed. So it is really a property of each string column, not the entire database. Maybe I have misunderstood SQL server (i've only read about it, never used it!)

Regards
Bill Rayer Received on Mon Apr 15 2002 - 15:25:17 CEST

Original text of this message