| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> c.d.o.server -> Re: nlssort
Hi Peter!
Thank you for your interest. I'm a collegue of Hegyvari Krisztian and
i'm pleased to share my information about oracle nls sorting with you.
First of all, correct hungarian sort must follow the rules that 'Á' is
after 'A', 'É' is after 'E', etc. The rule that 'GY' follows 'G' is
usally not expected in computer programs, but expected in
dictionaries. Quite bad, that Oracle can not handle proper nls sorting
due to a very stange .. hmm.. bug.
Interesting that oracle can perfectly sort ONE characters using
NLSSORT.
Moreover it handles 'GY' type rules when NLS_SORT=XHUNGARIAN is used,
handling it as a single character.
The problem begins when sorting strings longer than 1 letter.
We've dug in the problem deeply, and found that returned value from
nlssort function weights differences BADLY.
Here are the facts: (nlssort values based on NLS_SORT=HUNGARIAN)
(Oracle 8.1.7 if i'm not wrong)
14000100 - A
14000300 - Á
19000100 - B
1E000100 - C
141E00010100 - AC
Bitmask:
1400**0100 - Modified A
..1E00**0100 - Modified C
141900030100 - ÁB 1400**0300 - Modified Á ..1900**0100 - Modified B
Interpreting the output of nlssort (of a single character) as a
doubleword value, you can see nlssort values (of a string) is
calculated as follows:
each character's nlssort values is modified by inserting an empty byte
between the low and high word value (for each length over 1) (marked
with *) and shifting each character's value with a byte in order
corresponding to the string (marked with .)
Therefore:
1414190001030100 - AÁB
1400****0100 - A
..1400****0300 - Á
....1900****0100 - B
As you can see, here AC is greater than ÁB, but only because the
second character's nlssort mask is more dominant than the difference
of first character beeing A or Á. Therefore the following (correct)
hungarian sort
AB
AD
AF
AH
ÁC
ÁE
ÁG
ÁI
ordered by oracle will look like:
AB
ÁC
AD
ÁE
AF
ÁG
AH
ÁI
which is BAD. Really bad. I'm really curious whether the same anomaly
can be experienced in other middle-europian languages. (Sorting one
character correctly, more than one badly)
Anyway, nlssort based indexes take up more than 3 times the space than
the data itself :(
PS: sorry for my english, I read a lot more than I write ;)
Byez,
Flatline
pgulutzan_at_ocelot.ca (Peter Gulutzan) wrote in message news:<36c478c6.0211151625.7b79a3b7_at_posting.google.com>...
> hegyvari_at_ardents.hu (Hegyvari Krisztian) wrote in message news:<9003d46f.0211100219.120c38a0_at_posting.google.com>...
> > Greetings,
> >
> > I run a 8.1.7. database with the following settings:
> >
> > nls_lang=HUNGARIAN
> > nls_sort=HUNGARIAN
> > nls_comp=ANSI
> >
> > The correct alphabetical order in the Hungarian language is A,Á,B,....
> > and for basic queries Oracle knows it, I can verify it using queries
> > like "select * from dual where 'A'>'Á'" which gives no results and
> > with 'B'>'Á' which results in an X. So far so good. The same query
> > with 'BÁ'>'BA' results in an X which is good. 'BÁ'>'BAC' gives no
> > results, which is bad, since Á is the second character in the string,
> > while C is the third, according to how I order strings BÁ would come
> > later than BAC. If I check nlssort('BÁ','NLS_SORT=HUNGARIAN') and
> > nlssort('BAC','NLS_SORT=HUNGARIAN'), the second is greater, I think
> > that is what causes this misbehaviour.
> >
> > Do I know something wrong or is this a bug which may have been
> > corrected already?
> >
> > Any answers are highly appreciated, please cc it to
> > hegyvari_at_ardents.hu.
> >
> > Regards,
> >
> > Hegyvari Krisztian
>
> I had always understood that the Hungarian exceptional rules are: Á
> may follow A, CS follows C, Ê may follow E, Ö follows O, Ü follows U,
> ZS follows Z, and the digraphs (GY, LY, NY, SZ TY) are separate
> characters but follow (G, L, N, S, T) anyway. The words "may follow"
> would imply that the rule is optional, followed in some
> dictionaries/lists but not all. So I too would be interested in
> knowing which rules Oracle follows.
>
> For several other European languages and an overview of what DB2 and
> Oracle and SQL Server do with them, see my article "SQL Collations" at
> http://dbazine.com/gulutzen1.html.
>
> Peter Gulutzan
> Co-Author of SQL Peformance Tuning (http://www.ocelot.ca/tuning.htm)
Received on Sat Nov 16 2002 - 09:49:04 CST
![]() |
![]() |