Help: Multilanguage selects from one table

From: Peter Skvarenina <psq_at_pobox.sk>
Date: 6 Aug 2001 22:24:40 -0700
Message-ID: <35e10d4c.0108062124.4661273_at_posting.google.com>


Can anyone help me with this problem ? :

  I have table with a string attribute. I need to present rows in this database according to the value of this attribute in the order of selected language. I have fixed number of languages. That attribute is in Unicode.

  I know there are already RDBMSes that are able to perform this task successfully, but I need to do it portable, RDBMS independently, in pure SQL (to be able execute it on Oracle, Interbase, Informix, DB2, SyBase, Velocis, MySQL and PostgreSQL) and operate over large database with millions of rows in reasonable speed (it must be online).

  So ordinary RDBMS can't create indexes according to different languages.

  The primary task is to fetch blocks with about 10's-100's of rows in the correct order.

  I've already had an idea, that could help, but in one case it is incomplete:

  For each suported language a separate "priority" column should be added. When inserting a new row, the priority should be computed from the first n characters of the string that the resulting number will preserve orders between the strings in the particular language.

  Symbolically, the priority computation can be expressed this way:

   Str - string to be "encoded", Str[i] - character at i-th position    Ord - order of character in particular language    NumChar - number of characters in language    

   i = 0; priority = 0;
   while (i < n)
    {

     priority += Ord (Str[i]);
     priority *= NumChar;        // shift to increase significance
    }

  This will of course group records with the same n character prefix.

  The problem of this appears when there are *many* strings with the same prefix. For example, if you have catalogue with electrical transformators and most of the article names begin with 'Electrical transformator' and you've got about 70-80% of all strings beginning like that (out of 2,000,000 rows), that priority system tends to degenerate.

  You can simply adjust how many characters can be taken into account when computing the priority, but it'll slowdown too much, if you would take too many starting characters (typical prefix in database should be about 150 characters long...)

  I hope there exists a solution.

  If you have any remarks, ideas or references (or working algorithms ;-), please reply to this message !

  Thank you very much !

                                           Peter Skvarenina
                                           psq_at_pobox.sk
Received on Tue Aug 07 2001 - 07:24:40 CEST

Original text of this message