Re: database design question

From: Steve Kass <skass_at_drew.edu>
Date: Mon, 30 Dec 2002 01:09:10 -0500
Message-ID: <auonqh$88g$1_at_slb3.atl.mindspring.net>


Dan,

  I don't know anything about mySQL, but it doesn't seem to me that it's worth the time to develop a multilingual application on a platform that doesn't support the international Unicode standard,

  An application like SQL Server, for example, allows you either to store Unicode values in a column of type national character varying under a single collation, or to store the Unicode values with language-specific collations for each value in a column of type sql_variant.

  I don't quite see where the issue of collations is likely to be important for the application you've described, so it should be sufficient to simply store Unicode values, and let the front-end application apply a particular character set to the results so that, for example, the Chinese labels are displayed in traditional characters for users in Hong Kong, and in simplified characters for users in Beijing. There is no overlap in Unicode between characters in different languages, and no difficulty storing information in all the world's languages unambiguously in a single column, as far as I know.

  A quick web search suggests that it is possible to handle multilingual data in mySQL, however, judging from quotes like this (about mySQL):

"I made some experiments with this with good success. I created a database containing some Latin characters, some Cyrillic ones even some Arabic, Indian and a Chinese characters and stored them into simple
text fields.

"Then finally I had a simple JSP software running under Tomcat 4.0X that showed these characters correctly in case the fonts were installed in the
client browser."

  Insofar as this is an implementation question more than a design question, you might see if you can get any advice from mySQL users.

Steve Kass
Drew University

danjourno wrote:

>thank you for that..
>
>you have answered alot of my questions regarding possible bad practices in
>implementing the second option.
>the only problem with the first option is that mysql (btw I am using mysql)
>does not support different character sets within the same column.
>for example, if I have a label in english and another label in chinese for
>that same phrase, then I will be trying to put two labels of different
>character sets in the one collumn. Mysql does support having a diferent
>charset per column within a table with version 4 but not multiple charsets
>in the one column.
>
>Maybe I should use a simmilar design to the first but add more tables for
>each new language?
>
>thanks
>
>dan
>
>"Steve Kass" <skass_at_drew.edu> wrote in message
>news:auitvc$gdh$1_at_slb0.atl.mindspring.net...
>
>
>>Dan,
>>
>> The first way is the only way to go, in my opinion. If you
>>use the second option, you are no longer storing the label's language
>>in the table; you are storing it instead in the name of the table's
>>
>>
>column.
>
>
>>Data goes in tables; data does not go in column names. That's a
>>pretty fundamental principle of database design, and for good reason.
>>A row should represent a single entity, and the column values should
>>represent attributes of that entity. "oui" is not an attribute of "yes",
>>however "oui" is an attribute of "yes in French", the "how it is written"
>>attribute.
>>
>> You should also never have to add columns to a table except when
>>you come to realize that you must keep track of a new attribute not
>>originally in the design. A new language is not a new attribute - you
>>should only need to add a new column if you do something like expand
>>this to a voice-interactive system and decide to store .wav files for
>>each label so that it can be pronounced, not printed. The .wav file is
>>the "how it sounds" attribute of "yes in French", and would deserve a
>>new column. As you have it now, what would you do if you needed
>>the .wav files? Double the number of columns? Add an entirely new
>>table to store something about "Label" that you already have a table
>>for?
>>
>> I'm not trying to put you down - your question is common, and the
>>second structure you suggest is a nice one for people to look at, but
>>it's not a good table design. Save the second format for human-readable
>>output, to be produced by a report-writing package, and stick with the
>>first design. It will provide many benefits - one for example would be if
>>there is a global constant for a client's language, say _at__at_lang. Then you
>>have the same query for every language:
>>
>> select label from labels
>> where index = 'yes'
>> and language_id= _at__at_lang
>>
>>Otherwise you need to write all new queries for every language.
>>
>>Steve Kass
>>Drew University
>>
>>danjourno wrote:
>>
>>
>>
>>>i am having a problem trying to figure out which way to go with adding
>>>multilingual support to my database.
>>>
>>>I am trying to store labels for multiple languages in my DB so I am
>>>
>>>
>creating
>
>
>>>a table called labels
>>>currently I have a table with three columns. One for an index, one for
>>>
>>>
>the
>
>
>>>actual label, and one that links to a language ID.
>>>all index values are in english
>>>
>>>index | label | language_id
>>>---------------------------------------
>>>yes yes EN
>>>yes oui FR
>>>
>>>primary key(index, language_id)
>>>
>>>
>>>
>>>I am thinking that I should actually change this to add more collumns
>>>
>>>
>rather
>
>
>>>that rows.
>>>like below.
>>>
>>>index | label_eng | label_fr | label_ch |
>>>label_etc.......
>>>-----------------------------------------------------------------------
>>>yes yes oui ????
>>>.......
>>>
>>>and then continue to add more columns as I add more languages.
>>>
>>>==================
>>>
>>>my question is..: is this second option a better option? as far as
>>>
>>>
>database
>
>
>>>design goes, taking into account that the size of these label fields may
>>>come to over 150 chars..?
>>>
>>>BTW I am currently using 3.23 but I understand that version 4 supports
>>>multiple charsets.
>>>this is one of my reasons for thinking of this change
>>>
>>>thanks heaps
>>>
>>>dan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
>
>
Received on Mon Dec 30 2002 - 07:09:10 CET

Original text of this message