Design for more than one language

From: Stefan Bodewig <bodewig_at_bost.de>
Date: 2000/03/01
Message-ID: <m3n1oin5p8.fsf_at_sbodewig.bost.de>#1/1


-----BEGIN PGP SIGNED MESSAGE----- Hi,

I think the problem is quite common. I have data structures that are language independent with language depended details.

Lets say I do have an article with a name a description and several properties (key/value pairs). The naive solution would be something along the line of (omitting NOT NULLs and other constraints for readability)

TABLE ARTICLES (

        id                      INT PRIMARY KEY,
        name_in_german          CHAR(100),
        name_in_english         CHAR(100),
        description_in_german   VARCHAR(1000),
        description_in_english  VARCHAR(1000)

)

TABLE PROPERTIES (

        article                   INT REFERENCES ARTICLES,
        property_key_in_german    VARCHAR(100),
        property_key_in_english   VARCHAR(100),
        property_value_in_german  VARCHAR(100),
        property_value_in_english VARCHAR(100)

)

Of course this is far from elegant, will contain tons of redundant data, makes spelling mistakes hard to track and will get me into trouble if I'll ever want to add another language.

The solution I've been using is something like

TABLE LANGUAGES (

        id                      INT PRIMARY KEY,
        name                    INT

)

TABLE LONGTEXTS (

        id                      INT,
        language                INT REFERENCES LANGUAGES,
        value                   VARCHAR(1000),
        PRIMARY KEY (id, language)

)

TABLE SHORTTEXTS (

        id                      INT,
        language                INT REFERENCES LANGUAGES,
        value                   VARCHAR(100),
        PRIMARY KEY (id, language)

)

TABLE ARTICLES (

        id                      INT PRIMARY KEY,
        name                    INT,
        description             INT,

)

TABLE PROPERTIES (

        article                 INT REFERENCES ARTICLES,
        key                     INT,
        value                   INT,

)

where LANGUAGES.name, ARTICLES.name, PROPERTIES.key and PROPERTIES.value all "reference" SHORTTEXTS.id as well as ARTICLES.description "references" LONGTEXTS.id.

This solution has the following advantages (that's why I use it):

  • Each text needs to be translated once and only once. Either the translations is correct or you will have to change it in exactly one place.
  • Reduced redundancy (how many articles may be there that share properties with the same key).
  • It far easier to add a new language.

But it does have a some drawbacks and I'd like to hear your opinion and maybe discuss better solutions. The drawbacks I see:

  • I cannot express referential integrity on the texts as the id column cannot be unique.

With some vendor's databases I may be able to achieve this with CHECK constraints (something like CHECK name IN (SELECT DISTINCT id FROM SHORTTEXTS for the LANGUAGES table) but there are implementations that won't let me use subselects in CHECK constraints.

  • I have no (easy) way to see where a text is referenced from. So if data gets deleted I might end up with useless rows in the *TEXTS tables.
  • I have no (none I'm aware of) way to ensure that a text gets inserted in all required languages.

I'd really love to see a solution without those disadvantages or a way minimize them.

Stefan
-----BEGIN PGP SIGNATURE-----

Version: 2.6.3i
Charset: latin1
Comment: Processed by Mailcrypt 3.5.5, an Emacs/PGP interface

iQCVAwUBOL0kYa7Kq3f+7KrtAQFQ5QQAj8IJ8JaG4mIJLD/v3yPF8eHgUNXkmK0f hgy/lWQsLgVAnfdXfIlSqHT8HfcuDdKKXSwSW2NuxaXC2U/W8c8lbQZ+yEcmOKKl Z7tKRIUehaVvOpm07Io9qAkr0D2wvhjXqXtdoSRfnyyrvTYpg+Q7Oi3uG1fTsM0g VyI8TrPXg4w=
=KpXY
-----END PGP SIGNATURE----- Received on Wed Mar 01 2000 - 00:00:00 CET

Original text of this message