Design for more than one language
Date: 2000/03/01
Message-ID: <m3n1oin5p8.fsf_at_sbodewig.bost.de>#1/1
-----BEGIN PGP SIGNED MESSAGE----- Hi,
I think the problem is quite common. I have data structures that are language independent with language depended details.
Lets say I do have an article with a name a description and several properties (key/value pairs). The naive solution would be something along the line of (omitting NOT NULLs and other constraints for readability)
TABLE ARTICLES (
id INT PRIMARY KEY, name_in_german CHAR(100), name_in_english CHAR(100), description_in_german VARCHAR(1000), description_in_english VARCHAR(1000)
)
TABLE PROPERTIES (
article INT REFERENCES ARTICLES, property_key_in_german VARCHAR(100), property_key_in_english VARCHAR(100), property_value_in_german VARCHAR(100), property_value_in_english VARCHAR(100)
)
Of course this is far from elegant, will contain tons of redundant data, makes spelling mistakes hard to track and will get me into trouble if I'll ever want to add another language.
The solution I've been using is something like
TABLE LANGUAGES (
id INT PRIMARY KEY, name INT
)
TABLE LONGTEXTS (
id INT, language INT REFERENCES LANGUAGES, value VARCHAR(1000), PRIMARY KEY (id, language)
)
TABLE SHORTTEXTS (
id INT, language INT REFERENCES LANGUAGES, value VARCHAR(100), PRIMARY KEY (id, language)
)
TABLE ARTICLES (
id INT PRIMARY KEY, name INT, description INT,
)
TABLE PROPERTIES (
article INT REFERENCES ARTICLES, key INT, value INT,
)
where LANGUAGES.name, ARTICLES.name, PROPERTIES.key and PROPERTIES.value all "reference" SHORTTEXTS.id as well as ARTICLES.description "references" LONGTEXTS.id.
This solution has the following advantages (that's why I use it):
- Each text needs to be translated once and only once. Either the translations is correct or you will have to change it in exactly one place.
- Reduced redundancy (how many articles may be there that share properties with the same key).
- It far easier to add a new language.
But it does have a some drawbacks and I'd like to hear your opinion and maybe discuss better solutions. The drawbacks I see:
- I cannot express referential integrity on the texts as the id column cannot be unique.
With some vendor's databases I may be able to achieve this with CHECK constraints (something like CHECK name IN (SELECT DISTINCT id FROM SHORTTEXTS for the LANGUAGES table) but there are implementations that won't let me use subselects in CHECK constraints.
- I have no (easy) way to see where a text is referenced from. So if data gets deleted I might end up with useless rows in the *TEXTS tables.
- I have no (none I'm aware of) way to ensure that a text gets inserted in all required languages.
I'd really love to see a solution without those disadvantages or a way minimize them.
Stefan
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i Charset: latin1 Comment: Processed by Mailcrypt 3.5.5, an Emacs/PGP interface
iQCVAwUBOL0kYa7Kq3f+7KrtAQFQ5QQAj8IJ8JaG4mIJLD/v3yPF8eHgUNXkmK0f
hgy/lWQsLgVAnfdXfIlSqHT8HfcuDdKKXSwSW2NuxaXC2U/W8c8lbQZ+yEcmOKKl
Z7tKRIUehaVvOpm07Io9qAkr0D2wvhjXqXtdoSRfnyyrvTYpg+Q7Oi3uG1fTsM0g
VyI8TrPXg4w=
=KpXY
-----END PGP SIGNATURE-----
Received on Wed Mar 01 2000 - 00:00:00 CET