Re: Is multi-type data in one field normal?

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 12 Apr 2003 17:22:03 -0700
Message-ID: <c0d87ec0.0304121622.5e65e652_at_posting.google.com>


>> Inspite of various claims, it is kind of normal to store different
unrelated data in a single field. <<

I agree with that statement. That is one of the differences between FIELDS and COLUMNS in a relational database. A column has a scalar value of one and only one type by definition. A file system with variant records, an OO system with typing rules, or a NFNF database might be able to store different unrelated data in a single field; doing in an SQL database involves a kludge to work around the system and the relational model.

>> And don't pay too much attention to Mr. Celko's, his understanding
of data types is probably limited to SQL's view of data types which is very primitive, to say the least. <<

You can read about my view of data types in DATA & DATABASES, but more to the point, this is being done in an SQL database

>> He apparently has a lot of trouble to come to terms with the
boolena data type so asking him to understand the concept of a sum type is perhaps too much. <<

The Boolean/ multi-value logic thread is irrelevent to this thread, but ignoring the "ad hominem", SUM() is a function on a numeric set; it is not a datatype at all.

>> ... you should design the solution the best you can, and then find
the most suitable "translation" that fits the limitations of your particular environment. <<

Agreed.

>> To paraphrase the principle, you should not design your schema in
SQL, but rather design your schema as if you had the perfect DBMS in front of you, and after that translate it to SQL. <<

Wrong. The conceptual model is at one level and that is where the system is designed. This leads to a logical model, and finally I *do* design the schema in SQL at the implementation level. You are advocating kludges that skip over the various levels. (I have to do this "cut & paste"; it is too good not to use!):

Kludge or kluge: n. Slang

  1. A system, especially a computer system, that is constituted of poorly matched elements or of elements originally intended for other applications.
  2. The use of undocumented, unintended, accidental or non-standard features which appear in the software or hardware to solve an immediate problem in a computer system.
  3. A clumsy or inelegant solution to a problem.

Sources:

From the old Scottish word "kludgie" meaning an outside toilet; A Scottish engineering term for anything added in an ad hoc manner; the spelling "kludge" adapted by American enginners in World War II.

"How to Design a Kludge", Jackson Granholme, Datamation, February 1962, pp. 30-31], which defined it as "An ill-assorted collection of poorly matching parts, forming a distressing whole."

>> The fact that SQL's type system is limited should not prevent you
from using a better type system in the abstract (including union type), and then "encode" or translate your design using whatever SQL facilities you see most fit for that purpose. <<

.. as long as those SQL facilities match the relational translation from the conceptual model.

>> There's nothing wrong in principle with having a column that is a
type discriminant (in your case StatisticsTypeId) and a second column (StatisticAmount) who's semantics depends on the type discriminator. <<

Yes, there is. It's called First Normal Form (1NF) and it required scalar, atomic values from one domain. This kind of nightmare where each row is fundamentally different leads to weird SQL queries and falls apart in about a year. Every typo becomes a new datatype, every query has to have a monster CASE expression to read the table, the column is declared with the most general datatype possible (it used to be VARCHAR(255) but now you see VARCHAR(8000) a lot in SQL Server) and has to be CAST() to strings, numerics and temporal types, etc.

>> Although it apparently "looks" bad it is a perfect "translation" of
union type ideas who is perfectly sound and validated both in type theory and in practical languages like ML, Haskell and most advanced programming languages of today support union types. <<

When you are writing code in ML, Haskell, etc. then use a union type. I still remember the problems they caused in "C", but I will azssume that newer, higher level language have the proper type checking and boundary protections. But back in the real world, databases are in SQL. Because a database has to interface to any host language, present, past or future, it needs to have a fairly simple, HIGHLY portable type system. If you think SQL is bad, look at XML! Yet people work around its problems because they realize it is a transport language, not a database. Received on Sun Apr 13 2003 - 02:22:03 CEST

Original text of this message