Re: Is multi-type data in one field normal?

From: Costin Cozianu <ccozianu_at_netzero.net>
Date: Thu, 17 Apr 2003 16:01:47 -0700
Message-ID: <b7nbg9$2omou$1_at_ID-152540.news.dfncis.de>


--CELKO-- wrote:

>>>Inspite of various claims, it is kind of normal to store different

>
> unrelated data in a single field. <<
>
> I agree with that statement. That is one of the differences between
> FIELDS and COLUMNS in a relational database. A column has a scalar
> value of one and only one type by definition. A file system with
> variant records, an OO system with typing rules, or a NFNF database
> might be able to store different unrelated data in a single field;
> doing in an SQL database involves a kludge to work around the system
> and the relational model.
>

Oh, I'm affraid you are in a great mistake here.

First of all you confound SQL with relational model, and of all people it is you that should be aware that SQL and relational model diverge in many essential ways. At best you can claim that SQL was inspired by the relational model.

Second the relational model is orthogonal to the type system used for the domains. Relational algebra and relational calculus deal with calculus over relations (which by the way is not supported in SQL).

Chris Date clarified this subject, both in his books, and in a not so old post on comp.database.theory.

I'm affraid that the kuldge you are referring to belong to the SQL standard itself. It is unacceptable in 2003, after 4 decades of advanced research in type theory, and after 30 years from the implementation of the first advanced type system *provably* sound, to "debate" about basic things like whether to support *boolean* as a data type, or even about a sum types.

Please have a break and at least read the very basic introduction: Robert Constable "Naive Type Theory", do you may get up to speed with what is all about.

>

>>>And don't pay too much attention to Mr. Celko's, his understanding

>
> of data types is probably limited to SQL's view of data types which is
> very primitive, to say the least. <<
>
> You can read about my view of data types in DATA & DATABASES, but more
> to the point, this is being done in an SQL database
>

My collection of books on databases include books by Date, Abiteboul Hull & Vianu, Thalheim and a few others. I think it is enough until database theory advances a bit more.

So more to the point, if this is being done in SQL database, that is absolutely no justification for us to descend to the primitive level of the typical SQL database implementation. SQL is just a mean and not an end.

For example, SQL does not support proper relational model, that does not mean that we have a good excuse not to go by relational model, we can use the SQL DBMS as a lower level interface over which we can implement relational operators.

>

>>>He apparently has a lot of trouble to come to terms with the

>
> boolena data type so asking him to understand the concept of a sum
> type is perhaps too much. <<
>
> The Boolean/ multi-value logic thread is irrelevent to this thread,
> but ignoring the "ad hominem", SUM() is a function on a numeric set;
> it is not a datatype at all.
>

Says, who ? Mr. Celko. Hmm :)

There is the notion of sum types and the term "sum" is not randomly chosen, it has a direct correspondent in category theory where you have the sum category, also it has an analogue in vectorial spaces where you have the sum space and in many other branches of mathematics.

if you want to stay ignorant of all these, be my guest, but my recommendation is for you to read at leats Robert Constable's introductory article, and if you are really curious you should buy the recent book "Type Theory" by Benjamin Pierce.

I do hope, for the sake of your readers (including here on usenet ), that you read more than you write.

>

>>>... you should design the solution the best you can, and then find

>
> the most suitable "translation" that fits the limitations of your
> particular environment. <<
>
> Agreed.
>
>
>>>To paraphrase the principle, you should not design your schema in

>
> SQL, but rather design your schema as if you had the perfect DBMS in
> front of you, and after that translate it to SQL. <<
>
> Wrong. The conceptual model is at one level and that is where the
> system is designed. This leads to a logical model, and finally I *do*
> design the schema in SQL at the implementation level. You are
> advocating kludges that skip over the various levels. (I have to do
> this "cut & paste"; it is too good not to use!):
>
> Kludge or kluge: n. Slang
>
> 1) A system, especially a computer system, that is constituted of
> poorly matched elements or of elements originally intended for other
> applications.
>
> 2) The use of undocumented, unintended, accidental or non-standard
> features which appear in the software or hardware to solve an
> immediate problem in a computer system.
>
> 3) A clumsy or inelegant solution to a problem.
>
> Sources:
>
> From the old Scottish word "kludgie" meaning an outside toilet; A
> Scottish engineering term for anything added in an ad hoc manner; the
> spelling "kludge" adapted by American enginners in World War II.
>
> "How to Design a Kludge", Jackson Granholme, Datamation, February
> 1962, pp. 30-31], which defined it as "An ill-assorted collection of
> poorly matching parts, forming a distressing whole."
>

That's nice but that doesn't hide the fact that the kuldge belong to SQL as a standard and to SQL implementations.

>

>>>The fact that SQL's type system is limited should not prevent you

>
> from using a better type system in the abstract (including union
> type), and then "encode" or translate your design using whatever SQL
> facilities you see most fit for that purpose. <<
>
> .. as long as those SQL facilities match the relational translation
> from the conceptual model.
>

Again, your abuse of the term relational when you're talking abotu SQL, is at best funny.

>

>>>There's nothing wrong in principle with having a column that is a

>
> type discriminant (in your case StatisticsTypeId) and a second column
> (StatisticAmount) who's semantics depends on the type discriminator.
> <<
>
> Yes, there is. It's called First Normal Form (1NF) and it required
> scalar, atomic values from one domain. This kind of nightmare where
> each row is fundamentally different leads to weird SQL queries and
> falls apart in about a year. Every typo becomes a new datatype, every
> query has to have a monster CASE expression to read the table, the
> column is declared with the most general datatype possible (it used to
> be VARCHAR(255) but now you see VARCHAR(8000) a lot in SQL Server) and
> has to be CAST() to strings, numerics and temporal types, etc.
>

In the case we're talking about the first column was all integers. And speking of that case you presented no satisfactory practical solution.

>

>>>Although it apparently "looks" bad it is a perfect "translation" of

>
> union type ideas who is perfectly sound and validated both in type
> theory and in practical languages like ML, Haskell and most advanced
> programming languages of today support union types. <<
>
> When you are writing code in ML, Haskell, etc. then use a union type.
> I still remember the problems they caused in "C", but I will azssume
> that newer, higher level language have the proper type checking and
> boundary protections. But back in the real world, databases are in
> SQL.
Well, in the real world I can use the same SQL databases and still perform relational operations :) Go figure. Sometimes you can trick them   into representing more advanced data types like the sum type that we are talking about.

You can bet modern languages make the sum type not only safe, but an essential instrument into expressing algorithms and data structure in an elegant and efficient way.

type 'a binTree=

         Empty
         | Node of  'a binTree * 'a * 'a binTree

This one is a tree of elments of type 'a (meaning any type can be substitude for `a) composed either from an empty node, or a node that is defined as the triple (left subtree, element in the node, right subtree)

type `a Maybe = NULL | Just of `a

These are just 2 examples of sum types, and they are even more important because they replace the *unsafe* use of NULL with a safe use of special marker.

>
> Because a database has to interface to any host language, present,
> past or future, it needs to have a fairly simple, HIGHLY portable type
> system. If you think SQL is bad, look at XML! Yet people work around
> its problems because they realize it is a transport language, not a
> database.

If that was true why not reduce SQL then to the level of C ? No that's not a good reason. Most of the programming languages in current don't support even the numeric types form SQL standard. Not to mention that inspite being unable to introduce basic "type constructors" like the sum   type (aka union, but it is much more than the C union), they ventured into murky waters by specifying OBJECT features that no sane software engineer should ever want to use.

But with this attitude SQL is ready to become the next generation of COBOL, so I wonder why so much energy wasted in advocacy and counter-advocacy.

Is it worth it to prolong its life, because to evolve it in something better is quasi-impossible given its legacy, and already contorted design ? That might server our interests in the relaively short term as SQL is a useful tool for the time being.

Or else if we don't pull the plug in time on it we'll sooner or later find ourselves into such non-sense like declaring XML Schemas and doing XSL or XQL or who knows. ?

Costin Received on Fri Apr 18 2003 - 01:01:47 CEST

Original text of this message