Re: Relational Databases and Their Guts

From: Marshall Spight <mspight_at_dnai.com>
Date: Sun, 22 Jun 2003 15:02:03 GMT
Message-ID: <LxjJa.1018625$OV.1115113_at_rwcrnsc54>


"Todd Bandrowsky" <anakin_at_unitedsoftworks.com> wrote in message news:af3d9224.0306220431.7baeca67_at_posting.google.com...
> > If one is speaking quite strictly, then yes. One might expect a fair bit
> > of strictness in a .theory group.
>
> Speak strictly, please. After glancing at some of the more .theory
> stuff on the web, I noticed that while AND is intersection, and OR is
> a union, that subtract has been completely left out, and so in my own
> DBMS (not relational), I added subtract. Very very cool.

I'm very unsure of what you mean by this. The Third Manifesto describes an AND and OR that have as special cases, respectively, intersection and union. Is that what you're referring to? Otherwise, the only way I can think of that AND is like intersection is when you use it to join boolean expressions in a where clause.

But SQL has union and intersection directly in select statements. It also has "except". What about "except" is not subtract that you are expecting? Or were you not aware of "except?" (It is somewhat obscure, and I don't believe it's present in SQL Server.)

> > To my mind, the best reason to complain about SQL not being relational
> > is that it allows duplicates. NULLs are pretty wacky, too.
>
> So, strictly speaking, even though it may not be particularly
> efficient to implement, instead of a relation of:
>
> company relation
> companyid pk
> company
> street1
> street2
> city
> state
> zip
>
> and let street2 be nullable, you would?
>
> company relation
> companyid pk
> company
> city
> state
> zip
>
> company street relation
> streetid pk
> companyid pk
> order
>
> street relation
> streetid
> streetname
>
> to allow a company have multiple streets. This avoids the NULL and
> represents 0 - N streets that a company might have.

I'm not clear I understand you example. Assuming I understand your notation (the first line of each group is the table name, the rest are attribute names?) you example is odd. In the real world, a company may have many addresses, but it would not have zero or more streets independent of city, state, etc. If there is typically an optional "addr2" (not "street2") in an address table (not the company table) then it's because some addresses have two lines between company name and city, and some have only one.

But let me take your question to be simple "how does one deal with optional data?" If we imagine a record type with attributes a, b, c, and optionally d, then I see three approaches:

  1. table t has attributes (a, b, c, d) and there is a distinguished value of the type of d that represents "no d". For example, a zero length string for addr2 would indicate "no address line 2." This is distinctly different from null, although it may not seem like an important difference at first.
  2. Declare the column for d to be of a type that consists of all the values of the type of d union a specific value of a specific different type. I believe these are called "sum types" in sum languages (ha ha) such as ML and unfortunately, most popular languages don't have this facility, but this could be compensated for by the call-level interface to the database.
  3. A second table with two columns: the primary-key-foreign-key of the first table, and a d column.

(If anyone is still reading, there is a question I've had about this for a while: couldn't these three logically-equivalent cases be made to be simply syntactic differences? There are details to be worked out, but the idea intrigues me.)

Application logic is about equal for all three cases, but query logic is somewhat complicated by the third.

Note also that this is a model-level description, and so the question of which one is "efficient to implement" is specifically excluded; the implementation may be anything that can represent the model.

> But, how would you, in terms of basic relational operations (select,
> project, etc), create a view st each street is in the same tuple as
> the company city, state, and zip. This is one area where SQL the
> language (let alone any implementation of it), falls flat on its face.

Au contraire; this is where SQL stands tall, by not allowing such a thing.

And what would you expect to get out of such a multivalue dbms? There isn't any data you can store as multivalue that you can't store as relations.

> Then, even if you did that,
> what would you use to represent those attributes that do not have the
> maximum number of streets defined. Does relational theory require
> that each tuple have the same number of relations in a domain? Did I
> say that right?

I don't think so. Relations have tuples, not vice verse.

> Or, in the interests of pure NULL removal, would the
> server return a jagged result?

No, that would make life more complicated.

> Or, should you just have street1,
> street2 and have a value defined within your schema to represent NONE,
> which is not the same as NULL (which means unknown).

Sure, that's my proposal 1) above.

Marshall Received on Sun Jun 22 2003 - 17:02:03 CEST

Original text of this message