Re: what are keys and surrogates?

From: David BL <davidbl_at_iinet.net.au>
Date: Thu, 10 Jan 2008 20:20:32 -0800 (PST)
Message-ID: <3cf4712a-8017-4eef-98f2-a62bd5ad5e88_at_u10g2000prn.googlegroups.com>


On Jan 11, 4:50 am, Marshall <marshall.spi..._at_gmail.com> wrote:
> On Jan 9, 6:23 pm, David BL <davi..._at_iinet.net.au> wrote:
>
>
>
>
>
> > On Jan 10, 1:22 am, Marshall <marshall.spi..._at_gmail.com> wrote:
> > > On Jan 9, 8:07 am, David BL <davi..._at_iinet.net.au> wrote:
> > > > On Jan 9, 1:25 pm, Marshall <marshall.spi..._at_gmail.com> wrote:
>
> > > > > This issue goes away if we relax 1NF and allow attributes that are
> > > > > lists or relations. This gives us nested structures. (Nested relations
> > > > > are not particularly controversial around here.)
>
> > > > In addition to my previous post, I wish to add another comment
> > > > regarding my suspicion with RVAs. The tuples of a relation are
> > > > supposed to represent facts, but what does it mean when a relation
> > > > merely represents a value?
>
> > > The question is meaningless. The distinction you are drawing
> > > does not exist.
>
> > In what sense do tuples of an RVA represent propositions in *the* UoD?
>
> Propositions and tuples and so forth are abstractions of the real
> world. They all do the same thing, which is try to capture some
> subset of reality. Why is nestedness a problem for you?
>
> Suppose we wish to model what children someone has. (Using
> int ids for the sake of brevity.)
>
> {(parent=1, child=2), (parent=1, child=3)}
>
> Suppose we do it this way:
>
> {(parent=1, child={2, 3})}
>
> Why should either of these two ways raise any philosophical issues?
>
> Suppose we have a predicate
>
> Person x has children y
>
> and the proposition
>
> Person 1 has children {2, 3}
>
> Where's the problem?

None at all! On the contrary, my point has always been that nesting of data structures can be a good thing (ie important to avoid introduction of meaningless identifiers).

I'm only saying this: I understand the term "relational" to concern the set based processing of large numbers of propositions that all apply to a UoD. If a database has 99.9% of the information content buried away in nested data structures then I don't think of it as particularly "relational" any more.

> Another angle on the same thing: in ZF set theory, there
> is nothing in the universe *other* than sets. The theory doesn't
> have ur-elements or scalars or whatever. This theory is
> wildly successful. Sets containing sets is utterly unremarkable.
> Likewise relations with attribute values that are relations
> should be considered utterly unremarkable.

Yes, I agree, but don't see why I can't narrow the meaning of the term "relational"

> > > > Isn't the RM meant to have some close
> > > > association with FOPL?
>
> > > Yes.
>
> > > > It seems to me there is a fundamental difference between
>
> > > > a) a large collection of propositions relevant to a particular UoD;
> > > > and
>
> > > > b) a composite data structure such as an AST which simply
> > > > "is what it is"
>
> > > This is an illusion. There is no difference.
>
> > Hmmm. Unfortunately you didn't respond to my last paragraph
> > which was more tangible.
>
> I beg your pardon.
>
>
>
>
>
> > I don't believe the distinction is an illusion. I'll have a go at
> > providing an objective measure on a given relational database d...
>
> > Let B(d) equal some measure of the amount of information in d,
> > quantified as the total number of bits required to store all the data
> > (accounting for "compressibility").
>
> > Let P(d) equal the total number of tuples across all (top level)
> > relvars. Do not count tuples in nested relations. This is a measure
> > of the number of propositions on the UoD.
>
> > Now take the ratio bpp(d) = B(d)/P(d) to give the "average bits per
> > proposition".
>
> > An alternative measure could account for the number of attributes to
> > give bpa(d) which is an "average bits per attribute", for the
> > attribute values that appear in the top level propositions on the UoD.
>
> > In a conventional use of the RM, where attributes are "reasonably
> > atomic" bpa(d) will be relatively small. However for an
> > unconventional use of the RM (such as the representation of source
> > code using nested RVAs) bpa(d) will be very large. An extreme
> > example is the representation of a single AST and P(d) = 1.
>
> > Now for the part you won't agree with: I think bpa(d) provides an
> > (inverse) indicator of how "relational" the DB is.
>
> I was given to understand you were going to address your
> "fundamental difference" between a) and b) you described
> earlier, but I don't see how any of this does that at all.

The "fundamental difference" refers to the distinction between recording information using many top level propositions with relatively atomic values, versus recording information with hardly any propositions and using very elaborate composite attribute values. There is no suggestion of one being generally better or worse than the other - the decision depends on the problem at hand. I went to the trouble to quantify it to show that the distinction is not an illusion.

Think of it this way. I give you two sheets of A4 paper A) and B) with roughly the same amount of text (and information) on them.

  1. a bunch of facts about cities and rivers in some country.
  2. a source code listing.
  3. can be turned into a database with many propositions that apply to a UoD.
  4. can only be mapped to a large number of propositions by introduction of meaningless identifiers in order to "create" a UoD to which the propositions can apply!

My point is that A) and B) are quite different and the distinction can be quantified. We both agree that introduction of meaningless identifiers is a bad idea, and therefore that source code should be represented using heavily nested values.

It seems to me that there are two types of information. One concerns knowledge about things in a UoD, and the other only concerns the specification (or *selection* as Date would say) of a complex value independently of any UoD. The latter seems most closely tied to what one could call "creative information" as distinct from "factual information". Creative artists are interested in selecting nice values! Eg a nice image, CAD drawing or poem. Received on Fri Jan 11 2008 - 05:20:32 CET

Original text of this message