Re: Discovering new relationships

From: Walt <wamitty_at_verizon.net>
Date: Fri, 02 Mar 2007 23:54:07 GMT
Message-ID: <zO2Gh.692$Tf.216_at_trndny03>


"paul c" <toledobythesea_at_oohay.ac> wrote in message news:og2Gh.1206633$R63.245949_at_pd7urf1no...
> Walt wrote:
> > It seems to me that, sometimes, relationships in data are discovered by
> > other people than the ones who initially tabulated the data. This is
> > tangentially related to the discussion on constraints and functional
> > dependencies.
> >
> > Let me start with a couple of examples.
> >
> > The first example involves a scenario that I'll call the "case of the
> > missing validation table".
> >
> > We've got a schema with tables about students, and tables about campuses
for
> > a university.
> > Somebody comes up with this query (apologies for using SQL).
> >
> > select s.first_name, s.last_name, c.campus_name
> > from students as s, campuses as c
> > where s.zip_code = c.zip_code
> >
> > I have no idea why someone suddenly decided this query was useful, but
I'm
> > going to suggest that the theta operator between the two zip_code
columns is
> > entirely within the meaning of the data. Intuiting that the zip_code
> > columns are some sort of reference, we look around for a table whose
> > primary key is zip_code, only to find that there isn't one! At least,
there
> > isn't one in this schema.
> >
> > It turns out that our database doesn't distinguish between valid
zipcodes
> > and invalid ones, and doesn't have any constraints in that regard. Of
> > course there's a table of all valid zip_codes, somewhere in the US
postal
> > services databases, but that's outside our universe of discourse.
> >
> > Now, this join doesn't necessarily conform to Marshall's referential
> > integrity constraint. There could be students with no matching
campuses, or
> > campuses with no matching students. Because the realtionship is
> > many-to-many, there could also be multiple result rows with the same
> > student or the same campus. As a side issue, two different students
could
> > have the same first_name and last_name, but I'm not dealing with that.
> >
> > What I'm going to say about this is that both s.zip_code and c.zip_code
are
> > "dangling foreign keys". They are foreign keys to a primary key that
isn't
> > defined in this schema.
>
> In that case, any two attributes with the same name are "dangling" fk's,
> in fact, given a rename operator or equivalent syntax, all attributes
> are "dfk's". What good does it do to introduce a name that applies to
> all attributes?
>

I disagree. I'm going to suggest that "measurements" are not dangling fk's. If we say
that a given sack of rice has mass of 6.7255 kilos, 6.7255 is not a
"dangling reference to a pk".
It's a proportion between our measurement and the standard kilo. I'll concede that "kilo" is a dangling fk. It's not even an exact number. It's subject to measurement error, and also to computer induced roundoff error.

That's at least one kind of data that is not a dangling fk. Agreed?

> > And I'm going to suggest that this kind of join is
> > occasionally legitimate, and that there is no mathematical reason to
> > disallow it.
> > ...
>
> It is as legitimate or as illegitimate as the user's understanding of
> the schema is. There is no RT reason to disallow it either.
>

Good point.
> p
>
Received on Sat Mar 03 2007 - 00:54:07 CET

Original text of this message