Re: Discovering new relationships

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 03 Mar 2007 00:00:00 GMT
Message-ID: <4U2Gh.4798$PV3.45435_at_ursa-nb00s0.nbnet.nb.ca>


Walt wrote:

> "paul c" <toledobythesea_at_oohay.ac> wrote in message
> news:og2Gh.1206633$R63.245949_at_pd7urf1no...
>

>>Walt wrote:
>>
>>>It seems to me that, sometimes, relationships in data are discovered by
>>>other people than the ones who initially tabulated the data.  This is
>>>tangentially related to the discussion on constraints and functional
>>>dependencies.
>>>
>>>Let me start with a couple of examples.
>>>
>>>The first example involves a scenario that I'll call the "case of the
>>>missing validation table".
>>>
>>>We've got a schema with tables about students, and tables about campuses

>
> for
>
>>>a university.
>>>Somebody comes up with this query  (apologies for using SQL).
>>>
>>>select s.first_name, s.last_name, c.campus_name
>>>from students as s, campuses as c
>>>where s.zip_code = c.zip_code
>>>
>>>I have no idea why someone suddenly decided this query was useful,  but

>
> I'm
>
>>>going to suggest that the theta operator between the two zip_code

>
> columns is
>
>>>entirely within the meaning of the data.  Intuiting that the zip_code
>>>columns are some sort of reference,  we look around for a table whose
>>>primary key is zip_code, only to find that there isn't one!  At least,

>
> there
>
>>>isn't one in this schema.
>>>
>>>It turns out that our database doesn't distinguish between valid

>
> zipcodes
>
>>>and invalid ones, and doesn't have any constraints in that regard.  Of
>>>course there's a table of all valid zip_codes, somewhere in the US

>
> postal
>
>>>services databases, but that's outside our universe of discourse.
>>>
>>>Now,  this join doesn't necessarily conform to Marshall's referential
>>>integrity constraint.  There could be students with no matching

>
> campuses, or
>
>>>campuses with no matching students.  Because the realtionship is
>>>many-to-many,  there could also be multiple result rows with the same
>>>student or the same campus.  As a side issue, two different students

>
> could
>
>>>have the same first_name and last_name,  but I'm not dealing with that.
>>>
>>>What I'm going to say about this is that both s.zip_code and c.zip_code

>
> are
>
>>>"dangling foreign keys".  They are foreign keys to a primary key that

>
> isn't
>
>>>defined in this schema.
>>
>>In that case, any two attributes with the same name are "dangling" fk's,
>>  in fact, given a rename operator or equivalent syntax, all attributes
>>are "dfk's".  What good does it do to introduce a name that applies to
>>all attributes?

>
> I disagree. I'm going to suggest that "measurements" are not dangling fk's.
> If we say
> that a given sack of rice has mass of 6.7255 kilos, 6.7255 is not a
> "dangling reference to a pk".

By your earlier definition, of course it is! Suppose someone wants to match sacks in kilos with skids or containers in tonnes.

> It's a proportion between our measurement and the standard kilo. I'll
> concede that "kilo" is a dangling fk. It's not even an exact number. It's
> subject to measurement error, and also to computer induced roundoff error.
>
> That's at least one kind of data that is not a dangling fk. Agreed?

Not agreed. Dangling fk has no useful meaning.

>>>And I'm going to suggest that this kind of join is
>>>occasionally legitimate,  and that there is no mathematical reason to
>>>disallow it.
>>>...
>>
>>It is as legitimate or as illegitimate as the user's understanding of
>>the schema is.  There is no RT reason to disallow it either.

>
> Good point.
Received on Sat Mar 03 2007 - 01:00:00 CET

Original text of this message