Re: Discovering new relationships
From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 03 Mar 2007 00:00:00 GMT
Message-ID: <4U2Gh.4798$PV3.45435_at_ursa-nb00s0.nbnet.nb.ca>
>
> for
>
>
> I'm
>
>
> columns is
>
>
> there
>
>
> zipcodes
>
>
> postal
>
>
> campuses, or
>
>
> could
>
>
> are
>
>
> isn't
>
>
> I disagree. I'm going to suggest that "measurements" are not dangling fk's.
> If we say
> that a given sack of rice has mass of 6.7255 kilos, 6.7255 is not a
> "dangling reference to a pk".
>
> Good point.
Received on Sat Mar 03 2007 - 01:00:00 CET
Date: Sat, 03 Mar 2007 00:00:00 GMT
Message-ID: <4U2Gh.4798$PV3.45435_at_ursa-nb00s0.nbnet.nb.ca>
Walt wrote:
> "paul c" <toledobythesea_at_oohay.ac> wrote in message
> news:og2Gh.1206633$R63.245949_at_pd7urf1no...
>
>>Walt wrote: >> >>>It seems to me that, sometimes, relationships in data are discovered by >>>other people than the ones who initially tabulated the data. This is >>>tangentially related to the discussion on constraints and functional >>>dependencies. >>> >>>Let me start with a couple of examples. >>> >>>The first example involves a scenario that I'll call the "case of the >>>missing validation table". >>> >>>We've got a schema with tables about students, and tables about campuses
>
> for
>
>>>a university. >>>Somebody comes up with this query (apologies for using SQL). >>> >>>select s.first_name, s.last_name, c.campus_name >>>from students as s, campuses as c >>>where s.zip_code = c.zip_code >>> >>>I have no idea why someone suddenly decided this query was useful, but
>
> I'm
>
>>>going to suggest that the theta operator between the two zip_code
>
> columns is
>
>>>entirely within the meaning of the data. Intuiting that the zip_code >>>columns are some sort of reference, we look around for a table whose >>>primary key is zip_code, only to find that there isn't one! At least,
>
> there
>
>>>isn't one in this schema. >>> >>>It turns out that our database doesn't distinguish between valid
>
> zipcodes
>
>>>and invalid ones, and doesn't have any constraints in that regard. Of >>>course there's a table of all valid zip_codes, somewhere in the US
>
> postal
>
>>>services databases, but that's outside our universe of discourse. >>> >>>Now, this join doesn't necessarily conform to Marshall's referential >>>integrity constraint. There could be students with no matching
>
> campuses, or
>
>>>campuses with no matching students. Because the realtionship is >>>many-to-many, there could also be multiple result rows with the same >>>student or the same campus. As a side issue, two different students
>
> could
>
>>>have the same first_name and last_name, but I'm not dealing with that. >>> >>>What I'm going to say about this is that both s.zip_code and c.zip_code
>
> are
>
>>>"dangling foreign keys". They are foreign keys to a primary key that
>
> isn't
>
>>>defined in this schema. >> >>In that case, any two attributes with the same name are "dangling" fk's, >> in fact, given a rename operator or equivalent syntax, all attributes >>are "dfk's". What good does it do to introduce a name that applies to >>all attributes?
>
> I disagree. I'm going to suggest that "measurements" are not dangling fk's.
> If we say
> that a given sack of rice has mass of 6.7255 kilos, 6.7255 is not a
> "dangling reference to a pk".
By your earlier definition, of course it is! Suppose someone wants to match sacks in kilos with skids or containers in tonnes.
> It's a proportion between our measurement and the standard kilo. I'll
> concede that "kilo" is a dangling fk. It's not even an exact number. It's
> subject to measurement error, and also to computer induced roundoff error.
>
> That's at least one kind of data that is not a dangling fk. Agreed?
Not agreed. Dangling fk has no useful meaning.
>>>And I'm going to suggest that this kind of join is >>>occasionally legitimate, and that there is no mathematical reason to >>>disallow it. >>>... >> >>It is as legitimate or as illegitimate as the user's understanding of >>the schema is. There is no RT reason to disallow it either.
>
> Good point.
Received on Sat Mar 03 2007 - 01:00:00 CET