Re: Discovering new relationships

From: paul c <toledobythesea_at_oohay.ac>
Date: Sat, 03 Mar 2007 00:14:48 GMT
Message-ID: <Y53Gh.1206687$R63.81214_at_pd7urf1no>


Walt wrote:
> "paul c" <toledobythesea_at_oohay.ac> wrote in message
> news:og2Gh.1206633$R63.245949_at_pd7urf1no...
>

>>Walt wrote:
>>
>>>It seems to me that, sometimes, relationships in data are discovered by
>>>other people than the ones who initially tabulated the data.  This is
>>>tangentially related to the discussion on constraints and functional
>>>dependencies.
>>>
>>>Let me start with a couple of examples.
>>>
>>>The first example involves a scenario that I'll call the "case of the
>>>missing validation table".
>>>
>>>We've got a schema with tables about students, and tables about campuses

>
> for
>
>>>a university.
>>>Somebody comes up with this query  (apologies for using SQL).
>>>
>>>select s.first_name, s.last_name, c.campus_name
>>>from students as s, campuses as c
>>>where s.zip_code = c.zip_code
>>>
>>>I have no idea why someone suddenly decided this query was useful,  but

>
> I'm
>
>>>going to suggest that the theta operator between the two zip_code

>
> columns is
>
>>>entirely within the meaning of the data.  Intuiting that the zip_code
>>>columns are some sort of reference,  we look around for a table whose
>>>primary key is zip_code, only to find that there isn't one!  At least,

>
> there
>
>>>isn't one in this schema.
>>>
>>>It turns out that our database doesn't distinguish between valid

>
> zipcodes
>
>>>and invalid ones, and doesn't have any constraints in that regard.  Of
>>>course there's a table of all valid zip_codes, somewhere in the US

>
> postal
>
>>>services databases, but that's outside our universe of discourse.
>>>
>>>Now,  this join doesn't necessarily conform to Marshall's referential
>>>integrity constraint.  There could be students with no matching

>
> campuses, or
>
>>>campuses with no matching students.  Because the realtionship is
>>>many-to-many,  there could also be multiple result rows with the same
>>>student or the same campus.  As a side issue, two different students

>
> could
>
>>>have the same first_name and last_name,  but I'm not dealing with that.
>>>
>>>What I'm going to say about this is that both s.zip_code and c.zip_code

>
> are
>
>>>"dangling foreign keys".  They are foreign keys to a primary key that

>
> isn't
>
>>>defined in this schema.
>>
>>In that case, any two attributes with the same name are "dangling" fk's,
>>  in fact, given a rename operator or equivalent syntax, all attributes
>>are "dfk's".  What good does it do to introduce a name that applies to
>>all attributes?
>>

>
>
> I disagree. I'm going to suggest that "measurements" are not dangling fk's.
> If we say
> that a given sack of rice has mass of 6.7255 kilos, 6.7255 is not a
> "dangling reference to a pk".
> It's a proportion between our measurement and the standard kilo. I'll
> concede that "kilo" is a dangling fk. It's not even an exact number. It's
> subject to measurement error, and also to computer induced roundoff error.
>
> That's at least one kind of data that is not a dangling fk. Agreed?
> ...

With my (usually) strict, I hope, interpretation of domains and attributes as I think Codd intended them, I don't think an RT implementation can distinguish between a domain that stands for measurements and one that stands for kinds of things such as rice. My attitude is that talking about kinds of data is verging on the mystical (if we are not talking about operators that apply to only certain "kinds" of data).

(That doesn't mean to say that I think RT is frozen in time, just that I don't have variations on it that I can express sensibly.)

p Received on Sat Mar 03 2007 - 01:14:48 CET

Original text of this message