Path: news.f.de.plusline.net!news-fra1.dfn.de!kanaga.switch.ch!switch.ch!newsfeed-00.mathworks.com!NNTP.WPI.EDU!elk.ncren.net!newsflash.concordia.ca!News.Dal.Ca!ursa-nb00s0.nbnet.nb.ca!53ab2750!not-for-mail
From: Bob Badour <bbadour@pei.sympatico.ca>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.databases.theory
Subject: Re: Discovering new relationships
References: <Oo%Fh.2$Tf.1@trndny03> <og2Gh.1206633$R63.245949@pd7urf1no> <zO2Gh.692$Tf.216@trndny03>
In-Reply-To: <zO2Gh.692$Tf.216@trndny03>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 110
Message-ID: <4U2Gh.4798$PV3.45435@ursa-nb00s0.nbnet.nb.ca>
Date: Sat, 03 Mar 2007 00:00:00 GMT
NNTP-Posting-Host: 142.176.114.242
X-Complaints-To: abuse@aliant.net
X-Trace: ursa-nb00s0.nbnet.nb.ca 1172880000 142.176.114.242 (Fri, 02 Mar 2007 20:00:00 AST)
NNTP-Posting-Date: Fri, 02 Mar 2007 20:00:00 AST
Organization: Sympatico-Subscriber
Xref: news.f.de.plusline.net comp.databases.theory:42903

Walt wrote:

> "paul c" <toledobythesea@oohay.ac> wrote in message
> news:og2Gh.1206633$R63.245949@pd7urf1no...
> 
>>Walt wrote:
>>
>>>It seems to me that, sometimes, relationships in data are discovered by
>>>other people than the ones who initially tabulated the data.  This is
>>>tangentially related to the discussion on constraints and functional
>>>dependencies.
>>>
>>>Let me start with a couple of examples.
>>>
>>>The first example involves a scenario that I'll call the "case of the
>>>missing validation table".
>>>
>>>We've got a schema with tables about students, and tables about campuses
> 
> for
> 
>>>a university.
>>>Somebody comes up with this query  (apologies for using SQL).
>>>
>>>select s.first_name, s.last_name, c.campus_name
>>>from students as s, campuses as c
>>>where s.zip_code = c.zip_code
>>>
>>>I have no idea why someone suddenly decided this query was useful,  but
> 
> I'm
> 
>>>going to suggest that the theta operator between the two zip_code
> 
> columns is
> 
>>>entirely within the meaning of the data.  Intuiting that the zip_code
>>>columns are some sort of reference,  we look around for a table whose
>>>primary key is zip_code, only to find that there isn't one!  At least,
> 
> there
> 
>>>isn't one in this schema.
>>>
>>>It turns out that our database doesn't distinguish between valid
> 
> zipcodes
> 
>>>and invalid ones, and doesn't have any constraints in that regard.  Of
>>>course there's a table of all valid zip_codes, somewhere in the US
> 
> postal
> 
>>>services databases, but that's outside our universe of discourse.
>>>
>>>Now,  this join doesn't necessarily conform to Marshall's referential
>>>integrity constraint.  There could be students with no matching
> 
> campuses, or
> 
>>>campuses with no matching students.  Because the realtionship is
>>>many-to-many,  there could also be multiple result rows with the same
>>>student or the same campus.  As a side issue, two different students
> 
> could
> 
>>>have the same first_name and last_name,  but I'm not dealing with that.
>>>
>>>What I'm going to say about this is that both s.zip_code and c.zip_code
> 
> are
> 
>>>"dangling foreign keys".  They are foreign keys to a primary key that
> 
> isn't
> 
>>>defined in this schema.
>>
>>In that case, any two attributes with the same name are "dangling" fk's,
>>  in fact, given a rename operator or equivalent syntax, all attributes
>>are "dfk's".  What good does it do to introduce a name that applies to
>>all attributes?
> 
> I disagree.  I'm going to suggest that "measurements" are not dangling fk's.
> If we say
> that a given sack of rice has mass of 6.7255 kilos,  6.7255 is not a
> "dangling reference to a pk".

By your earlier definition, of course it is! Suppose someone wants to 
match sacks in kilos with skids or containers in tonnes.


> It's a proportion between our measurement and the standard kilo.  I'll
> concede that "kilo" is a dangling fk.  It's not even an exact number.  It's
> subject to measurement error,  and also to computer induced roundoff error.
> 
> That's at least one kind of data that is not a dangling fk.  Agreed?

Not agreed. Dangling fk has no useful meaning.


>>>And I'm going to suggest that this kind of join is
>>>occasionally legitimate,  and that there is no mathematical reason to
>>>disallow it.
>>>...
>>
>>It is as legitimate or as illegitimate as the user's understanding of
>>the schema is.  There is no RT reason to disallow it either.
> 
> Good point.
