Re: Discovering new relationships

From: paul c <toledobythesea_at_oohay.ac>
Date: Fri, 02 Mar 2007 23:17:40 GMT
Message-ID: <og2Gh.1206633$R63.245949_at_pd7urf1no>


Walt wrote:
> It seems to me that, sometimes, relationships in data are discovered by
> other people than the ones who initially tabulated the data. This is
> tangentially related to the discussion on constraints and functional
> dependencies.
>
> Let me start with a couple of examples.
>
> The first example involves a scenario that I'll call the "case of the
> missing validation table".
>
> We've got a schema with tables about students, and tables about campuses for
> a university.
> Somebody comes up with this query (apologies for using SQL).
>
> select s.first_name, s.last_name, c.campus_name
> from students as s, campuses as c
> where s.zip_code = c.zip_code
>
> I have no idea why someone suddenly decided this query was useful, but I'm
> going to suggest that the theta operator between the two zip_code columns is
> entirely within the meaning of the data. Intuiting that the zip_code
> columns are some sort of reference, we look around for a table whose
> primary key is zip_code, only to find that there isn't one! At least, there
> isn't one in this schema.
>
> It turns out that our database doesn't distinguish between valid zipcodes
> and invalid ones, and doesn't have any constraints in that regard. Of
> course there's a table of all valid zip_codes, somewhere in the US postal
> services databases, but that's outside our universe of discourse.
>
> Now, this join doesn't necessarily conform to Marshall's referential
> integrity constraint. There could be students with no matching campuses, or
> campuses with no matching students. Because the realtionship is
> many-to-many, there could also be multiple result rows with the same
> student or the same campus. As a side issue, two different students could
> have the same first_name and last_name, but I'm not dealing with that.
>
> What I'm going to say about this is that both s.zip_code and c.zip_code are
> "dangling foreign keys". They are foreign keys to a primary key that isn't
> defined in this schema.

In that case, any two attributes with the same name are "dangling" fk's,   in fact, given a rename operator or equivalent syntax, all attributes are "dfk's". What good does it do to introduce a name that applies to all attributes?

> And I'm going to suggest that this kind of join is
> occasionally legitimate, and that there is no mathematical reason to
> disallow it.
> ...

It is as legitimate or as illegitimate as the user's understanding of the schema is. There is no RT reason to disallow it either.

p Received on Sat Mar 03 2007 - 00:17:40 CET

Original text of this message