Re: Bidirectional Binary Self-Joins

From: JOG <jog_at_cs.nott.ac.uk>
Date: 30 Mar 2007 08:08:44 -0700
Message-ID: <1175267324.038565.309440_at_r56g2000hsd.googlegroups.com>


On Mar 30, 2:40 pm, "David Cressey" <cresse..._at_verizon.net> wrote:
> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>
> news:1175256984.344299.77790_at_l77g2000hsb.googlegroups.com...
>
>
>
> > On Mar 30, 3:41 am, "Kevin Kirkpatrick" <kvnkrkpt..._at_gmail.com> wrote:
> > >[snip]
> > > Finally, JOG - I toyed around with the notion of:
> > > Game {Date, Team, Score, Team, Score}
> > > suffice to say, that way leads madness....
>
> > I would never propose such a schema, so I hope you are not suggesting
> > I did - with no connection between Score and Team other than attribute
> > order (which is totally against the principles of RM), such a
> > structure is useless. , I always find it easiest to go back to the
> > propositions you want to record. If every proposition was of the
> > nature:
>
> > "There was a game on Date [Dec-12], where the Home team [Hope] scored
> > [59] and the Away Team [Calvin] scored [32]"
>
> > Then a schema of ( Date, Home_team, Home_score, Away_team, Away_score)
> > is sufficient. However if there was no such distinction between home
> > and away teams (perhaps you are recording finals played at neutral
> > grounds), then your proposition is more like:
>
> > "On Date [Dec-12], [Hope] played [Calvin], with [Hope] scoring [59]
> > and [Calvin] scoring [32]"
>
> > The conjunction in that sentence sets alarm bells off in my head,
> > because it means I could normalize this proposition to:
> > "On Date [Dec-12], [Hope] played [Calvin]"
> > "On Date [Dec-12], [Hope] scored [59]"
> > "On Date [Dec-12], [Calvin] scored [32]"
>
> > Then I realize I have two different types of propositions, and hence
> > two different types of relations. This corresponds to your design for
> > 1a, but it highlights the process of how I would have started to get
> > there. It also shows that you do not need some sort of artificial ID
> > key - the date and the team are enough to serve as identity.
>
> > Now, what is bugging me however is that my first relation would have
> > to have attributes team1 and team2, which is entirely unsatisfactory.
> > RM prohibits the repetition of attribute names, even when the
> > attributes are playing identical, equal roles in the relationship. I
> > could solve this by adding a surrogate, to represent the game as a
> > whole, and then normalizing, but this seems worryingly artificial,
> > given in RL a game is identified by the teams and the date. Any
> > illumination from cdt is more than welcome.
>
> This is bugging me, too. In particular, it seems to me that the
> proposition:
>
> "On Date [Dec-12], [Hope] played [Calvin]"
>
> and the proposition:
>
> "On Date [Dec-12], [Calvin] played [Hope]"
>
> Imply each other. But the way one of them is stated doesn't make that
> "obvious", at least not to me.

But they can also be viewed as exactly the same proposition. The proposition may be:
"On Date [Dec-12], there was a game that featured [Calvin] and [Hope]" And obviously the "and" is commutative.

>
> The second thing that bothers me is this: if you had two columns, namely
> team1 and team2, and you wanted to make a list of all the games hope played
> in, you would have to lookup in two columns.

Yes, this is certainly hideous. But then it is also unintuitive and arbitrary too, given there would be no way of determining who was the 'first' team and who was the 'second'.

> This is alarmingly like the
> reason first advanced to me, way back in about 1983, for conforming to 1NF.
> If you have to look in two places to find one fact, there's somethnig
> wrong. In fact, it almost seems as though the two columns "team1" and "team2" are a
> repeating group, cleverly disguised.

Is it a repeating group though? Its not like we have one attribute with two values in it - rather we have two attributes with the same name. My spider-sense (in lieu of any hard reasoning) is telling me that there's some subtle but crucial difference there.

My view of 1NF has gradually evolved over the last year. I now see it as a fundamental to good modelling and as providing a vital mathematical foundation, in that it requires that "one attribute takes one value from one domain". Is having two attributes in a relationship with the same name contrary to this? Once I would have said yes definitely. Now I am less certain. Received on Fri Mar 30 2007 - 17:08:44 CEST

Original text of this message