Re: How to normalize this?

From: <hugokornelis_at_gmail.com>
Date: Thu, 23 May 2013 05:43:46 -0700 (PDT)
Message-ID: <abf54530-4c5e-4a64-b562-e116087744f2_at_googlegroups.com>


Sorry for my long absence, everyone. When there were no new replies after a week, I had stopped checking. And now I come back and see that in the mean time, a long and very interesting discussion has taken place.

Thanks for your additional insights, Jan, Erwin, Philip, and Nicola! (I hope I missed nobody)

On Wednesday, May 1, 2013 11:20:47 AM UTC+2, Jan Hidders wrote:
>
> But if you have a realistic example that would show otherwise,
>
> that would be interesting.

I do indeed have a realistic (well, somewhat) example that illustrates the problem that started this topic. I have already presented it earlier in the discussion, but I'll repeat it. And I will reword it slightly to address the concern, expressed in some messages, that the issue can be an artifact of the initial table design.

As mentioned before - this is not an example I came across for real, I made it up to check for myself if the set of FDs I presented in the first message would even be possible. The example is a bit far-fetched, but I think realistic enough that it could be real.

A charity is organizing a big fundraiser event. Several rock bands and several opera singers will participate. Throughout the country, in different locations, there will be performances where one rock band and one opera singer will sing one song together. This event will last a whole month. On any given date in that month, there may be performances in different locations. At any given location, there may be multiple performances at different dates. A band can play more than one song on a given date, but not with the same opera singer. They might have another performance with the same singer at another date, either at the same or another location, and either with the same or another song. If a band does play with multiple opera singers at the same date, it may be the same or a different song. An opera singer can play more than one song on a given date, but not with the same band. They might have another performance with the same band at another date, either at the same or another location, and either with the same or another song. If an opera singe does play with multiple bands at the same date, it may be the same or a different song.

Translating the above in functional dependencies results in: (Band, Singer, Date) --> Location and (Band, Singer, Date) --> Song.

But there is one additional requirement. If a band plays multiple songs on the same date, they all have to be in the same location. And the same restriction goes for opera singers. Though technically possible to have a band or singer perform in New York and London on the same day, the organization lacks the travel budget for it. So now we have (Band, Date) --> Location and (Singer, Date) --> Location. One of the original FDs, (Band, Singer, Date) --> Location, is now not a full FD anymore, as it is implied by these two new FDs.

Note that the entire description makes no mention at all of tables. It's just the functional requirements, and the resulting FDs. So there is no initial table design that can cause artifacts.

As far as I can see, there are no other FDs implied by the scenario. I apply Bernsteins algorithm to the FDs, and get three relations:

R1: {Band, Singer, Date, Song} with (Band, Singer, Date --> Song)
R2: {Band, Date, Location} with (Band, Date --> Location)
R3: {Singer, Date, Location} with (Singer, Date --> Location)

The inclusion dependencies are totally obvious, but for completeness sake I'll mention them:

1: R1.(Band, Date) --> R2.(Band, Date)
2: R2.(Band, Date) --> R1.(Band, Date) (*)
3: R1.(Singer, Date) --> R3.(Singer, Date)
4: R3.(Singer, Date) --> R1.(Singer, Date) (*)
(*) These two are, unfortunately, not enforcable by most popular RDBMS's. But let's assume for now that we have one that doesn't suffer from this limitation. This restriction on most RDBMS's is sufficiently known that I expect the application code to enforce the dependency.

Now suppose that Placido Domingo and Fun are scheduled to play "Imagine" in London on July 15, 2013. Placido calls and says he can't make it to London, he has to be in New York for other obligations. The organization has three options: cancel this performance, reschedule for another date, or move it to New York.

Choice 1: Cancel. Remove the tuple from R3, and INDs 4 and 1 force you to remove (or do it automatically, if defined as cascading) the performance of Placido and Fun of Imagine, and the appearance of Fun in London. That is okay; this is what you would expect in this scenario if a singer has to cancel.
Choice 2: Reschedule. Change the date in R3 to July 17, and again INDs 4 and 1 force you to change (or do it automatically, if defined as cascading) the date in R1 and R2 as well. So Imagine will now be played by Fun and Placido on May 17, and Fun will be in London at that time. Again, just what we want.
Choice 3: And here's the problem. I can update the location for (Placido, July 15) in R3 to New York, and the RDBMS will accept the location. So unless someone wakes up at the right time, we will have Fun in London and Placido in New York for their joint performance of Imagine. Let's hope someone sets up the sattelite connection in time! ;)


I expect that, based on this realistic example, you'll reaffirm that there is no better way than this, and that I'll just need more code to ensure that singers and bands that are supposed to perform together will, actually, do so at the same location. But if this example does open the door to alternative solutions, I'm all ears!

Cheers,
Hugo
Received on Thu May 23 2013 - 14:43:46 CEST

Original text of this message