Re: How to normalize this?

From: <hugokornelis_at_gmail.com>
Date: Sun, 10 Feb 2013 08:24:40 -0800 (PST)
Message-ID: <e39ccd43-7b17-4b6e-87a7-c3af6f06ee29_at_googlegroups.com>


Op zondag 10 februari 2013 15:45:16 UTC+1 schreef Eric het volgende:
> I think (i) applies, the mistake being not giving the algorithm all the
>
> information. Given _only_ the 3 FDs and nothing else, is updating
>
>
>
> (a1, b1, c1, d1)
>
> (a1, b1, e1)
>
> (b1, c1, e1)
>
>
>
> to
>
>
>
> (a1, b1, c1, d1)
>
> (a1, b1, e1)
>
> (b1, c1, e2)
>
>
>
> producing inconsistent data? I can not see any reason why it would be,
>
> except your "fact" about the original design, which did not feed into
>
> the algorithm. Without that "fact" you have neither an update anomaly
>
> nor a redundancy. If you made the correct decision about the original
>
> FD being wrong then your "fact" is not a fact at all.

Hi Eric,

Yes, you are right. With only the given FDs, that would not be an inconsistency. I had overlooked that in my previous reply; it is only an inconsistency in the original problem (which gave the information that (ABCDE) was a good solution until I found the stricter FDs.

> The other possibility is that the new set of FDs is still wrong, and
>
> there is other information that that has not come to light. You may have
>
> to go back to the real world and make sure you have the complete set of
>
> FDs.

Unfortunately, there is no real world problem to go back to. This problem stems from my thinking about normalization and trying to understand how the rules behave with edge cases.

However, I did at one point start to doubt if, perhaps, I had formulated a set of FDs that could impossible exist in the real world, so I then set myself the task of trying to find a (more or less) realistic scenario with this set of FDs. I managed to find something that, while probably not realistic enough to qualify as "real world" (so please don't comment on that), is believable enough to qualify as a classroom exercise.

A charity is organizing a big fundraiser event. Several rock bands and several opera singers will participate. Throughout the country, in different locations, there will be performances where one rock band and one opera singer will sing one song together. This event will take place over a month. On any given date, there may be performances in different locations. At any given locations, there may be performances at different dates. A band can play more than one song on a given date, but not with the same opera singer. They might have another performance with the same singer at another date, either at the same or another location, and either with the same or another song. If a band does play with multiple opera singers at the same date, it may be the same or a different song. An opera singer can play more than one song on a given date, but not with the same band. They might have another performance with the same band at another date, either at the same or another location, and either with the same or another song. If an opera singe does play with multiple bands at the same date, it may be the same or a different song.

Functional dependencies so far: (Band, Singer, Date) --> Location and (Band, Singer, Date) --> Song.

The additional requirement is: If a band plays multiple songs on the same date, they all have to be in the same location. And the same restriction goes for opera singers. Though technically possible to have a performing artist perform in New York and London on the same day, the organization lacks the travel budget for it. So now we have (Band, Date) --> Location and (Singer, Date) --> Location, both superseding the original (Band, Singer, Date) --> Location.

In this example scenario, if Placido Domingo and Fun are scheduled to play "Imagine" in London on March 15, 2013, and I change the location for Placido Domingo to New York, I have to change it for Fun too - otherwise they won't be performing together!

In the original, single-table version, this is not a problem. It's one single update. But that violates 2NF. After solving that, we now have this issue. I can formulate the requirement that would be violated in terms of the example (if a band and singer perform on the same date, they must be in the same location on that date). Or in abstract terms for the original problem (for any (a1, b1, c1) in R1, there must be a (a1, b1, e1) in R2 and a (b1, c1, e2) in R3, such that e1 = e2 - forgive me if some of the terminology is wrong; I think about this in Dutch and have to translate). But I am not able to transform this requirement into the form of a regular functional dependency.

Cheers,
Hugo Received on Sun Feb 10 2013 - 17:24:40 CET

Original text of this message