Re: Designing DB in accordance with the relational model

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Sat, 13 Nov 2010 00:49:57 +0100
Message-ID: <erhrd6lhprmiifj75rkck1n1j18c2goho2_at_4ax.com>


On Thu, 11 Nov 2010 12:29:08 -0800 (PST), Kentauros <joker.vd_at_gmail.com> wrote:

>On Nov 11, 10:32 pm, Hugo Kornelis
><h..._at_perFact.REMOVETHIS.info.INVALID> wrote:
>> But why is that a problem?
>> "Wario has defense 17 in Super Mario World"
>> "Wario has defense 17 in Super Mario World 2"
>> "Wario has defense 17 in Super Mario World 3D"
>> "Wario has defense 17 in Super Mario World - the last part"
>> "Wario has defense 17 in Super Mario World - the sequel to the last
>> part"
>>
>> There is nothing wrong with this extension. The candidate key for this
>> relvar is (Character, Game), and there are no violations of this key,
>> or of any other constraint.
>The redundancy. That is what wrong.

Hi Kentauros,

I see no redundancy. These are five distinct facts. You can't remove one of these facts without losing information.

Don't forget that in the context of databases, "redundancy" refers to representing the same *fact* more than once, not about storing the same *value* more than once.

However, there are alternative ways to represent the same information. Some of them may use less facts than the five I used above; others may use more facts. You should not mistake a difference in number of facts stored for redundancy.

To clarify, let me forst extend my original example a bit, to two characters and two characterstics;

"Wario has defense 17 in Super Mario World"
"Wario has defense 17 in Super Mario World 2"
"Wario has defense 17 in Super Mario World 3D"
"Wario has defense 17 in Super Mario World - the last part"
"Wario has defense 17 in Super Mario World - the sequel to the last
part"
"Peach has jump 5 in Super Mario World"
"Peach has jump 7 in Super Mario World 2"
"Peach has jump 7 in Super Mario World 3D"
"Peach has jump 7 in Super Mario World - the sequel to the last part"

As you see, the jump power of Peach is not the same in all games, and in one game she has none.

These nine facts can also be represented in another way, if we also represent facts about the order of these games. The alternative would be to use these 8 facts:

"Super Mario World 2 is the sequel to Super Mario World".
"Super Mario World 3D is the sequel to Super Mario World 2".
"Super Mario World - the last part is the sequel to Super Mario World
3D".
"Super Mario World - the sequel to the last part is the sequel to Super Mario World - the last part".

"Wario has defense 17 from Super Mario World until Super Mario World - the sequel to the last part".

"Peach has jump 5 in Super Mario World". "Peach has jump 7 from Super Mario World 2 until Super Mario World 3D".
"Peach has jump 7 in Super Mario World - the sequel to the last part"

Is this alternative better? Well, that depends. If you want to minimize the number of facts, it is. If you want to minimize the number of predicates, it isn't. If you want to minimize the amount of bytes required to store the facts - well, I guess that depends on lots of information I don't have. And if you want to minimize the complexity of your queries, then, again, it depends on what kind of queries you expect.

So which one you choose (and not just from these two; there must be more alternatives) is up to you. As long as you stick to your choice. You should not try to combine these alternatives - as that WOULD introduce the redundancy you already thought to see in the first alternative.

>> I'm wondering why you have introduced a character number, and why the
>> CharacterName is not a candidate key. Since this is all about tracking
>> different characteristics of the same character in different games, I
>> would have expected the character name to be the most obvious
>> identifier for characters - or are there indeed any games with a
>> "Sonic" character that is not derived from the original hedgehog?
>Because these CharacterNumbers exist in the games, they're visible to
>players, and they behave just like AUTOINCREMENT. Well, if the game
>characters have such wonderful characterstic, I surely pick it as a
>primary key.

Ah, yes. That makes sense than.

>> >But it seems that the table Characters has to go characters have
>> >different sets of characteristics in different games, and I can't
>> >predict what will (dis)appear or which remained characteristic change
>> >-- except for name.
>>
>> But why would that impact the table Characters? If a character exists
>> in a game, you can represent that fact in this table, regardless of
>> his/her characteristics.
>It's the work for CharacterExistence. The table Characters were
>intended to store characters' characteristics. Maybe
>CharactersCharacteristics would be a better name.

Unless it is possible for a character to exist in a game with no characteristic at all (and you have already told us it isn't), the existence of a character in a game is implied by the character having characteristics in that game; adding a seperate relvar to store its existence would be redundant.

(snip)
>> This is not a good constraint. What will you do if a new game is
>> relased next month (just in time for Christmas), where special is
>> mandatory again? Change the constraints?
>Yes. Just change the constraints. So the constraints will mean
>"Special is mandatory in the first and the (currently) last games, but
>is absent in any other game". It's pretty simple and won't take any
>changing in table.

Most people in the field will tell you that it is not good practice to have constraints that need to change based on foreseeable changes in the world.
They'll also tell you that is is best not to mix data and metadata.

Your constraint violates both advices.

(snip)
>> I'm also wondering about your choice to use seperate tables for each
>> characteristic (attack, defense, special, ...). Not that this is
>> really WRONG, but it might not be the best choice either. Have you
>> considered the predicate "<Character> has <Characteristic> <score> in
>> <Game>" to replace all these?
>Yes. And I don't think it will work. What will the domain of <score>
>be? Alpha? No, thank you very much. Such relation would require an
>enormous number of constraints.

Yes, that is true. Welcome rock, meet hard place. You'll either have a model that's easy to query and has lots of nasty constraints, or you can choose a model with relatively simple constraint that is tough to query. Neither of these choices is automatically better than the other; it's a choice you'll have to make, based on how you expect the system to be used.

-- 
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Received on Sat Nov 13 2010 - 00:49:57 CET

Original text of this message