Re: The naive test for equality

From: David Cressey <david.cressey_at_earthlink.net>
Date: Wed, 10 Aug 2005 14:27:17 GMT
Message-ID: <99oKe.3381$RZ2.993_at_newsread3.news.atl.earthlink.net>

"Paul" <paul_at_test.com> wrote in message news:42f91fe6$0$17488$ed2e19e4_at_ptn-nntp-reader04.plus.net...

> If I store the value 2/4 in a database, then I retrieve it later, do I
> want it to remember the representation as well as the value? i.e. should
> it return 2/4 or 1/2?

If the value "2/4" being stored is of type "rational number", then it's better to convert it to canonical form (what I've been calling "normalizing") before storing, IMO.

If the value "2/4" being stored is of type "arithmetic expression", it might be better to leave it as is, IMO.

>
> If we always convert to a canonical form before storing then the DBMS
> can do equality comparisons at the physical bit level without recourse
> to the type details.

Yes, but sometimes the information requirements have needs that are obliterated by converting to canonical form.

By the way, I like the term "canonical" better than "normalized". Less confusion.

What follows is another discussion: I'm suggesting that when Codd first mentioned "normalizing" relations in the 1970 paper, what he was suggesting is that there is an equivalence class grouping within sets of systems of relations where, within an equivalence class, all the systems of relations are equivalent in terms of their capacity to meet information storage requirements. Of all of these sets, the canonical or normalized one is the one the Codd was proposing be used.

However, he wasn't, as I read him, denying that the other systems of relations within an equivalence class were also "valid relational models".

This is different from the way CJ Date interprets Codd, I acknowledge. But I think it's what Codd meant.

>
> Another alternative might be for it to be internally stored both as the
> inserted form (2/4) and the canonical form (1/2). Not very good for disk
> space usage but if this isn't an issue...
>

See above.

> An important part of defining a type might be to make sure the optimizer
> knows whether there is a 1-1 mapping between values and representatives
> or not. If there is, it makes equality comparisons easier.
>
> Some databases can compare strings for equality in a case-insensitive
> way. (MSSQL has this option for example.) When you store a string, it
> remembers the case of each letter. But when you test two strings for
> equality, it ignores case. So here's a common example of a type where it
> is 1-many between values and representations.
>

The case insensitive equality test is a very good example. Better than "Rationals" and better than "floating point numbers".

How many times have I had to write "where upcase (x.foobar) = upcase (y.foobar)" in Oracle? And how many times has it done a sort-merge join instead of using the index, because the index was built using the case sensitive form?

And the upcase function neatly illustrates the difference between the "relational engine" and the "type engine". The "upcase" function is clearly part of the character tstring type engine, while the "where" clause is clearly part of the "realtional engine" of Oracle. (I suppose the purists would want to say, "the quasi relational engine of Oracle". I won't quibble.) Received on Wed Aug 10 2005 - 16:27:17 CEST

This message: [ Message body ]
Next message: -CELKO-: "Re: any sources of obscure (and not so obscure) database design strategies?"
Previous message: vc: "Re: The naive test for equality"
Maybe in reply to: Marshall Spight: "Re: The naive test for equality"
In reply to Paul: "Re: The naive test for equality"
Next in thread: Paul: "Re: The naive test for equality"
Reply: Paul: "Re: The naive test for equality"
Reply: Gene Wirchenko: "Re: The naive test for equality"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message