Re: The naive test for equality

From: paul c <toledobythesea_at_oohay.ac>
Date: Sun, 31 Jul 2005 22:58:19 GMT
Message-ID: <fIcHe.75636$s54.42619_at_pd7tw2no>


David Cressey wrote:
> There's been discussion about whether the relational engine does or does not
> understand equality. I'd like to suggest that the relational engine does
> understand the "what" of equality testing, but is sometimes naive about the
> "how". Before I go any further with this, let me first describe a naive
> test for equality.
>
> Here's how it works:
>
> Every value has a type, and is represented by a bit string.
>
> If values A and B are to be tested for equality, the first step is to see
> if the type of A and the type of B are the same. If not, then an error
> flag is raised, and the test returns FALSE.
>
> If A and B are of the same type, then the length of the two bitstrings are
> compared. If the lengths are different, the test returns FALSE.
>
> If the types are the same, and the lengths are the same, the bit strings
> are compared, one bit at a time. If any corresponding bit is different in A
> and B, the test returns FALSE.
>
> Otherwise, the test returns TRUE.
>
>
> Well, what makes me call this test naive? It's naive because it has tested
> the two representations for equality, and not the two values. The naive
> test will always work correctly, provided there are no synonyms and no
> homonyms. There are a lot of datatypes that have these two features, and
> for those datatypes, the naive test works just fine.
>
> I'm going to sidestep the issue of homonyms.
>
> What if we had a type engine with the following interesting feature: if we
> ask it to test "humid" and "moist" for equality, it returns TRUE. I'm
> going to sidestep the question of whether such a type engine is
> mathematically valid, and the question of how it is implemented.
>
> The point is that the naive test fails to deliver the same answer as the
> type engine does, because it doesn't recognize "humid" and "moist" as
> synonyms.
>
> Here's where the relational engine may be able to use the "what" of the
> equality test, but can't grasp the "how".
>
> There's more, but I'll leave here for now.
>
>
>
>
>

yes, i think that test is "naive" and i have said something like what you say, namely that the relational model (RM) doesn't understand equality. what i meant is that it is the domain or type system that understands it. this begs the question: "if the RM doesn't understand equality, how does the RM understand if its own types are equal?".

not sure if i agree that it even knows "what" (as you say) as opposed to "how" unless you mean that "what" the relational engine knows is that the type engine knows what scalar equality is.

i take it as a given that the original RM doesn't preclude a type system from allowing subtypes (proper subsets if you will, aka single inheritance) as well as overlapping subtypes (multiple inheritance if you will), because as far as i know, Codd's 1970 paper didn't disallow them (i'm ignoring the Tasmanian and RM2 versions). this seems to be advantageous for a relational engine, assuming it can invoke some type engine to answer the equality question for scalars (scalars in the sense that D&D use the word). advantageous mainly because it allows user defined types without adding concepts to the RM.

as for a relational engine's own types, relations being the main one, my cut at it is that the engine only knows whether two relations are the same or not the same as well as several flavours of 'not the same', such as one being a subset of the other or two relations having different headings. it only knows "sameness" as a result of what the type system has told it regarding the equality of scalar values. in this i'm omitting anything about other 'types' an engine would need, such as headings.

none of this precludes some type engine telling a relational engine that it is okay to go ahead and do a bitwise comparison for some attributes (ie. the naive comparison). it's just that i think the RM per se doesn't talk about this.

to go a bit further, i think the RM doesn't even require that a type system must always give the same answers for equality tests, although if this were allowed by a relational engine it might have to go to ridiculously elaborate lengths to make sure that permanent contradictions didn't arise within a db, such as between views and references and who knows what else. for that matter, within a single relation physical contradictions could arise, such as duplicates. but the RM isn't physical, so presumably when it presents a relation, an engine is always projecting on all physical attribute values in the first place as well as doing other integrity checks (that's just a joke - i doubt if any of them do actually project for that purpose but logically one would think they should - makes we wonder if the systemR people allowed duplicates in order to save time).

deciding whether this might be a good thing takes more imagination than i've got. for example, if the RE (relational engine) knew that the TE (type engine) had changed such that 'moist' was equal to 'humid' temporarily (by temporarily, i mean without affecting long-term persistence) then maybe it helps make a "what-if" view. but maybe doing this is logically the same as retracting or subtracting the 'moist' value from the domain, in which case we have made a different domain in which case maybe we should change the table headings to reflect different type names, which perhaps is the same thing as making a new db (the predicates would be different!). with a small, malleable engine, it might be fun to try this out. maybe somebody will. actually, i'd guess that lots of people have tried it out, unintentionally!

i don't mean to give the impression that just because the RM doesn't say much about equality that it is giving us permission to do any cockamamie thing we feel like. it's just that i think most of the considerations would be practical ones.

the RM doesn't say much about persistence either. as for databases and dbms's, i'll quote a short and sweet definition i like, by Fabian Pascal (i like it because it is a bit of a Rosetta Stone for me even though i call it Rosita because it makes life sweetah):

--------quote-------------------

§ A database represents a set of axioms.

§ Responses to queries represent theorems.

§ The process of deriving theorems from axioms is a proof, which is made by manipulating symbols according to agreed mathematical rules.

§ The theorems are true if and only if the axioms are true and the proofs are mathematically correct.

§ A (properly designed) RDBMS is a deductive logic system: it derives theorems from axioms.

§ Theorems are guaranteed to be true if and only if the DBMS ensures that (a) axioms are true and (b) the manipulation (proofs) are mathematically correct.

§ It is the function of the integrity component of a DBMS to ensure that the axioms are true.

§ It is the function of the manipulation component of the DBMS to ensure that the theorems are true.

By true we mean, of course, consistent with the integrity constraints in effect, which represent business rules in the database. -------end of quote--------------

in the first two points, Pascal's use of the word 'represent/s' drives home to me the fact that a db model is merely a metaphor and not real, or if you like, just perception. i find it interesting that in the last two points, he doesn't say that axioms and theorems "STAY true", ie. he never says "persistent". but he certainly does demand "integrity".

i won't apologize for this sloppy brainstorm as i recently saw a post by David where he said fun topics were okay. besides, this newsgroup speculates all over the place anyway. i've kept any mention of the 'domain support' being implemented by a separate 'relational engine' out of it so as not to go on forever.

pc Received on Mon Aug 01 2005 - 00:58:19 CEST

Original text of this message