Re: Dawn doesn't like 1NF

From: Marshall Spight <mspight_at_dnai.com>
Date: Sat, 16 Oct 2004 12:23:50 GMT
Message-ID: <qp8cd.374924$mD.201649_at_attbi_s02>


"Laconic2" <laconic2_at_comcast.net> wrote in message news:8s6dnYcmsad6nOzcRVn-sw_at_comcast.com...
>
> "Marshall Spight" <mspight_at_dnai.com> wrote in message
> news:hl0cd.188364$wV.92433_at_attbi_s54...
> >
> > Any feelings people have that [equality] is complicated is
> > an artifact of the fact that all the systems we have *make* this problem
> > complicated. But the problem itself is simple: you compare all the
> > logical subelements, part by part, recursively, until you get to the
> > primitives, which you can compare directly.
>
> I am not sure you got my point. In order to carry out the process you
> outlined, you have to know what the logical subelements ARE. For a UDT,
> only U knows what the logical sublements are.

Why should it be the case that only the user knows what the logical subcomponents are?

[This is the fundamental question at hand; everything else I've written below is just a distraction from this essential question.]

> > > Is {1, 2, 3} equal to {2, 3, 1} ?
> >
> > Yes.
>
> And how to do go about figuring out that they are equal? And for
> two sets, each with a million elements, how do you carry out the
> procedure you outlined above, without sorting? And, if you
> do sort, how long will it take?

The question we are discussing is whether system-generated equality testing is decidable, not whether it is fast. Let's say you want to do a self-join on that set, how do you carry it out without sorting? And if you do sort, how long will it take?

Sure, equality testing on megabytes of data will require at least one access to every byte and perhaps more; that's how long it takes to do the job. Finding the median of one million integers is *conceptually* quite simple, don't you think?

> My above argument regarding UDT doesn't apply here, because we can expect
> the system to "know" sets. But testing two sets for equality is still non
> trivial.

There might be a lot of code accessed to handle equality of a relation over 100 different data types, but I would describe equality testing as *conceptually* quite simple.

> > > Is 3.21E3 equal to 32.1E2 ?
> >

> > Is three thousand two hundred and ten equal to three thousand two
> > hundred and ten? Certainly.
>
> OK, but how about if these two have different internal representations in
> binary? (different exponent, different mantissa, same value). This is
> really the synonym problem, in a different guise.

Fair enough, and let me just add that floating point numbers suck in a lot of different ways.

Another question with the much less (conceptually) sucky rational numbers: is 2/4 equal to 1/2? Yes, but you might have to figure gcd to find that out.

> I never worked with IEEE floating numbers. I worked with floating point
> arithmetic with three processors: DEC PDP-6, DEC PDP-10, and IBM 7090. In
> all of them, there was a process the the floating point arithmetic went
> through after calculating the answer. It was called "normalizing"
> (coincidence, or not?). It would adjust the exponent and mantissa until the
> mantissa was in a certain range or something.

Sure.

> > There are only two areas of complexity in equality: identity and
> > precision. Identity disappears completely if you don't allow pointers
> > in the language. Precision is harder: is the rational number 1/3 equal
> > to the IEEE-754 double precision floating point value 1.0d/3.0d?
> > Sadly no, but this is a simple consequence of having finite precision
> > floating point numbers. Rationals have no such trouble, but floats
> > are implemented in hardware, so we typically must bow down
> > before their awesome performance.
>
> OK, I should have said earlier that I was NOT dealing with the
> "approximately equal to" question.

Hmm. I'm not sure this is the "approximately equal to" question. I think what I did was confuse the issue by bringing up equality of values of different data types; that was probably a mistake on my part. Heh heh: our *three* main areas of complexity are identity, precision, and equality of values of different data types. And an almost fanatical devotion to the Pope.

> BTW, it's not that floats are implemented in hardware that causes the
> problem: it's that floats are implemented in a finite state machine, rather
> than a general Turing machine. You would have the same problem with
> software floats, although you could obviate it in a significant subset of
> cases.

I only mentioned floats being in hardware because I would otherwise exclude them from consideration; they have no value in the application areas I'm used to working in, although I recognize (grudgingly) that they are quite important in certain sci/tech applications.

Marshall Received on Sat Oct 16 2004 - 14:23:50 CEST

Original text of this message