Re: Dawn doesn't like 1NF
Date: Sat, 16 Oct 2004 09:20:44 -0400
Message-ID: <usqdnWvN0NSkvezcRVn-rQ_at_comcast.com>
"Marshall Spight" <mspight_at_dnai.com> wrote in message
news:qp8cd.374924$mD.201649_at_attbi_s02...
> "Laconic2" <laconic2_at_comcast.net> wrote in message
news:8s6dnYcmsad6nOzcRVn-sw_at_comcast.com...
> >
> > "Marshall Spight" <mspight_at_dnai.com> wrote in message
> > news:hl0cd.188364$wV.92433_at_attbi_s54...
> > >
> > > Any feelings people have that [equality] is complicated is
> > > an artifact of the fact that all the systems we have *make* this
problem
> > > complicated. But the problem itself is simple: you compare all the
> > > logical subelements, part by part, recursively, until you get to the
> > > primitives, which you can compare directly.
> >
> > I am not sure you got my point. In order to carry out the process you
> > outlined, you have to know what the logical subelements ARE. For a
UDT,
> > only U knows what the logical sublements are.
>
> Why should it be the case that only the user knows what the logical
> subcomponents are?
>
> [This is the fundamental question at hand; everything else I've
> written below is just a distraction from this essential question.]
Because the user defined the datatype. The meaning of equality is type specific. If the system didn't know the datatype to begin with, then it doesn't understand equality for that type.
>
>
> > > > Is {1, 2, 3} equal to {2, 3, 1} ?
> > >
> > > Yes.
> >
> > And how to do go about figuring out that they are equal? And for
> > two sets, each with a million elements, how do you carry out the
> > procedure you outlined above, without sorting? And, if you
> > do sort, how long will it take?
>
> The question we are discussing is whether system-generated equality
> testing is decidable, not whether it is fast. Let's say you want to
> do a self-join on that set, how do you carry it out without sorting?
> And if you do sort, how long will it take?
>
> Sure, equality testing on megabytes of data will require at least
> one access to every byte and perhaps more; that's how long it
> takes to do the job. Finding the median of one million integers
> is *conceptually* quite simple, don't you think?
>
>
> > My above argument regarding UDT doesn't apply here, because we can
expect
> > the system to "know" sets. But testing two sets for equality is still
non
> > trivial.
>
> There might be a lot of code accessed to handle equality of a relation
> over 100 different data types, but I would describe equality testing
> as *conceptually* quite simple.
>
>
I agree that it's conceptually quite simple. I claim, although I can't demonstrate it, that Codd was quite aware of this conceptual simplicity when he defined "Normal Form" the way he did in 1970. I further claim that his definition of normal form was more restrictive than the mathematics of relations required, but not for logical reasons. Rather, it was a concession to making an RDBMS "feasible". As it was, it was 8 years before the first attempt, and 10 years before the first commercial attempt. Moore's law had to do its magic.
> > > > Is 3.21E3 equal to 32.1E2 ?
> > >
> > > Is three thousand two hundred and ten equal to three thousand two
> > > hundred and ten? Certainly.
> >
> > OK, but how about if these two have different internal representations
in
> > binary? (different exponent, different mantissa, same value). This is
> > really the synonym problem, in a different guise.
>
> Fair enough, and let me just add that floating point numbers suck in
> a lot of different ways.
Yes.
>
> Another question with the much less (conceptually) sucky rational
> numbers: is 2/4 equal to 1/2? Yes, but you might have to figure
> gcd to find that out.
>
Yes.
>
> > I never worked with IEEE floating numbers. I worked with floating
point
> > arithmetic with three processors: DEC PDP-6, DEC PDP-10, and IBM 7090.
In
> > all of them, there was a process the the floating point arithmetic went
> > through after calculating the answer. It was called "normalizing"
> > (coincidence, or not?). It would adjust the exponent and mantissa until
the
> > mantissa was in a certain range or something.
>
> Sure.
>
>
> > > There are only two areas of complexity in equality: identity and
> > > precision. Identity disappears completely if you don't allow pointers
> > > in the language. Precision is harder: is the rational number 1/3 equal
> > > to the IEEE-754 double precision floating point value 1.0d/3.0d?
> > > Sadly no, but this is a simple consequence of having finite precision
> > > floating point numbers. Rationals have no such trouble, but floats
> > > are implemented in hardware, so we typically must bow down
> > > before their awesome performance.
> >
> > OK, I should have said earlier that I was NOT dealing with the
> > "approximately equal to" question.
>
> Hmm. I'm not sure this is the "approximately equal to" question.
> I think what I did was confuse the issue by bringing up equality
> of values of different data types; that was probably a mistake on
> my part. Heh heh: our *three* main areas of complexity are
> identity, precision, and equality of values of different data types.
> And an almost fanatical devotion to the Pope.
>
>
> > BTW, it's not that floats are implemented in hardware that causes the
> > problem: it's that floats are implemented in a finite state machine,
rather
> > than a general Turing machine. You would have the same problem with
> > software floats, although you could obviate it in a significant subset
of
> > cases.
>
> I only mentioned floats being in hardware because I would otherwise
> exclude them from consideration; they have no value in the application
> areas I'm used to working in, although I recognize (grudgingly) that
> they are quite important in certain sci/tech applications.
>
>
> Marshall
>
>
Received on Sat Oct 16 2004 - 15:20:44 CEST