Re: MV Keys

From: David Cressey <dcressey_at_verizon.net>
Date: Tue, 07 Mar 2006 22:53:40 GMT
Message-ID: <U9oPf.6561$452.3946_at_trndny03>

"dawn" wrote:
in message news:1141706092.790613.114350_at_p10g2000cwp.googlegroups.com...
>
> Brian Selzer wrote:

> It is not difficult to ask if two values are equal when they are list
> values. Why do you think it would be?
>

It's not difficult to ask, but it can be difficult to answer. In particular, it can require a sort.

consider this case: The run time engine is given two list variables, called "a" and "b".

It's asked to evaluate "a == b", and return true or false.

a evaluates to a list: (onions, mushrooms) b evaluates to a list: (mushrooms, onions).

Somewhere a data definition has been made available to the engine that says that for the particular type of list that a and b are, the proper evaluation is "TRUE". However, naive inspection of the list returns the answer "false". The programmer implementing the equality tester is not satisfied with the naive answer, because it's wrong. So here's what the programmer does: he sorts both lists alphabetically, then does the naive comparision. If the comparison yields "false" we can now safely return "false" to the poser of the question.

So far so good. We've made the question of order in the list moot, by imposing an arbitrary order (alphabetical) on the list. Either list can be manipulated (insert, remove) using the standard list manipulation primitive operations. And all we have to do is two trivial little sorts, every time we want to test for equality.

But now, what if we scale the lists up to lists of about 10,000 items apiece, and we ask for an equality test. Maybe it's going to take a second to perform two sorts of 10,000 items. Supposing we now have to ask the question 10,000 times in order to complete a given task. Now the process is going to take 10,000 seconds. That's almost three hours. If you have a user waiting three hours at the screen, that's going to inpterrupt somebody's workflow.

I don't think the numbers I've given you are at all unreasonable. Databases that capture mouse clicks are easily processing this amount of data, in this time frame.

OK, so now we require the sort to be done at insert time, rather than at query time. We've just made a decision that we don't support unordered lists. If order doesn't matter, then the list is maintained in alphabetical order. Equality tests run blazingly fast, but inserts take a while.

So now we build a separate data structure, a B-tree, to keep the list alphabetical. Whenever we insert into the list, the list insert is blazingly fast. Then we follow up by inserting the same key into the b-tree.
When we test for equality, we use the two b-trees, and that gives us the answer we want.

Slowly but surely, one feature at a time, we are adding the features that support existing SQL-RDBMSes. I'm not saying that SQL is the only interface, or that a better one cannot be devised. What I am saying is that all database management systems end up solving the some of the same problems concerning sorting and searching, regardless of the data model they are derived from.

> > >> Once you discard First Normal Form, all bets are off.
> > >
> > > What bets? Are you saying a database not in 1NF is logically
> > > unsound/inconsistent? 1NF is orthogonal to the other usual NFs, except
> > > if you arbitrarily define the others to include 1NF.
> > >
> >
> > I can't be sure, and that's enough for me to avoid applying NFNF like
the
> > plague.
>
> There's plenty of emperical data that should permit you to sleep well
> even if using NF2.
>
> > >> > If there can be more than one (as it seems there can be, assuming a
> > >> > person can only have two parents), it is not a good idea to use the
> > >> > name
> > >> > to identify her. In that case, you have a list of names, not a list
of
> > >> > children---and there is only one "Mary Smith" name. But this also
has
> > >> > nothing to do with list attributes.
> > >>
> > >> It certainly has something to do with how list attributes can be
abused.
> > >
> > > But you were trying to demonstrate the inherent redundancy introduced
by
> > > list attributes, not redundancy caused by bad DB design. Anything can
be
> > > abused; list attributes perhaps more so than other things---but they
are
> > > not fundamentally harmful.
> > >
> >
> > I think they are.
>
> Because...? --dawn
>
Received on Tue Mar 07 2006 - 23:53:40 CET

This message: [ Message body ]
Next message: David Cressey: "Re: MV Keys"
Previous message: dawn: "Re: MV Keys"
Maybe in reply to: Bob Hairgrove: "Re: MV Keys"
In reply to dawn: "Re: MV Keys"
Next in thread: dawn: "Re: MV Keys"
Reply: Marshall Spight: "Re: MV Keys"
Reply: dawn: "Re: MV Keys"
Reply: David Cressey: "Re: MV Keys"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message