Re: MV Keys

From: dawn <dawnwolthuis_at_gmail.com>
Date: 8 Mar 2006 05:59:40 -0800
Message-ID: <1141826380.194191.68730_at_z34g2000cwc.googlegroups.com>


David Cressey wrote:
> "dawn" wrote:
> in message news:1141706092.790613.114350_at_p10g2000cwp.googlegroups.com...
> >
> > Brian Selzer wrote:
>
>
> > It is not difficult to ask if two values are equal when they are list
> > values. Why do you think it would be?
> >
>
> It's not difficult to ask, but it can be difficult to answer.

Given that there are products that implement tests for equality of lists and have good performance, I guess we could look to those tools.

> In
> particular, it can require a sort.

But lists are already ordered. If we are comparing two sets, we might want to sort them, but lists are, by definition, ordered. No sort needed. Compare 'em. thisList==thatList even if behind the scenes there is iteration of thisList[i]==thatList[i]

> consider this case: The run time engine is given two list variables,
> called "a" and "b".
>
> It's asked to evaluate "a == b", and return true or false.
>
> a evaluates to a list: (onions, mushrooms)
> b evaluates to a list: (mushrooms, onions).
>
> Somewhere a data definition has been made available to the engine that says
> that for the particular type of list that a and b are, the proper
> evaluation is "TRUE".

Ah, OK. In MV, all lists (all data) are strings. Then list "a" simply isn't the same as list "b". A programmer might decide to check if they represent the same bags or sets even if the lists are not the same. That might require a sort.

> However, naive inspection of the list returns the
> answer "false". The programmer implementing the equality tester

This is a dbms programmer, not an app programmer. Lists (of strings) are part of the dbms.

> is not
> satisfied with the naive answer, because it's wrong. So here's what the
> programmer does: he sorts both lists alphabetically, then does the naive
> comparision. If the comparison yields "false" we can now safely return
> "false" to the poser of the question.
>
> So far so good. We've made the question of order in the list moot, by
> imposing an arbitrary order (alphabetical) on the list. Either list can be
> manipulated (insert, remove) using the standard list manipulation primitive
> operations. And all we have to do is two trivial little sorts, every time
> we want to test for equality.

Whenever I want to test two multivalued fields in Pick/MV for equality, I ask the same question I would ask if I were asking if single-valued fields were equal. No sorting from the app developer perspective as long as they are treated as lists of strings.

> But now, what if we scale the lists up to lists of about 10,000 items
> apiece, and we ask for an equality test. Maybe it's going to take a second
> to perform two sorts of 10,000 items.

Again, no sorts.

> Supposing we now have to ask the
> question 10,000 times in order to complete a given task. Now the process is
> going to take 10,000 seconds. That's almost three hours. If you have a
> user waiting three hours at the screen, that's going to inpterrupt
> somebody's workflow.

Of course, but that isn't what happens in practice. It is fine to theorize about it, and it is definitely the case that some logical data designs will be better than others for performance, but MV solutions are known to have good performance. You are likely providing more features in your scenario than are implemented in existing solutions.

> I don't think the numbers I've given you are at all unreasonable. Databases
> that capture mouse clicks are easily processing this amount of data, in this
> time frame.

If a evaluates to a list: (onions, mushrooms) and b evaluates to a list: (mushrooms, onions)

Then we ask the dbms if (a == b) and get false. Easy.

> OK, so now we require the sort to be done at insert time, rather than at
> query time. We've just made a decision that we don't support unordered
> lists. If order doesn't matter, then the list is maintained in
> alphabetical order. Equality tests run blazingly fast, but inserts take a
> while.

It's a list, "sorted" by the ordinal position in which a value is in the list. The first value goes first, the 4th value goes 4th. If you want the list to be in alpha order, then you sort it for output or you maintain it so that the 4th value would be the 4th in alpha order (not commonly done in my experience with lists). The dbms doesn't care about any order other than the ordinal position.

> So now we build a separate data structure, a B-tree, to keep the list
> alphabetical. Whenever we insert into the list, the list insert is
> blazingly fast. Then we follow up by inserting the same key into the
> b-tree.
> When we test for equality, we use the two b-trees, and that gives us the
> answer we want.
>
> Slowly but surely, one feature at a time, we are adding the features that
> support existing SQL-RDBMSes. I'm not saying that SQL is the only
> interface, or that a better one cannot be devised. What I am saying is
> that all database management systems end up solving the some of the same
> problems concerning sorting and searching, regardless of the data model
> they are derived from.

Sure. Some do more work than others and some make the user do more work than others. I suspect that if you try to think with an RM hat on and make an MV dbms, which is how your solution above reads to me, performance would be poor. I think there is plenty of information available about the hash table implementations for Pick, although I have little knowledge about what is under the covers there except to know that the mark-delimited ordered data might make an RM thinker cringe. --dawn Received on Wed Mar 08 2006 - 14:59:40 CET

Original text of this message