Re: Objects and Relations

From: David BL <davidbl_at_iinet.net.au>
Date: 31 Jan 2007 20:31:10 -0800
Message-ID: <1170304270.712272.310450_at_q2g2000cwa.googlegroups.com>


On Feb 1, 6:49 am, "Kevin Kirkpatrick" <kvnkrkpt..._at_gmail.com> wrote:
> On Jan 31, 9:24 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> > Marshall wrote:
> > > On Jan 30, 6:32 am, "David BL" <davi..._at_iinet.net.au> wrote:
> > >>On Jan 30, 6:33 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:

[snip]

> > >>In my C++ code I use a StringRange class that represents an aliased
> > >>range of characters using a pair of pointers, allowing me to write
>
> > >> while(s && *s != ch) ++s;
>
> > >>to scan through characters in range s not equal to ch.
>
> > > I am aware of the StringRange techinque but not familiar with
> > > it. I am not immediately clear what the above code does; in
> > > a C idiom (as opposed to C++) it appears merely to advance
> > > the (uninitialized) s pointer. Even if it's a complete solution, you
> > > omit the code from the StringRange class, which has to be
> > > accounted for to do a fair comparison with my code.
>
> > I wonder how the above loop terminates if ch does not occur in s. Then
> > again, if the increment operation can make s evaluate to false at the
> > end of the string, I suppose it is a fair operation to assume just as
> > join is a fair operation to assume.
>
> Bob makes this point in a later context, but I think it can be made
> even more accutely in the context of indexOf(int, int).
>
> David has implemented indexOf(int,int) as a fixed algorithm... sure,
> his implementation is somewhat simpler than Marshall's declarative
> version, but this metric misses a mammoth disadvantage.
>
> In almost any modern SQL DBMS, if string is stored as a relation
> mapping indexes to characters, and indexOf() is written as Mashall
> defined it, one could slap an index on the string relation and
> immediately have a O(log n) for indexOf(). In fact, for rare
> characters such as delimiters (which are most likely to be searched
> for within an application making routine use of indexOf), they would
> have an implementation approaching O(1). Even better, modern databases
> allow for the kind of dynamic sampling which would allow an optimizer
> to make runtime decisions that choose whether to use a binary search
> for rare characters, or an array scan starting at fromIndex for more
> common characters.

I'm a little surprised that you would directly compare performance to a modern SQL DBMS. AFAIK you should expect terrible performance if you model strings relationally. A DBMS has all sorts of heavyweight overheads such as concurrency control.

In any case I'm not taking this as a point in my favor, because the RM is orthogonal to these heavyweight features of a RDBMS.

The ability to find the position of a given character in O(log n) doesn't seem all that useful to me. Remember that this implies significant overheads for mutative functions. The study of string processing algorithms is quite mature and I would have thought that indexing data structures on strings would have been well considered. AFAIK indexing character positions doesn't offer a great deal. Perhaps there is some merit there, I don't know.

> For David to compete, he should present a procedural version of
> indexOf() code with equivalent behavior characteristics; i.e. in which
> indexOf() will be resolved via a binary search for rare characters and
> an array scan for more common characters.

In C++ you think of a string class as an array of characters. There is no indexing data structure by *definition*.

If you want (ie need) binary search for characters then you define a different concrete class with indexing. As an ADT this can be equivalent in functionality to a normal string.

C++ templates allow you to decouple algorithm from data structure - a valuable technique for decoupling physical and logical. My aim is to only have to write an algorithm once. I'm a great fan of template mixins (ie classes that parameterise on the base class), and use them copiously to achieve code reuse.

IMO in a low level language the code should be scoped top-down and implemented bottom-up. You have to demonstrate a need for some code before bothering to write it.

[snip] Received on Thu Feb 01 2007 - 05:31:10 CET

Original text of this message