Re: Objects and Relations

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Wed, 31 Jan 2007 22:10:47 GMT
Message-ID: <Ht8wh.536$R71.8803_at_ursa-nb00s0.nbnet.nb.ca>


Kevin Kirkpatrick wrote:

> On Jan 31, 9:24 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> 

>>Marshall wrote:
>>
>>>On Jan 30, 6:32 am, "David BL" <davi..._at_iinet.net.au> wrote:
>>
>>>>On Jan 30, 6:33 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>>
>>>Okay. Going to try to recreate my earlier post. Will not be
>>>as good but hope to get the basic sense across, along
>>>with some actual references to stuff.
>>
>>>>>>Consider the following relation
>>
>>>>>> charsInString(StringId, Index, Char) :- StringId has Char at
>>>>>>Index
>>
>>>>>This isn't right as is, though; you're putting every string in
>>>>>a single table. Coincidental cohesion; grouping things based
>>>>>on a superficial type commonality. Are you going to put all
>>>>>your ints in a single table also? I would expect not.
>>
>>>>I'm glad to see that you don't like the idea of putting all characters
>>>>of all strings in a single table.
>>
>>>>I believe a relational database should treat strings as an ADT to be
>>>>used for domains for attributes.
>>
>>>Yes, I'm aware of that perspective, and some people I really
>>>respect subscribe to it. However I don't; if you encapsulate
>>>a string as an ADT then you lose the ability to use the RA
>>>on the characters, which I believe I've already demonstrated
>>>has some utility.
>>
>>>I also have an argument that encapsulation itself is merely
>>>a crutch to make up for the fact that a language lacks
>>>declarative integrity constraints.
>>
>>>>>Instead, let's imagine we're in a nested relational system,
>>>>>and strings are a relation type {index:int, c:char} I will use
>>>>>the name S to denote the string at hand; sort of the analog
>>>>>of "this" in an OO language.
>>
>>>>Well, I was assuming a flat relational system. If you are going to
>>>>allow nested relations then I agree that things are rather
>>>>different. In fact I believe this is a bit of a cop out. Please
>>>>correct me if I'm wrong.
>>
>>>Not 100% sure I understand, but it sounds like you're saying
>>>nested relations are a cop-out. If so I disagree. Again, this
>>>is a theory group, so we do not have to restrict ourselves
>>>to discussing existing implementations or SQL or whatever.
>>>As a logical model nested relations have a long history of
>>>study, and they do solve some problems. My personal model
>>>includes nested relations.
>>
>>NF^2 nested relations? Or relation-valued attributes?
>>
>>
>>>>At the outermost nesting level you have
>>>>chosen to deal with strings opaquely - as a domain for attributes. In
>>>>other words at that level of abstraction you have chosen not to deal
>>>>with strings relationally.
>>
>>>Yes however I consider this a non-issue. The advantage of
>>>recursive structures is that one can deal with each level
>>>at a time, and not have to address them all at once.
>>
>>And I would expect a String type to permit multiple possible
>>representations. If one can represent a String as an array, one already
>>represents them as a relation given the operations to convert back and
>>forth. So why not have a relation-valued representation?
>>
>>/amplification
>>
>>
>>>>You don't seem to appreciate the fact that the onus of proof often
>>>>depends on the nature of the claim rather that who is making it. It
>>>>would obviously be difficult to construct a formal proof that there
>>>>doesn't exist elegant and useful string processing algorithms using
>>>>RA, whereas it is comparatively easy to show examples if they in fact
>>>>exist. All I said is that I can't think of any and perhaps there
>>>>aren't any. I make no apologies for having rather less experience in
>>>>RM than you do. If you look carefully at my post you will find that I
>>>>never even claim that there are no such examples. I only raise the
>>>>question.
>>
>>>I don't believe I have made any claims as to where the anus of
>>>proof lies.
>>
>>Intentional? ;> LOL
>>
>>
>>>>I note that this is the second time you have expected proofs of non-
>>>>existence of examples (cf our discussions about fine grained mutative
>>>>operations in a previous thread). Do you agree that this is often
>>>>impossible?
>>
>>>I don't believe that I have expected proofs. Let me change
>>>my earlier phrase "prove a point" to "make a point" so that
>>>we avoid any potentially-ambiguous use of the word "prove."
>>
>>>Some of what we're discussing are design questions and
>>>they are not subject to proof. (In fact, surprisingly little *is*
>>>subject to proof.)
>>
>>Indeed. The more we make subject to proof, the better, however.
>>
>>
>>
>>
>>
>>
>>>>>>Won't you almost always
>>>>>>join/select for the simple purpose of looking up a character in a
>>>>>>given string at a given index position?
>>
>>>>>You say that like it's a bad thing. Join is the primary tool
>>>>>of the RA, so yes, we will join. And select. We'll use the
>>>>>algebra, just like an OO programmer would use loops
>>>>>and method calls.
>>
>>>>Of course. What I mean is that the access pattern on the relation is
>>>>almost always the same, and no different to the functionality provided
>>>>by the ADT.
>>
>>>>This contrasts with relations like Employee which join on different
>>>>attributes at different times, revealing the power of RA for adhoc
>>>>query.
>>
>>>Joining on single-attribute relations is a practical and useful
>>>thing to do; it doesn't indicate any problem, any more than
>>>a function that takes only a single argument indicates a
>>>problem.
>>
>>>Join of two one-attribute relations on their single attributes is
>>>set intersection, which I hope you will agree is a meaningful
>>>and useful operator.
>>
>>Or cross-product which I hope everyone will agree is a meaningul and
>>useful operation too.
>>
>>
>>
>>
>>
>>
>>>>>indexOf(int ch, int fromIndex)
>>
>>>>>The source to java.lang.String can be downloaded as part of the JDK;
>>>>>I will refer to it here. (But not excerpt it, much as I would like to
>>>>>for
>>>>>comparison, because it is copyrighted.)
>>
>>>>>Here is the relational implementation in pseudo-SQL:
>>
>>>>> select min(index) from S where c = ch and index >= fromIndex
>>
>>>>>The body of the Java implementation is 37 lines long, and
>>>>>contains 8 if statements and 2 for loops, and can return
>>>>
>>>>>from any one of five places.
>>
>>>>I shall look at this Java code when I get the time. It seems overly
>>>>complicated.
>>
>>>>In my C++ code I use a StringRange class that represents an aliased
>>>>range of characters using a pair of pointers, allowing me to write
>>
>>>> while(s && *s != ch) ++s;
>>
>>>>to scan through characters in range s not equal to ch.
>>
>>>I am aware of the StringRange techinque but not familiar with
>>>it. I am not immediately clear what the above code does; in
>>>a C idiom (as opposed to C++) it appears merely to advance
>>>the (uninitialized) s pointer. Even if it's a complete solution, you
>>>omit the code from the StringRange class, which has to be
>>>accounted for to do a fair comparison with my code.
>>
>>I wonder how the above loop terminates if ch does not occur in s. Then
>>again, if the increment operation can make s evaluate to false at the
>>end of the string, I suppose it is a fair operation to assume just as
>>join is a fair operation to assume.
>>
> 
> 
> Bob makes this point in a later context, but I think it can be made
> even more accutely in the context of indexOf(int, int).
> 
> David has implemented indexOf(int,int) as a fixed algorithm... sure,
> his implementation is somewhat simpler than Marshall's declarative
> version, but this metric misses a mammoth disadvantage.
> 
> In almost any modern SQL DBMS, if string is stored as a relation
> mapping indexes to characters, and indexOf() is written as Mashall
> defined it, one could slap an index on the string relation and
> immediately have a O(log n) for indexOf().  In fact,  for rare
> characters such as delimiters (which are most likely to be searched
> for within an application making routine use of indexOf), they would
> have an implementation approaching O(1). Even better, modern databases
> allow for the kind of dynamic sampling which would allow an optimizer
> to make runtime decisions that choose whether to use a binary search
> for rare characters, or an array scan starting at fromIndex for more
> common characters.
> 
> For David to compete, he should present a procedural version of
> indexOf() code with equivalent behavior characteristics; i.e. in which
> indexOf() will be resolved via a binary search for rare characters and
> an array scan for more common characters.
> 
> I've added the requisite additions for enhancing Marshall's code
> (using standard DBMS-supplied utilities)
> Create table s (i int, c int);
> Create index n (c);  -- new
> GatherStats(s); -- new
> IndexOf(ch int, fromIndex int) : select min(i) from S where c = ch and
> i >= fromIndex

The example given will lead folks to misunderstand that s might be an attribute or a local variable rather than only a relvar.

> David's enhanced code (using standard OO-supplied libraries):
> 
> [David, please present the code here.]

#include <string.hpp>

[snip] Received on Wed Jan 31 2007 - 23:10:47 CET

Original text of this message