Re: Objects and Relations

From: David BL <davidbl_at_iinet.net.au>
Date: 31 Jan 2007 01:29:36 -0800
Message-ID: <1170235776.146470.152020_at_v33g2000cwv.googlegroups.com>


On Jan 31, 4:10 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Jan 30, 6:32 am, "David BL" <davi..._at_iinet.net.au> wrote:
>
> > On Jan 30, 6:33 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> Okay. Going to try to recreate my earlier post. Will not be
> as good but hope to get the basic sense across, along
> with some actual references to stuff.
>
> > > > Consider the following relation
>
> > > > charsInString(StringId, Index, Char) :- StringId has Char at
> > > > Index
>
> > > This isn't right as is, though; you're putting every string in
> > > a single table. Coincidental cohesion; grouping things based
> > > on a superficial type commonality. Are you going to put all
> > > your ints in a single table also? I would expect not.
>
> > I'm glad to see that you don't like the idea of putting all characters
> > of all strings in a single table.
>
> > I believe a relational database should treat strings as an ADT to be
> > used for domains for attributes.
>
> Yes, I'm aware of that perspective, and some people I really
> respect subscribe to it. However I don't; if you encapsulate
> a string as an ADT then you lose the ability to use the RA
> on the characters, which I believe I've already demonstrated
> has some utility.
>
> I also have an argument that encapsulation itself is merely
> a crutch to make up for the fact that a language lacks
> declarative integrity constraints.

Yes, I can see the sense in which private/public become unnecessary when adequate integrity constraints can be enforced.

> > > Instead, let's imagine we're in a nested relational system,
> > > and strings are a relation type {index:int, c:char} I will use
> > > the name S to denote the string at hand; sort of the analog
> > > of "this" in an OO language.
>
> > Well, I was assuming a flat relational system. If you are going to
> > allow nested relations then I agree that things are rather
> > different. In fact I believe this is a bit of a cop out. Please
> > correct me if I'm wrong.
>
> Not 100% sure I understand, but it sounds like you're saying
> nested relations are a cop-out. If so I disagree. Again, this
> is a theory group, so we do not have to restrict ourselves
> to discussing existing implementations or SQL or whatever.
> As a logical model nested relations have a long history of
> study, and they do solve some problems. My personal model
> includes nested relations.

I'm only saying that at a higher level you are dealing with strings as ADTs.

> > At the outermost nesting level you have
> > chosen to deal with strings opaquely - as a domain for attributes. In
> > other words at that level of abstraction you have chosen not to deal
> > with strings relationally.
>
> Yes however I consider this a non-issue. The advantage of
> recursive structures is that one can deal with each level
> at a time, and not have to address them all at once.

It seems to me that in some sense it's not *completely* relational. Anyway let's not get bogged down in definitions and take that as a positive because it means I have far fewer objections with your way of thinking.

> > You don't seem to appreciate the fact that the onus of proof often
> > depends on the nature of the claim rather that who is making it. It
> > would obviously be difficult to construct a formal proof that there
> > doesn't exist elegant and useful string processing algorithms using
> > RA, whereas it is comparatively easy to show examples if they in fact
> > exist. All I said is that I can't think of any and perhaps there
> > aren't any. I make no apologies for having rather less experience in
> > RM than you do. If you look carefully at my post you will find that I
> > never even claim that there are no such examples. I only raise the
> > question.
>
> I don't believe I have made any claims as to where the anus of
> proof lies.

An interesting typo :)

Yes, looking back at what was said I can see I may have jumped to false conclusions.

> > I note that this is the second time you have expected proofs of non-
> > existence of examples (cf our discussions about fine grained mutative
> > operations in a previous thread). Do you agree that this is often
> > impossible?
>
> I don't believe that I have expected proofs. Let me change
> my earlier phrase "prove a point" to "make a point" so that
> we avoid any potentially-ambiguous use of the word "prove."
>
> Some of what we're discussing are design questions and
> they are not subject to proof. (In fact, surprisingly little *is*
> subject to proof.)
>
>
>
>
>
> > > > Won't you almost always
> > > > join/select for the simple purpose of looking up a character in a
> > > > given string at a given index position?
>
> > > You say that like it's a bad thing. Join is the primary tool
> > > of the RA, so yes, we will join. And select. We'll use the
> > > algebra, just like an OO programmer would use loops
> > > and method calls.
>
> > Of course. What I mean is that the access pattern on the relation is
> > almost always the same, and no different to the functionality provided
> > by the ADT.
>
> > This contrasts with relations like Employee which join on different
> > attributes at different times, revealing the power of RA for adhoc
> > query.
>
> Joining on single-attribute relations is a practical and useful
> thing to do; it doesn't indicate any problem, any more than
> a function that takes only a single argument indicates a
> problem.
>
> Join of two one-attribute relations on their single attributes is
> set intersection, which I hope you will agree is a meaningful
> and useful operator.

I'm not suggesting that there is a problem. I'm trying to say (perhaps badly) that if you model strings relationally you don't suddenly get a whole bunch of simple select-project-join queries across multiple relations that give you useful results. I'm contrasting this with more conventional applications of RM where even simple queries give all sorts of useful information.

I'm treating this as merely an observation. It is only weakly suggestive that RA/RM may have little to offer.

> > > indexOf(int ch, int fromIndex)
>
> > > The source to java.lang.String can be downloaded as part of the JDK;
> > > I will refer to it here. (But not excerpt it, much as I would like to
> > > for
> > > comparison, because it is copyrighted.)
>
> > > Here is the relational implementation in pseudo-SQL:
>
> > > select min(index) from S where c = ch and index >= fromIndex
>
> > > The body of the Java implementation is 37 lines long, and
> > > contains 8 if statements and 2 for loops, and can return
> > > from any one of five places.
>
> > I shall look at this Java code when I get the time. It seems overly
> > complicated.
>
> > In my C++ code I use a StringRange class that represents an aliased
> > range of characters using a pair of pointers, allowing me to write
>
> > while(s && *s != ch) ++s;
>
> > to scan through characters in range s not equal to ch.
>
> I am aware of the StringRange techinque but not familiar with
> it. I am not immediately clear what the above code does; in
> a C idiom (as opposed to C++) it appears merely to advance
> the (uninitialized) s pointer.

To keep things simple I won't parameterise on type. Here's a partial implementation. I'm just typing this in quickly so it may not compile

class StringRange
{
public:

    StringRange(const char* _s1, const char* _s2) :

        s1(_s1), s2(_s2) {}
    operator bool() const { return s1 != s2; }     int size() const { return s2 - s1; }     char operator*() const { return *s1; }     StringRange& operator++() { ++s1; return *this; }     StringRange prefix(int len) const
    {

        return StringRange(s1,s1+len);
    }

private:

    const char* s1;
    const char* s2;
};

inline bool operator==(StringRange s1, StringRange s2) {

    if (s1.size() != s2.size()) return false;     while(s1)

        if (*s1++ != *s2++) return false;     return true;
}

Note that the implicit conversion to bool is full of pitfalls and should be done differently.

> Even if it's a complete solution, you
> omit the code from the StringRange class, which has to be
> accounted for to do a fair comparison with my code.

A fair comparison is difficult and trying to count the total number of lines is not the best measure. The StringRange class is highly reusable and a good basis for writing lots of string processing functions. Its amortized cost is nearly zero!

> > > select min(S1.index) from S S1 where
> > > (select min(S2.c = str.c) from S S2, str
> > > where S2.index-S1.index = str.index) and
> > > S1.index >= fromIndex
>
> [corrected from original]
>
> > > "Select the smallest S.index where
> > > every character in S' offset by S.index equals
> > > every character in str,
> > > that is greater than or equal to fromIndex"
>
> > > The body of the java implementation calls an internal helper function
> > > that is 33 lines, containing 6 ifs, 2 for loops, 1 while loop, and
> > > that
> > > can return from any one of 4 different places.
>
> > Thinking off the top of my head, in C++ I would write something based
> > along the lines of
>
> > while(s && str.size() <= s.size() && s.prefix(str.size()) != str) ++s;
>
> Again, I note that you're not accounting for the size() and prefix()
> methods;

See above. Note that they are exceedingly simple. BTW I think my "solution" can be simplified to

    while(str.size() <= s.size() && s.prefix(str.size()) != str) ++s;

> you have to include them in the total count. My code didn't use any
> function abstraction; what you see is the whole deal. Otherwise
> we both just wrap our implementations in a function and call it a
> day. My implementation used only min(), subtraction, and comparison.

My one liner while statement above contained the essence of the solution. The missing parts are rather trivial.

> > > > If you want to deal with strings declaratively (not procedurally) then
> > > > use a functional programming or logic programming language. RM seems
> > > > a poor fit.
>
> > > Does it still seem so after reading the above?
>
> > Yes!
>
> Okay. What would you find to be a convincing demonstration? If nothing
> else I claim my substring find is quite declarative; as much so as any
> logic program. (In fact, one could make an argument that mine *is*
> a logic program.)

In some respects it is quite elegant, and I'm happy to entertain the idea that with practise it would become natural to think that way. I'm a little sceptical though.

> It is in general nearly impossible to separate out one's familiarity
> with
> a particular computational model from how "intuitive" code in that
> model appears. Again, Haskell on first contact appeared ridiculously
> opaque, but now I find it generally very readable.
>
> The Stroustroup quote:
>
> http://www.research.att.com/~bs/bs_faq.html#compare
>
> The only way to get around the familiarity issue is with hard
> metrics of expressiveness, and these are few and far between.
> The one that stands out is Felleisen's "On the Expressive Power
> of Programming Languages." I have it on my to-do list to
> master Felleisen expressiveness and use it to compare a
> relational language.
>
> Until we do that, or something like it, pretty much all we're
> doing with comparing different computational models is
> expressing how familiar its idioms are to us.

Yes.

> > I found your post very informative and interesting and shows me that
> > RA is more powerful than I imagined - when combined with aggregation.
> > I was expecting that substring testing would require transitive
> > closure.
>
> As an aside, please note that TC is not a showstopper for a relational
> language. If we are speaking of a language that is equivalent in
> computational power to first order logic, (such as basic SQL) then
> we are speaking of a language that is not Turing complete. However
> if we imagine a general purpose programming language that was
> nonetheless relational through and through, we would of course be
> discussing a Turing complete language and it would be able
> to express TC.
>
> I note that recursive functions are Turing complete, and that
> functions
> are a kind of relation, and we can use join on any kind of relation,
> therefore we can employ join with functions. Being able to do
> so expands the computational power of join to the "goes to 11"
> of Turing, at the cost of the loss of the strong normalization
> property
> that basic RA has, which as a very smart guy once said to me
> "is overrated anyway."
>
>
[snipped - run out of time, will read later] Received on Wed Jan 31 2007 - 10:29:36 CET

Original text of this message