Re: Objects and Relations

From: David BL <davidbl_at_iinet.net.au>
Date: 30 Jan 2007 06:32:00 -0800
Message-ID: <1170167520.895722.31710_at_q2g2000cwa.googlegroups.com>


On Jan 30, 6:33 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Jan 29, 9:41 pm, "David BL" <davi..._at_iinet.net.au> wrote:
>
> > On Jan 30, 2:01 pm, Gene Wirchenko <g..._at_ocis.net> wrote:
>
> > > "David BL" <davi..._at_iinet.net.au> wrote:
>
> > > [snip]
>
> > > >These are clear disadvantages of RM here. Are there any advantages?
>
> > > Oh, come on! Surely you can troll better than that?
>
> > Do you realise I'm only talking about strings?
>
> > I'm not trolling. I'm quite serious.
>
> I believe that you believe what you're saying, so in some narrow
> sense you're not a troll. But you're still a troll in the broader
> sense,
> in that you're an outsider who doesn't exhibit much mastery of the
> RM, pops in with no prior posting history, and speaks of
> "clear disadvantages" without any signs of having done any actual
> analysis.

I'm not sure why you say I made the claims with no underlying basis. On the contrary I outlined my reasons for making the claim. If my reasoning was wrong through inexperience with RM then fair enough. But saying there was "no analysis" is quite wrong.

Implying I'm arrogant for making claims without sufficient experience or credentials also comes across as a little arrogant on your part. Anyway I'm more interested in engaging in an objective discussion. I'm quite happy to defend (and if necessary retract) my statements through criticism.

> If Gene dismisses you out of hand, I cannot fault him.
>
> You give a couple of small signals that indicate that you might
> not be a complete loser, though; you've supplied an actual table
> header and problem statement, and you mention functional and
> logic programming. That by itself puts you ahead of most of
> the folks who show up here to helpfully explain to us just where
> OO is so much better than RM.

You misrepresent me. I don't think OO is better than RM or vice versa. I think such generalised statements are retarded.

> > Consider the following relation
>
> > charsInString(StringId, Index, Char) :- StringId has Char at
> > Index
>
> This isn't right as is, though; you're putting every string in
> a single table. Coincidental cohesion; grouping things based
> on a superficial type commonality. Are you going to put all
> your ints in a single table also? I would expect not.

I'm glad to see that you don't like the idea of putting all characters of all strings in a single table.

I believe a relational database should treat strings as an ADT to be used for domains for attributes.

> Instead, let's imagine we're in a nested relational system,
> and strings are a relation type {index:int, c:char} I will use
> the name S to denote the string at hand; sort of the analog
> of "this" in an OO language.

Well, I was assuming a flat relational system. If you are going to allow nested relations then I agree that things are rather different. In fact I believe this is a bit of a cop out. Please correct me if I'm wrong. At the outermost nesting level you have chosen to deal with strings opaquely - as a domain for attributes. In other words at that level of abstraction you have chosen not to deal with strings relationally.

> > What benefit does this relation give you (ie on top of an OO
> > implementation of a string).
>
> All the benefits of the RM apply. What would be first on my
> list would be the relational algebra, but others might put
> logical/physical independence first, or any of a variety of
> things.
>
> > Show me some elegant and useful string
> > processing that can be achieved using RA.
>
> Okay. (But please note the "show me" is trollish; you
> are the one who wants to prove a point; why shouldn't
> you be the one to do the analysis?)

You don't seem to appreciate the fact that the onus of proof often depends on the nature of the claim rather that who is making it. It would obviously be difficult to construct a formal proof that there doesn't exist elegant and useful string processing algorithms using RA, whereas it is comparatively easy to show examples if they in fact exist. All I said is that I can't think of any and perhaps there aren't any. I make no apologies for having rather less experience in RM than you do. If you look carefully at my post you will find that I never even claim that there are no such examples. I only raise the question.

I note that this is the second time you have expected proofs of nonexistence  of examples (cf our discussions about fine grained mutative operations in a previous thread). Do you agree that this is often impossible?

> > Won't you almost always
> > join/select for the simple purpose of looking up a character in a
> > given string at a given index position?
>
> You say that like it's a bad thing. Join is the primary tool
> of the RA, so yes, we will join. And select. We'll use the
> algebra, just like an OO programmer would use loops
> and method calls.

Of course. What I mean is that the access pattern on the relation is almost always the same, and no different to the functionality provided by the ADT.

This contrasts with relations like Employee which join on different attributes at different times, revealing the power of RA for adhoc query.

> > I can only think of some "silly" uses like finding frequencies of
> > characters, or finding all the locations of a given character.
>
> Ha! I would have more sympathy for this position if I hadn't
> had an application a few months ago where I had to write
> a bunch of "silly" character frequency analysis in Java.
> It was excruciatingly low level, whereas what I wanted to
> do was just variations on:
>
> select c, count(*) from S group by c
>
> But yes, you've pointed out one area where the relational
> way of doing things will be much easier than the OO way:
> aggregates.
>
> > Something important like searching for a given sub-string is actually
> > quite difficult to specify in the RA.
>
> Using any new set of tools is quite difficult, until you get the hang
> of it. The first time I tried functional and logic programming, it
> was mind-bendingly painful to specify anything. How was it for
> you? Haskell looked really weird to me for a while. The different
> associativity and precedence of function application and
> juxtaposition, let alone currying, made everything harder
> at first. But then I saw that 3 line quicksort in Haskell and in
> a flash I realized that there was a lot of expressive power
> to be had, even if it meant I had to discard familiar imperative
> techniques and learn new functional ones to get it.
>
> Yes, our first hypothesis to explain your difficulty must be lack of
> familiarity. However we can't say with confidence until we've
> done some analysis. Maybe the relational implementation *is*
> more complex. Let's see.
>
> So: substring.
>
> Let's compare two substring operators from java.lang.String to
> a relational specification of same.
>
> indexOf(int ch, int fromIndex)
>
> The source to java.lang.String can be downloaded as part of the JDK;
> I will refer to it here. (But not excerpt it, much as I would like to
> for
> comparison, because it is copyrighted.)
>
> Here is the relational implementation in pseudo-SQL:
>
> select min(index) from S where c = ch and index >= fromIndex
>
> The body of the Java implementation is 37 lines long, and
> contains 8 if statements and 2 for loops, and can return
> from any one of five places.

I shall look at this Java code when I get the time. It seems overly complicated.

In my C++ code I use a StringRange class that represents an aliased range of characters using a pair of pointers, allowing me to write

    while(s && *s != ch) ++s;

to scan through characters in range s not equal to ch.

> But hey, that's maybe "silly" because it only uses one character,
> and *obviously* the relational implementation will win there. So
> let's do the next one:
>
> indexOf(String str, int fromIndex)
>
> Okay, so this finds the first occurrence of str inside "this" that is
> greater
> than or equal to fromIndex. To do this I need an aggregate conjunction
> function; that takes the boolean-and of every boolean in the aggregate
> input. I don't know if this is standard in SQL. In any event, this is
> simply
> the min() function applied to the boolean domain, under the ordering
> false < true. Think of it as "every".
>
> Relationally:
>
> select min(S1.index) from S S1 where
> (select min(S2.c = str.c) from S S2, str
> where S2.index+S1.index = str.index) and
> S1.index >= fromIndex
>
> "Select the smallest S.index where
> every character in S' offset by S.index equals
> every character in str,
> that is greater than or equal to fromIndex"
>
> The body of the java implementation calls an internal helper function
> that is 33 lines, containing 6 ifs, 2 for loops, 1 while loop, and
> that
> can return from any one of 4 different places.

Thinking off the top of my head, in C++ I would write something based along the lines of

    while(s && str.size() <= s.size() && s.prefix(str.size()) != str) + +s;

Of course it is well known that better algorithms exist.

> I could try to write the Haskell version of indexOf, but my Haskell
> sucks.
> I imagine it would require a helper function, and be defined
> inductively.
> I don't believe it would be any more succinct, but I suppose an
> argument
> could be made that it was more comprehensible. But I also believe that
> there will be as many places where the set-theoretic approach will be
> more comprehensible than the inductive approach as the reverse.
>
> > If you want to deal with strings declaratively (not procedurally) then
> > use a functional programming or logic programming language. RM seems
> > a poor fit.
>
> Does it still seem so after reading the above?

Yes!

I found your post very informative and interesting and shows me that RA is more powerful than I imagined - when combined with aggregation. I was expecting that substring testing would require transitive closure. I admit the RM approach is a little foreign to me. However, I have a remaining objection, and perhaps this could be overthrown as well.

The objection is this: I find it difficult to imagine a system being able to turn those select statements into fast efficient implementations on strings represented as arrays of characters. Presumably there is an integrity constraint that shows that a string has characters at all index positions from the start to the end of the string. Somehow the system sees this integrity constraint, realises that strings can be implemented as arrays and then is able to translate RA expressions into efficient algorithms. This all seems rather speculative to me.

Can you answer this question for me? In the following

    select min(index) from S where c = ch and index >= fromIndex

How does the system realise that it can avoid finding all characters satisfying the where clause, and then finding the minimum index? ie instead it should merely scan the array for the left most character matching ch starting at fromIndex. Received on Tue Jan 30 2007 - 15:32:00 CET

Original text of this message