| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: Objects and Relations
Marshall wrote:
> On Jan 30, 6:32 am, "David BL" <davi..._at_iinet.net.au> wrote:
>
>>On Jan 30, 6:33 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote: >>
>>>>Consider the following relation >> >>>> charsInString(StringId, Index, Char) :- StringId has Char at >>>>Index >> >>>This isn't right as is, though; you're putting every string in >>>a single table. Coincidental cohesion; grouping things based >>>on a superficial type commonality. Are you going to put all >>>your ints in a single table also? I would expect not. >> >>I'm glad to see that you don't like the idea of putting all characters >>of all strings in a single table. >> >>I believe a relational database should treat strings as an ADT to be >>used for domains for attributes.
>>>Instead, let's imagine we're in a nested relational system,
>>>and strings are a relation type {index:int, c:char} I will use
>>>the name S to denote the string at hand; sort of the analog
>>>of "this" in an OO language.
>>
>>Well, I was assuming a flat relational system. If you are going to
>>allow nested relations then I agree that things are rather
>>different. In fact I believe this is a bit of a cop out. Please
>>correct me if I'm wrong.
NF^2 nested relations? Or relation-valued attributes?
>>At the outermost nesting level you have >>chosen to deal with strings opaquely - as a domain for attributes. In >>other words at that level of abstraction you have chosen not to deal >>with strings relationally.
And I would expect a String type to permit multiple possible representations. If one can represent a String as an array, one already represents them as a relation given the operations to convert back and forth. So why not have a relation-valued representation?
/amplification
>>You don't seem to appreciate the fact that the onus of proof often >>depends on the nature of the claim rather that who is making it. It >>would obviously be difficult to construct a formal proof that there >>doesn't exist elegant and useful string processing algorithms using >>RA, whereas it is comparatively easy to show examples if they in fact >>exist. All I said is that I can't think of any and perhaps there >>aren't any. I make no apologies for having rather less experience in >>RM than you do. If you look carefully at my post you will find that I >>never even claim that there are no such examples. I only raise the >>question.
Intentional? ;> LOL
>>I note that this is the second time you have expected proofs of non- >>existence of examples (cf our discussions about fine grained mutative >>operations in a previous thread). Do you agree that this is often >>impossible?
Indeed. The more we make subject to proof, the better, however.
>>>>Won't you almost always >>>>join/select for the simple purpose of looking up a character in a >>>>given string at a given index position? >> >>>You say that like it's a bad thing. Join is the primary tool >>>of the RA, so yes, we will join. And select. We'll use the >>>algebra, just like an OO programmer would use loops >>>and method calls. >> >>Of course. What I mean is that the access pattern on the relation is >>almost always the same, and no different to the functionality provided >>by the ADT. >> >>This contrasts with relations like Employee which join on different >>attributes at different times, revealing the power of RA for adhoc >>query.
Or cross-product which I hope everyone will agree is a meaningul and useful operation too.
>>>indexOf(int ch, int fromIndex) >> >>>The source to java.lang.String can be downloaded as part of the JDK; >>>I will refer to it here. (But not excerpt it, much as I would like to >>>for >>>comparison, because it is copyrighted.) >> >>>Here is the relational implementation in pseudo-SQL: >> >>> select min(index) from S where c = ch and index >= fromIndex >> >>>The body of the Java implementation is 37 lines long, and >>>contains 8 if statements and 2 for loops, and can return >>>from any one of five places. >> >>I shall look at this Java code when I get the time. It seems overly >>complicated. >> >>In my C++ code I use a StringRange class that represents an aliased >>range of characters using a pair of pointers, allowing me to write >> >> while(s && *s != ch) ++s; >> >>to scan through characters in range s not equal to ch.
I wonder how the above loop terminates if ch does not occur in s. Then again, if the increment operation can make s evaluate to false at the end of the string, I suppose it is a fair operation to assume just as join is a fair operation to assume.
>>> select min(S1.index) from S S1 where >>> (select min(S2.c = str.c) from S S2, str >>> where S2.index-S1.index = str.index) and >>> S1.index >= fromIndex
>>>"Select the smallest S.index where >>> every character in S' offset by S.index equals >>> every character in str, >>> that is greater than or equal to fromIndex" >> >>>The body of the java implementation calls an internal helper function >>>that is 33 lines, containing 6 ifs, 2 for loops, 1 while loop, and >>>that >>>can return from any one of 4 different places. >> >>Thinking off the top of my head, in C++ I would write something based >>along the lines of >> >>while(s && str.size() <= s.size() && s.prefix(str.size()) != str) ++s;
I tend to disagree with that. You do not account for the physical implementation of join or min.
My code didn't use any
> function abstraction; what you see is the whole deal. Otherwise
> we both just wrap our implementations in a function and call it a
> day. My implementation used only min(), subtraction, and comparison.
I don't think it unfair to assume standard library-supplied code in a comparison, but I agree the other extreme of wrapping everything in a function makes no sense either.
I note that David's algorithm is necessarily at least O(n*m) where n is the number of characters in s and m is the number of characters in str--squared if the .size function scans for null-termination.
Your solution is potentially O(n) depending more on the skill and investment of the compiler writers and less on your skill and investment.
(Unless someone can point to a C++ compiler that has implemented algorithmic replacement.)
>>>>If you want to deal with strings declaratively (not procedurally) then >>>>use a functional programming or logic programming language. RM seems >>>>a poor fit. >> >>>Does it still seem so after reading the above? >> >>Yes!
>>I found your post very informative and interesting and shows me that >>RA is more powerful than I imagined - when combined with aggregation. >>I was expecting that substring testing would require transitive >>closure.
>>I admit the RM approach is a little foreign to me. >>However, I have a remaining objection, and perhaps this could be >>overthrown as well. >> >>The objection is this: I find it difficult to imagine a system being >>able to turn those select statements into fast efficient >>implementations on strings represented as arrays of characters. >>Presumably there is an integrity constraint that shows that a string >>has characters at all index positions from the start to the end of the >>string. Somehow the system sees this integrity constraint, realises >>that strings can be implemented as arrays and then is able to >>translate RA expressions into efficient algorithms. This all seems >>rather speculative to me. >> >>Can you answer this question for me? In the following >> >> select min(index) from S where c = ch and index >= fromIndex >> >>How does the system realise that it can avoid finding all characters >>satisfying the where clause, and then finding the minimum index? ie >>instead it should merely scan the array for the left most character >>matching ch starting at fromIndex.
Indeed. Considering the physical representation might omit any representation of the index other than in the relative physical position of the characters, the compiler would absolutely have to understand the relationship between index and physical location.
> The thing that I used to think was harder was figuring out
> early termination of aggregates. That is, if you are aggregating
> multiplication across a set of integers, you can stop doing any
> actual multiplication if you hit a zero. I found the key to this
> in some recently published Fortress documents; you can
> stop any time you hit a fixed point of the folded function.
> (So the system has to know fixed points.)
>
> Actually the Fortress documents are quite interesting in
> how they handle both catamorphisms and how they handle
> implicit parallelization. I note these are two things that
> are *quite* difficult to optimize in C++ because of its
> Mr. Pouty Pants memory model, where you pretty
> much have to assume everything is aliased to everything
> else unless you have an affidavit from the Postmaster General.
> I also note that these kinds of optimizations are a cakewalk
> for a relational language, and that the relative importance
> of implicit parallelization vs. standard C-style optimizations
> is strongly increasing over time with multiple CPU cores.
> Finally, I note that I am really starting to overuse the
> phrase "I note."
>
> Upon rereading the last paragraph it appears my coherence
> level is dropping faster than a frat boy's pants at Mardi Gras,
> so I'd better sign off now. Sorry for any incoherence.
The Dragon Book mentions how algorithmic replacement can achieve far greater optimizations than the sum of all the optimizations it describes and notes that nobody has yet figured out how to do algorithmic replacement. Declarative languages expressing needs at a higher level change algorithmic replacement into a matter of algorithmic choice, which is far far easier to automate.
/amplification Received on Wed Jan 31 2007 - 09:24:23 CST
![]() |
![]() |