Re: In an RDBMS, what does "Data" mean?

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Tue, 29 Jun 2004 19:07:04 -0500
Message-ID: <cbt07g$6b9$1_at_news.netins.net>


"Marshall Spight" <mspight_at_dnai.com> wrote in message news:j6CDc.101024$2i5.34223_at_attbi_s52...
> "mAsterdam" <mAsterdam_at_vrijdag.org> wrote in message
news:40de988a$0$48920$e4fe514c_at_news.xs4all.nl...
> > Marshall Spight wrote:
> > > Anthony W. Youngman wrote:
> > >>Note that it's easy to go from a list to either of the other two. But
in
> > >>order to go back, the set or bag needs to contain extra data (ie the
> > >>order) over the list.
> > >
> > > I don't see how you could consider that data "extra" if it was
> > > there originally.
> > >
> > > Anyway, the list [A, B, C] expressed as a set is { (1, A), (2, B), (3,
C) }
> > > where [] denotes an ordered collection and {} denotes an unordered
> > > collection.
> > >
> > [...]
> > First, the way you chose is not without problems:
>
> For sure!
>
>
> > Let's put a new element 'N' into
> > the list, right after the 'B':
> >
> > The new list is [A, B, N, C].
> > Now what is the set?
> >
> > is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or
> > is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?
>
> It's UC2.

Yes, there is some such list algebra in the DataBASIC language associate with PICK and this is how such an insert would be handled logically.

>
>
> > In the case of UC2 every insert causes updates to
> > all elements that should be after the new element.

Only if the number is stored rather than implied by the ordering of the data in linked list, for example

> Let us also consider the two other common implementations
> of lists: linked lists and arrays.
>
> In UC2, an insert requires O(N) time: every index above
> the one inserted requires updating. In a linked list, an insert
> requires O(1) time, but locating an item in order to insert in
> front of it is O(N). In an array, an insert requires O(N) time,
> because you have to move all the higher elements up.
>
> The issue here is not performance, it's interface. Using SQL
> to manipulate ordered data *where the ordering is positional*
> and *not by any logical component of the data* is just way
> too hard.

yes, exactly!

> "Lists are very common, and SQL doesn't handle them well at all.
> RM doesn't handle them well either."
>
> Let me expand on what I mean by "handle," and to do so,
> I'll use the "structure, integrity, manipulation" definition of
> a DBMS.
>
> If you have an unchanging list, using (1,A), (2,B), (3,C) etc as a list
> *structure* works just fine. All the query operators you'd expect
> to have, you have: get me item 2, how many items are there, etc.
> You'd also get some not-so-common queries: at what indicies
> does the letter "Q" occur?
>
>
> But let's consider manipulation. Simple insert, using your (3,N) tuple.
>
> Java:
> list.insert(3, N);
>
> SQL:
> begin
> update List set position = position + 1 where position >= 3
> insert into List (position, value) values (3, N)
> commit
> [note if you do them in the wrong order you're screwed.]
>
> Wait, I just remembered: Date says integrity enforcement
> should be at the statement level, not the transaction
> level. So the first update will fail, because it leaves a
> hole in the sequence. So I've gotta figure out how to
> update all those pos values at once, which I think I
> can do with mod and some offsets. I expect it would
> take me a few hours to figure out.
>
> Now: integrity. I have to specify some checks. Uh:
> unique(pos)
> check( pos >= 0 )
> check( select max(pos) from List = select count(pos) from List )
>
> Did that get it? I think it did, but I'm not sure. Also, if I
> saw that in a table declaration, would I say "list" or would
> I say "bunch of integrity constraints."

very good point

> > The problem is we have to make a choice with consequences to get from
> > the list to a unordered collection carrying the same meaning, so we are
> > capable of reconstructing the list.
>
> I don't disagree, but I might say it differently: what matters is what we
> options we have to enforce integrity, and what operators we have
> to perform manipulation. I think RM is a step backwards in ease-of-use
> for each of these where lists are concerned. (Which is not to say
> that I think we should make our decisions on that basis alone, but I
> do think it's significant.)
>
>
> > Thing is we *have* to make a complex choice,
> > because (and only because) we decided that the order was
> > meaningful.
>
> I don't think the choice is intrinsically complex. I think
> the issue is just that SQL (and TTM for that matter) don't
> give you even the most basic list manipulation or integrity
> checks.
>

RM, or at least SQL, focuses on a) scalar values and b) relations. Adding c) lists as a type with its own functions (operators) native to the database and SQL language is a start. Viewing relations and lists both as functions allows us to think about what functions can be applied to which others, in particular, which functions can be applied to functions where the, uh, domain of one of the parameters is the natural numbers (to mix vocabulary). These would be the ordered lists.

> > There are even more worms in this can, I think, but
> > I would appreciate your comments on this one.
> >
> > <snip>
> > > Lists are very common, and SQL doesn't handle them well at all.
> > > RM doesn't handle them well either.
> > >
> > > (I can also believe that MV handles them quite well, although I
> > > have no direct experience to that end.)

sometimes yes, sometimes no.

> > > Does anyone know of a "list calculus" or "list algebra" with a
> > > formal definition? It is just too simple for anyone to have
> > > cared about?
> >
> > Lisp and prolog come to mind - but that is not what you are
> > looking for - at least it is not what I am looking for.
>
> No, if I want a PL with list operators, I can find them anywhere.
> I was more thinking of something like the relational algebra,
> with its minimal set of operators, additional operators defined
> in terms of the minimal ones, and a definition of what it means
> for a set of operators to be complete. Only for lists.
>
> Frighteningly, if I can't find something like that, I may have
> to do it myself.

I look forward to it! --dawn Received on Wed Jun 30 2004 - 02:07:04 CEST

Original text of this message