Re: In an RDBMS, what does "Data" mean?

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sun, 27 Jun 2004 22:38:37 +0200
Message-ID: <40df3048$0$34762$e4fe514c_at_news.xs4all.nl>

Marshall Spight wrote:
> mAsterdam wrote:

>>Marshall Spight wrote:
>>>Anthony W. Youngman wrote:
>>>
>>>>Note that it's easy to go from a list to either of the other two. But in
>>>>order to go back, the set or bag needs to contain extra data (ie the
>>>>order) over the list.
>>>
>>>I don't see how you could consider that data "extra" if it was
>>>there originally.
>>>
>>>Anyway, the list [A, B, C] expressed as a set is { (1, A), (2, B), (3, C) }
>>>where [] denotes an ordered collection and {} denotes an unordered
>>>collection.
>>>
>>[...]
>>First, the way you chose is not without problems:

>
> For sure!

>>Let's put a new element 'N' into
>>the list, right after the 'B':
>>
>>The new list is [A, B, N, C].
>>Now what is the set?
>>
>>is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or
>>is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?

>
> It's UC2.

Ok. No argument, just wondering: any particular reason to discard UC1?

>>In the case of UC2 every insert causes updates to
>>all elements that should be after the new element.

>
> Let us also consider the two other common implementations
> of lists: linked lists and arrays.

Yes, there are more alternatives, all with pros and cons.

> In UC2, an insert requires O(N) time: every index above
> the one inserted requires updating. In a linked list, an insert
> requires O(1) time, but locating an item in order to insert in
> front of it is O(N). In an array, an insert requires O(N) time,
> because you have to move all the higher elements up.
>
> The issue here is not performance, ...

Indeed.

> it's interface.

Is it really? Just interface? Maybe I just don't understand what you mean by that. I suspect it goes beyond interface. To be more specific: I think it is related to the information principle:

> The entire information content of a relational database
> is represented in one and only one way: namely, as
> attribute values within tuples within relations.

As soon as some order is said to have information content, this principle requires that to have it in a relational database, this content must be represented as attribute values.

So we have to answer the question: which attribute(s') values? The answer seems obvious: list typed attributes. But that's not the way it is done - so what stops that? I know: the lack of the "Spight list algebra"!

BTW: Attribute is still on the glossary's todo list (hint :-)

> Using SQL
> to manipulate ordered data *where the ordering is positional*
> and *not by any logical component of the data* is just way
> too hard.
>
> "Lists are very common, and SQL doesn't handle them well at all.
> RM doesn't handle them well either."
>
> Let me expand on what I mean by "handle," and to do so,
> I'll use the "structure, integrity, manipulation" definition of
> a DBMS.
>
> If you have an unchanging list, using (1,A), (2,B), (3,C) etc as a list
> *structure* works just fine. All the query operators you'd expect
> to have, you have: get me item 2, how many items are there, etc.
> You'd also get some not-so-common queries: at what indicies
> does the letter "Q" occur?
>
> But let's consider manipulation. Simple insert, using your (3,N) tuple.
>
> Java:
> list.insert(3, N);
>
> SQL:
> begin
> update List set position = position + 1 where position >= 3
> insert into List (position, value) values (3, N)
> commit
> [note if you do them in the wrong order you're screwed.]

No problem. Assume a preprocessor and make a macro^H^H^H^H^H shorthand.

> Wait, I just remembered: Date says integrity enforcement
> should be at the statement level, not the transaction
> level. So the first update will fail, because it leaves a
> hole in the sequence. So I've gotta figure out how to
> update all those pos values at once, which I think I
> can do with mod and some offsets. I expect it would
> take me a few hours to figure out.
>
> Now: integrity. I have to specify some checks. Uh:
> unique(pos)
> check( pos >= 0 )
> check( select max(pos) from List = select count(pos) from List )
>
> Did that get it? I think it did, but I'm not sure. Also, if I
> saw that in a table declaration, would I say "list" or would
> I say "bunch of integrity constraints."

:-)

>>The problem is we have to make a choice with consequences to get from
>>the list to a unordered collection carrying the same meaning, so we are
>>capable of reconstructing the list.

>
> I don't disagree, but I might say it differently: what matters is what we
> options we have to enforce integrity, and what operators we have
> to perform manipulation. I think RM is a step backwards in ease-of-use
> for each of these where lists are concerned. (Which is not to say
> that I think we should make our decisions on that basis alone, but I
> do think it's significant.)

>>Thing is we *have* to make a complex choice,
>>because (and only because) we decided that the order was
>>meaningful.

>
> I don't think the choice is intrinsically complex. I think
> the issue is just that SQL (and TTM for that matter) don't
> give you even the most basic list manipulation or integrity
> checks.

The more options we have, the more serious the problem is. Is list [A, B, N, C] ambiguous? Is 'insert Y into L1 after B' amibiguous?

Say we have a pizza-attribute 'topping' of type 'list-of-toppings' Is 'constraint FK topping refers to toppings' ambiguous?

If it is intrinsically simple as you say - what is your explanation why SQL (and TTM) do simply not address meaningful order?

Maybe we are just overlooking something obvious.

> I was more thinking of something like the relational algebra,
> with its minimal set of operators, additional operators defined
> in terms of the minimal ones, and a definition of what it means
> for a set of operators to be complete. Only for lists.
>
> Frighteningly, if I can't find something like that, I may have
> to do it myself.

"Spight list algebra". Sounds good. :-) Received on Sun Jun 27 2004 - 22:38:37 CEST

This message: [ Message body ]
Next message: Bill H: "Re: One Ring to Bind Them"
Previous message: --CELKO--: "Re: expressing one-or-more constraint structurally"
In reply to Marshall Spight: "Re: In an RDBMS, what does "Data" mean?"
Next in thread: --CELKO--: "Re: expressing one-or-more constraint structurally"
Reply: Marshall Spight: "Re: In an RDBMS, what does "Data" mean?"
Reply: Dawn M. Wolthuis: "Re: In an RDBMS, what does "Data" mean?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message