Re: In an RDBMS, what does "Data" mean?
Date: Sun, 27 Jun 2004 22:38:37 +0200
Message-ID: <40df3048$0$34762$e4fe514c_at_news.xs4all.nl>
Marshall Spight wrote:
> mAsterdam wrote:
>>Marshall Spight wrote: >>>Anthony W. Youngman wrote: >>> >>>>Note that it's easy to go from a list to either of the other two. But in >>>>order to go back, the set or bag needs to contain extra data (ie the >>>>order) over the list. >>> >>>I don't see how you could consider that data "extra" if it was >>>there originally. >>> >>>Anyway, the list [A, B, C] expressed as a set is { (1, A), (2, B), (3, C) } >>>where [] denotes an ordered collection and {} denotes an unordered >>>collection. >>> >>[...] >>First, the way you chose is not without problems:
>
> For sure!
>>Let's put a new element 'N' into >>the list, right after the 'B': >> >>The new list is [A, B, N, C]. >>Now what is the set? >> >>is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or >>is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?
>
> It's UC2.
Ok. No argument, just wondering: any particular reason to discard UC1?
>>In the case of UC2 every insert causes updates to >>all elements that should be after the new element.
>
> Let us also consider the two other common implementations
> of lists: linked lists and arrays.
Yes, there are more alternatives, all with pros and cons.
> In UC2, an insert requires O(N) time: every index above
> the one inserted requires updating. In a linked list, an insert
> requires O(1) time, but locating an item in order to insert in
> front of it is O(N). In an array, an insert requires O(N) time,
> because you have to move all the higher elements up.
>
> The issue here is not performance, ...
Indeed.
> it's interface.
Is it really? Just interface? Maybe I just don't
understand what you mean by that. I suspect it goes
beyond interface. To be more specific: I think it is related
to the information principle:
> The entire information content of a relational database
As soon as some order is said to have information content,
this principle requires that to have it in a relational database,
> is represented in one and only one way: namely, as
> attribute values within tuples within relations.
BTW: Attribute is still on the glossary's todo list (hint :-)
> Using SQL
> to manipulate ordered data *where the ordering is positional*
> and *not by any logical component of the data* is just way
> too hard.
>
> "Lists are very common, and SQL doesn't handle them well at all.
> RM doesn't handle them well either."
>
> Let me expand on what I mean by "handle," and to do so,
> I'll use the "structure, integrity, manipulation" definition of
> a DBMS.
>
> If you have an unchanging list, using (1,A), (2,B), (3,C) etc as a list
> *structure* works just fine. All the query operators you'd expect
> to have, you have: get me item 2, how many items are there, etc.
> You'd also get some not-so-common queries: at what indicies
> does the letter "Q" occur?
>
> But let's consider manipulation. Simple insert, using your (3,N) tuple.
>
> Java:
> list.insert(3, N);
>
> SQL:
> begin
> update List set position = position + 1 where position >= 3
> insert into List (position, value) values (3, N)
> commit
> [note if you do them in the wrong order you're screwed.]
No problem. Assume a preprocessor and make a macro^H^H^H^H^H shorthand.
> Wait, I just remembered: Date says integrity enforcement
> should be at the statement level, not the transaction
> level. So the first update will fail, because it leaves a
> hole in the sequence. So I've gotta figure out how to
> update all those pos values at once, which I think I
> can do with mod and some offsets. I expect it would
> take me a few hours to figure out.
>
> Now: integrity. I have to specify some checks. Uh:
> unique(pos)
> check( pos >= 0 )
> check( select max(pos) from List = select count(pos) from List )
>
> Did that get it? I think it did, but I'm not sure. Also, if I
> saw that in a table declaration, would I say "list" or would
> I say "bunch of integrity constraints."
:-)
>>The problem is we have to make a choice with consequences to get from >>the list to a unordered collection carrying the same meaning, so we are >>capable of reconstructing the list.
>
> I don't disagree, but I might say it differently: what matters is what we
> options we have to enforce integrity, and what operators we have
> to perform manipulation. I think RM is a step backwards in ease-of-use
> for each of these where lists are concerned. (Which is not to say
> that I think we should make our decisions on that basis alone, but I
> do think it's significant.)
>>Thing is we *have* to make a complex choice, >>because (and only because) we decided that the order was >>meaningful.
>
> I don't think the choice is intrinsically complex. I think
> the issue is just that SQL (and TTM for that matter) don't
> give you even the most basic list manipulation or integrity
> checks.
The more options we have, the more serious the problem is. Is list [A, B, N, C] ambiguous? Is 'insert Y into L1 after B' amibiguous?
Say we have a pizza-attribute 'topping' of type 'list-of-toppings' Is 'constraint FK topping refers to toppings' ambiguous?
> I was more thinking of something like the relational algebra,
> with its minimal set of operators, additional operators defined
> in terms of the minimal ones, and a definition of what it means
> for a set of operators to be complete. Only for lists.
>
> Frighteningly, if I can't find something like that, I may have
> to do it myself.
"Spight list algebra". Sounds good. :-) Received on Sun Jun 27 2004 - 22:38:37 CEST