Re: A Logical Model for Lists as Relations
Date: 9 May 2006 19:08:47 -0700
Message-ID: <1147226927.622359.101430_at_u72g2000cwu.googlegroups.com>
Marshall Spight wrote:
> I am interested in the question of how best to handle lists in a
> relation-oriented world. I have considered various approaches,
> usually oriented around adding a list collection type.
>
> But a list can be described as a relation. Most simply, an infinite
> list is a relation from the natural numbers to the target set,
> and a finite list is a relation from some finite contiguous subset
> [0..n] of the naturals to the target set. Generalizing, we could
> describe an n-ary list as a relation with an index attribute and
> zero or more other attributes.
You could implement the relation as an associative array but then you have the fact that the index values are ordered, as you mention below, so that you also need functions of insert, delete, and append.
> Generalizing still further, we could accomodate multidimensional
> lists as relations with multiple index attributes.
Yes.
> There are various selection operations we might want to do
> on lists. The full power of something like SQL's SELECT
> on a list, where the index was one of the attributes, would
> be useful. This would necessarily produce a relation result.
If the index were part of the selection, yes, otherwise it could be a bag as with any SQL result "set."
> There might be cases where one wanted a list result.
Yes.
> For example, the substring operation. If one had a string
> "abcdefg" (actually { (0,a), (1,b) ... etc }) and wanted
> everything where the index was in [1..3], one would
> want { (0,b), (1,c), (2,d) } rather than { (1,b), (2,c), (3,d) }.
> Perhaps this could be accomplished with a library routine.
> (Otherwise you get into the territory of trying to make list
> a subtype of relation, and worse, wanting contravariant
> return types, ugh.)
>
> There are also two-argument operations on generic lists.
> One, commonly called "zip", is the pairwise putting together
> of list elements. Thus, zip([1,2], [5,6]) is [(1,5), (2,6)].
Yes, "we" do that -- current terminology (in a world far away) might be "associating unassociated lists." There are times when a schema does not specify the association, but an application needs to see multiple lists as "zipped" (I like the term, but already have lingo for it, so I'll stick with "associated multivalues"). [That would be a good case for a "virtual field" that is such a matrix, rather than zipping within an app.]
> If we consider lists as relations, this is simply a join
> on the index attribute.
Yes, when reporting against nested list structures using a database-independent-ish flavor of SQL, such as ODBC/SQL-92, that is precisely what I have done when needing to zip two lists.
> Another binary operation on lists is taking the product.
> In essence, the dimensions of the second list are considered
> dimensions above those of the first list. product([1,2], [5,6])
> is [[(1,5), (2,5)], [(1,6), (2,6)]].
You aren't going to like what I've heard this called (and called it myself in a former life, sorry) -- "normalizing" (even though it is clearly NOT). More commonly called "flattening" (another term to raise someone's ire). If you need to output an SQL(-92) result set and you are starting with multiple multivalues, you need to take the cross product.
{John Doe, {jdoe_at_aol.com, john.doe_at_gmail.com}, {toyota camry, mazda tribute, ford mustang}}
is a single "row" in your relation that includes list-valued attributes
and it becomes 6 rows when "flattened" by way of a cartesian
cross-product. IIRC (and I likely don't), the GROUP and UNGROUP
operators in tutD do this. In another world (IBM U2), it is NEST and
UNNEST.
> This is simply a rename of
> the second lists 0th index to be the 1st index, and a join.
>
> Both zip and product can be generalized to multidimesional
> lists in the obvious way.
Yes.
> What about imperative operations? This is a bit more complicated.
> We have to consider whether we want to enforce the usual
> list semantics, where for example if we delete an element, the
> higher indicies are accordingly reduced.
You likely want both operations - a delete and a ripple delete. A deleted list value without the ripple delete would make that value a "2-valued-logic null". You wouldn't want to have only the ripple delete in case there is an application need to remove a value in one transaction before "replacing" it in another. No need to renumber, only to do it again with an insert.
> Thus, insert/delete
> also renumber. Again this might be best handled with a library
> routine. (Alternative: triggers?)
I'm not a trigger-happy person, so I'd go for the library.
> The more I type, the more this seems like a simple matter
> of some library routines for applying the usual list semantics.
>
> Anyway, comments, suggestions, criticisms welcome. Although
> note I'm not much interested in the question of whether ordered
> data is useful or not.
I hope I refrained from that, but I'm not taking the time to re-read in order to find out.
By the way, what are your goals/requirements for reinventing this wheel -- what are MS, Intersystems, IBM... not doing that you want to do, or are you just trying to roll your own? Just curious. One guess is that you want declarative constraint handling, perhaps? I don't know if anyone has that in combination with list-valued attributes.
Cheers! --dawn Received on Wed May 10 2006 - 04:08:47 CEST