Re: What is Aggregation? Re: grouping in tuple relational calculus

From: Dawn M. Wolthuis <dwolt_at_tincat-group.comREMOVE>
Date: Sat, 19 Feb 2005 22:11:35 -0600
Message-ID: <cv92lq$5iv$1_at_news.netins.net>


"Mikito Harakiri" <mikharakiri_at_iahu.com> wrote in message news:dzuRd.36$o%4.38_at_news.oracle.com...
> "Jan Hidders" <jan.hidders_at_REMOVETHIS.pandora.be> wrote in message
> news:L2uRd.14949$TV7.1119364_at_phobos.telenet-ops.be...
>> Hmmm, yes, but list aggregation is a bit cumbersome in the context of
>> the relational model. There it is easy to identify a bag of domain
>> values, just give me a relation and point to one of its columns. For
>> lists you would probably also need to indicate how you would like to
>> order the tuples in the relation, and that may not be obvious.
>>
>> On the other hand, if you have lists, why require associativity and not
>> simply consider all functions over lists as aggregation functions?
>>
>> It's all a matter of definition, of course, and hence ultimately
>> meaningless.
>
> All of them -- sets, bags, and lists -- have their place in RM. The
> importance is shifted heavily toward the left side: bags are only
> important
> in the context of aggregation, and lists are even less important as the
> "order by" SQL clause is virtually the only usage for them.
>
> Once again, if we loose accociativity, then we have to expand our
> semantics
> spectrum to admit trees
>
> sets, bags, lists, trees
>
> This progression is natural:
> sets -- idempotent, commutative, associative
> bags -- nonidempotent, commutative, associative
> lists -- nonidempotent, noncommutative, associative
> trees -- nonidempotent, noncommutative, nonassociative
>
> The further down to the right you are, the less declarative RA becomes.
> Every algebraic property that is lost contributes to making a query more
> procedural.

I will attempt to explain how I think we can introduce these more complex data structures into the mix without making queries more procedural, but violating other (non-essential, in my opinion) aspects of relational theory.

Let's say we admit sets, bags, lists, and trees, but our declarative query language works only against single logical functions (relations with primary keys). Every such function is just one single (logical) relation, where every "column name" a user can use in the query is defined (somewhat like a View). The definition of each function includes a base relation name and each defined column includes "code" for the function that maps this column name to some derivation from columns in this or other base relations.

What does the code for a column name look like? It looks like the declaration of a function.

Feed in values (all represented as strings) from columns in the base relation plus names and columns from other relations, along with operations (on these string values) and your derived column value is available for the declarative query language.

Simple example:

Base Relation: Person
Header - Column Names: (PersonID, FirstName, Gender) all Strings Tuple values: (111, Curly, M), (123, Larry, M), (234, Moriah, F)

Base Relation: PersonAddresses
Header: (PersonID, AddressType, AddressID) Tuple values: (111, Home, 88723), (123, Work, 76251)

Base Relation: Address
Header: (AddressID,AddressLines,City,StateProvince,PostCode) Tuple values: (88723, [1]c/o George Herbert Walker[2]1177 3rd Ave,Anytown,TX,88888)

[Note: I don't know how lists are typically reflected in a tuple value, so just pretend this notation works for you and feel free to let me know how I should have written it -- the 1 and 2 in brackets above are implied by the ordering of the list in the value]

New logical relation/function for query: People Header: (PersonID, FirstName, Gender,
AddressType,AddressLines,City,StateProvince,PostCode)

We could have multiple unordered AddressID's and each AddressID could have multiple ordered AddressLines

Each column not in the base relation Person could be defined as derived by specifying a function
City, for example, can be defined as
City = Address(PersonAddress(Person.PersonID,"AddressID"),"City")

That is, feed in a PersonID from the base relation, grab the value of the AddressID from the PersonAddress relation for the person with this PersonID, then pull the city from the Address with this AddressID.

You are right -- that is not a "relational database" approach, but it does work and there is no procedural code here, if I am understanding it correctly

Now a query to the relation People might be select PersonID, FirstName, AddressLines, City, StateProvince, PostCode from People;

Whaddaya think? I'm not making this up -- this is how the model I work with functions -- using functions. Each relation is a function; each "vocabulary entry" (aka column, whether derived or not) for a "logical file dictionary" (aka view) is defined by a function; and because values can be of cardinality greater than one and can be ordered, this query can also be seen as a binary function -- mapping the query itself (represented by a string like all other values) to the resulting bag (represented as a string too).

Did I successfully demonstrate how to introduce more complex data structures into the mix without introducing procedural code or fail to do that? --dawn Received on Sun Feb 20 2005 - 05:11:35 CET

Original text of this message