# Relation subset operators

Date: Tue, 2 Jun 2009 13:24:09 -0700 (PDT)

Message-ID: <a8c68252-4821-48d7-bf18-58f2d2135687_at_3g2000yqk.googlegroups.com>

While working on aggregation within groupping operations on the db core I design for relation manipulation, I questionned myself about the opportunity of using new operators to simplify relational division formulation and make it more systematic. For instance, conside the following questions:

suppose CAR_SALE relation represented as

id car salesman price color date 1 Buick Henderson 10000 Red 01/01/1990 2 Buick Wilkinson 10000 Red 02/01/1990 3 Chevrolet Hutchinson 10000 Red 12/01/1990 4 Buick Wilkinson 10000 Blue 13/01/1990 5 Chevrolet Henderson 10000 Red 14/01/1990 6 Buick Henderson 10000 Blue 16/01/1990 7 Buick Henderson 10000 Blue 18/01/1990 8 Chevrolet Parson 10000 Yellow 18/01/1990

Now consider the following questions

> Question 1: What are the total sales of salesmen who have sold AT LEAST (one) Blue car.

returning...

salesman total_sale Wilkinson 20000 Henderson 30000

> Question 2: What are the total sales of salesman who have sold AT LEAST (one) blue car AND (one) red car.

returning...

salesman total_sale Wilkinson 20000 Henderson 30000

> Question 3: What are the total sales of salesman who have sold AT LEAST (one) blue car OR (one) red car.

returning...

salesman total_sale Hutchinson 10000 Wilkinson 20000 Henderson 30000

> Question 4: What are the total sales of salesman who have NOT sold ANY blue cars

returning...

salesman total_sale Hutchinson 10000 Parson 10000

> Question 5: What are the total sales of salesman who have NOT sold NEITHER a blue car NOR a red car

returning...

salesman total_sale Parson 10000

Even though the above examples are easy to express algebrically (at least compared to their SQL expression), they must to be decomposed into a minimum of two to three elementary operations to be solved when using elementary operators (JOIN, UNION, GROUP BY). I thought about creating an specific operator for groupping subsets to simplify the expression of such problems. Such operator would allow to operate an relation involved in a division operation with one attribute drawn from the header subset. I thought about symbolizing the operator as */ * followed by the operator to be implemented within the groupping subset such as */=* */>*, */<*, */<>*, */*. Using such operator allows the expression of the questions above in a simpler way.

**SYMBOLOGY
**

/ : relational division

+: relational union

/=: equality within the group

**DEFINITION
**

For a relation R and un-ary relations p and q part of R header, and V
values drawn from any attribute of R, */=* would be defined such as

R/p WITH q /= V <=> R/p (INNER) JOIN (p WHERE q = V) ON p

Using the above definition, the expression of Question 1 comes to

CARSALE/salesman WITH color /= 'Blue' is a interesting shortcut to (CARSALE/salesman) JOIN (salesman WHERE color = 'Blue') ON salesman

to get Question 2, instead of writing (CARSALE/salesman) JOIN ((salesman WHERE color = 'Blue') JOIN (salesman WHERE color = 'Red') on salesman ) ON salesman, we could simply write...

CARSALE/salesman WITH color /= 'Blue' AND color /= 'Red'

to get Question 3, instead of writing (CARSALE/salesman) JOIN ((salesman WHERE color = 'Blue') UNION ALL (salesman WHERE color = 'Red') on salesman ) ON salesman, we could write...

CARSALE/salesman WITH color /= 'Blue' OR color /= 'Red'

to get Question 4 is even simpler...instead of writing (CARSALE/
salesman) JOIN (salesman MINUS (salesman WHERE color = 'Blue')) ON
salesman

we keep the coherence of expressing

CARSALE/salesman WITH color /<> 'Blue' (/<> would be the opposite of / =)

Finally to get Question 5 expressed as (CARSALE/salesman) JOIN (salesman MINUS (salesman WHERE color = 'Blue') MINUS (salesman WHERE color = 'Red')) ON salesman

we would simply write

CARSALE/salesman WITH color /<> 'Blue' AND color /<> 'Red' .

The need for such simplification is even more obvious when the queries become more complex

We could imagine a Question 6 which would be: What are the total sales of salesmen who either have *not* sold a single Blue OR have *not* sold a single Red car...

Using traditional operators, Question 6 would be expressed as (CARSALE/salesman) JOIN ((salesman MINUS (salesman WHERE color = 'Blue')) UNION ALL (salesman MINUS (salesman WHERE color = 'Blue'))) ON salesman

where they could be written...

CARSALE/salesman WITH color /<> 'Blue' OR color /<> 'Red' .

It seems to me such simplification would be beneficial to express more systematically algebric formulation of relational divisions.

Note: I do not mention SQL into this because there would be too much verbose to write. Using the db core and subsequent language I am working ontoI can write this...

** --DEFINITION
**

R1 = (1, 'A', 0, 10) + (1, 'A', 1, 10) + (1, 'B', 1, 10) + (1,
'B', 1, 10) WITH LEFT TO RIGHT ALIGN (id, groupper, differentiator,
value)

Q1 = R1/salesman WITH (color /= 'Blue') Q2 = R1/salesman WITH (color /= 'Blue' AND color /= 'Red') Q3 = R1/salesman WITH (color /= 'Blue' OR color /= 'Red') Q4 = R1/salesman WITH color /<> 'Blue' Q5 = R1/salesman WITH color /<> 'Blue' AND color /<> 'Red' Q5 = R1/salesman WITH color /<> 'Blue' OR color /<> 'Red'

--PRESENTATION in tabular format

PRESENT2D(salesman, SUM(price)).Q1 PRESENT2D(salesman, SUM(price)).Q2 PRESENT2D(salesman, SUM(price)).Q3 PRESENT2D(salesman, SUM(price)).Q4 PRESENT2D(salesman, SUM(price)).Q5

produces previously mentionned results..

Comments would be welcome to evaluate the usefulness of such operator....

Thank you. Received on Tue Jun 02 2009 - 22:24:09 CEST