# Relation subset operators

From: Cimode <cimode_at_hotmail.com>
Date: Tue, 2 Jun 2009 13:24:09 -0700 (PDT)

suppose CAR_SALE relation represented as

CAR_SALE

```id		car			salesman		price		color		date
1		Buick		Henderson	        10000       Red		01/01/1990
2		Buick		Wilkinson		10000	Red		02/01/1990
3		Chevrolet		Hutchinson	10000	Red		12/01/1990
4		Buick		Wilkinson		10000	Blue		13/01/1990
5		Chevrolet		Henderson 	10000	Red		14/01/1990
6		Buick		Henderson	        10000	Blue		16/01/1990
7		Buick		Henderson	        10000	Blue		18/01/1990
8		Chevrolet		Parson		10000	Yellow	18/01/1990

```

Now consider the following questions

returning...

```salesman			total_sale
Wilkinson			20000
Henderson		30000

```

> Question 2: What are the total sales of salesman who have sold AT LEAST (one) blue car AND (one) red car.

returning...

```salesman			total_sale
Wilkinson			20000
Henderson		30000

```

> Question 3: What are the total sales of salesman who have sold AT LEAST (one) blue car OR (one) red car.

returning...

```salesman			total_sale
Hutchinson		10000
Wilkinson			20000
Henderson		30000

```

> Question 4: What are the total sales of salesman who have NOT sold ANY blue cars

returning...

```salesman			total_sale
Hutchinson		10000
Parson			10000

```

> Question 5: What are the total sales of salesman who have NOT sold NEITHER a blue car NOR a red car

returning...

```salesman			total_sale
Parson			10000

```

Even though the above examples are easy to express algebrically (at least compared to their SQL expression), they must to be decomposed into a minimum of two to three elementary operations to be solved when using elementary operators (JOIN, UNION, GROUP BY). I thought about creating an specific operator for groupping subsets to simplify the expression of such problems. Such operator would allow to operate an relation involved in a division operation with one attribute drawn from the header subset. I thought about symbolizing the operator as */ * followed by the operator to be implemented within the groupping subset such as */=* */>*, */<*, */<>*, */*. Using such operator allows the expression of the questions above in a simpler way.

SYMBOLOGY
/ : relational division
+: relational union
/=: equality within the group

DEFINITION
For a relation R and un-ary relations p and q part of R header, and V values drawn from any attribute of R, */=* would be defined such as

R/p WITH q /= V <=> R/p (INNER) JOIN (p WHERE q = V) ON p

Using the above definition, the expression of Question 1 comes to

CARSALE/salesman WITH color /= 'Blue' is a interesting shortcut to (CARSALE/salesman) JOIN (salesman WHERE color = 'Blue') ON salesman

to get Question 2, instead of writing (CARSALE/salesman) JOIN ((salesman WHERE color = 'Blue') JOIN (salesman WHERE color = 'Red') on salesman ) ON salesman, we could simply write...

CARSALE/salesman WITH color /= 'Blue' AND color /= 'Red'

to get Question 3, instead of writing (CARSALE/salesman) JOIN ((salesman WHERE color = 'Blue') UNION ALL (salesman WHERE color = 'Red') on salesman ) ON salesman, we could write...

CARSALE/salesman WITH color /= 'Blue' OR color /= 'Red'

to get Question 4 is even simpler...instead of writing (CARSALE/ salesman) JOIN (salesman MINUS (salesman WHERE color = 'Blue')) ON salesman
we keep the coherence of expressing

CARSALE/salesman WITH color /<> 'Blue' (/<> would be the opposite of / =)

Finally to get Question 5 expressed as (CARSALE/salesman) JOIN (salesman MINUS (salesman WHERE color = 'Blue') MINUS (salesman WHERE color = 'Red')) ON salesman

we would simply write

CARSALE/salesman WITH color /<> 'Blue' AND color /<> 'Red' .

The need for such simplification is even more obvious when the queries become more complex

We could imagine a Question 6 which would be: What are the total sales of salesmen who either have *not* sold a single Blue OR have *not* sold a single Red car...

Using traditional operators, Question 6 would be expressed as (CARSALE/salesman) JOIN ((salesman MINUS (salesman WHERE color = 'Blue')) UNION ALL (salesman MINUS (salesman WHERE color = 'Blue'))) ON salesman

where they could be written...

CARSALE/salesman WITH color /<> 'Blue' OR color /<> 'Red' .

It seems to me such simplification would be beneficial to express more systematically algebric formulation of relational divisions.

Note: I do not mention SQL into this because there would be too much verbose to write. Using the db core and subsequent language I am working ontoI can write this...

--DEFINITION
R1 = (1, 'A', 0, 10) + (1, 'A', 1, 10) + (1, 'B', 1, 10) + (1, 'B', 1, 10) WITH LEFT TO RIGHT ALIGN (id, groupper, differentiator, value)

```Q1 = R1/salesman WITH (color /= 'Blue')
Q2 = R1/salesman WITH (color /= 'Blue'  AND color /= 'Red')
Q3 = R1/salesman WITH (color /= 'Blue'  OR color /= 'Red')
Q4 = R1/salesman WITH color /<> 'Blue'
Q5 = R1/salesman WITH color /<> 'Blue'  AND color /<> 'Red'
Q5 = R1/salesman WITH color /<> 'Blue'  OR color /<> 'Red'

```

--PRESENTATION in tabular format

```PRESENT2D(salesman, SUM(price)).Q1
PRESENT2D(salesman, SUM(price)).Q2
PRESENT2D(salesman, SUM(price)).Q3
PRESENT2D(salesman, SUM(price)).Q4
PRESENT2D(salesman, SUM(price)).Q5

```

produces previously mentionned results..

Comments would be welcome to evaluate the usefulness of such operator....

Thank you. Received on Tue Jun 02 2009 - 22:24:09 CEST

Original text of this message