Relation subset operators

From: Cimode <cimode_at_hotmail.com>
Date: Tue, 2 Jun 2009 13:24:09 -0700 (PDT)
Message-ID: <a8c68252-4821-48d7-bf18-58f2d2135687_at_3g2000yqk.googlegroups.com>


While working on aggregation within groupping operations on the db core I design for relation manipulation, I questionned myself about the opportunity of using new operators to simplify relational division formulation and make it more systematic. For instance, conside the following questions:

suppose CAR_SALE relation represented as

CAR_SALE

id		car			salesman		price		color		date
1		Buick		Henderson	        10000       Red		01/01/1990
2		Buick		Wilkinson		10000	Red		02/01/1990
3		Chevrolet		Hutchinson	10000	Red		12/01/1990
4		Buick		Wilkinson		10000	Blue		13/01/1990
5		Chevrolet		Henderson 	10000	Red		14/01/1990
6		Buick		Henderson	        10000	Blue		16/01/1990
7		Buick		Henderson	        10000	Blue		18/01/1990
8		Chevrolet		Parson		10000	Yellow	18/01/1990

Now consider the following questions

> Question 1: What are the total sales of salesmen who have sold AT LEAST (one) Blue car.

returning...

salesman			total_sale
Wilkinson			20000
Henderson		30000


> Question 2: What are the total sales of salesman who have sold AT LEAST (one) blue car AND (one) red car.

returning...

salesman			total_sale
Wilkinson			20000
Henderson		30000


> Question 3: What are the total sales of salesman who have sold AT LEAST (one) blue car OR (one) red car.

returning...

salesman			total_sale
Hutchinson		10000
Wilkinson			20000
Henderson		30000

> Question 4: What are the total sales of salesman who have NOT sold ANY blue cars

returning...

salesman			total_sale
Hutchinson		10000
Parson			10000

> Question 5: What are the total sales of salesman who have NOT sold NEITHER a blue car NOR a red car

returning...

salesman			total_sale
Parson			10000


Even though the above examples are easy to express algebrically (at least compared to their SQL expression), they must to be decomposed into a minimum of two to three elementary operations to be solved when using elementary operators (JOIN, UNION, GROUP BY). I thought about creating an specific operator for groupping subsets to simplify the expression of such problems. Such operator would allow to operate an relation involved in a division operation with one attribute drawn from the header subset. I thought about symbolizing the operator as */ * followed by the operator to be implemented within the groupping subset such as */=* */>*, */<*, */<>*, */*. Using such operator allows the expression of the questions above in a simpler way.

SYMBOLOGY
/ : relational division
+: relational union
/=: equality within the group

DEFINITION
For a relation R and un-ary relations p and q part of R header, and V values drawn from any attribute of R, */=* would be defined such as

R/p WITH q /= V <=> R/p (INNER) JOIN (p WHERE q = V) ON p

Using the above definition, the expression of Question 1 comes to

CARSALE/salesman WITH color /= 'Blue' is a interesting shortcut to (CARSALE/salesman) JOIN (salesman WHERE color = 'Blue') ON salesman

to get Question 2, instead of writing (CARSALE/salesman) JOIN ((salesman WHERE color = 'Blue') JOIN (salesman WHERE color = 'Red') on salesman ) ON salesman, we could simply write...

CARSALE/salesman WITH color /= 'Blue' AND color /= 'Red'

to get Question 3, instead of writing (CARSALE/salesman) JOIN ((salesman WHERE color = 'Blue') UNION ALL (salesman WHERE color = 'Red') on salesman ) ON salesman, we could write...

CARSALE/salesman WITH color /= 'Blue' OR color /= 'Red'

to get Question 4 is even simpler...instead of writing (CARSALE/ salesman) JOIN (salesman MINUS (salesman WHERE color = 'Blue')) ON salesman
we keep the coherence of expressing

CARSALE/salesman WITH color /<> 'Blue' (/<> would be the opposite of / =)

Finally to get Question 5 expressed as (CARSALE/salesman) JOIN (salesman MINUS (salesman WHERE color = 'Blue') MINUS (salesman WHERE color = 'Red')) ON salesman

we would simply write

CARSALE/salesman WITH color /<> 'Blue' AND color /<> 'Red' .

The need for such simplification is even more obvious when the queries become more complex

We could imagine a Question 6 which would be: What are the total sales of salesmen who either have *not* sold a single Blue OR have *not* sold a single Red car...

Using traditional operators, Question 6 would be expressed as (CARSALE/salesman) JOIN ((salesman MINUS (salesman WHERE color = 'Blue')) UNION ALL (salesman MINUS (salesman WHERE color = 'Blue'))) ON salesman

where they could be written...

CARSALE/salesman WITH color /<> 'Blue' OR color /<> 'Red' .

It seems to me such simplification would be beneficial to express more systematically algebric formulation of relational divisions.

Note: I do not mention SQL into this because there would be too much verbose to write. Using the db core and subsequent language I am working ontoI can write this...

 --DEFINITION
R1 = (1, 'A', 0, 10) + (1, 'A', 1, 10) + (1, 'B', 1, 10) + (1, 'B', 1, 10) WITH LEFT TO RIGHT ALIGN (id, groupper, differentiator, value)

Q1 = R1/salesman WITH (color /= 'Blue')
Q2 = R1/salesman WITH (color /= 'Blue'  AND color /= 'Red')
Q3 = R1/salesman WITH (color /= 'Blue'  OR color /= 'Red')
Q4 = R1/salesman WITH color /<> 'Blue'
Q5 = R1/salesman WITH color /<> 'Blue'  AND color /<> 'Red'
Q5 = R1/salesman WITH color /<> 'Blue'  OR color /<> 'Red'

--PRESENTATION in tabular format

PRESENT2D(salesman, SUM(price)).Q1
PRESENT2D(salesman, SUM(price)).Q2
PRESENT2D(salesman, SUM(price)).Q3
PRESENT2D(salesman, SUM(price)).Q4
PRESENT2D(salesman, SUM(price)).Q5

produces previously mentionned results..

Comments would be welcome to evaluate the usefulness of such operator....

Thank you. Received on Tue Jun 02 2009 - 22:24:09 CEST

Original text of this message