Re: Relation subset operators

From: <cimode_at_hotmail.com>
Date: Fri, 5 Jun 2009 11:25:07 -0700 (PDT)
Message-ID: <53d6afdb-36b2-4fd9-a12f-3a7fdbc2f636_at_b1g2000vbc.googlegroups.com>


On 5 juin, 18:56, paul c <toledobythe..._at_oohay.ac> wrote:
> cim..._at_hotmail.com wrote:
> > On 3 juin, 16:23, --CELKO-- <jcelko..._at_earthlink.net> wrote:
> >>>> Comments would be welcome to evaluate the usefulness of such an operator. <<
> >> Just for comparison, I re-wrote the skeleton schema in SQL and brought
> >> the data element names up to ISO-11179 standards. The dates also need
> >> to be in ISO-8601 format, but that is not important for this
> >> comparison.
>
> >> CREATE TABLE CarSales
> >> (invoice_nbr INTEGER NOT NULL PRIMARY KEY,
> >>  vehicle_make VARCHAR(10) NOT NULL,
> >>  salesman_name  VARCHAR(10) NOT NULL,
> >>  vehicle_price DECIMAL (8,2) NOT NULL,
> >>  vehicle_color VARCHAR(10) NOT NULL,
> >>  sale_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL);
>
> >>> Question 1: What are the total sales of salesmen who have sold AT LEAST (one) Blue vehicle.
> >> SELECT C1.salesman_name, SUM(C1.vehicle_price) AS sales_tot
> >>   FROM CarSales AS C1
> >>  WHERE C1.salesman_name
> >>    IN  (SELECT C2.salesman_name
> >>    FROM CarSales AS C2
> >>  WHERE C2.vehicle_color = 'Blue') ;
>
> >>> Question 2: What are the total sales of salesmen who have sold AT LEAST (one) blue vehicle AND (one) red vehicle.
> >> SELECT C1.salesman_name, SUM(C1.vehicle_price) AS sales_tot
> >>   FROM CarSales AS C1
> >>  WHERE C1.salesman_name
> >>    IN  (SELECT C2.salesman_name
> >>    FROM CarSales AS C2
> >>  WHERE C2.vehicle_color IN ( 'Blue', 'Red')
> >>   GROUP BY C2.salesman_name
> >> HAVING COUNT(DISTINCT C2.vehicle_color) = 2);
>
> >>> Question 3: What are the total sales of salesmen who have sold AT LEAST (one) blue vehicle OR (one) red vehicle.
> >> SELECT C1.salesman_name, SUM(C1.vehicle_price) AS sales_tot
> >>   FROM CarSales AS C1
> >>  WHERE C1.salesman_name
> >>    IN  (SELECT C2.salesman_name
> >>    FROM CarSales AS C2
> >>  WHERE C2.vehicle_color IN ( 'Blue', 'Red')
> >>   GROUP BY C2.salesman_name
> >> HAVING COUNT(DISTINCT C2.vehicle_color) > 0);
>
> >>> Question 4: What are the total sales of salesmen who have NOT sold ANY blue vehicles
> >> SELECT C1.salesman_name, SUM(C1.vehicle_price) AS sales_tot
> >>   FROM CarSales AS C1
> >>  WHERE C1.salesman_name
> >>    NOT IN  (SELECT C2.salesman_name
> >>    FROM CarSales AS C2
> >>  WHERE C2.vehicle_color = 'Blue');
>
> >>> Question 5: What are the total sales of salesmen who have NOT sold NEITHER a blue vehicle NOR a red vehicle
> >> SELECT C1.salesman_name, SUM(C1.vehicle_price) AS sales_tot
> >>   FROM CarSales AS C1
> >>  WHERE C1.salesman_name
> >>    NOT IN  (SELECT C2.salesman_name
> >>    FROM CarSales AS C2
> >>  WHERE C2.vehicle_color IN ( 'Blue', 'Red')
> >>   GROUP BY C2.salesman_name
> >> HAVING COUNT(DISTINCT C2.vehicle_color) > 0);
>
> >> I did not test these queries and I used the IN() predicate rather than
> >> EXISTS() to keep them closer to relational algebra and set notation.
>
> >> However, if the specs are changed a bit to show a zero total for the
> >> disqualified salesmen, the queries could have been written with a CASE
> >> expressions:  Here is Q1:
>
> >> SELECT C1.salesman_name,
> >>        SUM( CASE WHEN C1.vehicle_color = 'Blue'
> >>                            THEN C1.vehicle_price ELSE 0.00 END)
> >>          AS blue_sales_tot
> >>   FROM CarSales AS C1
> >>  GROUP BY C1.salesman_name;
>
> >> This is very simple SQL and you could nest it inside another query to
> >> meet the original specs:
>
> >> SELECT salesman_name, blue_sales_tot
> >> FROM ( SELECT C1.salesman_name,
> >>        SUM( CASE WHEN C1.vehicle_color = 'Blue'
> >>                            THEN C1.vehicle_price ELSE 0.00 END)
> >>          AS blue_sales_tot
> >>   FROM CarSales AS C1
> >>  GROUP BY C1.salesman_name) AS B(salesman_name, blue_sales_tot)
> >> WHERE blue_sales_tot >  0;
>
> > OK I will grant you that you have made an effort to formulate a
> > possible SQL, I voluntarily skipped.  It is however unfortunate missed
> > the point behind the need to create a new operator.  Try to formulate
> > for instance the following question someone brought up....
>
> > Question 7: What are the total sales of ALL salesmen who have sold ALL
> > possible colors...The SQL can become quite cumbersome.  OTOH, using
> > the proposed operator it can simply be written as:
>
> > CARSALE/salesman WITH color /= color
>
> Not that I admire people who admire SQL (I just admire people who can
> cope with it, assuming they were forced into that position) but I
> thought the SQL queries were useful, even though I didn't try them and
> none of them involved any of the flavours of relational division I've
> seen (nor did the original examples, which also confused me given that
> division was mentioned in the original post).  Perhaps you were looking
> for a way to establish division as a generalization of simple
> restriction, I'm not sure.

> The syntax also confuses me, I'm not sure whether '/' is somehow,
> sometimes, operating on an attribute (projection?) instead of a
> relation, eg., the "color /= color" part.  
The */* expresses a division between CARSALES and the attribute while */=* is a new operator that would indicate a restriction applied to the division within each single group. Detecting potential confusions is the precise reason I posted this examples...

> Probably I wouldn't be able
> to see the point without more SQL comparisons or perhaps algebra
> expressions that included a "SUM" operator, then I/we could see where
> GROUP, DIVIDE and the aggregates come into play.

Unfortunately for Celko's, his SQL formulation of the problem (even though correct as far as result as concerned) uses an algebric *hack* (thanks to SQL commitee's) that takes any algebrical analysis interest out of the equation (I am taking about the HAVING COUNT operator). Which is precisely why I did not mention it in the first place....But for the sake of clarity...

A *sound* (sound is a risky word for SQL) equivalent to Question1 : What are the total sales of salesmen who sold at least one Blue car? would be...

select salesman, sum(price)
from CARSALE C1 inner join

                       (
                         select distinct salesman
                         from CARSALE
                         where color = 'Blue'
                        )  C2

on C1. salesman = C2.salesman
group by C1.salesman

I propose to simplify the expression of the projection of C2 over C1 in the context of C1/salesman as a single operation performed by the operator */=* ...The new formulation to get the same result than above would be expressed as

CARSALE/salesman WITH color /= Blue

Question 2: What are the total sales of salesmen who sold at least one Blue car AND one Red car would be expressed in SQL as

select salesman, sum(price)
from CARSALE C1 inner join

                       (
                        select b1.salesman from
                         (
                          select distinct salesman
                          from CARSALE
                          where color = 'Blue'
                          ) b1
                                     inner join
                                               (
                                               select distinct
salesman
                                               from CARSALE
                                              where color = 'Red'
                                               ) b2
                                     on b1.salesman = b2.salesman
                        )  C2

on C1. salesman = C2.salesman
group by C1.salesman

I propose to express the above as

CARSALE/salesman WITH color /= Blue AND color /= Red and so forth for other questions brought...

Note that the */=* operates 2 relations, Blue and Red in fact are not operated as values anymore but a single tuple subset of color attribute.

Actually the operator would be */=* to operate 2 un-ary relations *within* the context of a grouping operation. To be more explicit , if we consider relation R1 with attributes A1, B1 and B2 being single tuple subtype of B1

B1 /= B2 in the context of R1/(A WITH B1/=B2) seems an important simplification of formulation of divisions. The premise of such simplification has a broad range of pratical uses among which algebric formulation of frequent operations as well as a practical way to declare specialization by constraints. See previous examples...

> Another example to compare against division might be "total sales for
> each salesman who sold all purple cars"!  
There are many more complex queries, I did not mention, that were taken into considération before creating the operator, queries such as:

Question 7: What are the total sales of salesmen who sold ALL colors sold ?

that would simply be expressed as

CARSALE/salesman WITH color /= color

Question 8 (the one you mentionned): What are the total sales of salesmen who sold ALL purple cars is a particular case of Question1 since all salesmen who sold ALL purple cars are the same than ALL salesmen who have sold AT LEAST one purple car. Therefore the expression would also be similar to Question1

CARSALE/salesman WITH color /= Purple

>I don't yet know whether
> optimizing that is easy or hard, but I think one lateral use for
> shorthands, beyond reducing typos', is to make common optimizations
> easier to recognize (though the longhand versions are still needed to
> 'prove' the optimizations)..
I am sorry I do not understand exactly what you are getting at.

Hope this clarified... Received on Fri Jun 05 2009 - 20:25:07 CEST

Original text of this message