Re: Database introduction

From: Erwin <e.smout_at_myonline.be>
Date: Wed, 29 Jan 2014 11:28:59 -0800 (PST)
Message-ID: <70e0b46c-643d-42a0-9da6-6d08126a435f_at_googlegroups.com>


Op woensdag 29 januari 2014 10:50:08 UTC+1 schreef Nicola:
> In article ,
>
> "James K. Lowden" wrote:
>
>
>
> > On Mon, 27 Jan 2014 13:56:04 +0100
>
> > Max wrote:
>
> >
>
> > > Unfortunately I can't grasp division in a relational database..
>
> >
>
> > Division is harder than multiplication. We learn it later in school,
>
> > and it takes more cycles for a CPU. It's the basis for cryptography,
>
> > and after 30 years SQL DBMSs still can't do it.
>
> >
>
> > I find the easiest way to think about it is as what it is: the inverse
>
> > of multiplication. Given any product
>
> >
>
> > insert into C select * from A cross join B
>
> >
>
> > division produces
>
> >
>
> > A = C ÷ B
>
> > and
>
> > B = C ÷ A
>
> >
>
> > It doesn't come up a lot in my experience but, like any tool, when you
>
> > need it you really need it.
>
> >
>
> > The examples I remember involve whether a canonical set has been met.
>
> > Say you have a type of task that consists of steps A, B, and C.
>
> >
>
> > insert into task_types
>
> > values ( ( 1, 'A' ), (2, 'B'), (3, 'C') )
>
> >
>
> > As each step is done for each actual task, you add a row for the task
>
> > and the step
>
> >
>
> > insert into status values (task_id, 'A')
>
> >
>
> > Then eventually you want a list of completed tasks. That is, you want
>
> >
>
> > task_id = status ÷ task_types.step
>
> >
>
> > SQL version left as exercise for the reader!
>
>
>
> One possibility is to apply the algebraic definition of division in a
>
> straightforward way. Division can be defined (under suitable assumptions on the
>
> form of R and S) as:
>
>
>
> R ÷ S = proj_K(R) \ proj_K( (proj_K(R) x S) \ R) )
>
>
>
> where proj_K is the projection of its argument on K, and K = attr(R) \ attr(S)
>
> (the backslash represents set difference).
>
>
>
> Hence, task_id = status ÷ task_types.step can be translated as follows:
>
>
>
> with all_pairs_not_occurring_in_status(task_id, task_name) as (
>
> select task_id, step from status, task_types
>
> except
>
> select task_id, task_name from status
>
> )
>
> select task_id from status
>
> except
>
> select task_id from all_pairs_not_occurring_in_status;
>
>
>
> But, since division is essentially the algebraic counterpart of a universal
>
> quantification, and since SQL is supposed to be akin to a logical language, in
>
> principle the proper way to deal with the problem would be to formulate the
>
> query in a more logical fashion. You want to get the statuses (identified by
>
> their task_id) whose steps have all been completed. That is, you want to get all
>
> the statuses S such that there is no task type that does not appear in S. This
>
> last sentence can be translated in SQL more or less directly:
>
>
>
> select distinct S1.task_id from status S1
>
> where not exists (
>
> select * from task_types T
>
> where not exists (
>
> select * from status S2
>
> where S2.task_id = S1.task_id
>
> and S2.task_name = T.step
>
> )
>
> );
>
>
>
> I suspect, though, that in this example many people would rather write something
>
> along these lines:
>
>
>
> select task_id from status
>
> group by task_id
>
> having count(*) = (select count(*) from task_types);
>
>
>
> Division represents a case where the arguments about the advantages of
>
> "declarative" SQL vs "operational" relational algebra can be easily challenged.
>
>
>
> --- news://freenews.netfront.net/ ---

I'm sorry, Nicola, but do you really call that "straightforward" ? Pls allow me to not agree.

"Give me all the customers for which there does not exist any insurance policy type such that that customer still does not have an insurance policy of that type."

That is __anything but__ straightforward and that's putting it mildly.

Yet it is precisely the subject of the OP's question.

"Give me all the customers that already have all possible types of insurance policy".

That's a simpler version and on face value it looks like the very same thing, but this formulation hides a very nasty problem : the problem of whether we're supposed to rely on set equality or set superset/subset to check this "already has all possible types" condition. Existing customers can have an old policy of a type that is no longer commercialized and therefore is no longer part of the set of "possible types". Making his set of insurance policies a superset of the set of "[currently] possible types". And making your results incorrect if you apply set equality to test the condition.

I concur with James. There is a reason for us learning it later, there is a reason for CPU's needing more cycles to compute it, and there is a reason why after 30 years SQL still doesn't do it [in shorthand]. There is even a reason why Darwen once introduced the 4-operand version of his "great divide". Received on Wed Jan 29 2014 - 20:28:59 CET

Original text of this message