Re: Database introduction

From: Nicola <nvitacolonna_at_gmail.com>
Date: Wed, 29 Jan 2014 10:50:08 +0100
Message-ID: <nvitacolonna-FEAE15.10500729012014_at_freenews.netfront.net>


In article <20140128202637.5227c00b.jklowden_at_speakeasy.net>,  "James K. Lowden" <jklowden_at_speakeasy.net> wrote:

> On Mon, 27 Jan 2014 13:56:04 +0100
> Max <maxfalsa_at_despammed.com> wrote:
>
> > Unfortunately I can't grasp division in a relational database..
>
> Division is harder than multiplication. We learn it later in school,
> and it takes more cycles for a CPU. It's the basis for cryptography,
> and after 30 years SQL DBMSs still can't do it.
>
> I find the easiest way to think about it is as what it is: the inverse
> of multiplication. Given any product
>
> insert into C select * from A cross join B
>
> division produces
>
> A = C ÷ B
> and
> B = C ÷ A
>
> It doesn't come up a lot in my experience but, like any tool, when you
> need it you really need it.
>
> The examples I remember involve whether a canonical set has been met.
> Say you have a type of task that consists of steps A, B, and C.
>
> insert into task_types
> values ( ( 1, 'A' ), (2, 'B'), (3, 'C') )
>
> As each step is done for each actual task, you add a row for the task
> and the step
>
> insert into status values (task_id, 'A')
>
> Then eventually you want a list of completed tasks. That is, you want
>
> task_id = status ÷ task_types.step
>
> SQL version left as exercise for the reader!

One possibility is to apply the algebraic definition of division in a straightforward way. Division can be defined (under suitable assumptions on the form of R and S) as:

R ÷ S = proj_K(R) \ proj_K( (proj_K(R) x S) \ R) )

where proj_K is the projection of its argument on K, and K = attr(R) \ attr(S) (the backslash represents set difference).

Hence, task_id = status ÷ task_types.step can be translated as follows:

with all_pairs_not_occurring_in_status(task_id, task_name) as (   select task_id, step from status, task_types   except
  select task_id, task_name from status )
select task_id from status
except
select task_id from all_pairs_not_occurring_in_status;

But, since division is essentially the algebraic counterpart of a universal quantification, and since SQL is supposed to be akin to a logical language, in principle the proper way to deal with the problem would be to formulate the query in a more logical fashion. You want to get the statuses (identified by their task_id) whose steps have all been completed. That is, you want to get all the statuses S such that there is no task type that does not appear in S. This last sentence can be translated in SQL more or less directly:

select distinct S1.task_id from status S1 where not exists (
  select * from task_types T
  where not exists (
    select * from status S2
    where S2.task_id = S1.task_id
      and S2.task_name = T.step
  )
);

I suspect, though, that in this example many people would rather write something along these lines:

select task_id from status
group by task_id
having count(*) = (select count(*) from task_types);

Division represents a case where the arguments about the advantages of “declarative” SQL vs “operational” relational algebra can be easily challenged.

Received on Wed Jan 29 2014 - 10:50:08 CET

Original text of this message