Re: Why is "group by" obligatory in SQL?

From: Cimode <cimode_at_hotmail.com>
Date: Fri, 24 Jul 2009 08:47:09 -0700 (PDT)
Message-ID: <236f2cad-a94d-417e-abfa-f938890826a1_at_e27g2000yqm.googlegroups.com>


On 24 juil, 17:00, paul c <toledobythe..._at_oohay.ac> wrote:
> Cimode wrote:
> > Snipped.
> >> Who can know how the original designers arrived at their decision?
> > Probably because they believed in the myth of natural programming
> > languages.  SQL is no exception.
>
> > Dijskra clearly described the fundamental problem below...
>
> > The virtue of formal texts is that their manipulations, in order to be
> > legitimate, need to satisfy only a few simple rules; they are, when
> > you come to think of it, an amazingly effective tool for ruling out
> > all sorts of nonsense that, when we use our native tongues, are almost
> > impossible to avoid.
>
> > (http://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667.html)
>
> >> There were certainly other languages from the early SQL era (such as
> >> QUEL) that would allow a similar formulation to what you propose,
> >> meaning what you intend.  (And indeed QUEL would allow other sensible
> >> things with aggregates that SQL still doesn't easily allow.)
>
> >> Personally I imagine the logic of SQL aggregation got defined in haste,
> >> was "good enough", and they just moved swiftly on to the next challenge.
> >> A small team, internal IBM rivalries, publication demands, etc. finished
> >> the job.
> > When looking at the grammar of the initial BS12, it seems obvious that
> > we slowly shifted from a declarative oriented language to a verbose
> > driven language.
>
> > In BS12 the expresssion of aggregation was by the assignment of
> > variables seprating the aggregates from the operation such as
>
> > T1 = SUMMARY(EMP, GROUP(DEPTNUM), EMPS=COUNT, SALSUM=SUM(SALARY))
> > T2 = JOIN(T1, DEPT)
> > T3 = SELECT(T2, SALSUM > BUDGET)
>
> > comparing with SQL
>
> > SELECT d.Deptnum, Count(*) as Emps,
> >    Sum(e.Salary) as Salsum, Budget
> >  FROM Emp as e
> >  JOIN Dept as d ON e.Deptnum = d.Deptnum
> >  GROUP BY d.Deptnum, Budget
> >  HAVING Sum(e.Salary) > Budget
>
> > It is reasonnable to assume that the BS12 approach consisting of
> > differentiating operations and presentation, while more abrupt, forced
> > the programmer to understand better the underlying concepts.  In other
> > words, the introduction of *aesthetics* is one of the factors that
> > killed SQL's chances to ever become a relational operation language.
>
> Great post, Cimode, Djikstra as profound as ever, this stuff bears
> repeating every so often.  Too bad so little remains of the history of
> db language development from the 1970's, I think there are still lessons
> to be learned from the scanty archives.  The motivations of those days
> must be murky to anybody who didn't grow up with assembler languages, in
> fact I'd say they remaiin murky for most of the people who were there.
> Many assembler people I knew seemed to have a constant question in the
> back of their minds, but one they never verbalized - "what is the
> interface?".  I think they probably never even conceived it, it was just
> an intermittent doubt in their subconcious mind.   I don't think I heard
> anybody ask that question until the 1980's.  The programming majority in
> the mainframe world then used Cobol and I'd say the language trend was
> mostly trying to avoid background details because those were somehow
> associated with the obscurity of machine language techniques (btw, the
> term 'exit strategy' was in fairly common use by the system programming
> world then, long before Pentagon historians dissected the military
> problems of Vietnam.)  In the commercial world of that time there was a
> lot of emphasis on so-called 'structured programming techniques', which
> we know today is just a form of language mysticism, at least insofar as
> it disguises the real programmer's interface.  Long before I even knew
> that SQL assumed duplicate rows, I could never get why 'project' was
> called 'select'.
>
> The BS12 example emphasizes to me that the basic programming interface
> is the relation and it's not hard to imagine it leading to a functional
> language.  I have only ever seen one database book that mentioned BS12
> and that was a fairly brief section.
BS12 was developped after IBM started having second thoughts about the massacre of System R massacre and *after* Oracle was on the way to build an empire. The context explained below

http://www.mcjones.org/System_R/bs12.html

gives very interesting clues about the context of that second attempt...

> Another quote from the McJones site is from Don Chamberlin:
>
> "I had a conversion experience, and I still
> remember this. Ted Codd came to visit Yorktown, I think it
> might have been at this symposium that Irv alluded to. He
> gave a seminar and a lot of us went to listen to him. This
> was as I say a revelation for me because Codd had a bunch
> of queries that were fairly complicated queries and since I’d
> been studying CODASYL, I could imagine how those
> queries would have been represented in CODASYL by
> programs that were five pages long that would navigate
> through this labyrinth of pointers and stuff. Codd would sort
> of write them down as one-liners. These would be queries
> like, “Find the employees who earn more than their
> managers.” [laughter] He just whacked them out and you
> could sort of read them, and they weren’t complicated at all,
> and I said, “Wow.” This was kind of a conversion
> experience for me, that I understood what the relational
> thing was about after that."
The relational model implementation failed before it even started in the seventies. There is no chance that a lack of specialized computing model could have led to any other conclusion.

The lower level of abstraction of a computing model is the only way a programmer, no matter how capable, could make something out of relational model. But if much effort was spent on higher abstraction, relatively very little was spent defining a relational framework for implementation and I am not simply talking about what logical operators are to be implemented to build a complete Turing machine, but also about the fundamental limitations of a binary system to represent a relation. Until this work is done and complete, I am afraid a TRDBMS will remain the Holy Graal of computing science. Received on Fri Jul 24 2009 - 17:47:09 CEST

Original text of this message