Re: Why using "Group By"

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 17 Mar 2003 16:28:55 -0800
Message-ID: <c0d87ec0.0303171628.509a2c73_at_posting.google.com>


>> This is a theoretical question. Why do I need to add the "Group By"
in the following SQL:  

 SELECT customer_city, COUNT(*)
   FROM Customers;  

 Isn't it clear that I want to get the number of rows per city, so why is it necessary to add: "GROUP BY customer_city" <

To keep syntax consistent. You can also write;

 SELECT COUNT(*) FROM Customers;

If you only need the aggreagates (I have some examples of that trick in SQL FOR SMARTIES and you can look at an UPDATE problem posted by a guy names Myron on the SQL Server newsgroup).

Or you can write nested code that plays with levesl of aggregation.

 SELECT split, COUNT(*)
   FROM (SELECT MOD(customer_age, 2) FROM Customers) AS X(split)   GROUP BY split;

Here is how a SELECT works in SQL ... at least in theory. Real products will optimize things when they can.

  1. Start in the FROM clause and build a working table from all of the joins, unions, intersections, and whatever other table constructors are there. The table expression> AS <correlation name> option allows you give a name to this working table which you then have to use for the rest of the containing query.
  2. Go to the WHERE clause and remove rows that do not pass criteria; that is, that do not test to TRUE (reject UNKNOWN and FALSE). The WHERE clause is applied to the working in the FROM clause.
  3. Go to the optional GROUP BY clause, make groups and reduce each group to a single row, replacing the original working table with the new grouped table. The rows of a grouped table must be group characteristics: (1) a grouping column (2) a statistic about the group (i.e. aggregate functions) (3) a function or (4) an expression made up of the those three items.
  4. Go to the optional HAVING clause and apply it against the grouped working table; if there was no GROUP BY clause, treat the entire table as one group.
  5. Go to the SELECT clause and construct the expressions in the list. This means that the scalar subqueries, function calls and expressions in the SELECT are done after all the other clauses are done. The AS operator can give a name to expressions in the SELECT list, too. These new names come into existence all at once, but after the WHERE clause, GROUP BY clause and HAVING clause has been executed; you cannot use them in the SELECT list or the WHERE clause for that reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are removed. For purposes of defining a duplicate row, NULLs are treated as matching (just like in the GROUP BY).

 f) Nested query expressions follow the usual scoping rules you would expect from a block structured language like C, Pascal, Algol, etc. Namely, the innermost queries can reference columns and tables in the queries in which they are contained. Received on Tue Mar 18 2003 - 01:28:55 CET

Original text of this message