Re: Why doesn't this work?

From: Thomas Kyte <tkyte_at_us.oracle.com>
Date: Sun, 13 Dec 1998 21:27:13 GMT
Message-ID: <368529ed.22408181@192.86.155.100>

A copy of this was sent to "KeyStroke (Jack L. Swayze Sr.)" <KeyStrk_at_Feist.Com> (if that email address didn't require changing) On Sun, 13 Dec 1998 14:37:30 -0600, you wrote:

My last posting in this particular thread... but anyway here we go.

[snip]

>> No you cannot make group by act like order by (what does a primary key have to
>> do with anything anyway? If I don't put the primary key into the view?). You
>> gave a general purpose solution that purported that group by orders, so all you
>> need to do is group by. Your example used derived columns and one column from
>> the table (primary key not included).
>
>If I always put the primary key in the group by then I will not 'loose' rows.
>See related posts under subject 'calling all sql gurus'. This line of posting
>was a follow-on to that one.
>

but you will loose ORDER in some cases (and the generalization you made to the poster in this particular thread was "group by = order by". You said

<quote>
2) you can accomplish the same thing as an ORDER BY by using GROUP BY.
</quote>

and did not qualify it at all (not to mention that it is a provably false statement). I pointed out 2 issues with that statement (don't really care that you did qualify the rows issue in another thread -- it had nothing to do with this thread and a primary key did not apply in this case).

group by will lose rows in some cases -- better watch out.
group by A, B does not mean the result set will be sorted by A, B. NOT AT ALL. Not in real life, not reliably.

The real problem is that an OPTIMIZER mode switch (DBA decides to start using CBO instead of RBO or vice versa OR the optimizer, cause the size and shape of the tables have changed over time -- changes the PLAN), additional indexes, partitions, parallel query, and so on will change the sort order on you.

You see the basic problem with using group by to sort is that there are 2 plans available to the optimizer when grouping.

GROUP BY
GROUP BY NOSORT the group nosort will kill you when using group by to sort... Here are 2 simple examples that show this (there are others, many many others. they are easy to come up with and happen in 'real life'. the BIGGEST problem is that they may not manifest themselves on day 1 -- they manifest themselves later, after you have moved on perhaps or after you have forgotten that the group by was doing an implicit sort (or so you thought) and now its not and that is why your software is broken)

example 1 -------------------------------------- SQL> create table the_table ( pk1 int, 2 pk2 int, 3 nk1 int not null, 4 nk2 int not null, 5 primary key(pk1,pk2) ) 6 / Table created.

SQL> create index the_table_idx on the_table(pk1,nk1,pk2,nk2) 2 /
Index created.

SQL> insert into the_table values (1, 1, 2, 2 ); 1 row created.

SQL> insert into the_table values (2, 1, 1, 1 ); 1 row created.

SQL> analyze table the_table compute statistics; Table analyzed.

SQL> set autotrace on explain

SQL> select nk1, nk2, pk1, pk2, count(*) a_count   2 from the_table
  3 group by nk1, nk2, pk1, pk2
  4 ;

NK1 NK2 PK1 PK2 A_COUNT ---------- ---------- ---------- ---------- ----------

         2          2          1          1          1
         1          1          2          1          1

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=2 Bytes=30) 1 0 SORT (GROUP BY NOSORT) (Cost=1 Card=2 Bytes=30)

   2    1     INDEX (FULL SCAN) OF 'THE_TABLE_IDX' (NON-UNIQUE) (Cost=
          1 Card=2 Bytes=30)

end result -- data not sorted by NK1, NK2.
- example 2 ------------------------------------ SQL> create table the_table ( pk1 int, 2 pk2 int, 3 nk1 int, 4 nk2 int, 5 primary key(pk1,pk2) ) 6 organization index 7 /

Table created.

SQL>
SQL> insert into the_table values (1, 1, 2, 2 );

1 row created.

SQL> insert into the_table values (2, 1, 1, 1 );

1 row created.

SQL> 
SQL> set autotrace on explain
SQL> 
SQL> select nk1, nk2, pk1, pk2, count(*) a_count
  2    from     the_table

3 group by nk1, nk2, pk1, pk2
4 ;

NK1 NK2 PK1 PK2 A_COUNT ---------- ---------- ---------- ---------- ----------

         2          2          1          1          1
         1          1          2          1          1

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=28 Card=1 Bytes=104) 1 0 SORT (GROUP BY NOSORT) (Cost=28 Card=1 Bytes=104)

   2    1     INDEX (FULL SCAN) OF 'SYS_IOT_TOP_80023' (UNIQUE) (Cost=
          26 Card=1 Bytes=104)

end result -- data is not sorted by NK1, NK2

>
>> Its 100% bad advice to give as the performance implications are terrible as
>> well.
>
>I still haven't seen your solution to this persons problem. You act like the

I told him

you cannot order by in a view.
you must order by in a select statement.

QED. there are no other reliable, performant solutions. That is the solution -- use ORDER BY

>doctor when his patient comes to him and says 'Doc, it hurts when I do this' and
>the Doc says 'well, don't do that'. This is what I mean by arrogance. Maybe my

you've just sprained your wrist.

You: doc my wrist hurts when I take the bandage off and move it Doc: don't do that.

wow, what an arrogant doctor :)

>solution wont work all the time. I will have to experiment with your example

and the real problem is that it MIGHT work today. It might stop working TOMORROW. it makes use of the ASSUMPTION that group by SORTS. That ASSUMPTION is incorrect.

It is dangerous advice. When I see it in action (this particular 'trick') -- I get people to take it out of the code. There is no way to be certain that group by will order by. (and usually, upon seeing an example or having it explained to them they say "wow -- didn't know that, we gotta fix that".)

>problem when I can get to work and try it out. However, I didn't leave him in a
>lurch and not reply. Nor did I (as others have done) say that it cant be done,
>therefore it shouldn't be done.
>

you are not getting it. Sometimes it will work and then all of a sudden, something OUTSIDE of your control will make it cease to work. Watch this example:

SQL> create table the_table ( pk1 int, pk2 int, nk1 int, nk2 int, primary key(pk1,pk2) );
Table created.

SQL> create index the_table_idx on the_table(pk1,nk1,pk2,nk2); Index created.

SQL> insert into the_table values (1, 1, 2, 2 ); 1 row created.
SQL> insert into the_table values (2, 1, 1, 1 ); 1 row created.

SQL> set autotrace on explain

SQL> select nk1, nk2, pk1, pk2, count(*) a_count_1   2 from the_table
  3 group by nk1, nk2, pk1, pk2
  4 ;

NK1 NK2 PK1 PK2 A_COUNT_1 ---------- ---------- ---------- ---------- ----------

         1          1          2          1          1
         2          2          1          1          1

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE 1 0 SORT (GROUP BY)
2 1 TABLE ACCESS (FULL) OF 'THE_TABLE' Well so far so good. It all works. Lets go home, program finished. But wait -- the DBA decides to run the update statistics stuff or the optimizer changes its access plan based on the size and shape of the data:

SQL> analyze table the_table compute statistics; Table analyzed.

SQL> select nk1, nk2, pk1, pk2, count(*) a_count_2   2 from the_table
  3 group by nk1, nk2, pk1, pk2
  4 ;

NK1 NK2 PK1 PK2 A_COUNT_2 ---------- ---------- ---------- ---------- ----------

         2          2          1          1          1
         1          1          2          1          1

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=2 Bytes=30) 1 0 SORT (GROUP BY NOSORT) (Cost=1 Card=2 Bytes=30)

   2    1     INDEX (FULL SCAN) OF 'THE_TABLE_IDX' (NON-UNIQUE) (Cost=
          1 Card=2 Bytes=30)

Bummer, our application breaks. Can't be the group by -- it sorts (but no, it doesn't).

>I did try out my solution to his original post 'calling all SQL gurus' on my
>machine at work, and it did work just as I said.
>

and under different circumstances it *MIGHT* not. That, in a nutshell is why I am following up on this. It appears to work alot of times -- it will suddenly STOP working that way perhaps (maybe, maybe not -- maybe the world will go to $#@!$ in the year 2000, maybe not). Fortunately, right now, we have a chance to make sure it DOESN'T go wrong and that is by using an ORDER BY.

>
>> When I see something wrong -- I follow up. You gave an answer (as a followup to
>> my answer) that was plain BAD ADVICE. Sorry you disagree but this is what the
>> groups are about.
>
>When PMG first posted his problem (under the subject of 'Calling all SQL gurus'
>I notice that you didn't follow your supposedly cherished principle of trying to
>help, instead of saying it cant be done.

1.) I make no promise to answer any questions, let along ALL questions.

2.) What does 'calling all sql gurus' have to do with this? That was asking a question along the lines of

<quote>
I am having a little trouble coming up with a nice little SQL statement that does the following:group on a column, come up with a count of each occurence of the group, sort descending on the count,
calculate the running percentage of the total count for each group, and return the top N groups.
ex: suppose I have a table of EMPLOYEES, and I would like to select the first names which comprise the top 80% of all first names in the table. Is this possible to do this as a single query? </quote>

While this is possible as a single query (I've done them before) it is certainly not efficient, I would not recommend it (a cursor being a much better way), and I had better things to do then try to come up with the interesting query from $#$#@ to answer it.

3.) I didn't make any posting in "calling all sql gurus" postive or negative.

Thomas Kyte
tkyte_at_us.oracle.com
Oracle Service Industries
Reston, VA USA

--
http://govt.us.oracle.com/ -- downloadable utilities

Opinions are mine and do not necessarily reflect those of Oracle Corporation

Anti-Anti Spam Msg: if you want an answer emailed to you, you have to make it easy to get email to you. Any bounced email will be treated the same way i treat SPAM-- I delete it. Received on Sun Dec 13 1998 - 15:27:13 CST