Getting the FIRST and LAST records in a group efficiently in SQL?

From: dean <deanbrown3d_at_yahoo.com>
Date: 9 Dec 2005 07:37:44 -0800
Message-ID: <1134142664.862162.54780@f14g2000cwb.googlegroups.com>

Hello all,

I have a table TRAIN_ROUTE which lists the route locations (field LOCATION_I) in order of field ROUTE_ORDER_I, for a series of trains grouped by (TRAIN_I and P_VERSION_I).
There is a primary key on TRAIN_I, P_VERSION_I, ROUTE_ORDER_I.

I can find the first and last location for each train easily enough with these two methods:

select  TRAIN_I, P_VERSION_I, LOCATION_I, ROUTE_ORDER_I
from    TRAIN_ROUTE
where  (TRAIN_I, P_VERSION_I, ROUTE_ORDER_I) in

(

  select   TRAIN_I, P_VERSION_I, MIN(ROUTE_ORDER_I)
  from     TRAIN_ROUTE
  group by TRAIN_I, P_VERSION_I

);

8374 rows selected.

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=420 Card=1 Bytes=59) 1 0 HASH JOIN (SEMI) (Cost=420 Card=1 Bytes=59)

   2    1     TABLE ACCESS (FULL) OF 'TRAIN_ROUTE' (Cost=128 Card=6576
          7 Bytes=1315340)

   3    1     VIEW OF 'VW_NSO_1' (Cost=208 Card=65767 Bytes=2564913)
   4    3       SORT (GROUP BY) (Cost=208 Card=65767 Bytes=854971)
   5    4         INDEX (FULL SCAN) OF 'IX_TRAIN_ROUTE_ORDER_01' (UNIQ
          UE) (Cost=208 Card=65767 Bytes=854971)

select TR1.TRAIN_I, TR1.P_VERSION_I, TR1.LOCATION_I, TR1.ROUTE_ORDER_I

from TRAIN_ROUTE TR1
inner join
(

  select   TRAIN_I, P_VERSION_I, MIN(ROUTE_ORDER_I) MIN_ROUTE_ORDER_I
  from     TRAIN_ROUTE
  group by TRAIN_I, P_VERSION_I
) TR2
on  TR1.TRAIN_I       = TR2.TRAIN_I

and TR1.P_VERSION_I = TR2.P_VERSION_I
and TR1.ROUTE_ORDER_I = TR2.MIN_ROUTE_ORDER_I;

8374 rows selected.

Execution Plan

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=420 Card=65767 Bytes
          =3880253)

   1    0   HASH JOIN (Cost=420 Card=65767 Bytes=3880253)
   2    1     TABLE ACCESS (FULL) OF 'TRAIN_ROUTE' (Cost=128 Card=6576
          7 Bytes=1315340)

   3    1     VIEW (Cost=208 Card=65767 Bytes=2564913)
   4    3       SORT (GROUP BY) (Cost=208 Card=65767 Bytes=854971)
   5    4         INDEX (FULL SCAN) OF 'IX_TRAIN_ROUTE_ORDER_01' (UNIQ
          UE) (Cost=208 Card=65767 Bytes=854971)

They both trace to the same cost. However, I wonder if there is a better way than finding the MIN value of ROUTE_ORDER_I, but I am not sure. Do these methods here calculate the MIN(ROUTE_ORDER_I) value or use the index to do this?

My next question was how to use FIRST_VALUE using PARTITION BY (or some other method) to return only the origin (first) location of every train:

set autotrace on EXPLAIN;

select distinct TRAIN_I, P_VERSION_I,

FIRST_VALUE(LOCATION_I) over (partition by TRAIN_I, P_VERSION_I order by ROUTE_ORDER_I) FIRST_LOC

from train_route
set autotrace off EXPLAIN;

8374 rows selected.

Execution Plan

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=666 Card=65767 Bytes
          =1315340)

   1    0   SORT (UNIQUE) (Cost=666 Card=65767 Bytes=1315340)
   2    1     WINDOW (SORT) (Cost=666 Card=65767 Bytes=1315340)
   3    2       TABLE ACCESS (FULL) OF 'TRAIN_ROUTE' (Cost=128 Card=65
          767 Bytes=1315340)

Again this seems to be more expensive cost-wise than the others, which makes sense because of the distinct function running on a larger result set.

What I really want to do is this:

select TRAIN_I, P_VERSION_I,

FIRST_VALUE(LOCATION_I) over (partition by TRAIN_I, P_VERSION_I order by ROUTE_ORDER_I) FIRST_LOC

from train_route
group by TRAIN_I, P_VERSION_I;

but this does not compile:

FIRST_VALUE(LOCATION_I) over (partition by TRAIN_I, P_VERSION_I order by ROUTE_ORDER_I) FIRST_LOC

ERROR at line 2:
ORA-00979: not a GROUP BY expression

Thanks for any help.

Dean Received on Fri Dec 09 2005 - 09:37:44 CST