Re: Difficulty to select distinct rows from view.

From: <fitzjarrell_at_cox.net>
Date: 28 Jun 2005 09:25:15 -0700
Message-ID: <1119975915.032985.223370@o13g2000cwo.googlegroups.com>

Dmitry E. Loginov wrote:
> <andkovacs_at_yahoo.com> wrote in message
> news:1118163086.681162.23640_at_g44g2000cwa.googlegroups.com...
> > I have a view base on three columns( ID, historical_date, value)
> > The view is built like this:
> > select * from hourly_data
> > Union
> > select * from daily_data
> > Union
> > select * from weekly_data;
> >
> > I do the following select on this view:
> >
> > select ID, historic_date, value from hist_data;
> >
> > And it returns
> >
> > 10, 2005/05/04, 10.56
> > 10, 2005/05/04, 10.53
> >
> > I only want the first row occur in the result set. Is there a way to do
> > this ?
> > How can I use DISTINCT option on columns (ID and historical_date ) ?
> >
> You should not use DISTINCT, just use GROUP BY with appropriate aggregate
> function.
> Something like
> select ID, historical_date, max(value) from (
> select * from hourly_data
> Union All
> select * from daily_data
> Union All
> select * from weekly_data;
> )
> group by ID, historical_date
>
> Dmitry.

And, presuming historical_date is a DATE field, it also includes the time, which is likely different for records from the same day of the month. As such, the unqualified use of GROUP BY won't work, either, as the date values are not the same (due to the time differences) and again two records will be displayed:

SQL> create table emptest as select * From emp where 0 = 1;

Table created.

SQL> alter table emptest add process_dt date;

Table altered.

SQL> declare

  2     cursor get_emp_data is
  3     select empno, ename, job, mgr, hiredate, sal, comm, deptno,
ttl_sal
  4     from emp;
  5  begin
  6     for erec in get_emp_data loop
  7             insert into emptest
  8             (empno, ename, job, mgr, hiredate, sal, comm, deptno,
ttl_sal, process_dt)
  9             values
 10             (erec.empno, erec.ename, erec.job, erec.mgr,

erec.hiredate, erec.sal, erec.comm, erec.deptno, erec.ttl_sal, sysdate);
11 dbms_lock.sleep(1); -- Ensure a time difference between records
12 end loop;
13 end;
14 /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> select sal, process_dt
2 from emptest
3 group by sal, process_dt;

SAL PROCESS_D

---------- ---------
       800 28-JUN-05
      1100 28-JUN-05
      1200 28-JUN-05
      1300 28-JUN-05
      1500 28-JUN-05
      1500 28-JUN-05
      1600 28-JUN-05
      3000 28-JUN-05
      3000 28-JUN-05
      5000 28-JUN-05
       950 28-JUN-05

       SAL PROCESS_D
---------- ---------
      1250 28-JUN-05
      2450 28-JUN-05
      2850 28-JUN-05
      2975 28-JUN-05

15 rows selected.

SQL> Notice how the dates appear to be the same, yet all 15 records from the emptest table are returned, even though there are 'duplicate' records displayed. Changing the date format and running the query again reveals the reason:

SQL> alter session set nls_date_format = 'DD-MON-YY HH24:MI:SS';

Session altered.

SQL> / SAL PROCESS_DT

---------- ------------------
       800 28-JUN-05 14:51:43
      1100 28-JUN-05 14:51:53
      1200 28-JUN-05 14:51:58
      1300 28-JUN-05 14:51:57
      1500 28-JUN-05 14:51:47
      1500 28-JUN-05 14:51:52
      1600 28-JUN-05 14:51:44
      3000 28-JUN-05 14:51:50
      3000 28-JUN-05 14:51:56
      5000 28-JUN-05 14:51:51
       950 28-JUN-05 14:51:55

       SAL PROCESS_DT
---------- ------------------
      1250 28-JUN-05 14:51:45
      2450 28-JUN-05 14:51:49
      2850 28-JUN-05 14:51:48
      2975 28-JUN-05 14:51:46

15 rows selected.

SQL> There should be 13 rows returned based on apparent duplicates in the output. Using trunc() on the date column returns the desired results:

SQL> select sal, trunc(process_dt) process_dt   2 from emptest
  3 group by sal, trunc(process_dt)
  4 /

SAL PROCESS_DT

---------- ------------------
       800 28-JUN-05 00:00:00
      1100 28-JUN-05 00:00:00
      1200 28-JUN-05 00:00:00
      1300 28-JUN-05 00:00:00
      1500 28-JUN-05 00:00:00
      1600 28-JUN-05 00:00:00
      3000 28-JUN-05 00:00:00
      5000 28-JUN-05 00:00:00
       950 28-JUN-05 00:00:00
      1250 28-JUN-05 00:00:00
      2450 28-JUN-05 00:00:00

       SAL PROCESS_DT
---------- ------------------
      2850 28-JUN-05 00:00:00
      2975 28-JUN-05 00:00:00

13 rows selected.

SQL> Note the records now have a truncated time component, and, as such, eliminate the duplicate entries for the 1500 and 3000 sal values. Now a group by with an aggregate will return the expected results:

SQL> select trunc(process_dt) process_dt, max(sal) 2 from emptest
3 group by trunc(process_dt);

PROCESS_DT MAX(SAL)

------------------ ----------
28-JUN-05 00:00:00       5000

SQL> Possibly this will help the OP.

David Fitzjarrell Received on Tue Jun 28 2005 - 11:25:15 CDT