Home » SQL & PL/SQL » SQL & PL/SQL » summary count question
summary count question [message #189237] Wed, 23 August 2006 15:18 Go to next message
cumin
Messages: 82
Registered: August 2005
Member
I am trying to summarize some data with COUNT, but I am a little unsure how to go about this as I am wanting to summarize over more than one field. Each name in my table ('Ford','Chevy','VW') is associated with two values (color and length). I am trying to retrieve a summary showing, for each name, the number of unique colors (unshared by any other name), total distinct colors, unique (unshared) lengths, and total distinct lengths. For this test data:
drop table t1;
create table t1(
   name varchar2(10),
   color varchar2(10),
   length varchar2(10)
);
insert into t1 values('Ford','red','long');
insert into t1 values('Ford','blue','short');
insert into t1 values('Ford','green','medium');
insert into t1 values('Chevy','red','medium');
insert into t1 values('Chevy','yellow','huge');
insert into t1 values('VW','red','medium');
insert into t1 values('VW','green','big');

select * from t1;

NAME       COLOR      LENGTH
---------- ---------- ----------
Ford       red        long
Ford       blue       short
Ford       green      medium
Chevy      red        medium
Chevy      yellow     huge
VW         red        medium
VW         green      big

I would like to see this result:
NAME       UNQ_COLOR    TOT_COLOR   UNQ_LENGTH   TOT_LENGTH
---------- ---------- ---------   ----------   ----------
Ford          1            3         2             3
Chevy         1            2         1             1
VW            0            2         1             2

Re: summary count question [message #189240 is a reply to message #189237] Wed, 23 August 2006 15:33 Go to previous messageGo to next message
Todd Barry
Messages: 4819
Registered: August 2001
Senior Member
One way (note the tot_length = 2, medium/huge, for Chevy which is different than your expected result):

sql>select name,
  2         (select count(distinct color)
  3            from t1 b
  4           where b.name = a.name
  5             and b.color in (select color
  6                               from t1 c
  7                              group by color
  8                              having count(*) = 1)) unq_color,
  9         count(distinct color) tot_color,
 10         (select count(distinct length)
 11            from t1 b
 12           where b.name = a.name
 13             and b.length in (select length
 14                                from t1 c
 15                               group by length
 16                               having count(*) = 1)) unq_length,
 17         count(distinct length) tot_length
 18    from t1 a
 19   group by name;

NAME       UNQ_COLOR TOT_COLOR UNQ_LENGTH TOT_LENGTH
---------- --------- --------- ---------- ----------
Chevy              1         2          1          2
Ford               1         3          2          3
VW                 0         2          1          2

3 rows selected.
Re: summary count question [message #189243 is a reply to message #189240] Wed, 23 August 2006 16:00 Go to previous messageGo to next message
cumin
Messages: 82
Registered: August 2005
Member
Thank you, and thanks for spotting the error in my expected results.

In my real data, I have about 120 "names", maybe two million rows; with each "name" having a few thousand to many thousand uniques and totals. It looks like I need as many table aliases as attributes or characteristics of the names ("color" and "length" in this case). There are just two things I am counting in my real tables too.

The join across the three table aliases (assuming I have indexes on the appropriate fields) seems like it would be relatively efficient; I can't think of any other way to do it faster. My other thought was to loop over the names in PL-SQL, but I'm not sure that buys me anything, and I would have to do some review of PL-SQL.

Please comment if anything I said above is incorrect or leading me down the wrong path. I am cautious about setting the query loose before being pretty sure it is relatively efficient.

Thank you again for your help.
Re: summary count question [message #189245 is a reply to message #189243] Wed, 23 August 2006 16:09 Go to previous messageGo to next message
Todd Barry
Messages: 4819
Registered: August 2001
Senior Member
With that data size, I would recommend subquery factoring to materialize those list of color and lengths that appear only once because you don't want to build that list over and over:

with 
  single_color as
  (select color
     from t1
    group by color
   having count(*) = 1),
  single_length as
  (select length
     from t1
    group by length
    having count(*) = 1) 
select name,
       (select count(distinct color)
          from t1 b
         where b.name = a.name
           and b.color in (select * from single_color)) unq_color,
       count(distinct color) tot_color,
       (select count(distinct length)
          from t1 b
         where b.name = a.name
           and b.length in (select * from single_length)) unq_length,
       count(distinct length) tot_length
  from t1 a
 group by name;
Re: summary count question [message #189463 is a reply to message #189237] Thu, 24 August 2006 13:00 Go to previous message
cumin
Messages: 82
Registered: August 2005
Member
Thank you; I think that will work.

edit: Thanks a lot.

[Updated on: Thu, 24 August 2006 13:14]

Report message to a moderator

Previous Topic: Question on checking for duplicate values in a PL/SQL procedure
Next Topic: Call a Procedure from a query
Goto Forum:
  


Current Time: Sat Dec 10 08:52:09 CST 2016

Total time taken to generate the page: 0.07944 seconds