Home » SQL & PL/SQL » SQL & PL/SQL » Top-N query
Top-N query [message #299631] Tue, 12 February 2008 15:20 Go to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
I have the table id_by_zip:
SQL> desc id_by_zip
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 ZIPCODE                                            VARCHAR2(50)
 COUNT                                              NUMBER

Below is a sample set:
SQL> select * from id_by_zip where rownum < 10;

        ID ZIPCODE                                                 COUNT
---------- -------------------------------------------------- ----------
         1 00001                                                      27
         1 00002                                                       4
         1 00004                                                       1
         1 00005                                                       3
         1 00007                                                       3
         1 00009                                                       3
         1 00010                                                       3
         1 00011                                                       3
         1 00012                                                       4


There are multiple id's from 1 to 400. The count column displays how many times that id was used by each zip code. I'd like to find the max(count) of each id by zip code.
I can execute the below query to find the id and it's corresponding max(count):
select id, max(count) from id_by_zip
group by id
order by 2;

....
        ID MAX(COUNT)
---------- ----------
      1604       1172
      1033       1261
       520       1369
      1555       1689
       855       1701
      1453       2301
       179       2972
      1281       3159
       186       3215
      1402       3500
         1       4440

        ID MAX(COUNT)
---------- ----------
      1275       4987
      1349       6959
      1262      13240
       906      22246

400 rows selected.


However, I can't seem to get the appropriate results. Below is a sample of what the result set should look like:
        ID ZIPCODE                                                 COUNT
---------- -------------------------------------------------- ----------
       906 12345                                                   22246
      1262 12345                                                   13240
      1414 90210                                                     149


Up to 400 rows. How can I query the top zip codes by id?
Thanks.
icon2.gif  Re: Top-N query [message #299637 is a reply to message #299631] Tue, 12 February 2008 16:06 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
This code gives you the top 10 by id and zip code, descending.

select *
from
(
  select id, zip_code, max(count) 
  from id_by_zip
  group by id, zip_code
  order by 3 desc
)
where rownum <= 10
Re: Top-N query [message #299639 is a reply to message #299631] Tue, 12 February 2008 16:13 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
Unfortunately, I attempted this format and it did not produce the desired result. As you can see, the id of 1 is repeated. I only want a single row for each id that has the highest count.

SQL> select *
from
(
  select id, zipcode, max(count) 
  from id_by_zip
  group by id, zipcode
  order by 3 desc
)
where rownum <= 10;  2    3    4    5    6    7    8    9  

        ID ZIPCODE                                            MAX(COUNT)
---------- -------------------------------------------------- ----------
       906 12345                                                   22246
      1262 12345                                                   13240
      1349 12345                                                    6959
      1275 12345                                                    4987
      1262 90210                                                    4713
         1 12345                                                    4440
         1 90210                                                    4343
      1402 12345                                                    3500
       186 12345                                                    3215
      1281 12345                                                    3159

10 rows selected.
Re: Top-N query [message #299647 is a reply to message #299631] Tue, 12 February 2008 18:20 Go to previous messageGo to next message
moshea
Messages: 51
Registered: February 2008
Location: Dublin, Ireland
Member
I think IN might be your friend here.

I'm not sure the examples there will help you make the entire leap, so consider this snippet as well ...

SELECT a,
  b,
  c
FROM atable
WHERE(a,   b) IN
  (SELECT x,
     y
   FROM anothertable)


I'm rubbish at this hints game Sad

Michael
Re: Top-N query [message #299728 is a reply to message #299639] Wed, 13 February 2008 00:52 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
As you can see, the id of 1 is repeated. I only want a single row for each id that has the highest count.

So enclose the inner query inside a query that take the max of max(count) by id.

Regards
Michel
Re: Top-N query [message #299904 is a reply to message #299631] Wed, 13 February 2008 10:30 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
I don't understand your suggestion. Below is how I interpreted it:

SQL> select *
from
(select id, zipcode, max(count)
from
(
  select id, zipcode, max(count) 
  from id_by_zip
  group by id, zipcode
  order by 3 desc
))
where rownum <= 10;  
(select id, zipcode, max(count)
                         *
ERROR at line 3:
ORA-00904: "COUNT": invalid identifier

I'm not sure how I can create a query that takes only the max(count) by id.
Re: Top-N query [message #299907 is a reply to message #299904] Wed, 13 February 2008 10:38 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
I gave you the way, I didn't give you the syntax.

There are 2 levels of aggregation, first by id and zipcode then by id alone.

Regards
Michel

[Updated on: Wed, 13 February 2008 10:39]

Report message to a moderator

Re: Top-N query [message #299938 is a reply to message #299907] Wed, 13 February 2008 15:41 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
How do I achieve these 2 levels of aggregation? I've tried a variety of in-clauses, unions, cubes & rollups all to no avail. Is there an example out that I can review or further documentation that I may peruse?
This really has me spinning in circles.


Thanks for your help.
Re: Top-N query [message #300018 is a reply to message #299938] Thu, 14 February 2008 00:52 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
The first level is an inner query, a row source, for the second level.

Regards
Michel

[Updated on: Thu, 14 February 2008 00:52]

Report message to a moderator

Re: Top-N query [message #300044 is a reply to message #299631] Thu, 14 February 2008 01:38 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
select * 
from
(
  select id, zip_code, knt, count(1) as id_knt
  from 
  (
    select id, zip_code, max(count) as knt
    from id_by_zip
    group by id, zip_code
  )
  group by is, zip_code, knt
  order by 3 desc
)
where rownum <= 10;
Re: Top-N query [message #300183 is a reply to message #300044] Thu, 14 February 2008 08:58 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
Unfortunately, this query still produces the same result. I believe the issue is with the inner query in that it does not produce single rows for each id. Rather, it produces the max(count) for each id & zipcode. I need the zipcode and max(count) for each id. I really appreciate everyone's help as I work through this.

SQL> select * 
from
(
  select id, zipcode, knt, count(1) as id_knt
  from 
  (
    select id, zipcode, max(count) as knt
    from id_by_zip
    group by id, zipcode
  )
  group by id, zipcode, knt
  order by 3 desc
)
where rownum <= 10;

        ID ZIPCODE                                                   KNT     ID_KNT
---------- -------------------------------------------------- ---------- ----------
       906 12345                                                   22246          1
      1262 12345                                                   13240          1
      1349 12345                                                    6959          1
      1275 12345                                                    4987          1
      1262 90210                                                    4713          1
         1 12345                                                    4440          1
         1 90210                                                    4343          1
      1402 12345                                                    3500          1
       186 12345                                                    3215          1
      1281 12345                                                    3159          1

10 rows selected.
Re: Top-N query [message #300185 is a reply to message #300183] Thu, 14 February 2008 09:02 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
What is the meaning of this "count(1)"?
Don't try to use other queries, think by yourself.
You are closed.

See your 2 "group by" clauses, what did I say about them?

Regards
Michel

Re: Top-N query [message #300200 is a reply to message #300185] Thu, 14 February 2008 10:29 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
Well, I was responding to coeling to demonstrate that unfortunately his suggestion did not produce the desired results. Since he was making syntax suggestions, I assumed he'd like to know the results for future reference.

Here are your quotes:
"So enclose the inner query inside a query that take the max of max(count) by id."

"I gave you the way, I didn't give you the syntax.
There are 2 levels of aggregation, first by id and zip code then by id alone."

"The first level is an inner query, a row source, for the second level."


It appears I do not understand your hints. Below is an attempt as to what your recommendations are:
select * 
from
(
  select max(id) id, zipcode, count
  from 
  (
    select id, zipcode, max(count) count
    from id_by_zip
    group by id, zipcode
  )
  group by zipcode, count
  order by 3 desc
)
where rownum <= 10;


        ID ZIPCODE                                                 COUNT
---------- -------------------------------------------------- ----------
       906 12345                                                   22246
      1262 12345                                                   13240
      1349 12345                                                    6959
      1275 12345                                                    4987
      1262 90210                                                    4713
         1 12345                                                    4440
         1 90210                                                    4343
      1402 12345                                                    3500
       186 12345                                                    3215
      1281 12345                                                    3159

10 rows selected.


I created an outer query that takes the max(id) from the inner query. Maybe I'm getting stuck on your suggestion of "max of max(count) by id"?
According to your aggregation recommendations, I've first grouped by id and zip code, but how can I group by id alone and still also include the zip code and count in the outer query?

Thanks.
Re: Top-N query [message #300201 is a reply to message #300200] Thu, 14 February 2008 10:33 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Don't call "count" a column/field, it is a reserved word.

As far I as I understood your question, this is not the max id you want but the max "maxcount" for each id.

Regards
Michel
Re: Top-N query [message #300212 is a reply to message #300201] Thu, 14 February 2008 10:54 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
Yeah, I changed the 'count' column name to 'zipcount' halfway through the post. I just retained the name 'count' for continuity's sake. I didn't want to cause additional confusion.

What I want is the top zip codes by id. Since I have 400 distinct id's, all I need returned is those 400 id's and which zipcode has the highest 'zipcount'.
Re: Top-N query [message #300221 is a reply to message #300212] Thu, 14 February 2008 11:19 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
So this what I thought and so you can follow what I said.

Regards
Michel
Re: Top-N query [message #300229 is a reply to message #300221] Thu, 14 February 2008 11:43 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
Well, I guess I just don't understand your suggestions. On the query that I posted, am I even close? Is it evident what point you're making that I'm not picking up on?
I appreciate your effort at letting people figure things out and I appreciate your assistance with this confusion that I'm experiencing.
SQL> select * 
from
(
  select id, zipcode, max(zipcount) zipcount
  from 
  (
    select id, zipcode, max(zipcount) zipcount
    from id_by_zip
    group by id, zipcode
  )
  group by id, zipcode
  order by 3 desc
)
where rownum <= 10;  

        ID ZIPCODE                                              ZIPCOUNT
---------- -------------------------------------------------- ----------
       906 12345                                                   22246
      1262 12345                                                   13240
      1349 12345                                                    6959
      1275 12345                                                    4987
      1262 90210                                                    4713
         1 12345                                                    4440
         1 90210                                                    4343
      1402 12345                                                    3500
       186 12345                                                    3215
      1281 12345                                                    3159

10 rows selected.
Re: Top-N query [message #300232 is a reply to message #300229] Thu, 14 February 2008 11:53 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
The outer group by must not have the same columns than the inner one, otherwise you get the same result.

You said it yourself:
Quote:
What I want is the top zip codes by id.


Regards
Michel
Re: Top-N query [message #300266 is a reply to message #299631] Thu, 14 February 2008 15:47 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
My query was way out there, not sure what I was thinking at the time of writing to be honest. so best to ignore it.

Ok, the principal is that you need 2 steps to get to the answer.

    select id, zipcode, max(zipcount) zipcount
    from id_by_zip
    group by id, zipcode


Then from this set, you need 1 entry for each id, having the max(zipcount), and the corresponding zipcode that relates to. We can add a partitioned row number on ID, and get 1 row per id, having the max_zipcount at the start of the rowcount.


  select id, 
         zip_code, 
         max_zipcount, 
         row_number() over (partition by id, order by max_zipcount desc) as rn
  from
    (
      select id, zipcode, max(zipcount) max_zipcount
      from id_by_zip
      group by id, zipcode
    )


Finally, you can restrict the rowset to only rn = 1 (ie the first row in each ID partition). And order the full resultset by max_zipcount desc.

select * from
(
  select id, 
         zip_code, 
         max_zipcount, 
         row_number() over (partition by id, order by max_zipcount desc) as rn
  from
    (
      select id, zipcode, max(zipcount) max_zipcount
      from id_by_zip
      group by id, zipcode
    )
)
where rn = 1
order by 3 desc
Re: Top-N query [message #300300 is a reply to message #300266] Thu, 14 February 2008 23:13 Go to previous messageGo to next message
Michel Cadot
Messages: 64132
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
@coleing

STOP posting solution, follow OraFAQ Forum Guide, "Responding to Posts" section:
Quote:
When responding to questions, if it is obviously a student with a homework assignment or someone else just learning, especially in the homework and newbies forums, it is usually best to provide hints or clues, perhaps links to relevant portions of the documentation, or a similar example, to point them in the right direction so that they will research and experiment on their own and learn, rather than providing complete solutions to problems. In cases where someone has a real-life complex work problem, or at least it may seem complex to them, it may be best to provide a complete demo and explanation.


Regards
Michel
Re: Top-N query [message #300438 is a reply to message #300300] Fri, 15 February 2008 07:17 Go to previous messageGo to next message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
That's interesting that you think this is a homework problem. This is actually an issue I'm dealing with at work. I'm still not really sure what you were driving towards with your hints. However, I certainly do appreciate everyone's assistance with this problem I've been dealing with.
Thanks.
Re: Top-N query [message #300443 is a reply to message #300266] Fri, 15 February 2008 07:29 Go to previous message
TLegend33
Messages: 203
Registered: March 2005
Senior Member
coeling,
Thanks very much for your assistance. I've been banging my head on my desk attempting to get my mind wrapped around this issue.
I appreciate it.
Thanks.
Previous Topic: help to search text in package
Next Topic: string function or utility help
Goto Forum:
  


Current Time: Wed Dec 07 16:17:23 CST 2016

Total time taken to generate the page: 0.06850 seconds