Home » SQL & PL/SQL » SQL & PL/SQL » rownum > 5 and count(*) (oracle 11g)
rownum > 5 and count(*) [message #569805] 
Thu, 01 November 2012 17:17 
gentleman777us
Messages: 117 Registered: April 2005

Senior Member 


Hi,
I need a query to display the office and count(*)
where rownum > 6 i.e
here is the data
office

hq
hq
hq
hq
hq
hq1
hq1
hq1
hq1
hq2
hq2
hq2
hq3
hq3
hq4
hq5
hq6
hq7
hq8
hq8
hq9
hq9
hq10
hq10
I need a query to display the top 3 counts and the 4 row should be sum of count of other offices.
my result should look like this
office count(*)
 
hq 5
hq1 4
hq2 3
other 10
Thanks





Re: rownum > 5 and count(*) [message #569823 is a reply to message #569810] 
Fri, 02 November 2012 01:35 
rajesh4851
Messages: 85 Registered: January 2007

Member 


Hi,
Here is your query :
create table office (v1 varchar2(100));
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq4');
INSERT INTO office values ('hq5');
INSERT INTO office values ('hq6');
INSERT INTO office values ('hq7');
INSERT INTO office values ('hq8');
INSERT INTO office values ('hq8');
INSERT INTO office values ('hq9');
INSERT INTO office values ('hq9');
INSERT INTO office values ('hq10');
INSERT INTO office values ('hq10');
commit;
select v1, cnt from
(select rownum rn, v1, cnt from (
select distinct v1, count(v1) over (partition by v1) cnt
from office
order by 2 desc) )
where rn <= 3
union
select 'others', sum(cnt) from
(select rownum rn, v1, cnt from (
select distinct v1, count(v1) over (partition by v1) cnt
from office
order by 2 desc) )
where rn > 3;
I think, you may need to optimize it incase of huge data.
Regards,
Rajesh



Re: rownum > 5 and count(*) [message #569858 is a reply to message #569823] 
Fri, 02 November 2012 09:30 

Kevin Meade
Messages: 1967 Registered: December 1999 Location: Connecticut USA

Senior Member 


OK, well I was trying to avoid just giving an answer. OP, please pay attention. Rajesh has done your job for you. By this I do not refer to him giving an answer, but rather that he has provided a test case so that others can work on the problem. This simple table and inserts is what BlackSwan was referring to when he noted you have not been following guidelines. Next time please provide a working case so that others do not have to.
Since we are giving answers now, here is another one.
with

 add up the data into a simple set of groups so we can get start with the basic row counts

simple_grouping as (
select v1,count(*) rowcount
from office
group by v1
)

 use analytics to order the rows based on their row counts this giving them a position in an ordered list

, determine_ordered_position as (
select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
from simple_grouping
)

 since we want a CATCH ALL bucket after some certain number of top items
 use logic to determine which rows go into this OTHER bucket and tag them as such

, determine_final_buckets as (
select case when ordered_position <= 3 then v1 else 'Other' end v1
, rowcount
, ordered_position
, case when ordered_position <= 3 then ordered_position else 4 end final_bucket
from determine_ordered_position
)

 do the final grouping to group rows that belong in OTHER into an actual OTHER bucket

, final_grouping as (
select v1,sum(rowcount) rowcount
from determine_final_buckets
group by v1
)
select *
from final_grouping
order by rowcount desc
/
There are a couple of things I want to point out.
1) notice I used CODE HTML formatting in my post so that it is easy to read. This is another thing we expect of people when they are asking us questions.
2) I have used the WITH clause to show the logic of data transformations to your solution. With it you can follow along inside the query very easily. Oracle will rewrite the query anyway so we might as well make the query as easy for us to read as possible and the WITH clause goes a long way to doing this.
3) notice my use of liberal commentary.
4) one of the best things about WITH is that you can run the pieces to see how data is transformed by each step.
The following queries will yield the following data transforms:
with

 add up the data into a simple set of groups so we can get start with the basic row counts

simple_grouping as (
select v1,count(*) rowcount
from office
group by v1
)
select * from simple_grouping
/
with

 add up the data into a simple set of groups so we can get start with the basic row counts

simple_grouping as (
select v1,count(*) rowcount
from office
group by v1
)

 use analytics to order the rows based on their row counts this giving them a position in an ordered list

, determine_ordered_position as (
select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
from simple_grouping
)
select * from determine_ordered_position
/
with

 add up the data into a simple set of groups so we can get start with the basic row counts

simple_grouping as (
select v1,count(*) rowcount
from office
group by v1
)

 use analytics to order the rows based on their row counts this giving them a position in an ordered list

, determine_ordered_position as (
select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
from simple_grouping
)

 since we want a CATCH ALL bucket after some certain number of top items
 use logic to determine which rows go into this OTHER bucket and tag them as such

, determine_final_buckets as (
select case when ordered_position <= 3 then v1 else 'Other' end v1
, rowcount
, ordered_position
, case when ordered_position <= 3 then ordered_position else 4 end final_bucket
from determine_ordered_position
)
select * from determine_final_buckets
/
with

 add up the data into a simple set of groups so we can get start with the basic row counts

simple_grouping as (
select v1,count(*) rowcount
from office
group by v1
)

 use analytics to order the rows based on their row counts this giving them a position in an ordered list

, determine_ordered_position as (
select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
from simple_grouping
)

 since we want a CATCH ALL bucket after some certain number of top items
 use logic to determine which rows go into this OTHER bucket and tag them as such

, determine_final_buckets as (
select case when ordered_position <= 3 then v1 else 'Other' end v1
, rowcount
, ordered_position
, case when ordered_position <= 3 then ordered_position else 4 end final_bucket
from determine_ordered_position
)

 do the final grouping to group rows that belong in OTHER into an actual OTHER bucket

, final_grouping as (
select v1,sum(rowcount) rowcount
from determine_final_buckets
group by v1
)
select *
from final_grouping
order by rowcount desc
/
V1 ROWCOUNT
 
hq5 1
hq8 2
hq4 1
hq7 1
hq2 3
hq9 2
hq 5
hq6 1
hq3 2
hq1 4
hq10 2
11 rows selected.
V1 ROWCOUNT ORDERED_POSITION
  
hq 5 1
hq1 4 2
hq2 3 3
hq8 2 4
hq9 2 5
hq3 2 6
hq10 2 7
hq6 1 8
hq4 1 9
hq5 1 10
hq7 1 11
11 rows selected.
V1 ROWCOUNT ORDERED_POSITION FINAL_BUCKET
   
hq 5 1 1
hq1 4 2 2
hq2 3 3 3
Other 2 4 4
Other 2 5 4
Other 2 6 4
Other 2 7 4
Other 1 8 4
Other 1 9 4
Other 1 10 4
Other 1 11 4
11 rows selected.
V1 ROWCOUNT
 
Other 12
hq 5
hq1 4
hq2 3
4 rows selected.
5) lastly I would also point out that this question like many topn questions is a bit ambiguous. For example, the data provided is rather neat in that there are few overlaps in the final result set. But what happens when the data changes and there are in fact multiple final rows with the same rowcount. For example, let us add a few more rows to the data and see how the answer changes. Adding these two rows and rerunning the query yields this answer:
INSERT INTO office values ('hq10');
INSERT INTO office values ('hq10');
V1 ROWCOUNT
 
Other 13
hq 5
hq10 4
hq1 4
4 rows selected.
This answer is not wrong. But it does beg the question what happened to hq2 which has a count of three. After all, the highest three counts are 5/4/3. Didn't we want to see the highest three counts? Maybe or Maybe not. It is all in how the question is interpreted.
Did you want all rows with the highest three counts?
Or just some random set of rows that are from all rows with the highest three counts?
Or did you want all rows that fit into the highest set of grouped rows that yields at least three rows?
Or do you want a random set of rows from all rows that fit into the highest set of grouped rows that yields three rows?
If these questions sound a bit confusing that is because they are. So let me offer a second set of data for you to consider as a way to introduce what I am getting at. Given you want the TOP 3... what is the answer for this seemingly silly but valid data set?
delete from office;
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq4');
commit;
For this data, I can think of at least nine (9) different answers that are all valid interpretations of the question TOP3. They are:
hq1 1
hq2 1
hq3 1
hq1 1
hq2 1
hq4 1
hq1 1
hq3 1
hq4 1
hq2 1
hq3 1
hq4 1
hq1 1
hq2 1
hq3 1
Other 1
hq1 1
hq2 1
hq4 1
Other 1
hq1 1
hq3 1
hq4 1
Other 1
hq2 1
hq3 1
hq4 1
Other 1
hq1 1
hq2 1
hq3 1
hq4 1
Of these nine, what is your expected answer? I don't really want you to answer that question for us. I just want you to realize how ambiguous TOPN queries can be without additional clarifications provided. This is why they often make end users mad. They ask for a TOPN report and then when they get it they eventually find it is has some quirk in it they did not expect.
As you can see, the queries provided as "answers" to your question make assumptions about what they think your question is. Indeed, they are not looking so good either in light of the possibilities. When I ran my super duper WITH query against these four rows, it spit out this answer (#7 or the nine above):
V1 ROWCOUNT
 
hq4 1
hq1 1
hq3 1
Other 1
4 rows selected.
But if you are following along you see how this is a somewhat RANDOM result.
To add more sauce, I can imagine how the answer might actually change with the same data and the same query given various factors.
Here is one example to consider. Suppose the data we are faced with is very large; large enough that Oracle decides to use PARALLEL QUERY. In the face of PARALLEL ADAPTIVE MULTIUSER or the more recent AUTODOP technologies of Oracle, the database might decide to use different DOP given varrying workloads on the box. That would in turn present different row sorting and partitioning happening during query execution and consequently maybe a different final answer. Chew on that. Same data, Same query, different answer at different times of the day... hehe.
Other possbilities? Maybe this is part of a batch system and you trunc/load work tables with this data and in the process the same data gets inserted in a different order. Maybe your DBA converts the table to an IOT? Maybe the DBA turns off PARALLEL QUERY for your instance because a study showed that it was counterproductive. Or maybe the DBA turned it on? Mabye the DBA allocated more memory for hash and sort operations to increase the number of OPTIMUM and ONEPASS operations? Lots of things can happen and given the nature of results that depend or row order (yikes).
Here is one:
drop table office;
create table office (v1 varchar2(10)) partition by hash (v1) PARTITIONS 3;
INSERT INTO office values ('hq4');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq1');
commit;
V1 ROWCOUNT
 
hq4 1
hq1 1
Other 1
hq2 1
4 rows selected.
WoohOo! Same data, same query, different answer. Got to love it.
Good luck man. Kevin
[Updated on: Fri, 02 November 2012 09:33] Report message to a moderator



Goto Forum:
Current Time: Wed Dec 17 17:21:24 CST 2014
Total time taken to generate the page: 0.08482 seconds
