Home » SQL & PL/SQL » SQL & PL/SQL » rownum > 5 and count(*) (oracle 11g)
rownum > 5 and count(*) [message #569805] Thu, 01 November 2012 17:17 Go to next message
gentleman777us
Messages: 117
Registered: April 2005
Senior Member
Hi,


I need a query to display the office and count(*)
where rownum > 6 i.e

here is the data

office
------
hq
hq
hq
hq
hq
hq1
hq1
hq1
hq1
hq2
hq2
hq2
hq3
hq3
hq4
hq5
hq6
hq7
hq8
hq8
hq9
hq9
hq10
hq10

I need a query to display the top 3 counts and the 4 row should be sum of count of other offices.

my result should look like this

office count(*)
--------- --------
hq 5
hq1 4
hq2 3
other 10

Thanks
Re: rownum > 5 and count(*) [message #569806 is a reply to message #569805] Thu, 01 November 2012 17:42 Go to previous messageGo to next message
BlackSwan
Messages: 22495
Registered: January 2009
Senior Member
It appears that you REFUSE to comply Posting Guidelines; so as far as I am concerned You're On Your Own (YOYO)!

BlackSwan wrote on Fri, 04 May 2012 12:22
Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/

BlackSwan wrote on Thu, 19 July 2012 13:56
>

Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/

BlackSwan wrote on Fri, 20 April 2012 15:04
Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/

Re: rownum > 5 and count(*) [message #569810 is a reply to message #569805] Thu, 01 November 2012 20:36 Go to previous messageGo to next message
Kevin Meade
Messages: 1925
Registered: December 1999
Location: Connecticut USA
Senior Member
read up on Oracle Analytics. You can easily solve this problem with them.

Kevin
Re: rownum > 5 and count(*) [message #569823 is a reply to message #569810] Fri, 02 November 2012 01:35 Go to previous messageGo to next message
rajesh4851
Messages: 85
Registered: January 2007
Member
Hi,
Here is your query :

create table office (v1 varchar2(100));

INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq4');
INSERT INTO office values ('hq5');
INSERT INTO office values ('hq6');
INSERT INTO office values ('hq7');
INSERT INTO office values ('hq8');
INSERT INTO office values ('hq8');
INSERT INTO office values ('hq9');
INSERT INTO office values ('hq9');
INSERT INTO office values ('hq10');
INSERT INTO office values ('hq10');

commit;

select v1, cnt from
(select rownum rn, v1, cnt from (
select  distinct v1, count(v1) over (partition by v1) cnt
from office
order by 2 desc) )
where rn <= 3
union
select 'others', sum(cnt)  from
(select rownum rn, v1, cnt from (
select  distinct v1, count(v1) over (partition by v1) cnt
from office
order by 2 desc) )
where rn > 3;


I think, you may need to optimize it incase of huge data.

Regards,
Rajesh
Re: rownum > 5 and count(*) [message #569858 is a reply to message #569823] Fri, 02 November 2012 09:30 Go to previous message
Kevin Meade
Messages: 1925
Registered: December 1999
Location: Connecticut USA
Senior Member
OK, well I was trying to avoid just giving an answer. OP, please pay attention. Rajesh has done your job for you. By this I do not refer to him giving an answer, but rather that he has provided a test case so that others can work on the problem. This simple table and inserts is what BlackSwan was referring to when he noted you have not been following guidelines. Next time please provide a working case so that others do not have to.

Since we are giving answers now, here is another one.

with
--
-- add up the data into a simple set of groups so we can get start with the basic row counts
--
      simple_grouping as (
                           select v1,count(*) rowcount
                           from office
                           group by v1
                         )
--
-- use analytics to order the rows based on their row counts this giving them a position in an ordered list
--
    , determine_ordered_position as (
                                      select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
                                      from simple_grouping
                                    )
--
-- since we want a CATCH ALL bucket after some certain number of top items
-- use logic to determine which rows go into this OTHER bucket and tag them as such
--
    , determine_final_buckets as (
                                   select  case when ordered_position <= 3 then v1 else 'Other' end v1
                                         , rowcount
                                         , ordered_position
                                         , case when ordered_position <= 3 then ordered_position else 4 end final_bucket
                                   from determine_ordered_position
                                 )
--
-- do the final grouping to group rows that belong in OTHER into an actual OTHER bucket
--
    , final_grouping as (
                          select v1,sum(rowcount) rowcount
                          from determine_final_buckets
                          group by v1
                        )
select *
from final_grouping
order by rowcount desc
/


There are a couple of things I want to point out.

1) notice I used CODE HTML formatting in my post so that it is easy to read. This is another thing we expect of people when they are asking us questions.

2) I have used the WITH clause to show the logic of data transformations to your solution. With it you can follow along inside the query very easily. Oracle will rewrite the query anyway so we might as well make the query as easy for us to read as possible and the WITH clause goes a long way to doing this.

3) notice my use of liberal commentary.

4) one of the best things about WITH is that you can run the pieces to see how data is transformed by each step.

The following queries will yield the following data transforms:

with
--
-- add up the data into a simple set of groups so we can get start with the basic row counts
--
      simple_grouping as (
                           select v1,count(*) rowcount
                           from office
                           group by v1
                         )
select * from simple_grouping
/

with
--
-- add up the data into a simple set of groups so we can get start with the basic row counts
--
      simple_grouping as (
                           select v1,count(*) rowcount
                           from office
                           group by v1
                         )
--
-- use analytics to order the rows based on their row counts this giving them a position in an ordered list
--
    , determine_ordered_position as (
                                      select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
                                      from simple_grouping
                                    )
select * from determine_ordered_position 
/

with
--
-- add up the data into a simple set of groups so we can get start with the basic row counts
--
      simple_grouping as (
                           select v1,count(*) rowcount
                           from office
                           group by v1
                         )
--
-- use analytics to order the rows based on their row counts this giving them a position in an ordered list
--
    , determine_ordered_position as (
                                      select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
                                      from simple_grouping
                                    )
--
-- since we want a CATCH ALL bucket after some certain number of top items
-- use logic to determine which rows go into this OTHER bucket and tag them as such
--
    , determine_final_buckets as (
                                   select  case when ordered_position <= 3 then v1 else 'Other' end v1
                                         , rowcount
                                         , ordered_position
                                         , case when ordered_position <= 3 then ordered_position else 4 end final_bucket
                                   from determine_ordered_position
                                 )
select * from determine_final_buckets 
/

with
--
-- add up the data into a simple set of groups so we can get start with the basic row counts
--
      simple_grouping as (
                           select v1,count(*) rowcount
                           from office
                           group by v1
                         )
--
-- use analytics to order the rows based on their row counts this giving them a position in an ordered list
--
    , determine_ordered_position as (
                                      select v1,rowcount,row_number () over (order by rowcount desc) ordered_position
                                      from simple_grouping
                                    )
--
-- since we want a CATCH ALL bucket after some certain number of top items
-- use logic to determine which rows go into this OTHER bucket and tag them as such
--
    , determine_final_buckets as (
                                   select  case when ordered_position <= 3 then v1 else 'Other' end v1
                                         , rowcount
                                         , ordered_position
                                         , case when ordered_position <= 3 then ordered_position else 4 end final_bucket
                                   from determine_ordered_position
                                 )
--
-- do the final grouping to group rows that belong in OTHER into an actual OTHER bucket
--
    , final_grouping as (
                          select v1,sum(rowcount) rowcount
                          from determine_final_buckets
                          group by v1
                        )
select *
from final_grouping
order by rowcount desc
/

V1           ROWCOUNT
---------- ----------
hq5                 1
hq8                 2
hq4                 1
hq7                 1
hq2                 3
hq9                 2
hq                  5
hq6                 1
hq3                 2
hq1                 4
hq10                2

11 rows selected.

V1           ROWCOUNT ORDERED_POSITION
---------- ---------- ----------------
hq                  5                1
hq1                 4                2
hq2                 3                3
hq8                 2                4
hq9                 2                5
hq3                 2                6
hq10                2                7
hq6                 1                8
hq4                 1                9
hq5                 1               10
hq7                 1               11

11 rows selected.

V1           ROWCOUNT ORDERED_POSITION FINAL_BUCKET
---------- ---------- ---------------- ------------
hq                  5                1            1
hq1                 4                2            2
hq2                 3                3            3
Other               2                4            4
Other               2                5            4
Other               2                6            4
Other               2                7            4
Other               1                8            4
Other               1                9            4
Other               1               10            4
Other               1               11            4

11 rows selected.

V1           ROWCOUNT
---------- ----------
Other              12
hq                  5
hq1                 4
hq2                 3

4 rows selected.


5) lastly I would also point out that this question like many top-n questions is a bit ambiguous. For example, the data provided is rather neat in that there are few overlaps in the final result set. But what happens when the data changes and there are in fact multiple final rows with the same rowcount. For example, let us add a few more rows to the data and see how the answer changes. Adding these two rows and rerunning the query yields this answer:

INSERT INTO office values ('hq10');

INSERT INTO office values ('hq10');

V1           ROWCOUNT
---------- ----------
Other              13
hq                  5
hq10                4
hq1                 4

4 rows selected.

This answer is not wrong. But it does beg the question what happened to hq2 which has a count of three. After all, the highest three counts are 5/4/3. Didn't we want to see the highest three counts? Maybe or Maybe not. It is all in how the question is interpreted.

Did you want all rows with the highest three counts?
Or just some random set of rows that are from all rows with the highest three counts?
Or did you want all rows that fit into the highest set of grouped rows that yields at least three rows?
Or do you want a random set of rows from all rows that fit into the highest set of grouped rows that yields three rows?

If these questions sound a bit confusing that is because they are. So let me offer a second set of data for you to consider as a way to introduce what I am getting at. Given you want the TOP 3... what is the answer for this seemingly silly but valid data set?

delete from office;
INSERT INTO office values ('hq1');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq4');
commit;

For this data, I can think of at least nine (9) different answers that are all valid interpretations of the question TOP-3. They are:

hq1   1
hq2   1
hq3   1

hq1   1
hq2   1
hq4   1

hq1   1
hq3   1
hq4   1

hq2   1
hq3   1
hq4   1


hq1   1
hq2   1
hq3   1
Other 1

hq1   1
hq2   1
hq4   1
Other 1

hq1   1
hq3   1
hq4   1
Other 1

hq2   1
hq3   1
hq4   1
Other 1

hq1   1
hq2   1
hq3   1
hq4   1

Of these nine, what is your expected answer? I don't really want you to answer that question for us. I just want you to realize how ambiguous TOP-N queries can be without additional clarifications provided. This is why they often make end users mad. They ask for a TOP-N report and then when they get it they eventually find it is has some quirk in it they did not expect.

As you can see, the queries provided as "answers" to your question make assumptions about what they think your question is. Indeed, they are not looking so good either in light of the possibilities. When I ran my super duper WITH query against these four rows, it spit out this answer (#7 or the nine above):

V1           ROWCOUNT
---------- ----------
hq4                 1
hq1                 1
hq3                 1
Other               1

4 rows selected.

But if you are following along you see how this is a somewhat RANDOM result.

To add more sauce, I can imagine how the answer might actually change with the same data and the same query given various factors.

Here is one example to consider. Suppose the data we are faced with is very large; large enough that Oracle decides to use PARALLEL QUERY. In the face of PARALLEL ADAPTIVE MULTI-USER or the more recent AUTO-DOP technologies of Oracle, the database might decide to use different DOP given varrying workloads on the box. That would in turn present different row sorting and partitioning happening during query execution and consequently maybe a different final answer. Chew on that. Same data, Same query, different answer at different times of the day... hehe.

Other possbilities? Maybe this is part of a batch system and you trunc/load work tables with this data and in the process the same data gets inserted in a different order. Maybe your DBA converts the table to an IOT? Maybe the DBA turns off PARALLEL QUERY for your instance because a study showed that it was counterproductive. Or maybe the DBA turned it on? Mabye the DBA allocated more memory for hash and sort operations to increase the number of OPTIMUM and ONE-PASS operations? Lots of things can happen and given the nature of results that depend or row order (yikes).

Here is one:

drop table office;

create table office (v1 varchar2(10)) partition by hash (v1) PARTITIONS 3;

INSERT INTO office values ('hq4');
INSERT INTO office values ('hq3');
INSERT INTO office values ('hq2');
INSERT INTO office values ('hq1');
commit;


V1           ROWCOUNT
---------- ----------
hq4                 1
hq1                 1
Other               1
hq2                 1

4 rows selected.


WoohOo! Same data, same query, different answer. Got to love it.

Good luck man. Kevin

[Updated on: Fri, 02 November 2012 09:33]

Report message to a moderator

Previous Topic: Query
Next Topic: Query to find the user who did not login from past 3 years
Goto Forum:
  


Current Time: Fri Jul 25 04:05:29 CDT 2014

Total time taken to generate the page: 0.17464 seconds