Home » SQL & PL/SQL » SQL & PL/SQL » computing the cumulative sum on a column
computing the cumulative sum on a column [message #354682] Mon, 20 October 2008 17:54 Go to next message
riteshu
Messages: 5
Registered: May 2005
Junior Member
I have a table COUNTRY with the following columns.

id, state, population
1, Alabama, 200
2, Alaska, 50
3, Arizona, 150
4, Arkansas, 175
5, California, 300
6, Colorado, 200
7, Connecticut, 225

I want a query which should sum the population of all but the top 5 states.

id, state, population, population_Count
1, Alabama, 200, 200
2, Alaska, 50, 50
3, Arizona, 150, 150
4, Arkansas, 175, 175
5, California, 300, 300
6, Others,, 425

Thankx
Rick
Re: computing the cumulative sum on a column [message #354686 is a reply to message #354682] Mon, 20 October 2008 18:37 Go to previous messageGo to next message
ebrian
Messages: 2794
Registered: April 2006
Senior Member
What's considered TOP 5 states ? Is it based strictly on alphabetical order ?

What have you tried so far?
Re: computing the cumulative sum on a column [message #354691 is a reply to message #354682] Mon, 20 October 2008 20:11 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
This is an interesting question to ask. On the surface there may be several alernative ways to do the query. The first things that come to mind are that you will use some combination of the following tools/techniques:

1) analytics
2) top-n query
3) grouping sets (as opposed to ROLLUP/CUBE)
4) case/decode

Next let me chastize you. One thing I hate and others here do too, is having to create test data for an OP who failed to post it them selves. You should have posted a CREATE TABLE and INSERT statements to facilitate those willing to help you find a solution.

Fortunately for you I am listening to AC/DC ThunderStruck and am having a good time annoying my wife and girls by bouncing in my office chair to the music and singing off key, whilst also ignoring them as they try to get my attention to stop (hehe what fun!).

Quote:
We met some girls
Some dancers who gave a good time
Broke all the rules, played all the fools
Yeah, yeah, they, they, they blew our minds

So...

id, state, population
1, Alabama, 200
2, Alaska, 50
3, Arizona, 150
4, Arkansas, 175
5, California, 300
6, Colorado, 200
7, Connecticut, 225


create table some_states
(
   state_id number
 , state_name varchar2(30)
 , population number
)
/

insert into some_states values (1,'Alabama',200);
insert into some_states values (2,'Alaska',50);
insert into some_states values (3,'Arizona',150);
insert into some_states values (4,'Arkansas',175);
insert into some_states values (5,'California',300);
insert into some_states values (6,'Colorado',200);
insert into some_states values (7,'Connecticut',225);

commit
/


Next I will point you to additional reading. Even though I am about to break an unspoken rule here on ORAFAQ and give you one possible solution, I fully expect you to visit and read these web pages. I took the time to find them for you, least you can do is read them. These links will introduce you to the advanced kind of SQL that awaits you if you apply yourself. It is actually fun if you are a geek and lets face it, would we be here if we were not at least a bit geeky...

Tom Kyte Rank, Dense_Rank, Row_Number

Top 5 Salaries by Department (valuable top-n variation)

Just some Damn Cool Tom Kyte reading everyone should do

getting rows N through M of a result set (more basic top-n query)

Kevin Meade's blog

A Simple Example of Oracle Analytics: Running Totals

OK, there is a lot of links there. But if you are really interested in this stuff, you will have no problem reading them over the comming week. You will enjoy the new stuff.

Now, since I figure it will take you quite a while to get the solution on your own, or even reading, and I figure you won't read anything above if you are worried about getting your solution first, here is one coding solution to your problem. Once again thank AC/DC (NOW PLAYING... MoneyTalks).

Quote:
Tailored suits, chauffered cars
Fine hotels and big cigars
Up for grabs, up for a price
Where the red hot girls keep on dancing through the night

select *
from some_states
order by state_id
/

select some_states.*
      ,row_number() over (order by state_id) state_order
from some_states
/

select some_states.*
      ,case
         when row_number() over (order by state_id) <= 5 then state_name
         else 'Other'
       end group_name
from some_states
/

with
   analyzed_states as (
                        select some_states.*
                              ,case
                                 when row_number() over (order by state_id) <= 5 then state_name
                                 else 'Other'
                               end group_name
                        from some_states
                      )
select sum(population) summary_population
      ,group_name
from analyzed_states
group by group_name
order by group_name
/


Create the table, load it, and run the above quereis. Then apply your mind to what each successive variation is doing. Pay particular attention to the ROW_NUMBER() analytic expression and how the OVER clause defines the order you use to decide what are your TOP-N rows.

Note further that by using the TOP-N query construct, we can change the definition of TOP-N for various scenarios easily. Same basic query structure, just a differen OVER clause.

with
   analyzed_states as (
                        select some_states.*
                              ,case
                                 when row_number() over (order by state_name desc) <= 5 then state_name
                                 else 'Other'
                               end group_name
                        from some_states
                      )
select sum(population) summary_population
      ,group_name
from analyzed_states
group by group_name
order by group_name
/

with
   analyzed_states as (
                        select some_states.*
                              ,case
                                 when row_number() over (order by population desc) <= 5 then state_name
                                 else 'Other'
                               end group_name
                        from some_states
                      )
select sum(population) summary_population
      ,group_name
from analyzed_states
group by group_name
order by sum(population) desc
/


To eveyrone else, Tom Kyte wrote an article about ROW_NUMBER, RANK, and DENSE_RANK which is better than the link I posted here, but I could not find it. If someone knows where it is, please post it in a reply. Thanks.

Good luck, better read some of those links. Kevin

I occurrs to me that not everyone will have ready access to sqlplus so here is the output of all the sql for those readers who cannot run the queries immediately.

SQL> 
SQL> select *
  2  from some_states
  3  order by state_id
  4  /

  STATE_ID STATE_NAME                     POPULATION
---------- ------------------------------ ----------
         1 Alabama                               200
         2 Alaska                                 50
         3 Arizona                               150
         4 Arkansas                              175
         5 California                            300
         6 Colorado                              200
         7 Connecticut                           225

7 rows selected.

SQL> 
SQL> select some_states.*
  2        ,row_number() over (order by state_id) state_order
  3  from some_states
  4  /

  STATE_ID STATE_NAME                     POPULATION STATE_ORDER
---------- ------------------------------ ---------- -----------
         1 Alabama                               200           1
         2 Alaska                                 50           2
         3 Arizona                               150           3
         4 Arkansas                              175           4
         5 California                            300           5
         6 Colorado                              200           6
         7 Connecticut                           225           7

7 rows selected.

SQL> 
SQL> select some_states.*
  2        ,case
  3           when row_number() over (order by state_id) <= 5 then state_name
  4           else 'Other'
  5         end group_name
  6  from some_states
  7  /

  STATE_ID STATE_NAME                     POPULATION GROUP_NAME
---------- ------------------------------ ---------- ------------------------------
         1 Alabama                               200 Alabama
         2 Alaska                                 50 Alaska
         3 Arizona                               150 Arizona
         4 Arkansas                              175 Arkansas
         5 California                            300 California
         6 Colorado                              200 Other
         7 Connecticut                           225 Other

7 rows selected.

SQL> 
SQL> with
  2     analyzed_states as (
  3                          select some_states.*
  4                                ,case
  5                                   when row_number() over (order by state_id) <= 5 then state_name
  6                                   else 'Other'
  7                                 end group_name
  8                          from some_states
  9                        )
 10  select sum(population) summary_population
 11        ,group_name
 12  from analyzed_states
 13  group by group_name
 14  order by group_name
 15  /

SUMMARY_POPULATION GROUP_NAME
------------------ ------------------------------
               200 Alabama
                50 Alaska
               150 Arizona
               175 Arkansas
               300 California
               425 Other

6 rows selected.

SQL> 
SQL> with
  2     analyzed_states as (
  3                          select some_states.*
  4                                ,case
  5                                   when row_number() over (order by state_name desc) <= 5 then state_name
  6                                   else 'Other'
  7                                 end group_name
  8                          from some_states
  9                        )
 10  select sum(population) summary_population
 11        ,group_name
 12  from analyzed_states
 13  group by group_name
 14  order by group_name
 15  /

SUMMARY_POPULATION GROUP_NAME
------------------ ------------------------------
               150 Arizona
               175 Arkansas
               300 California
               200 Colorado
               225 Connecticut
               250 Other

6 rows selected.

SQL> 
SQL> with
  2     analyzed_states as (
  3                          select some_states.*
  4                                ,case
  5                                   when row_number() over (order by population desc) <= 5 then state_name
  6                                   else 'Other'
  7                                 end group_name
  8                          from some_states
  9                        )
 10  select sum(population) summary_population
 11        ,group_name
 12  from analyzed_states
 13  group by group_name
 14  order by sum(population) desc
 15  /

SUMMARY_POPULATION GROUP_NAME
------------------ ------------------------------
               300 California
               225 Connecticut
               200 Alabama
               200 Colorado
               200 Other
               175 Arkansas

6 rows selected.

SQL> 

[Updated on: Mon, 20 October 2008 20:17]

Report message to a moderator

Re: computing the cumulative sum on a column [message #354695 is a reply to message #354691] Mon, 20 October 2008 21:02 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Nice post. But not even Angus can sustain a lead break that long. Wink
Re: computing the cumulative sum on a column [message #354726 is a reply to message #354691] Tue, 21 October 2008 00:49 Go to previous messageGo to next message
riteshu
Messages: 5
Registered: May 2005
Junior Member
Thanks Kevin, for the solution, possibilities and links to all the information, or maybe i should thank AC/DC Smile

And sorry for not posting the scripts for the table and seed data. And I am primarily a java specialist and know only the basics of SQL and never had the need to delve deeper into it, until today that is!

But thank you for the links and hopefully I'll get sometime to educate myself more on SQL

icon14.gif  Re: computing the cumulative sum on a column [message #355293 is a reply to message #354726] Thu, 23 October 2008 06:29 Go to previous message
atozvs
Messages: 12
Registered: May 2008
Location: Hyderabad
Junior Member
Great Job Kevin,
thanks a lot for the query. I really learned few additional things in SQL.

Thanks again
Previous Topic: How to extract excel file into oracle database
Next Topic: Built in Package for encryption and decryption process.
Goto Forum:
  


Current Time: Sat Nov 09 16:00:58 CST 2024