Home » SQL & PL/SQL » SQL & PL/SQL » computing the cumulative sum on a column
computing the cumulative sum on a column [message #354682] |
Mon, 20 October 2008 17:54 |
riteshu
Messages: 5 Registered: May 2005
|
Junior Member |
|
|
I have a table COUNTRY with the following columns.
id, state, population
1, Alabama, 200
2, Alaska, 50
3, Arizona, 150
4, Arkansas, 175
5, California, 300
6, Colorado, 200
7, Connecticut, 225
I want a query which should sum the population of all but the top 5 states.
id, state, population, population_Count
1, Alabama, 200, 200
2, Alaska, 50, 50
3, Arizona, 150, 150
4, Arkansas, 175, 175
5, California, 300, 300
6, Others,, 425
Thankx
Rick
|
|
|
|
Re: computing the cumulative sum on a column [message #354691 is a reply to message #354682] |
Mon, 20 October 2008 20:11 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
This is an interesting question to ask. On the surface there may be several alernative ways to do the query. The first things that come to mind are that you will use some combination of the following tools/techniques:
1) analytics
2) top-n query
3) grouping sets (as opposed to ROLLUP/CUBE)
4) case/decode
Next let me chastize you. One thing I hate and others here do too, is having to create test data for an OP who failed to post it them selves. You should have posted a CREATE TABLE and INSERT statements to facilitate those willing to help you find a solution.
Fortunately for you I am listening to AC/DC ThunderStruck and am having a good time annoying my wife and girls by bouncing in my office chair to the music and singing off key, whilst also ignoring them as they try to get my attention to stop (hehe what fun!).
Quote: | We met some girls
Some dancers who gave a good time
Broke all the rules, played all the fools
Yeah, yeah, they, they, they blew our minds
|
So...
id, state, population
1, Alabama, 200
2, Alaska, 50
3, Arizona, 150
4, Arkansas, 175
5, California, 300
6, Colorado, 200
7, Connecticut, 225
create table some_states
(
state_id number
, state_name varchar2(30)
, population number
)
/
insert into some_states values (1,'Alabama',200);
insert into some_states values (2,'Alaska',50);
insert into some_states values (3,'Arizona',150);
insert into some_states values (4,'Arkansas',175);
insert into some_states values (5,'California',300);
insert into some_states values (6,'Colorado',200);
insert into some_states values (7,'Connecticut',225);
commit
/
Next I will point you to additional reading. Even though I am about to break an unspoken rule here on ORAFAQ and give you one possible solution, I fully expect you to visit and read these web pages. I took the time to find them for you, least you can do is read them. These links will introduce you to the advanced kind of SQL that awaits you if you apply yourself. It is actually fun if you are a geek and lets face it, would we be here if we were not at least a bit geeky...
Tom Kyte Rank, Dense_Rank, Row_Number
Top 5 Salaries by Department (valuable top-n variation)
Just some Damn Cool Tom Kyte reading everyone should do
getting rows N through M of a result set (more basic top-n query)
Kevin Meade's blog
A Simple Example of Oracle Analytics: Running Totals
OK, there is a lot of links there. But if you are really interested in this stuff, you will have no problem reading them over the comming week. You will enjoy the new stuff.
Now, since I figure it will take you quite a while to get the solution on your own, or even reading, and I figure you won't read anything above if you are worried about getting your solution first, here is one coding solution to your problem. Once again thank AC/DC (NOW PLAYING... MoneyTalks).
Quote: | Tailored suits, chauffered cars
Fine hotels and big cigars
Up for grabs, up for a price
Where the red hot girls keep on dancing through the night
|
select *
from some_states
order by state_id
/
select some_states.*
,row_number() over (order by state_id) state_order
from some_states
/
select some_states.*
,case
when row_number() over (order by state_id) <= 5 then state_name
else 'Other'
end group_name
from some_states
/
with
analyzed_states as (
select some_states.*
,case
when row_number() over (order by state_id) <= 5 then state_name
else 'Other'
end group_name
from some_states
)
select sum(population) summary_population
,group_name
from analyzed_states
group by group_name
order by group_name
/
Create the table, load it, and run the above quereis. Then apply your mind to what each successive variation is doing. Pay particular attention to the ROW_NUMBER() analytic expression and how the OVER clause defines the order you use to decide what are your TOP-N rows.
Note further that by using the TOP-N query construct, we can change the definition of TOP-N for various scenarios easily. Same basic query structure, just a differen OVER clause.
with
analyzed_states as (
select some_states.*
,case
when row_number() over (order by state_name desc) <= 5 then state_name
else 'Other'
end group_name
from some_states
)
select sum(population) summary_population
,group_name
from analyzed_states
group by group_name
order by group_name
/
with
analyzed_states as (
select some_states.*
,case
when row_number() over (order by population desc) <= 5 then state_name
else 'Other'
end group_name
from some_states
)
select sum(population) summary_population
,group_name
from analyzed_states
group by group_name
order by sum(population) desc
/
To eveyrone else, Tom Kyte wrote an article about ROW_NUMBER, RANK, and DENSE_RANK which is better than the link I posted here, but I could not find it. If someone knows where it is, please post it in a reply. Thanks.
Good luck, better read some of those links. Kevin
I occurrs to me that not everyone will have ready access to sqlplus so here is the output of all the sql for those readers who cannot run the queries immediately.
SQL>
SQL> select *
2 from some_states
3 order by state_id
4 /
STATE_ID STATE_NAME POPULATION
---------- ------------------------------ ----------
1 Alabama 200
2 Alaska 50
3 Arizona 150
4 Arkansas 175
5 California 300
6 Colorado 200
7 Connecticut 225
7 rows selected.
SQL>
SQL> select some_states.*
2 ,row_number() over (order by state_id) state_order
3 from some_states
4 /
STATE_ID STATE_NAME POPULATION STATE_ORDER
---------- ------------------------------ ---------- -----------
1 Alabama 200 1
2 Alaska 50 2
3 Arizona 150 3
4 Arkansas 175 4
5 California 300 5
6 Colorado 200 6
7 Connecticut 225 7
7 rows selected.
SQL>
SQL> select some_states.*
2 ,case
3 when row_number() over (order by state_id) <= 5 then state_name
4 else 'Other'
5 end group_name
6 from some_states
7 /
STATE_ID STATE_NAME POPULATION GROUP_NAME
---------- ------------------------------ ---------- ------------------------------
1 Alabama 200 Alabama
2 Alaska 50 Alaska
3 Arizona 150 Arizona
4 Arkansas 175 Arkansas
5 California 300 California
6 Colorado 200 Other
7 Connecticut 225 Other
7 rows selected.
SQL>
SQL> with
2 analyzed_states as (
3 select some_states.*
4 ,case
5 when row_number() over (order by state_id) <= 5 then state_name
6 else 'Other'
7 end group_name
8 from some_states
9 )
10 select sum(population) summary_population
11 ,group_name
12 from analyzed_states
13 group by group_name
14 order by group_name
15 /
SUMMARY_POPULATION GROUP_NAME
------------------ ------------------------------
200 Alabama
50 Alaska
150 Arizona
175 Arkansas
300 California
425 Other
6 rows selected.
SQL>
SQL> with
2 analyzed_states as (
3 select some_states.*
4 ,case
5 when row_number() over (order by state_name desc) <= 5 then state_name
6 else 'Other'
7 end group_name
8 from some_states
9 )
10 select sum(population) summary_population
11 ,group_name
12 from analyzed_states
13 group by group_name
14 order by group_name
15 /
SUMMARY_POPULATION GROUP_NAME
------------------ ------------------------------
150 Arizona
175 Arkansas
300 California
200 Colorado
225 Connecticut
250 Other
6 rows selected.
SQL>
SQL> with
2 analyzed_states as (
3 select some_states.*
4 ,case
5 when row_number() over (order by population desc) <= 5 then state_name
6 else 'Other'
7 end group_name
8 from some_states
9 )
10 select sum(population) summary_population
11 ,group_name
12 from analyzed_states
13 group by group_name
14 order by sum(population) desc
15 /
SUMMARY_POPULATION GROUP_NAME
------------------ ------------------------------
300 California
225 Connecticut
200 Alabama
200 Colorado
200 Other
175 Arkansas
6 rows selected.
SQL>
[Updated on: Mon, 20 October 2008 20:17] Report message to a moderator
|
|
|
|
|
|
Goto Forum:
Current Time: Sat Nov 09 16:00:58 CST 2024
|