Home » SQL & PL/SQL » SQL & PL/SQL » Using analytical functions (Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production)
Using analytical functions [message #353072] Fri, 10 October 2008 22:15 Go to next message
jgreenleaf
Messages: 9
Registered: September 2008
Junior Member
Hello everyone,

I am trying to understand analytical functions but I'm having some trouble. A problem I have been posed with is to "determine the top three zip codes where most of the students live using an analytical function." I believe I need to utilize DENSE_RANK() but I'm not sure. Any suggestions or examples would be appreciated! I have attached the student table.

http://www.greenleafpowered.com/testing/student.jpeg
Re: Using analytical functions [message #353073 is a reply to message #353072] Fri, 10 October 2008 23:25 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Try some of this reading:

Kevin Meade's blog

A Simple Example of Oracle Analytics: Running Totals

Also search OraFAQ for more.

Additionally, jump over to ASKTOMHOME. Tom Kyte has produced lots of material on analytics, including some articles that specifically address use of rank and dense_rank, and answering exactly your kind of question.

Kevin
Re: Using analytical functions [message #353077 is a reply to message #353072] Sat, 11 October 2008 00:49 Go to previous messageGo to next message
Michel Cadot
Messages: 68718
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
Generally speaking:
* We don't do your homework
* Post what you already tried.
* Post a test case: create table and insert statements along with the result you want with these data
* Before read OraFAQ Forum Guide, especially "How to format your post?" section.

Regards
Michel

Re: Using analytical functions [message #353132 is a reply to message #353072] Sat, 11 October 2008 14:34 Go to previous messageGo to next message
jgreenleaf
Messages: 9
Registered: September 2008
Junior Member
Thank you for the references, sorry for being unprofessional.
Re: Using analytical functions [message #353145 is a reply to message #353072] Sat, 11 October 2008 23:36 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
On the contrary, your initial post was pretty good. You took the time to actually post an image which meant you had to look around for how to do that. Most posters don't take that time.

You did well for one of your first posts here. Michel was only pointing out additional materials you can supply. We have to remind newbies often about how to post questions and reference material so that we can attend to their needs with success. We just find that many people post figuring they will get code written for them and we prefer to teach people to fish as it 'twere. That is why we offerred up the references rather than posting SQL for you.

Good luck; if after a few more attempts you still don't have it, come back. Michel and I will get automatic emails from this thread if you post here again.

Kevin
Re: Using analytical functions [message #353188 is a reply to message #353072] Sun, 12 October 2008 12:14 Go to previous messageGo to next message
jgreenleaf
Messages: 9
Registered: September 2008
Junior Member
Thanks Kevin!

Let me show you what I've done so far although incorrect maybe you can point me in the right direction.

What I am trying to accomplish is to find the top 3 zip codes where most students live. The output is supposed to yield 10 rows. There are 268 students in the table. I have verified that I am supposed to be using a dense_rank() function but something about my grouping is not right. Here are some queries I have tried:

SELECT *
FROM   (SELECT Zip,
               Student_Id,
               Dense_rank() OVER(PARTITION BY Zip ORDER BY Student_Id) AS Rank
        FROM   Student)
WHERE  Rank <= 3;

ZIP   STUDENT_ID       RANK
----- ---------- ----------
11561        357          1
11565        257          1
11565        259          2
11565        275          3
11576        170          1
11717        315          1
11743        345          1
11746        234          1
11746        279          2
11746        380          3
11758        359          1
...
-- It goes on for 237 rows.


I have also tried:

SELECT *
FROM   (SELECT Zip,
               Dense_rank() OVER(ORDER BY Zip) AS Rank
        FROM   Student)
WHERE  Rank <= 3;

ZIP         RANK
----- ----------
01247          1
02124          2
02155          3


What I would really like to accomplish is something like the following example:

SELECT deptno,
       ename,
       sal,
       RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM   emp
WHERE  deptno = 30;

    DEPTNO ENAME             SAL       rank
---------- ---------- ---------- ----------
        30 JAMES             950          1
        30 WARD             1250          2
        30 MARTIN           1250          2
        30 TURNER           1500          4
        30 ALLEN            1600          5
        30 BLAKE            2850          6


It seems like when I partition based on zip and then try to order it by something else it doesn't come out right. I also tried to partition and order by zip and it just ranks everything as 1. Any pointers would be greatly appreciated!

Thank you.
Re: Using analytical functions [message #353195 is a reply to message #353188] Sun, 12 October 2008 13:01 Go to previous messageGo to next message
Michel Cadot
Messages: 68718
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
Quote:
What I am trying to accomplish is to find the top 3 zip codes where most students live. The output is supposed to yield 10 rows.

2 points:
- Why 10 rows? Top 3 seems to return 3 rows (unless you want those of same "rank" but this is not the same top 3)
- "most students" seems to mean you have to first count the number of students in each zip code before knowing which ones are in the top 3

Regards
Michel

[Updated on: Sun, 12 October 2008 13:04]

Report message to a moderator

Re: Using analytical functions [message #353199 is a reply to message #353072] Sun, 12 October 2008 13:35 Go to previous messageGo to next message
jgreenleaf
Messages: 9
Registered: September 2008
Junior Member
Ya I know that 10 rows sounds weird but you are right Michel they are expecting a listing of zipcodes with the same rank which is why I've been trying to use the dense_rank function as it will assign duplicate values to the same rank.

I also think you are correct with your second point about counting the number of students with the same zipcode. I've been thinking about that a lot, trying to think of a way to group them. Each student_id is unique so it seems I can't group by that. It seems like I should be grouping by zipcode (zip) but haven't really figured out how to do that. Rules of GROUP BY are different when using analytical functions right?

Thanks again!
Re: Using analytical functions [message #353201 is a reply to message #353199] Sun, 12 October 2008 14:29 Go to previous messageGo to next message
Michel Cadot
Messages: 68718
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
As I said there are 2 steps, you group then you rank.

Regards
Michel
Re: Using analytical functions [message #353220 is a reply to message #353072] Sun, 12 October 2008 23:56 Go to previous message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
First lets us define what kind of query we are talking about. You want to do what is known as a TOP-N query. The problem with a top-n query is that they are often ambiguous. Consider the following question:

Quote:
show me the top three paid employees

The following scenarios show how this question is ambiguous:

create table employee (empno number,ename varchar2(10),salary number(9,2));
insert into employee values (1,'joe',1000);
insert into employee values (2,'jon',1000);
insert into employee values (3,'jan',1000);
insert into employee values (4,'jim',1000);

What is the answer for the above data? Is it:

Quote:
1) joe,jon,jan (or some other combination of three)
2) joe,jon,jan,jim (because all are in the top three salarys)

Consider this scenario

delete from employee;
insert into employee values (1,'joe',1000);
insert into employee values (2,'jon',1000);
insert into employee values (3,'jan',1000);
insert into employee values (4,'jim',1000);

insert intp employee values (11,'sue',2000);
insert into employee values (21,'sal',2000);
insert into employee values (31,'set',2000);
insert into employee values (41,'sow',2000);

What is the answer for the data above?

Quote:
1) sow,set,sal (or some other combination)
2) why not joe,jan,jan (or some combination) (these emps all have salaries in the top three)
3) joe,jon,jan,jim,sue,sal,set,sow (same reason as #2)

Also most interesting is this scenario:

delete from employee;
insert into employee values (1,'joe',1000);
insert into employee values (2,'jon',2000);
insert into employee values (3,'jan',2000);
insert into employee values (4,'jim',2000);

Who are your top 3 paid employess?:

Quote:
1) jim,jan,jon
2) jim,jan,jon,joe

So, to do a top-n query, you have to know exactly what you are looking for because interpretations abound and all are valid.

Try going to ASKTOMHOME and search for:

Quote:
dense_rank
topn
top n

Kevin
Previous Topic: % from Value
Next Topic: join two tables with ip
Goto Forum:
  


Current Time: Wed Dec 04 18:37:07 CST 2024