Home » SQL & PL/SQL » SQL & PL/SQL » RANK: Selecting the sum of the top n highest values, only n values max
RANK: Selecting the sum of the top n highest values, only n values max [message #155597] Tue, 17 January 2006 10:22 Go to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
Assume the following data in table T

A| B
=====
A 20
B 21
C 22
D 19
E 19
F 19

I want to have a sum of the 4 highest values in the table. If there are more values that rank 1,2,3 or 4, just take the the sum of the 4 highest but no more than 4 in total.

So I would want to have a result like: 19+20+21+22 = 82
and not: 19+19+19+20+21+22 = ....

Would the data look like this:
A 20
B 21
C 21
D 22
E 19
F 19

Then I would want it to be 20+21+21+22 = 84.

both rank and dense rank do not rule out the possibility that more than 4 values end up with a rank <5 (e.g. 1,2,3,3,5)

How to do this... something with distinct?

Thanks...

PS I did a search on 'RANK' in this forum and havent found anything about this...

[Updated on: Tue, 17 January 2006 10:41]

Report message to a moderator

Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155600 is a reply to message #155597] Tue, 17 January 2006 10:50 Go to previous messageGo to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
Hm..

I just found something, in a thread here:

select * from (select * from emp order by sal desc) where rownum <=5

however, is this efficient? It's going to sort the whole table...
plus, what is 'rownum'? Does that work?

There should be a more efficient way shouldnt there... a full table scan with an array of 5, something like that, should be done by the optimizer. Not a full sort of the table and then take only 5...

very inefficient isnt it.

thanks.
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155604 is a reply to message #155600] Tue, 17 January 2006 11:13 Go to previous messageGo to next message
Art Metzer
Messages: 2480
Registered: December 2002
Senior Member
It won't "sort the whole table" if the optimizer chooses to use the stopkey optimization. For more information, please read
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155607 is a reply to message #155597] Tue, 17 January 2006 11:30 Go to previous messageGo to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
Thanks (again) Art,

Can I summarize that you are indicating that to get ONLY the top 5 of values, also returning only FIVE if there would be, say, 9 values all identical, and highest, the best way to go about doing this would be:

select * from (select * from T order by B desc) where rownum <=5

?

Thanks thanks
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155614 is a reply to message #155607] Tue, 17 January 2006 11:58 Go to previous messageGo to next message
Art Metzer
Messages: 2480
Registered: December 2002
Senior Member
It all depends on your requirements.

If you have a nine-way tie, do you absolutely want to show all nine tying rows? Or the top five rows by some additional ordering criterion? Or can you only accept a maximum of five rows period, in which case the requirement is to just grab five rows at random?
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155626 is a reply to message #155597] Tue, 17 January 2006 12:57 Go to previous messageGo to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
Yes I just need to sum the values themselves so it can come from any row whatsoever. Just have to make sure that there are no other values higher than the 5 that I am grabbing. Has to be 5 all the time except when there are only 4 or less, then only sum those...

thanks
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155628 is a reply to message #155626] Tue, 17 January 2006 13:00 Go to previous messageGo to next message
Art Metzer
Messages: 2480
Registered: December 2002
Senior Member
Then your ROWNUM solution should work fine.
select * from (select * from T order by B desc) where rownum <=5
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155631 is a reply to message #155597] Tue, 17 January 2006 13:43 Go to previous messageGo to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
but this select from, order by desc.... i mean, it is going to be browsing through about 500,000,000 rows... (although partitioned by date). And what is worse, I am going to have to calculate it for EACH individual article_number in that table, over a certain period. There are about 200000 articles... Locations * articles * weeks could be up to 1,000,000 calculations; for each article number I have to find the top 5 selling locations...

I hope you have some time to look at my alternative solution, involving an index-by table and a cursor:


Suppose the problem concerns a sales table with four fields:

SALES
----
ARTICLE (NUMBER) --> up to 200000 different articles
LOCATION (NUMBER) --> about 200 locations
SOLD_AMT (NUMBER)
WEEK_NO (NUMBER)

Instead I was thinking of the following approach:
run a cursor for loop over the table to give all location/articles for the relevant period(s). Order this by LOCATION. So Articles are passed in random order in the loop.

Fill an index-by table using the article number as an index.
On the index location I put a record. This record contains (amongst lots of other things) a VARRAY(5) with top-selling locations.

e.g.: sales(article).top5(1) will contain the highest amount for that article. Location that was involved is not relevant, just its total sales amt for this article.

Since the set is ordered by location, I keep adding to the total (for each article there is one index location) until the location changes. Then, if the location has changed, I do a check to see if the total_sales for that location is in the top 5 of that article. Remember this goes for each article.

Would that be faster? If you need more info, I'd be happy to share...

thanks again

[Updated on: Tue, 17 January 2006 13:48]

Report message to a moderator

Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155639 is a reply to message #155631] Tue, 17 January 2006 14:44 Go to previous messageGo to next message
Art Metzer
Messages: 2480
Registered: December 2002
Senior Member
Have you looked into the ROW_NUMBER() analytic function? In that call, you would partition by your article_number.

My general advice to you would be, don't reinvent the wheel. "If you can do it in SQL, do it." Only resort to PL/SQL if you absolutely have to (which I don't think you do).

Make sure your tables statistics have been recently analyzed.

Then benchmark the SQL solution, and the PL/SQL solution, if you so choose.

For more on the ROW_NUMBER analytic function, search AskTom.
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155640 is a reply to message #155639] Tue, 17 January 2006 14:52 Go to previous messageGo to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
Art Metzer wrote on Tue, 17 January 2006 21:44

Have you looked into the ROW_NUMBER() analytic function? In that call, you would partition by your article_number.



I dont totally see your solution... partition by art_number? The table has been partitioned by date but I suppose with a subset I could do anything else. Could you please explain a bit?

Thanks.

[Updated on: Tue, 17 January 2006 14:53]

Report message to a moderator

Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155643 is a reply to message #155640] Tue, 17 January 2006 15:40 Go to previous messageGo to next message
Art Metzer
Messages: 2480
Registered: December 2002
Senior Member
An example might be best.

Here's a query that shows the top five locations (one of nine possible two-letter codes) for each of twenty-six articles (single letter of the alphabet):
SQL> CREATE TABLE t (
  2      article_number  VARCHAR2(1)
  3  ,   location        VARCHAR2(2)
  4  ,   dt              DATE
  5  ,   sales           NUMBER
  6  )
  7  /
 
Table created.
 
SQL> INSERT INTO t
  2  SELECT CHR(TRUNC(DBMS_RANDOM.VALUE(0,26)) + 65)
  3  ,      CHR(TRUNC(DBMS_RANDOM.VALUE(0,3)) + 65)
  4         || CHR(TRUNC(DBMS_RANDOM.VALUE(0,3)) + 65)
  5  ,      TO_DATE('197901','YYYYMM') + DBMS_RANDOM.VALUE(0,10000)
  6  ,      TRUNC(DBMS_RANDOM.VALUE(0,10000)) * 1E-2
  7  FROM   sys.all_objects
  8  WHERE  ROWNUM <= 5000
  9  /
 
5000 rows created.
 
SQL> SELECT   b.article_number
  2  ,        MAX(DECODE(b.r,1,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#1_LOCATION"
  3  ,        MAX(DECODE(b.r,2,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#2_LOCATION"
  4  ,        MAX(DECODE(b.r,3,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#3_LOCATION"
  5  ,        MAX(DECODE(b.r,4,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#4_LOCATION"
  6  ,        MAX(DECODE(b.r,5,b.location || ':' || LPAD(TO_CHAR(b.total_sales,'fm9990.00'),10))) "#5_LOCATION"
  7  FROM    (SELECT a.article_number
  8           ,      a.location
  9           ,      a.total_sales
 10           ,      ROW_NUMBER()
 11                  OVER (PARTITION BY a.article_number
 12                        ORDER BY     a.total_sales DESC) r
 13           FROM  (SELECT   t.article_number
 14                  ,        t.location
 15                  ,        SUM(t.sales)     total_sales
 16                  FROM     t
 17                  GROUP BY t.article_number
 18                  ,        t.location) a) b
 19  GROUP BY b.article_number
 20  ORDER BY b.article_number
 21  /
 
A #1_LOCATION   #2_LOCATION   #3_LOCATION   #4_LOCATION   #5_LOCATION
- ------------- ------------- ------------- ------------- -------------
A CB:   1243.01 AB:   1220.12 CC:   1136.65 BB:   1092.00 AC:    873.07
B BC:   1315.07 CB:   1210.55 AB:   1096.69 AC:    992.31 CC:    913.12
C BB:   1526.50 BA:   1463.79 BC:   1254.00 AA:   1221.40 CB:   1202.82
D CB:   1774.60 BB:   1713.99 AA:   1592.01 AB:   1323.78 CA:    977.92
E AC:   1240.18 BA:   1163.41 AA:   1019.80 CA:    939.67 BB:    856.59
F AB:   1337.53 AA:   1325.00 CC:   1199.50 BB:   1124.83 BC:   1064.63
G AA:   1838.99 BA:   1383.37 CC:   1214.45 BC:   1183.18 AB:   1161.77
H AB:   1790.31 BB:   1493.82 CB:   1387.30 BA:   1332.45 BC:   1320.80
I AA:   1844.60 AC:   1186.10 BA:   1122.40 AB:    976.72 CA:    925.43
J AA:   1149.16 CB:   1108.97 BB:    957.67 AB:    902.75 AC:    899.79
K BC:   1568.66 CC:   1358.95 BA:   1335.33 CB:   1215.88 AB:   1091.22
L BB:   1238.54 BC:   1217.45 BA:   1119.37 CA:   1088.63 CC:   1079.85
M BC:   1273.45 CA:   1163.99 AC:   1112.90 BA:   1102.88 CB:   1063.84
N BB:   1724.76 CB:   1360.67 CA:   1287.25 AB:   1267.32 BA:   1238.88
O CA:   1422.41 CB:   1386.23 AB:   1380.47 AA:   1372.09 BA:   1312.06
P AB:   1451.14 AA:   1333.88 BC:   1131.36 BA:   1062.11 CC:   1008.58
Q BA:   1355.69 CA:   1276.48 CB:   1266.73 BB:   1131.05 AC:   1021.95
R BA:   1400.46 AC:   1296.94 CB:   1113.09 AB:   1052.78 CC:   1011.71
S AA:   1446.47 BB:   1381.35 CA:   1265.83 CC:   1248.37 AB:   1172.45
T CA:   1681.45 CB:   1407.47 AC:   1280.56 AA:   1219.67 BC:   1164.08
U CB:   1346.14 AA:   1209.67 AC:   1209.38 BB:   1133.90 CC:   1037.00
V AC:   1251.18 CC:   1249.93 BC:   1148.45 BA:   1138.22 AA:   1085.44
W CA:   1268.80 BA:   1208.33 BB:   1191.23 CB:   1112.65 AB:   1026.37
X CB:   1404.09 AA:   1281.85 BC:   1278.58 AB:   1084.74 AC:    961.25
Y BA:   1345.74 BB:   1279.04 BC:   1265.53 AB:   1145.38 CC:   1134.65
Z AA:   1585.47 CA:   1308.87 BA:   1223.27 BB:   1128.17 BC:   1104.22
 
26 rows selected.
 
SQL>
Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155646 is a reply to message #155597] Tue, 17 January 2006 15:52 Go to previous messageGo to next message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
THANKS!!!! Cool
icon1.gif  Re: RANK: Selecting the sum of the top n highest values, only n values max [message #155760 is a reply to message #155597] Wed, 18 January 2006 08:09 Go to previous message
arnematt
Messages: 24
Registered: November 2005
Location: Netherlands
Junior Member
However, however...

I am facing the problem that I would need to place a major WHERE clause on two locations in that select statement...

Will that still perform???


I will show you the basic select statement (in my solution a cursor that is used to fill the index-by table).

Also at the end of this post I have included my solution, well part of it... not asking 'what do you think of my code', more like an explanation of the approach if the query becomes too big.



==============================
SELECT STATEMENT
This statement consists of the INNER JOIN of several tables. This is to deal with restrictions that apply to the data in several tabels; e.g. for a particular article from the sales table (in reality the E_KASSABON table), the restriction applies that its should have status VALID in another table (here: e_art_hist.art_status; where e_art_hist is linked to e_kassabon through a substitute key in the e_art_geldig table).

Without wanting to go into too much detail here, the problem I now face is that the partition by needs only to be done over ONLY the articles in this result set; I suppose this implies the whole inner join bunch should go into BOTH the GROUP BY sections of your example as far as I can see??

Lot of processing time, and lot of text???

Hope you have some thoughts on this...

-------------------------------
SELECT e_vst.vst_nr, cls_nr, e_cls_art.art_nr
FROM e_art_hist
INNER JOIN e_art_geldig USING (art_key)
INNER JOIN e_cls_art ON e_art_geldig.art_nr = e_cls_art.art_nr
INNER JOIN e_cls_vst USING (cls_nr)
INNER JOIN e_vst ON e_vst.vst_nr = e_cls_vst.vst_nr
INNER JOIN e_kassabon ON e_kassabon.vst_nr = e_vst.vst_nr
AND e_kassabon.art_nr = e_cls_art.art_nr
WHERE e_kassabon.tract_datum BETWEEN (r1.peildatum AND r1.peildatum +6)
AND IND_KLANT = 'J'
AND IND_BOEKING = 'I'
AND e_art_geldig.beg_datum <= r1.peildatum
AND e_art_geldig.eind_datum >= r1.peildatum
AND e_art_hist.art_status = 'A'
AND e_cls_art.beg_datum_bron <= r1.peildatum
AND e_cls_art.eind_datum_bron >= r1.peildatum
AND e_cls_art.beg_datum <= r1.peildatum
AND e_cls_art.eind_datum > r1.peildatum
AND e_cls_vst.beg_datum_bron <= r1.peildatum
AND e_cls_vst.eind_datum_bron >= r1.peildatum
AND e_cls_vst.beg_datum <= r1.peildatum
AND e_cls_vst.eind_datum > r1.peildatum
ORDER BY
VST_NR,
BON_ID,
TRACT_DATUM ASC

---------------------
My solution (hope you can look through fieldname details etc):

one cursor to determine time period (peildatum: a date in a week). Second cursor loops within this one over mainly the E_KASSABON table, but with restrictions from many other tables (E_KASSABON is the big one).

VARRAY is used to store and update the top 5 of locations, while a certain location is being processed ('VST_NR' = 'LOCATION_NR'), for each article number (ART_NR) such a VARRAY is inserted in the index-by table on the index of the article number and updated each time the article number is encountered (loop is through ordered VST_NRs, not ordered ART_NRs. This is because I need ordering on BON_ID per VST_NR. (BON_ID = 'sale-id', where one sale can be more articles.)).

                                                           
CREATE OR REPLACE PROCEDURE OMZETDOORGIFTE (v_result IN OUT NUMBER) AS

CURSOR c1_peildata IS
SELECT DISTINCT(TO_CHAR(verw_datum, 'YYYYIW')
FROM MONITOR
WHERE proceskode IN ('LOAD', 'TLOD')
AND db_datum >
    SELECT (MAX(db_datum) FROM MONITOR
    WHERE proceskode = 'IKBAGG'
    AND result_kode < 2;


CURSOR c2_vstdag IS
SELECT
e_vst.vst_nr,
cls_nr,
e_cls_art.art_nr,
e_kassabon.bon_id,
e_kassabon. nogwat,
e_kassabon.aantal!!!
FROM e_art_hist
INNER JOIN e_art_geldig USING (art_key)
INNER JOIN e_cls_art ON e_art_geldig.art_nr = e_cls_art.art_nr
INNER JOIN e_cls_vst USING (cls_nr)
INNER JOIN e_vst ON e_vst.vst_nr = e_cls_vst.vst_nr
INNER JOIN e_kassabon ON e_kassabon.vst_nr = e_vst.vst_nr
AND e_kassabon.art_nr = e_cls_art.art_nr
WHERE e_kassabon.tract_datum BETWEEN (r1.peildatum AND r1.peildatum +6)
AND IND_KLANT = 'J'
AND IND_BOEKING = 'I'
AND e_art_geldig.beg_datum <= r1.peildatum
AND e_art_geldig.eind_datum >= r1.peildatum
AND e_art_hist.art_status = 'A'
AND e_cls_art.beg_datum_bron <= r1.peildatum
AND e_cls_art.eind_datum_bron >= r1.peildatum
AND e_cls_art.beg_datum <= r1.peildatum
AND e_cls_art.eind_datum > r1.peildatum
AND e_cls_vst.beg_datum_bron <= r1.peildatum
AND e_cls_vst.eind_datum_bron >= r1.peildatum
AND e_cls_vst.beg_datum <= r1.peildatum
AND e_cls_vst.eind_datum > r1.peildatum
ORDER BY
VST_NR,
BON_ID,
TRACT_DATUM ASC;


TYPE TOPVSTOMZ IS VARRAY(5) OF NUMBER;
);

TYPE Kasbonrec IS RECORD ( 
    vst_nr  E_KASSABON.VST_NR%TYPE, -- huidige vestigingsnummer
    cls_nr  E_KASSABON.CLS_NR%TYPE, -- alleen het meest recente cls_nr blijft bewaard.
    art_nr  E_KASSABON.ART_NR%TYPE, -- nodig voor sortering.
    bon_id  E_KASSABON.BON_ID%TYPE, --om bon id vast te houden; als dit wijzigt dan aantal klanten + 1
    aantal_stuks  E_KASSABON.AANTAL%TYPE, -- totaal aantal verkopen. Opgehoogd met aantal verkopen in huidig record.
    aantal_bonnen NUMBER, --aantal klanten voor dit artikel. Opgehoogd met 1 zodra bonid vorig en huidig ongelijk zijn.
    aantal_vestigingen NUMBER ,-- totaal aantal vestigingen dat dit artikel voert
    vst_totaal NUMBER, --totale verkoopbedrag voor dit artikel voor de huidige vestiging; als vst_nr wijzigt dan reset
    vst_top_5_Arr TOPVSTOMZ%TYPE -- array om voor dit artikel de vestigingstop 5 bij te houden


TYPE ArrA_VST_ART
IS TABLE OF Kasbonrec%ROWTYPE 
INDEX BY PLS_INTEGER;

PROCEDURE VERWERK IS

top5Counter1 NUMBER;
top5Counter2 NUMBER;
arrIndex NUMBER;

BEGIN
OR r1_peildatum IN c1_peildata LOOP

    FOR r2_vstdag IN c2_vstdag LOOP
        arrIndex : = r2_vstdag.ART_NR;
        aantalVestigingen := 0;
        aantalBonnen := 0;
        aantalStuks := 0;
    --etc...

        IF ArrA_VST_ART(arrIndex).VST_NR <> r2_vstdag.VST_NR 
        THEN

        FOR top5Counter1 IN REVERSE 1..5 -- Telt af van 5 naar 1.
        LOOP
            IF huidige waarde >= Arr(top5Counter1)
            THEN 
                FOR top5Counter2 IN 1..top5Counter1-1
                LOOP
                    topVstOmz(top5Counter2) := topVstOmz(top5Counter2+1);
                END LOOP;
                topVstOmz(top5Counter1):= ArrA_VST_ART(r2_vstdag.ART_NR).VST_TOTAAL;
                EXIT;
            END IF;
        END LOOP;

            ArrA(arrIndex).VST_NR := r2_vstdag.VST_NR;
            ArrA(arrIndex).VST_TOTAAL := 0;

            aantalVestigingen := aantalVestigingen + 1;  <<--"vestiging" stands for "location"
        END IF;
        
        IF ArrA_VST_ART(arrIndex).BON_ID <> r2_vstdag.BON_ID
        THEN


        END IF;

        ArrA(arrIndex).AANTAL_VERKOCHT := ArrA(arrIndex).AANTAL_VERKOCHT + r2_vstdag.AANTAL;
        ArrA(arrIndex).NETTO_OMZET_WAARDE := ArrA(arrIndex).NETTO_OMZET_WAARDE + r2_vstdag.BONREG_BEDRAG;
        ArrA(arrIndex).BGW := ArrA(arrIndex).BGW + r2_vstdag.BGW;

        IF r2_vstdag.PROM_ID IN (lijst met waarden voor promotie)
        THEN
          ArrA(arrIndex).PROM_OMZET_WAARDE := ArrA(arrIndex).PROM_OMZET_WAARDE + r2_vstdag.BONREG_BEDRAG;
        END IF;

        IF 
          ArrA(r2_vstdag.ART_NR).BON_ID <> r2_vstdag.BON_ID
        THEN
            BEGIN
              ArrA(arrIndex).AANTAL_KLANTEN = ArrA(arrIndex).AANTAL_KLANTEN +1;
              ArrA(arrIndex).BON_ID = r2_vstdag.BON_ID;
            END;
        END IF;
    END LOOP;
END LOOP;
:
:
...

[Updated on: Wed, 18 January 2006 10:11]

Report message to a moderator

Previous Topic: Natural Join
Next Topic: programmer
Goto Forum:
  


Current Time: Fri Apr 26 11:23:06 CDT 2024