reset cumulative sum

From: <murali.lakshman_at_gmail.com>
Date: 26 Dec 2004 09:03:32 -0800
Message-ID: <1104080612.417167.244200@z14g2000cwz.googlegroups.com>

I have a problem where I want to reset cumulative percentage on a table if it becomes greater than 1. I have solved this problem for now using collections but the solution is not very good in terms of performance. I would really prefer the solution to be more in SQL than pl/sql to help performance. Having said that, I think it is impossible to do this in one sql statement unless you use "MODEL BY" clause in 10g. I am currently using 9i.

I know there can be clever solutions created using "Pipelined" functions and I want to see if some folks have ideas on this.

I have a table txn as follows:
-- example 1: incoming data

seq order_id order_dt agent_id Percent group_seq

1  A-1       1/1/04   123       0.2
2  A-1       1/1/04   123       0.4
3  A-1       1/1/04   123       0.3
4  A-1       1/1/04   123       0.1
5  A-1       1/1/04   456       0.3
6  A-1       1/1/04   456       0.3
7  A-1       1/1/04   456       0.5
8  A-1       1/1/04   456       0.5

The goal is to be able to group transactions that have percentages that add up to 100% for a order_id, order_dt and agent_id. The transactions come in ordered with a seq in the table.

So in this case for A-1, 1/1/04 and agent 123, txns 1 thro 4 add up to 100% and we can group them as group "1". We continue with grouping for transaction seq's 5 and above for agent 456. Txn 5 and 6 add upto to less than 100 % and so we add txn 7 to it and the sum is now greater than 100%. Therefore, we drop txn 5 and add txn 8 to 6 and 7 and the result is the same. We now drop txn 6 and add 7 and 8 and we have a new group that adds to 100 %. we call this say group "2" and so on..

The result of our process should produce the following:

example 1: after group sequences have been created

seq order_id order_dt agent_id Percent group_seq

1  A-1       1/1/04   123        0.2     1
2  A-1       1/1/04   123        0.4     1
3  A-1       1/1/04   123        0.3     1
4  A-1       1/1/04   123        0.1     1
5  A-1       1/1/04   456        0.3
6  A-1       1/1/04   456        0.3
7  A-1       1/1/04   456        0.5     2
8  A-1       1/1/04   456        0.5     2

The general idea is as follows:

Create sub-groups within a group of transactions that have the same order_id, order_dt and agent_id. In the above example it would be A-1 and 1/1/04. As we go thro txns with the same order_id and order_dt, we keep a cumulative sum of percentage (per agent_id). If the cumulative % is less than 100 % we keep traversing, if greater than 100 % we disregard all the traversed records, if cumulative sum = 100 % then assign a group seq to it. As we keep traversing the txns, if the agent_id changes we again disregard the traversed txns if they dont add up to 100 %. The transactions come in ordered with a seq.

Here is some sample data:

create table txn (
seq number,

order_id   varchar2(10),
order_dt   date,
agent_id   varchar2(10),

percent number,
group_seq number
);

data for ex1 above insert into txn values (1, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'123', 0.2, null);
insert into txn values (2, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'123', 0.4, null);
insert into txn values (3, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'123', 0.3, null);
insert into txn values (4, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'123', 0.1, null);
insert into txn values (5, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'456', 0.3, null);
insert into txn values (6, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'456', 0.3, null);
insert into txn values (7, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'456', 0.5, null);
insert into txn values (8, 'A-1', to_date('1/1/2004', 'mm/dd/yyyy'),
'456', 0.5, null);
example 2: incoming data seq order_id order_dt agent_id Percent group_seq

9   A-2        1/1/04   345       0.75
10  A-2       1/1/04   345        0.75
11  A-2       1/1/04   345        0.25
12  A-2       1/1/04   345        0.75
13  A-2       1/1/04   345        0.25
14  A-2       1/1/04   345        0.25
15  A-2       1/1/04   345        0.75
16  A-2       1/1/04   345        0.25

after creation of group sequences

seq order_id order_dt agent_id Percent group_seq

9   A-2        1/1/04   345       0.75
10  A-2       1/1/04   345        0.75    1
11  A-2       1/1/04   345        0.25    1
12  A-2       1/1/04   345        0.75    2
13  A-2       1/1/04   345        0.25    2
14  A-2       1/1/04   345        0.25    3
15  A-2       1/1/04   345        0.75    3
16  A-2       1/1/04   345        0.25

data for example 2 insert into txn values (9, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.75, null);
insert into txn values (10, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.75, null);
insert into txn values (11, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.25, null);
insert into txn values (12, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.75, null);
insert into txn values (13, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.25, null);
insert into txn values (14, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.25, null);
insert into txn values (15, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.75, null);
insert into txn values (16, 'A-2', to_date('1/1/2004', 'mm/dd/yyyy'),
'345', 0.25, null);

Please let me know if I need to clarify this more. Thanks for your help. Received on Sun Dec 26 2004 - 11:03:32 CST