Re: difficult select- question

From: Andreas Mosmann <mosmann_at_expires-30-11-2009.news-group.org>
Date: Mon, 16 Nov 2009 22:35:20 +0100
Message-ID: <1258407320.92_at_user.newsoffice.de>

Andreas Mosmann schrieb am 02.11.2009 in <1257181408.57_at_user.newsoffice.de>:

> Maxim Demenko schrieb am 27.10.2009 in <4AE6A3AF.4020506_at_gmail.com>:

>> Andreas Mosmann wrote:

>>> Hi ng,
>>> 
>>> I hope, there is someone with a good idea concerning the following 
>>> question:
>>> 
>>> There is a sorted Table (or view) like the following:
>>> F1|F2|F3|F4
>>> AA|BB|10|20
>>> AA|BB|20|30
>>> AA|BB|40|50
>>> AA|BB|50|60
>>> AA|BB|60|70
>>> AA|CC|70|80
>>> AA|CC|80|90
>>> 
>>> As you can see there are some rows, that are connected by F3=prior F4, 
>>> but there are leaks too. I need a result as the following:
>>> 
>>> AA|BB|10|30|2 parts
>>> AA|BB|40|70|3 parts
>>> AA|CC|70|90|2 parts
>>> 
>>> I know it is easy to solve by programming.
>>> I think it is possible to do this with inline view and hierarchical query.
>>> I can imagine a way by select min(F3), max(F4) ... where exists (enough 
>>> connecting data in the middle).
>>> 
>>> But what is the best (SQL-)way to solve this? (There actually are about 
>>> 15000 lines, the estimated result probably should not have more than 100 
>>> lines)
>>> I could do it by PL/SQL, but I am interested in SQL, if exists
>>> 
>>> Andreas Mosmann
>>> 
>>> 
>>> 
>>> 
>>>

>> Another alternative could be use of analytical functions,
>> 1) the classical interval calculation with *start of group* method
>> SQL> with t as (
>> 2 select 'AA' F1,'BB' F2,10 F3,20 F4 from dual union all
>> 3 select 'AA','BB',20,30 from dual union all
>> 4 select 'AA','BB',40,50 from dual union all
>> 5 select 'AA','BB',50,60 from dual union all
>> 6 select 'AA','BB',60,70 from dual union all
>> 7 select 'AA','BB',90,100 from dual union all
>> 8 select 'AA','BB',100,110 from dual union all
>> 9 select 'AA','BB',110,120 from dual union all
>> 10 select 'AA','BB',120,130 from dual union all
>> 11 select 'AA','CC',10,20 from dual union all
>> 12 select 'AA','CC',20,30 from dual union all
>> 13 select 'AA','CC',30,40 from dual union all
>> 14 select 'AA','CC',50,60 from dual union all
>> 15 select 'AA','CC',60,70 from dual union all
>> 16 select 'AA','CC',70,80 from dual union all
>> 17 select 'AA','CC',80,90 from dual union all
>> 18 select 'AA','CC',90,100 from dual union all
>> 19 select 'AA','CC',100,110 from dual union all
>> 20 select 'AA','CC',170,180 from dual union all
>> 21 select 'AA','CC',180,190 from dual
>> 22 )
>> 23 -- End test data
>> 24 select f1,f2,min(f3) f3,max(f4) f4,count(*) cnt
>> 25 from (
>> 26 select f1,
>> 27 f2,
>> 28 f3,
>> 29 f4,
>> 30 sum(start_of_group) over(partition by f1, f2 order by f3)
>> group_no
>> 31 from (select f1,
>> 32 f2,
>> 33 f3,
>> 34 f4,
>> 35 decode(lag(f4) over(partition by f1, f2 order by f3),
>> 36 f3,
>> 37 0,
>> 38 1) start_of_group
>> 39 from t)
>> 40 )
>> 41 group by f1,f2,group_no
>> 42 order by f1,f2,f3
>> 43 ;

>> F1 F2 F3 F4 CNT
>> -- -- ---------- ---------- ----------
>> AA BB 10 30 2
>> AA BB 40 70 3
>> AA BB 90 130 4
>> AA CC 10 40 3
>> AA CC 50 110 6
>> AA CC 170 190 2

>> 2) Slightly different approach with analytics

>> SQL> with t as (
>> 2 select 'AA' F1,'BB' F2,10 F3,20 F4 from dual union all
>> 3 select 'AA','BB',20,30 from dual union all
>> 4 select 'AA','BB',40,50 from dual union all
>> 5 select 'AA','BB',50,60 from dual union all
>> 6 select 'AA','BB',60,70 from dual union all
>> 7 select 'AA','BB',90,100 from dual union all
>> 8 select 'AA','BB',100,110 from dual union all
>> 9 select 'AA','BB',110,120 from dual union all
>> 10 select 'AA','BB',120,130 from dual union all
>> 11 select 'AA','CC',10,20 from dual union all
>> 12 select 'AA','CC',20,30 from dual union all
>> 13 select 'AA','CC',30,40 from dual union all
>> 14 select 'AA','CC',50,60 from dual union all
>> 15 select 'AA','CC',60,70 from dual union all
>> 16 select 'AA','CC',70,80 from dual union all
>> 17 select 'AA','CC',80,90 from dual union all
>> 18 select 'AA','CC',90,100 from dual union all
>> 19 select 'AA','CC',100,110 from dual union all
>> 20 select 'AA','CC',170,180 from dual union all
>> 21 select 'AA','CC',180,190 from dual
>> 22 )
>> 23 -- End test data
>> 24 select f1,
>> 25 f2,
>> 26 f3,
>> 27 nvl(lead(prev_f4) over(partition by f1, f2 order by f3),
>> max_f4) f4,
>> 28 nvl(lead(prev_cnt_f4)
>> 29 over(partition by f1, f2 order by f3) - prev_cnt_f4,
>> 30 cnt_f4 - prev_cnt_f4) cnt
>> 31 from (select t.*,
>> 32 count(*) over(partition by f1, f2) cnt_f4,
>> 33 count(*) over(partition by f1, f2 order by f3
>> 34 rows between unbounded preceding and 1 preceding)
>> prev_cnt_f4,
>> 35 max(f4) over(partition by f1, f2) max_f4,
>> 36 max(f4) over(partition by f1, f2 order by f3
>> 37 rows between unbounded preceding and 1 preceding)
>> prev_f4,
>> 38 decode(lag(f4) over(partition by f1, f2 order by f3),
>> 39 f3,
>> 40 0,
>> 41 1) g
>> 42 from t) t1
>> 43 where g = 1
>> 44 order by f1,f2,f3
>> 45 ;

>> F1 F2 F3 F4 CNT
>> -- -- ---------- ---------- ----------
>> AA BB 10 30 2
>> AA BB 40 70 3
>> AA BB 90 130 4
>> AA CC 10 40 3
>> AA CC 50 110 6
>> AA CC 170 190 2

>> Best regards

>> Maxim

> I did not understand everything by now, but it works fine.
> - Innermost select seems to give first row of a sub-group defined by 
> connection between F3->F4 the number 1, all the others get number 0,
> - second step is a numbering by sub-groups, each sub-group gets the same 
> number
> - third stage is to find out the min/max and count of each sub-group

> I never heard about this functions, but they work very well and super > fast.

> Many thanks
> Andreas