Re: difficult select- question

From: Andreas Mosmann <mosmann_at_expires-30-11-2009.news-group.org>
Date: Mon, 02 Nov 2009 18:03:28 +0100
Message-ID: <1257181408.57_at_user.newsoffice.de>



Maxim Demenko schrieb am 27.10.2009 in <4AE6A3AF.4020506_at_gmail.com>:

> Andreas Mosmann wrote:

>> Hi ng,
>> 
>> I hope, there is someone with a good idea concerning the following 
>> question:
>> 
>> There is a sorted Table (or view) like the following:
>> F1|F2|F3|F4
>> AA|BB|10|20
>> AA|BB|20|30
>> AA|BB|40|50
>> AA|BB|50|60
>> AA|BB|60|70
>> AA|CC|70|80
>> AA|CC|80|90
>> 
>> As you can see there are some rows, that are connected by F3=prior F4, 
>> but there are leaks too. I need a result as the following:
>> 
>> AA|BB|10|30|2 parts
>> AA|BB|40|70|3 parts
>> AA|CC|70|90|2 parts
>> 
>> I know it is easy to solve by programming.
>> I think it is possible to do this with inline view and hierarchical query.
>> I can imagine a way by select min(F3), max(F4) ... where exists (enough 
>> connecting data in the middle).
>> 
>> But what is the best (SQL-)way to solve this? (There actually are about 
>> 15000 lines, the estimated result probably should not have more than 100 
>> lines)
>> I could do it by PL/SQL, but I am interested in SQL, if exists
>> 
>> Andreas Mosmann
>> 
>> 
>> 
>> 
>> 

> Another alternative could be use of analytical functions,
> 1) the classical interval calculation with *start of group* method
> SQL> with t as (
> 2 select 'AA' F1,'BB' F2,10 F3,20 F4 from dual union all
> 3 select 'AA','BB',20,30 from dual union all
> 4 select 'AA','BB',40,50 from dual union all
> 5 select 'AA','BB',50,60 from dual union all
> 6 select 'AA','BB',60,70 from dual union all
> 7 select 'AA','BB',90,100 from dual union all
> 8 select 'AA','BB',100,110 from dual union all
> 9 select 'AA','BB',110,120 from dual union all
> 10 select 'AA','BB',120,130 from dual union all
> 11 select 'AA','CC',10,20 from dual union all
> 12 select 'AA','CC',20,30 from dual union all
> 13 select 'AA','CC',30,40 from dual union all
> 14 select 'AA','CC',50,60 from dual union all
> 15 select 'AA','CC',60,70 from dual union all
> 16 select 'AA','CC',70,80 from dual union all
> 17 select 'AA','CC',80,90 from dual union all
> 18 select 'AA','CC',90,100 from dual union all
> 19 select 'AA','CC',100,110 from dual union all
> 20 select 'AA','CC',170,180 from dual union all
> 21 select 'AA','CC',180,190 from dual
> 22 )
> 23 -- End test data
> 24 select f1,f2,min(f3) f3,max(f4) f4,count(*) cnt
> 25 from (
> 26 select f1,
> 27 f2,
> 28 f3,
> 29 f4,
> 30 sum(start_of_group) over(partition by f1, f2 order by f3)
> group_no
> 31 from (select f1,
> 32 f2,
> 33 f3,
> 34 f4,
> 35 decode(lag(f4) over(partition by f1, f2 order by f3),
> 36 f3,
> 37 0,
> 38 1) start_of_group
> 39 from t)
> 40 )
> 41 group by f1,f2,group_no
> 42 order by f1,f2,f3
> 43 ;

> F1 F2 F3 F4 CNT
> -- -- ---------- ---------- ----------
> AA BB 10 30 2
> AA BB 40 70 3
> AA BB 90 130 4
> AA CC 10 40 3
> AA CC 50 110 6
> AA CC 170 190 2

> 2) Slightly different approach with analytics

> SQL> with t as (
> 2 select 'AA' F1,'BB' F2,10 F3,20 F4 from dual union all
> 3 select 'AA','BB',20,30 from dual union all
> 4 select 'AA','BB',40,50 from dual union all
> 5 select 'AA','BB',50,60 from dual union all
> 6 select 'AA','BB',60,70 from dual union all
> 7 select 'AA','BB',90,100 from dual union all
> 8 select 'AA','BB',100,110 from dual union all
> 9 select 'AA','BB',110,120 from dual union all
> 10 select 'AA','BB',120,130 from dual union all
> 11 select 'AA','CC',10,20 from dual union all
> 12 select 'AA','CC',20,30 from dual union all
> 13 select 'AA','CC',30,40 from dual union all
> 14 select 'AA','CC',50,60 from dual union all
> 15 select 'AA','CC',60,70 from dual union all
> 16 select 'AA','CC',70,80 from dual union all
> 17 select 'AA','CC',80,90 from dual union all
> 18 select 'AA','CC',90,100 from dual union all
> 19 select 'AA','CC',100,110 from dual union all
> 20 select 'AA','CC',170,180 from dual union all
> 21 select 'AA','CC',180,190 from dual
> 22 )
> 23 -- End test data
> 24 select f1,
> 25 f2,
> 26 f3,
> 27 nvl(lead(prev_f4) over(partition by f1, f2 order by f3),
> max_f4) f4,
> 28 nvl(lead(prev_cnt_f4)
> 29 over(partition by f1, f2 order by f3) - prev_cnt_f4,
> 30 cnt_f4 - prev_cnt_f4) cnt
> 31 from (select t.*,
> 32 count(*) over(partition by f1, f2) cnt_f4,
> 33 count(*) over(partition by f1, f2 order by f3
> 34 rows between unbounded preceding and 1 preceding)
> prev_cnt_f4,
> 35 max(f4) over(partition by f1, f2) max_f4,
> 36 max(f4) over(partition by f1, f2 order by f3
> 37 rows between unbounded preceding and 1 preceding)
> prev_f4,
> 38 decode(lag(f4) over(partition by f1, f2 order by f3),
> 39 f3,
> 40 0,
> 41 1) g
> 42 from t) t1
> 43 where g = 1
> 44 order by f1,f2,f3
> 45 ;

> F1 F2 F3 F4 CNT
> -- -- ---------- ---------- ----------
> AA BB 10 30 2
> AA BB 40 70 3
> AA BB 90 130 4
> AA CC 10 40 3
> AA CC 50 110 6
> AA CC 170 190 2

> Best regards

> Maxim

I did not understand everything by now, but it works fine. - Innermost select seems to give first row of a sub-group defined by connection between F3->F4 the number 1, all the others get number 0, - second step is a numbering by sub-groups, each sub-group gets the same number
- third stage is to find out the min/max and count of each sub-group

I never heard about this functions, but they work very well and super fast.

Many thanks
Andreas

-- 
wenn email, dann AndreasMosmann <bei> web <punkt> de
Received on Mon Nov 02 2009 - 11:03:28 CST

Original text of this message