Re: Analytic function Help

From: Lennart Jonsson <erik.lennart.jonsson_at_gmail.com>
Date: Tue, 01 Feb 2011 21:44:00 +0100
Message-ID: <ii9rah$538$1_at_news.eternal-september.org>



On 2011-02-01 19:31, Andreas Mosmann wrote:
> Andreas Mosmann schrieb am 31.01.2011 in
> <1296500556.65_at_user.newsoffice.de>:
>
>> Hi, there is a view like

>
>
>> ID   SORT   FK
>> 01     -5   N
>> 01     -4   N
>> 01     -3   Y
>> 01     -2   Y
>> 01     -1   N
>> 01      0   Y
>> 01      1   N
>> 01      2   N
>> 02     -2   N
>> 02     -1   Y
>> 02      0   Y
>> 02      1   Y
>> 03     -5   Y
>> 03     -4   N
>> 03     -3   Y
>> 03     -2   N
>> 03     -1   Y
>> 03      0   N
>> 03      1   Y

>
>> I need the number of seperate groups of 'Y'
>> that means in this case:

>
>> ID  COUNT
>> 01      2
>> 02      1
>> 03      4

>
>> ID 01: SORT (-2,-3) and (0) are a group ->2
>> ID 02: SORT (-1 .. +1) is a group ->1
>> ID 03: SORT (-5), (-3), (-1), (1) are groups ->4

>
>> I sometimes use lead and lag, but in this case I do not have an idea, 
>> but I feel it should be possible, isn't it?

>
>> Thank you
>> Andreas

>
> I solved the question above with your help. Now I have another question
> to the same subject.
>
> Imagine the result a little more complex. Instead of
>
>> ID  COUNT
>> 01      2
>> 02      1
>> 03      4

>
> there is maybe:
> ID START STOP COUNT
> 01 0 100 2
> 01 100 150 2
> 01 150 375 4
> 01 375 500 1
> 01 500 700 1
> 01 700 900 2
> 01 900 1015 2
>
> I now want to melt records with the same count, but only that, that
> follow each other.
>
> wished result
> ID START STOP COUNT
> 01 0 150 2
> 01 150 375 4
> 01 375 700 1
> 01 700 1015 2
>
> I former got an answer to a similar question by Mr. Maxim Demenko and I
> tried to change some hours, but I could not resolve my new problem. It
> is, that f.e. COUNT=2 is twice in the result. So I can not use MAX-
> Function to find out end of the block.
>
> Maxims idea (that fitted for the other question very fine) was
> Inner query:
> select
> ID, START,
> --to find out the first
> nvl(decode(lag(START) over (partition by ID, COUNT order by
> VST),NULL,1,0) FIRST_LINE,
> --to find out "Stop"- Value
> max(STOP) over (partition by ID, COUNT),
> COUNT
> from MyResult
>
> Complete query (simplified):
> select
> ID,START,MAX_STOP,COUNT
> from (
> select
> ID, START,
> --to find out the first
> nvl(decode(lag(START) over (partition by ID, COUNT order by
> VST),NULL,1,0) IS_FIRST_LINE,
> --to find out "Stop"- Value
> max(STOP) over (partition by ID, COUNT) MAX_STOP,
> COUNT
> from MyResult
> ) WHERE IS_FIRST_LINE=1
>
> It works for COUNT 1 and 4, but of course not for COUNT 2.
>
> Any ideas?

Perhaps best to start without using any analytic functions etc. Here's a sketch (at the moment I don't have an Oracle dbms to try with, so you might have to do some adjustments):

create table T (

    id char(2) not null,
    start int not null,
    stop int not null,
    cnt int not null,

    primary key (id, start)
);

insert into T (id, start, stop, cnt)
values

    ('01',0,100,2),
    ('01',100,150,2),
    ('01',150,375,4),
    ('01',375,500,1),
    ('01',500,700,1),
    ('01',700,900,2),
    ('01',900,1000,2);

For every tuple we would like to find the max stop such that no other cnt lies between our start and that stop:

select 	id, start, cnt, (select max(stop) as stop
			 from T T2 where (T1.id, T1.cnt)
				       = (T2.id, T2.cnt)
				     and not exists (
					select 1 from T T3
					where T2.id = T3.id
					  and T2.cnt <> T3.cnt
					  and T3.start between T1.stop
							   and T2.start
				     )
			)

from T T1

ID START CNT STOP
-- ----------- ----------- -----------

01           0           2         150
01         100           2         150
01         150           4         375
01         375           1         700
01         500           1         700
01         700           2        1000
01         900           2        1000

What remain is to find the smallest start per id, cnt and stop:

select id, min(start) as start, stop, cnt from (

	select id, start, cnt,
		(select max(stop) as stop
		 from T T2
		 where (T1.id, T1.cnt) = (T2.id, T2.cnt)
		   and not exists (
			select 1 from T T3
			where T2.id = T3.id
			  and T2.cnt <> T3.cnt
			  and T3.start between T1.stop and T2.start
		   )
		)
	from T T1

) group by id, stop, cnt order by stop

ID START STOP CNT
-- ----------- ----------- -----------

01           0         150           2
01         150         375           4
01         375         700           1
01         700        1000           2


Using a similar technique as in my previous example:

select id, start, stop, cnt,

       row_number() over (partition by id order by start) -
       row_number() over (partition by id, cnt order by start) as diff
from T
order by start

ID START STOP CNT DIFF

-- ----------- ----------- ----------- --------------------
01           0         100           2                    0
01         100         150           2                    0
01         150         375           4                    2
01         375         500           1                    3
01         500         700           1                    3
01         700         900           2                    3
01         900        1000           2                    3

min(start) and max(stop) per id, diff gives:

select id, min(start) as start, max(stop) as stop, cnt, diff from (

	select id, start, stop, cnt,
		row_number() over (partition by id order by start) -
		row_number() over (partition by id, cnt
				   order by start) as diff
	from T

)
group by id, cnt, diff
order by start

ID START STOP CNT DIFF

-- ----------- ----------- ----------- --------------------
01           0         150           2                    0
01         150         375           4                    2
01         375         700           1                    3
01         700        1000           2                    3


I'll have to stop here, if I come up with some other way I might post that later.

/Lennart Received on Tue Feb 01 2011 - 14:44:00 CST

Original text of this message