Home » SQL & PL/SQL » SQL & PL/SQL » hlp with analytical function to calculate missing data
hlp with analytical function to calculate missing data [message #240115] |
Thu, 24 May 2007 03:03 |
dusoo
Messages: 41 Registered: March 2007
|
Member |
|
|
Hi everyone, is there a way to fill some missing numbers based on what have come before and after that missing number by starttime and based on how many missing data are between? ... by "simple" select? I know how to do that just theoreticly with simple math commands, but is there a way to apply them in sql (analytical functions)?
Thanks in advance for any ideas !
The missing number on line 17 could be calculated as 339+(1/2)*(356-339) = 347,5
The missing number on line 23 could be calculated as 355+(1/3)*(292-355) = 334
The missing number on line 24 could be calculated as 355+(2/3)*(292-355) = 313
temp_table (rownumber,starttime,data_column)
15 23.5.2007 16:15 , 258
16 23.5.2007 16:30 , 339
17 23.5.2007 16:45 ,
18 23.5.2007 17:00 , 356
19 23.5.2007 17:15 , 373
20 23.5.2007 17:30 , 355
21 23.5.2007 17:45 , 363
22 23.5.2007 18:00 , 355
23 23.5.2007 18:15 ,
24 23.5.2007 18:30 ,
25 23.5.2007 19:00 , 292
26 23.5.2007 19:15 , 295
thanks
|
|
|
Re: hlp with analytical function to calculate missing data [message #240183 is a reply to message #240115] |
Thu, 24 May 2007 05:56 |
Cthulhu
Messages: 381 Registered: September 2006 Location: UK
|
Senior Member |
|
|
It's a bit convoluted, but here goes:
drop table timings;
create table timings (seq_no number, time_of_event date, figure number);
insert into timings (seq_no, time_of_event, figure)
values (15, TO_DATE('23.5.2007 16:15', 'dd.mm.yyyy hh24:mi') , 258);
insert into timings (seq_no, time_of_event, figure)
values (16, TO_DATE('23.5.2007 16:30', 'dd.mm.yyyy hh24:mi') , 339);
insert into timings (seq_no, time_of_event, figure)
values (17, TO_DATE('23.5.2007 16:45', 'dd.mm.yyyy hh24:mi') , null);
insert into timings (seq_no, time_of_event, figure)
values (18, TO_DATE('23.5.2007 17:00', 'dd.mm.yyyy hh24:mi') , 356);
insert into timings (seq_no, time_of_event, figure)
values (19, TO_DATE('23.5.2007 17:15', 'dd.mm.yyyy hh24:mi') , 373);
insert into timings (seq_no, time_of_event, figure)
values (20, TO_DATE('23.5.2007 17:30', 'dd.mm.yyyy hh24:mi') , 355);
insert into timings (seq_no, time_of_event, figure)
values (21, TO_DATE('23.5.2007 17:45', 'dd.mm.yyyy hh24:mi') , 363);
insert into timings (seq_no, time_of_event, figure)
values (22, TO_DATE('23.5.2007 18:00', 'dd.mm.yyyy hh24:mi') , 355);
insert into timings (seq_no, time_of_event, figure)
values (23, TO_DATE('23.5.2007 18:15', 'dd.mm.yyyy hh24:mi') ,null);
insert into timings (seq_no, time_of_event, figure)
values (24, TO_DATE('23.5.2007 18:30', 'dd.mm.yyyy hh24:mi') ,null);
insert into timings (seq_no, time_of_event, figure)
values (25, TO_DATE('23.5.2007 19:00', 'dd.mm.yyyy hh24:mi') , 292);
insert into timings (seq_no, time_of_event, figure)
values (26, TO_DATE('23.5.2007 19:15', 'dd.mm.yyyy hh24:mi') , 295);
select seq_no, time_of_event, next_non_null_fig,
prev_non_null_fig + (((tot_rows-null_pos)/tot_rows)*(next_non_null_fig - prev_non_null_fig)) as computed
from
(select
t.seq_no, t.figure, t.time_of_event, pt.time_of_event as prev_non_null_time, nt.time_of_event as next_non_null_time,
pt.figure as prev_non_null_fig, nt.figure as next_non_null_fig,
(select count(*)-1 from timings t1 where t1.seq_no between t.seq_no and t.next_non_null_seq_no) as null_pos,
(select count(*)-1 from timings t1 where t1.seq_no between t.prev_non_null_seq_no and t.next_non_null_seq_no) as tot_rows
from
(select seq_no, figure,
time_of_event,
(select max(t2.seq_no) from timings t2 where t2.time_of_event < t1.time_of_event and t2.figure is not null) as prev_non_null_seq_no ,
(select min(t2.seq_no) from timings t2 where t2.time_of_event > t1.time_of_event and t2.figure is not null) as next_non_null_seq_no
from timings t1) t, timings nt, timings pt
where nt.seq_no = t.next_non_null_seq_no
and pt.seq_no = t.prev_non_null_seq_no
and t.figure is null)
order by seq_no
|
|
|
|
|
|
|
|
Re: hlp with analytical function to calculate missing data [message #240253 is a reply to message #240115] |
Thu, 24 May 2007 08:21 |
Cthulhu
Messages: 381 Registered: September 2006 Location: UK
|
Senior Member |
|
|
I considered analytic functions. Lead/lag would be the obvious choice and they would work fine for calculating 17 (where there is a single gap), but I can't see how to get it to calculate 23 and 24 very easliy. If you had a "lead/lag ignore nulls" function, which gave you the last non-null value, it might work, but I don't think you can do that.
The other obvious approach would be to use PL/SQL. It would be fairly easy just to write a cursor to go through the rows in seq_no order. When you hit a null value, you could store the entries in a small pl/sql table until you find the next non-null value. You could then compute the entries in the PL/SQL table and apply them as updates to the table.
|
|
|
Re: hlp with analytical function to calculate missing data [message #240258 is a reply to message #240253] |
Thu, 24 May 2007 08:30 |
dusoo
Messages: 41 Registered: March 2007
|
Member |
|
|
hi, well, plsql is not a problem at all to solve this, bu this way i would have to run a job every let's say a hour and update missing data...But i guess having ora9 and unable to use "ignore nulls" need me to move for plsql choice,...but still, i'll try to find sql online way to do that ...thanks. And yes, for one missing value it worked perfectly...
|
|
|
Re: hlp with analytical function to calculate missing data [message #240288 is a reply to message #240258] |
Thu, 24 May 2007 09:49 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
Right - it can be done, but it's made my brain hurt, and seriously impacted the number of hours I can bill for today, so be grateful!!!!
drop table temp_table;
create table temp_table (starttime date,data_column number);
insert into temp_table values ( to_date('23.5.2007 16:15','dd.mm.yyyy hh24:mi') , 258);
insert into temp_table values ( to_date('23.5.2007 16:30','dd.mm.yyyy hh24:mi') , 339);
insert into temp_table values ( to_date('23.5.2007 16:45','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 17:00','dd.mm.yyyy hh24:mi') , 356);
insert into temp_table values ( to_date('23.5.2007 17:15','dd.mm.yyyy hh24:mi') , 373);
insert into temp_table values ( to_date('23.5.2007 17:30','dd.mm.yyyy hh24:mi') , 355);
insert into temp_table values ( to_date('23.5.2007 17:45','dd.mm.yyyy hh24:mi') , 363);
insert into temp_table values ( to_date('23.5.2007 18:00','dd.mm.yyyy hh24:mi') , 355);
insert into temp_table values ( to_date('23.5.2007 18:15','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 18:30','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 19:00','dd.mm.yyyy hh24:mi') , 292);
insert into temp_table values ( to_date('23.5.2007 19:15','dd.mm.yyyy hh24:mi') , 295);
select starttime
,data_column orig_data_col
,nvl(data_column ,lnn + (nnn-lnn)*(null_numerator/null_denominator)) data_column
from (
select starttime
,data_column
,rnum
,last_non_null,next_non_null
,first_value(data_column) over (partition by last_non_null order by data_column nulls last) lnn
,first_value(data_column) over (partition by next_non_null order by data_column nulls last) nnn
,count(1) over (partition by last_non_null order by starttime rows between unbounded preceding and current row)-1 null_numerator
,sum(1) over (partition by next_non_null+last_non_null)+1 null_denominator
from (
select starttime
,data_column
,row_number() over (order by starttime) rnum
,row_number() over (order by starttime) -count(nvl2(data_column,null,'A')) over (order by starttime) last_non_null
,row_number() over (order by starttime) +count(nvl2(data_column,null,'A')) over (order by starttime desc) next_non_null
from temp_table)
)
STARTTIME ORIG_DATA_COL DATA_COLUMN
-------------- ------------- -----------
23/05/07 16:15 258 258
23/05/07 16:30 339 339
23/05/07 16:45 347.5
23/05/07 17:00 356 356
23/05/07 17:15 373 373
23/05/07 17:30 355 355
23/05/07 17:45 363 363
23/05/07 18:00 355 355
23/05/07 18:15 334
23/05/07 18:30 313
23/05/07 19:00 292 292
23/05/07 19:15 295 295
The very innermost query uses a count of the total number of nulls so far, combined with the total number of rows so far to produce 2 different columns (Last_non_null and Next_Non_Null), each of which group the null values with either the trailing or the following non null value.
Then you just have to get these values into the same row as the Null records (as LNN and NNN), and just do some arithmetic.
As Tom Kyte says - Analytics Rock, Analytics Roll.
Now if you'll excuse me, I'm going to lie down.
|
|
|
|
Re: hlp with analytical function to calculate missing data [message #240532 is a reply to message #240115] |
Fri, 25 May 2007 03:10 |
Cthulhu
Messages: 381 Registered: September 2006 Location: UK
|
Senior Member |
|
|
Quote: | hi, well, plsql is not a problem at all to solve this, bu this way i would have to run a job every let's say a hour and update missing data
|
If neither of the solutions suggested so far gives sufficient performance and you don't want to do updates, another option would be to write the stored procedure as I suggested but turn it into a pipeline procedure. That would enable you to have a select statement which returns the data you want.
|
|
|
|
Re: hlp with analytical function to calculate missing data [message #240573 is a reply to message #240569] |
Fri, 25 May 2007 04:17 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
You could probably improve the performace slightly by replacing the innermost query with this:select starttime
,data_column
,rnum
,rnum -count(nvl2(data_column,null,'A')) over (order by starttime) last_non_null
,rnum +count(nvl2(data_column,null,'A')) over (order by starttime desc) next_non_null
from (
select starttime
,data_column
,row_number() over (order by starttime) rnum
from temp_table) as this only needs to calculate one Row_number, rather than three.
|
|
|
Goto Forum:
Current Time: Tue Dec 03 08:14:49 CST 2024
|