Home » SQL & PL/SQL » SQL & PL/SQL » hlp with analytical function to calculate missing data
hlp with analytical function to calculate missing data Thu, 24 May 2007 03:03
 dusoo Messages: 41Registered: March 2007 Member
Hi everyone, is there a way to fill some missing numbers based on what have come before and after that missing number by starttime and based on how many missing data are between? ... by "simple" select? I know how to do that just theoreticly with simple math commands, but is there a way to apply them in sql (analytical functions)?
Thanks in advance for any ideas !

The missing number on line 17 could be calculated as 339+(1/2)*(356-339) = 347,5

The missing number on line 23 could be calculated as 355+(1/3)*(292-355) = 334
The missing number on line 24 could be calculated as 355+(2/3)*(292-355) = 313

temp_table (rownumber,starttime,data_column)
15 23.5.2007 16:15 , 258
16 23.5.2007 16:30 , 339
17 23.5.2007 16:45 ,
18 23.5.2007 17:00 , 356
19 23.5.2007 17:15 , 373
20 23.5.2007 17:30 , 355
21 23.5.2007 17:45 , 363
22 23.5.2007 18:00 , 355
23 23.5.2007 18:15 ,
24 23.5.2007 18:30 ,
25 23.5.2007 19:00 , 292
26 23.5.2007 19:15 , 295

thanks
Re: hlp with analytical function to calculate missing data [message #240183 is a reply to message #240115] Thu, 24 May 2007 05:56
 Cthulhu Messages: 381Registered: September 2006 Location: UK Senior Member
It's a bit convoluted, but here goes:

```drop table timings;

create table timings (seq_no number, time_of_event date, figure number);

insert into timings (seq_no, time_of_event, figure)
values (15, TO_DATE('23.5.2007 16:15', 'dd.mm.yyyy hh24:mi') , 258);

insert into timings (seq_no, time_of_event, figure)
values (16, TO_DATE('23.5.2007 16:30', 'dd.mm.yyyy hh24:mi') , 339);

insert into timings (seq_no, time_of_event, figure)
values (17, TO_DATE('23.5.2007 16:45', 'dd.mm.yyyy hh24:mi') , null);

insert into timings (seq_no, time_of_event, figure)
values (18, TO_DATE('23.5.2007 17:00', 'dd.mm.yyyy hh24:mi') , 356);

insert into timings (seq_no, time_of_event, figure)
values (19, TO_DATE('23.5.2007 17:15', 'dd.mm.yyyy hh24:mi') , 373);

insert into timings (seq_no, time_of_event, figure)
values (20, TO_DATE('23.5.2007 17:30', 'dd.mm.yyyy hh24:mi') , 355);

insert into timings (seq_no, time_of_event, figure)
values (21, TO_DATE('23.5.2007 17:45', 'dd.mm.yyyy hh24:mi') , 363);

insert into timings (seq_no, time_of_event, figure)
values (22, TO_DATE('23.5.2007 18:00', 'dd.mm.yyyy hh24:mi') , 355);

insert into timings (seq_no, time_of_event, figure)
values (23, TO_DATE('23.5.2007 18:15', 'dd.mm.yyyy hh24:mi') ,null);

insert into timings (seq_no, time_of_event, figure)
values (24, TO_DATE('23.5.2007 18:30', 'dd.mm.yyyy hh24:mi') ,null);

insert into timings (seq_no, time_of_event, figure)
values (25, TO_DATE('23.5.2007 19:00', 'dd.mm.yyyy hh24:mi') , 292);

insert into timings (seq_no, time_of_event, figure)
values (26, TO_DATE('23.5.2007 19:15', 'dd.mm.yyyy hh24:mi') , 295);

select seq_no, time_of_event, next_non_null_fig,
prev_non_null_fig + (((tot_rows-null_pos)/tot_rows)*(next_non_null_fig - prev_non_null_fig)) as computed
from
(select
t.seq_no, t.figure, t.time_of_event, pt.time_of_event as prev_non_null_time, nt.time_of_event as next_non_null_time,
pt.figure as prev_non_null_fig, nt.figure as next_non_null_fig,
(select count(*)-1 from timings t1 where t1.seq_no between t.seq_no and t.next_non_null_seq_no) as null_pos,
(select count(*)-1 from timings t1 where t1.seq_no between t.prev_non_null_seq_no and t.next_non_null_seq_no) as tot_rows
from
(select seq_no, figure,
time_of_event,
(select max(t2.seq_no) from timings t2 where t2.time_of_event < t1.time_of_event and t2.figure is not null) as prev_non_null_seq_no ,
(select min(t2.seq_no) from timings t2 where t2.time_of_event > t1.time_of_event and t2.figure is not null) as next_non_null_seq_no
from timings t1) t, timings nt, timings pt
where nt.seq_no   = t.next_non_null_seq_no
and   pt.seq_no   = t.prev_non_null_seq_no
and   t.figure is null)
order by seq_no
```
Re: hlp with analytical function to calculate missing data [message #240243 is a reply to message #240183] Thu, 24 May 2007 07:52
 Maaher Messages: 7062Registered: December 2001 Senior Member
This seems to be ideal for a MODEL clause query. Unfortunately, I've only got a 9i right now. Any takers?

MHE
Re: hlp with analytical function to calculate missing data [message #240247 is a reply to message #240115] Thu, 24 May 2007 08:01
 Cthulhu Messages: 381Registered: September 2006 Location: UK Senior Member
MHE,

Could you enlighten me on the meaning of that message. I've no idea what you are talking about.
Re: hlp with analytical function to calculate missing data [message #240249 is a reply to message #240247] Thu, 24 May 2007 08:07
 dusoo Messages: 41Registered: March 2007 Member
hi! well i've heard about model function or whatever is that - but i also have only ORA 9 at work...so i need to find solution for that ...thanks for any ideas
Re: hlp with analytical function to calculate missing data [message #240250 is a reply to message #240183] Thu, 24 May 2007 08:10
 dusoo Messages: 41Registered: March 2007 Member
hi, this one looks logical, but im worried how it would act once i run it on much much more data ... i guess analytical functions are good choice, but not sure how to use em to calculate that in ora 9
Re: hlp with analytical function to calculate missing data [message #240251 is a reply to message #240115] Thu, 24 May 2007 08:11
 Cthulhu Messages: 381Registered: September 2006 Location: UK Senior Member
I only have 9i as well, so can't try out MODEL.
Re: hlp with analytical function to calculate missing data [message #240253 is a reply to message #240115] Thu, 24 May 2007 08:21
 Cthulhu Messages: 381Registered: September 2006 Location: UK Senior Member
I considered analytic functions. Lead/lag would be the obvious choice and they would work fine for calculating 17 (where there is a single gap), but I can't see how to get it to calculate 23 and 24 very easliy. If you had a "lead/lag ignore nulls" function, which gave you the last non-null value, it might work, but I don't think you can do that.

The other obvious approach would be to use PL/SQL. It would be fairly easy just to write a cursor to go through the rows in seq_no order. When you hit a null value, you could store the entries in a small pl/sql table until you find the next non-null value. You could then compute the entries in the PL/SQL table and apply them as updates to the table.
Re: hlp with analytical function to calculate missing data [message #240258 is a reply to message #240253] Thu, 24 May 2007 08:30
 dusoo Messages: 41Registered: March 2007 Member
hi, well, plsql is not a problem at all to solve this, bu this way i would have to run a job every let's say a hour and update missing data...But i guess having ora9 and unable to use "ignore nulls" need me to move for plsql choice,...but still, i'll try to find sql online way to do that ...thanks. And yes, for one missing value it worked perfectly...
Re: hlp with analytical function to calculate missing data [message #240288 is a reply to message #240258] Thu, 24 May 2007 09:49
 JRowbottom Messages: 5933Registered: June 2006 Location: Sunny North Yorkshire, ho... Senior Member
Right - it can be done, but it's made my brain hurt, and seriously impacted the number of hours I can bill for today, so be grateful!!!!

```drop table temp_table;

create table temp_table (starttime date,data_column number);

insert into temp_table values ( to_date('23.5.2007 16:15','dd.mm.yyyy hh24:mi') , 258);
insert into temp_table values ( to_date('23.5.2007 16:30','dd.mm.yyyy hh24:mi') , 339);
insert into temp_table values ( to_date('23.5.2007 16:45','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 17:00','dd.mm.yyyy hh24:mi') , 356);
insert into temp_table values ( to_date('23.5.2007 17:15','dd.mm.yyyy hh24:mi') , 373);
insert into temp_table values ( to_date('23.5.2007 17:30','dd.mm.yyyy hh24:mi') , 355);
insert into temp_table values ( to_date('23.5.2007 17:45','dd.mm.yyyy hh24:mi') , 363);
insert into temp_table values ( to_date('23.5.2007 18:00','dd.mm.yyyy hh24:mi') , 355);
insert into temp_table values ( to_date('23.5.2007 18:15','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 18:30','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 19:00','dd.mm.yyyy hh24:mi') , 292);
insert into temp_table values ( to_date('23.5.2007 19:15','dd.mm.yyyy hh24:mi') , 295);

select starttime
,data_column    orig_data_col
,nvl(data_column ,lnn + (nnn-lnn)*(null_numerator/null_denominator)) data_column
from  (
select starttime
,data_column
,rnum
,last_non_null,next_non_null
,first_value(data_column) over (partition by last_non_null order by data_column nulls last) lnn
,first_value(data_column) over (partition by next_non_null order by data_column nulls last) nnn
,count(1) over (partition by last_non_null order by starttime rows between unbounded preceding and current row)-1 null_numerator
,sum(1) over (partition by next_non_null+last_non_null)+1 null_denominator
from (
select starttime
,data_column
,row_number() over (order by starttime) rnum
,row_number() over (order by starttime) -count(nvl2(data_column,null,'A')) over (order by starttime) last_non_null
,row_number() over (order by starttime)  +count(nvl2(data_column,null,'A')) over (order by starttime desc) next_non_null
from   temp_table)
)

STARTTIME      ORIG_DATA_COL DATA_COLUMN
-------------- ------------- -----------
23/05/07 16:15           258         258
23/05/07 16:30           339         339
23/05/07 16:45                     347.5
23/05/07 17:00           356         356
23/05/07 17:15           373         373
23/05/07 17:30           355         355
23/05/07 17:45           363         363
23/05/07 18:00           355         355
23/05/07 18:15                       334
23/05/07 18:30                       313
23/05/07 19:00           292         292
23/05/07 19:15           295         295```

The very innermost query uses a count of the total number of nulls so far, combined with the total number of rows so far to produce 2 different columns (Last_non_null and Next_Non_Null), each of which group the null values with either the trailing or the following non null value.
Then you just have to get these values into the same row as the Null records (as LNN and NNN), and just do some arithmetic.

As Tom Kyte says - Analytics Rock, Analytics Roll.

Now if you'll excuse me, I'm going to lie down.
Re: hlp with analytical function to calculate missing data [message #240311 is a reply to message #240288] Thu, 24 May 2007 10:37
 Michel Cadot Messages: 65143Registered: March 2007 Location: Nanterre, France, http://... Senior MemberAccount Moderator
Well done!
Unfortunatly I didn't have time to play with it.

Regards
Michel
Re: hlp with analytical function to calculate missing data [message #240532 is a reply to message #240115] Fri, 25 May 2007 03:10
 Cthulhu Messages: 381Registered: September 2006 Location: UK Senior Member
 Quote: hi, well, plsql is not a problem at all to solve this, bu this way i would have to run a job every let's say a hour and update missing data

If neither of the solutions suggested so far gives sufficient performance and you don't want to do updates, another option would be to write the stored procedure as I suggested but turn it into a pipeline procedure. That would enable you to have a select statement which returns the data you want.
Re: hlp with analytical function to calculate missing data [message #240569 is a reply to message #240115] Fri, 25 May 2007 04:04
 dusoo Messages: 41Registered: March 2007 Member
hi everyone, that solution worked just perfect on that amount of data i have ... So thanks !!!, now i just have to learn those "few" an.function u have used there

thanks

[Updated on: Fri, 25 May 2007 04:04]

Report message to a moderator

Re: hlp with analytical function to calculate missing data [message #240573 is a reply to message #240569] Fri, 25 May 2007 04:17
 JRowbottom Messages: 5933Registered: June 2006 Location: Sunny North Yorkshire, ho... Senior Member
You could probably improve the performace slightly by replacing the innermost query with this:
```select starttime
,data_column
,rnum
,rnum -count(nvl2(data_column,null,'A')) over (order by starttime) last_non_null
,rnum +count(nvl2(data_column,null,'A')) over (order by starttime desc) next_non_null
from   (
select starttime
,data_column
,row_number() over (order by starttime) rnum
from  temp_table)```
as this only needs to calculate one Row_number, rather than three.
 Previous Topic: Please hepl me on this QUERY Next Topic: DBMS_OUTPUT.PUT_LINE
Goto Forum:

Current Time: Sat Aug 19 08:09:04 CDT 2017

Total time taken to generate the page: 0.02666 seconds