Home » SQL & PL/SQL » SQL & PL/SQL » hlp with analytical function to calculate missing data
hlp with analytical function to calculate missing data [message #240115] Thu, 24 May 2007 03:03 Go to next message
dusoo
Messages: 41
Registered: March 2007
Member
Hi everyone, is there a way to fill some missing numbers based on what have come before and after that missing number by starttime and based on how many missing data are between? ... by "simple" select? I know how to do that just theoreticly with simple math commands, but is there a way to apply them in sql (analytical functions)?
Thanks in advance for any ideas !

The missing number on line 17 could be calculated as 339+(1/2)*(356-339) = 347,5

The missing number on line 23 could be calculated as 355+(1/3)*(292-355) = 334
The missing number on line 24 could be calculated as 355+(2/3)*(292-355) = 313

temp_table (rownumber,starttime,data_column)
15 23.5.2007 16:15 , 258
16 23.5.2007 16:30 , 339
17 23.5.2007 16:45 ,
18 23.5.2007 17:00 , 356
19 23.5.2007 17:15 , 373
20 23.5.2007 17:30 , 355
21 23.5.2007 17:45 , 363
22 23.5.2007 18:00 , 355
23 23.5.2007 18:15 ,
24 23.5.2007 18:30 ,
25 23.5.2007 19:00 , 292
26 23.5.2007 19:15 , 295

thanks
Re: hlp with analytical function to calculate missing data [message #240183 is a reply to message #240115] Thu, 24 May 2007 05:56 Go to previous messageGo to next message
Cthulhu
Messages: 381
Registered: September 2006
Location: UK
Senior Member
It's a bit convoluted, but here goes:

drop table timings;

create table timings (seq_no number, time_of_event date, figure number);

insert into timings (seq_no, time_of_event, figure)
values (15, TO_DATE('23.5.2007 16:15', 'dd.mm.yyyy hh24:mi') , 258);

insert into timings (seq_no, time_of_event, figure)
values (16, TO_DATE('23.5.2007 16:30', 'dd.mm.yyyy hh24:mi') , 339);

insert into timings (seq_no, time_of_event, figure)
values (17, TO_DATE('23.5.2007 16:45', 'dd.mm.yyyy hh24:mi') , null);

insert into timings (seq_no, time_of_event, figure)
values (18, TO_DATE('23.5.2007 17:00', 'dd.mm.yyyy hh24:mi') , 356);

insert into timings (seq_no, time_of_event, figure)
values (19, TO_DATE('23.5.2007 17:15', 'dd.mm.yyyy hh24:mi') , 373);

insert into timings (seq_no, time_of_event, figure)
values (20, TO_DATE('23.5.2007 17:30', 'dd.mm.yyyy hh24:mi') , 355);

insert into timings (seq_no, time_of_event, figure)
values (21, TO_DATE('23.5.2007 17:45', 'dd.mm.yyyy hh24:mi') , 363);

insert into timings (seq_no, time_of_event, figure)
values (22, TO_DATE('23.5.2007 18:00', 'dd.mm.yyyy hh24:mi') , 355);

insert into timings (seq_no, time_of_event, figure)
values (23, TO_DATE('23.5.2007 18:15', 'dd.mm.yyyy hh24:mi') ,null);

insert into timings (seq_no, time_of_event, figure)
values (24, TO_DATE('23.5.2007 18:30', 'dd.mm.yyyy hh24:mi') ,null);

insert into timings (seq_no, time_of_event, figure)
values (25, TO_DATE('23.5.2007 19:00', 'dd.mm.yyyy hh24:mi') , 292);

insert into timings (seq_no, time_of_event, figure)
values (26, TO_DATE('23.5.2007 19:15', 'dd.mm.yyyy hh24:mi') , 295);


select seq_no, time_of_event, next_non_null_fig,
prev_non_null_fig + (((tot_rows-null_pos)/tot_rows)*(next_non_null_fig - prev_non_null_fig)) as computed 
from
(select 
t.seq_no, t.figure, t.time_of_event, pt.time_of_event as prev_non_null_time, nt.time_of_event as next_non_null_time,
pt.figure as prev_non_null_fig, nt.figure as next_non_null_fig,
(select count(*)-1 from timings t1 where t1.seq_no between t.seq_no and t.next_non_null_seq_no) as null_pos,
(select count(*)-1 from timings t1 where t1.seq_no between t.prev_non_null_seq_no and t.next_non_null_seq_no) as tot_rows
from
(select seq_no, figure, 
time_of_event,
(select max(t2.seq_no) from timings t2 where t2.time_of_event < t1.time_of_event and t2.figure is not null) as prev_non_null_seq_no ,
(select min(t2.seq_no) from timings t2 where t2.time_of_event > t1.time_of_event and t2.figure is not null) as next_non_null_seq_no
from timings t1) t, timings nt, timings pt
where nt.seq_no   = t.next_non_null_seq_no 
and   pt.seq_no   = t.prev_non_null_seq_no
and   t.figure is null)
order by seq_no
Re: hlp with analytical function to calculate missing data [message #240243 is a reply to message #240183] Thu, 24 May 2007 07:52 Go to previous messageGo to next message
Maaher
Messages: 7062
Registered: December 2001
Senior Member
This seems to be ideal for a MODEL clause query. Unfortunately, I've only got a 9i right now. Any takers? Very Happy

MHE
Re: hlp with analytical function to calculate missing data [message #240247 is a reply to message #240115] Thu, 24 May 2007 08:01 Go to previous messageGo to next message
Cthulhu
Messages: 381
Registered: September 2006
Location: UK
Senior Member
MHE,

Could you enlighten me on the meaning of that message. I've no idea what you are talking about.
Re: hlp with analytical function to calculate missing data [message #240249 is a reply to message #240247] Thu, 24 May 2007 08:07 Go to previous messageGo to next message
dusoo
Messages: 41
Registered: March 2007
Member
hi! well i've heard about model function or whatever is that - but i also have only ORA 9 at work...so i need to find solution for that Sad ...thanks for any ideas
Re: hlp with analytical function to calculate missing data [message #240250 is a reply to message #240183] Thu, 24 May 2007 08:10 Go to previous messageGo to next message
dusoo
Messages: 41
Registered: March 2007
Member
hi, this one looks logical, but im worried how it would act once i run it on much much more data ... Sad i guess analytical functions are good choice, but not sure how to use em to calculate that in ora 9
Re: hlp with analytical function to calculate missing data [message #240251 is a reply to message #240115] Thu, 24 May 2007 08:11 Go to previous messageGo to next message
Cthulhu
Messages: 381
Registered: September 2006
Location: UK
Senior Member
I only have 9i as well, so can't try out MODEL.
Re: hlp with analytical function to calculate missing data [message #240253 is a reply to message #240115] Thu, 24 May 2007 08:21 Go to previous messageGo to next message
Cthulhu
Messages: 381
Registered: September 2006
Location: UK
Senior Member
I considered analytic functions. Lead/lag would be the obvious choice and they would work fine for calculating 17 (where there is a single gap), but I can't see how to get it to calculate 23 and 24 very easliy. If you had a "lead/lag ignore nulls" function, which gave you the last non-null value, it might work, but I don't think you can do that.

The other obvious approach would be to use PL/SQL. It would be fairly easy just to write a cursor to go through the rows in seq_no order. When you hit a null value, you could store the entries in a small pl/sql table until you find the next non-null value. You could then compute the entries in the PL/SQL table and apply them as updates to the table.
Re: hlp with analytical function to calculate missing data [message #240258 is a reply to message #240253] Thu, 24 May 2007 08:30 Go to previous messageGo to next message
dusoo
Messages: 41
Registered: March 2007
Member
hi, well, plsql is not a problem at all to solve this, bu this way i would have to run a job every let's say a hour and update missing data...But i guess having ora9 and unable to use "ignore nulls" need me to move for plsql choice,...but still, i'll try to find sql online way to do that ...thanks. And yes, for one missing value it worked perfectly...
Re: hlp with analytical function to calculate missing data [message #240288 is a reply to message #240258] Thu, 24 May 2007 09:49 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Right - it can be done, but it's made my brain hurt, and seriously impacted the number of hours I can bill for today, so be grateful!!!!

drop table temp_table;

create table temp_table (starttime date,data_column number);

insert into temp_table values ( to_date('23.5.2007 16:15','dd.mm.yyyy hh24:mi') , 258);
insert into temp_table values ( to_date('23.5.2007 16:30','dd.mm.yyyy hh24:mi') , 339);
insert into temp_table values ( to_date('23.5.2007 16:45','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 17:00','dd.mm.yyyy hh24:mi') , 356);
insert into temp_table values ( to_date('23.5.2007 17:15','dd.mm.yyyy hh24:mi') , 373);
insert into temp_table values ( to_date('23.5.2007 17:30','dd.mm.yyyy hh24:mi') , 355);
insert into temp_table values ( to_date('23.5.2007 17:45','dd.mm.yyyy hh24:mi') , 363);
insert into temp_table values ( to_date('23.5.2007 18:00','dd.mm.yyyy hh24:mi') , 355);
insert into temp_table values ( to_date('23.5.2007 18:15','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 18:30','dd.mm.yyyy hh24:mi') , null);
insert into temp_table values ( to_date('23.5.2007 19:00','dd.mm.yyyy hh24:mi') , 292);
insert into temp_table values ( to_date('23.5.2007 19:15','dd.mm.yyyy hh24:mi') , 295);

select starttime
      ,data_column    orig_data_col
      ,nvl(data_column ,lnn + (nnn-lnn)*(null_numerator/null_denominator)) data_column
from  (      
select starttime
      ,data_column
      ,rnum
      ,last_non_null,next_non_null
      ,first_value(data_column) over (partition by last_non_null order by data_column nulls last) lnn
      ,first_value(data_column) over (partition by next_non_null order by data_column nulls last) nnn      
      ,count(1) over (partition by last_non_null order by starttime rows between unbounded preceding and current row)-1 null_numerator
      ,sum(1) over (partition by next_non_null+last_non_null)+1 null_denominator
from (
select starttime
      ,data_column
      ,row_number() over (order by starttime) rnum
      ,row_number() over (order by starttime) -count(nvl2(data_column,null,'A')) over (order by starttime) last_non_null
      ,row_number() over (order by starttime)  +count(nvl2(data_column,null,'A')) over (order by starttime desc) next_non_null
from   temp_table)
)

STARTTIME      ORIG_DATA_COL DATA_COLUMN
-------------- ------------- -----------
23/05/07 16:15           258         258
23/05/07 16:30           339         339
23/05/07 16:45                     347.5
23/05/07 17:00           356         356
23/05/07 17:15           373         373
23/05/07 17:30           355         355
23/05/07 17:45           363         363
23/05/07 18:00           355         355
23/05/07 18:15                       334
23/05/07 18:30                       313
23/05/07 19:00           292         292
23/05/07 19:15           295         295


The very innermost query uses a count of the total number of nulls so far, combined with the total number of rows so far to produce 2 different columns (Last_non_null and Next_Non_Null), each of which group the null values with either the trailing or the following non null value.
Then you just have to get these values into the same row as the Null records (as LNN and NNN), and just do some arithmetic.

As Tom Kyte says - Analytics Rock, Analytics Roll.

Now if you'll excuse me, I'm going to lie down.
Re: hlp with analytical function to calculate missing data [message #240311 is a reply to message #240288] Thu, 24 May 2007 10:37 Go to previous messageGo to next message
Michel Cadot
Messages: 64151
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Well done! ./fa/2115/0/
Unfortunatly I didn't have time to play with it. ./fa/1637/0/

Regards
Michel
Re: hlp with analytical function to calculate missing data [message #240532 is a reply to message #240115] Fri, 25 May 2007 03:10 Go to previous messageGo to next message
Cthulhu
Messages: 381
Registered: September 2006
Location: UK
Senior Member
Quote:
hi, well, plsql is not a problem at all to solve this, bu this way i would have to run a job every let's say a hour and update missing data


If neither of the solutions suggested so far gives sufficient performance and you don't want to do updates, another option would be to write the stored procedure as I suggested but turn it into a pipeline procedure. That would enable you to have a select statement which returns the data you want.
Re: hlp with analytical function to calculate missing data [message #240569 is a reply to message #240115] Fri, 25 May 2007 04:04 Go to previous messageGo to next message
dusoo
Messages: 41
Registered: March 2007
Member
hi everyone, that solution worked just perfect on that amount of data i have ... So thanks !!!, now i just have to learn those "few" an.function u have used there Smile

thanks

[Updated on: Fri, 25 May 2007 04:04]

Report message to a moderator

Re: hlp with analytical function to calculate missing data [message #240573 is a reply to message #240569] Fri, 25 May 2007 04:17 Go to previous message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
You could probably improve the performace slightly by replacing the innermost query with this:
select starttime
      ,data_column
      ,rnum
      ,rnum -count(nvl2(data_column,null,'A')) over (order by starttime) last_non_null
      ,rnum +count(nvl2(data_column,null,'A')) over (order by starttime desc) next_non_null
from   (
select starttime
      ,data_column
      ,row_number() over (order by starttime) rnum
from  temp_table)
as this only needs to calculate one Row_number, rather than three.
Previous Topic: script help needed
Next Topic: Flowchart / sequential list of calling objects.
Goto Forum:
  


Current Time: Fri Dec 09 23:08:36 CST 2016

Total time taken to generate the page: 0.15198 seconds