Is it a good idea to use REF_CURSOR to select subset of data? [message #334417] |
Wed, 16 July 2008 07:27  |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Hello,
I'm supposed to re-write some existing code and I am not sure if it is correct to split it into modules in the way I am going to do:
(given example is a simplified form)
Many tables are going to be created with columns:
T_n EventDate, CustomerId, SomeDetailAboutTheEvent
From a single table with columns:
T_input EventDate, EventType, CustomerId, SomeDetailAboutTheEvent
Then we are supposed to process the data in several steps using
SELECT * FROM T_input WHERE EventDate BETWEEN (...)
And the logic will be different for every value of CustomerId.
Is this a good idea to select the rows into a ref_cursor and then to pass it between procedures/functions that are processing the data? How should I implement it then?
Example:
PROCEDURE P
(CustomerId IN NUMBER
,DataSet IN type_ref_cursor
) AS
BEGIN
(CREATE TABLE AS SELECT FROM DataSet WHERE (complicated dynamic logic));
END;
PROCEDURE P_get_data
(EventDateStart IN DATE
,EventDateEnd IN DATE
,DataSet OUT type_ref_cursor
) AS
BEGIN
OPEN DataSet FOR (SELECT * FROM T_input WHERE EventDate BETWEEN EventDateStart AND EventDateEnd);
END;
DECLARE
TYPE type_ref_cursor IS REF CURSOR;
my_data_set type_ref_cursor;
BEGIN
P_get_data('1-JAN-95','5-FEB-95',my_data_set);
P(1, my_data_set);
P(3, my_data_set);
END;
P.S.
Assume that calls P(...) are determined on the run time.
I am going to have multiple packages with the procedures like above.
EDIT:
According to my current knowledge - this is probably not the best way to do it. A better way seems to be a materialized view - however data are frozen and whenever the T_input is updated then materialized view is not going to be updated.
[Updated on: Wed, 16 July 2008 08:09] Report message to a moderator
|
|
|
|
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334448 is a reply to message #334439] |
Wed, 16 July 2008 10:01   |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Thanks for your reply Kevin,
1) I'll try to use static SQL however I do not know what is the query going to look like at the compile time.
Previously multiple tables were created using CTAS (Create Table As Select).
Details:
Input is data (several tables where one column contain "type" object with variable number of elements) and configuration. Currently existing query works fine - but has limited functionality. It is about 3000 characters long.
2) I am going to create a subset of existing data (like I wrot - input table can be very large) as a materialized view, then process it in several steps and remove it afterwards.
All this is currently under re-design.
Existing code is long and complicated (many procedures with >2000 lines each).
Maybe I could add some longer example:
CREATE TABLE input_data
(event_date DATE,
,event_type NUMBER
,event_detail VARCHAR2(1000)
,event_data SOME_PARENT_TYPE
);
CREATE TYPE SOME_PARENT_TYPE AS OBJECT
( height number) not final;
CREATE TYPE SOME_CHILD_TYPE1 UNDER SOME_PARENT_TYPE
( width number, length number );
CREATE TYPE SOME_CHILD_TYPE2 UNDER SOME_PARENT_TYPE
( weight number );
CREATE TABLE events_of_type1
(event_date DATE
,event_duration NUMBER
,event_importance NUMBER
,event_description VARCHAR2(1000)
);
CREATE TABLE events_of_type2
(event_date DATE
,event_duration NUMBER
,event_importance NUMBER
,event_description VARCHAR2(1000)
);
-- Create part of the report for client 1 - he wants sorting by duration
SELECT FROM events_of_type1 WHERE CustomerId=1 SORT BY event_duration;
-- Create part of the report for client 2 - he wants sorting by importance
SELECT FROM events_of_type1 WHERE CustomerId=2 SORT BY event_importance;
Now the tricky part: Let say that customer 1 wants to get cummulative number of events per day while customer 2 wants average number of events per hour. events_of_type* should contain data for both - so row pattern/content will be different for every customer. And the way it is getting prepared.
Assume that first we want to filter the events (1) list using some abstract rules.
Then we want to convert (2) the data (let say that input data could be money: euro, dollars, yens, etc. and we want to have one common currency).
Then we want to do some logic (3) on the data.
Layers (1), (2) and (3) are going to access the input data (in my example with updating them).
Performance aspects:
We are going to have about 30000000 data rows weekly to analyze.
There are no real performance constrains at the moment - but we should think about it in advance.
|
|
|
|
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334456 is a reply to message #334451] |
Wed, 16 July 2008 10:40   |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
anacedent wrote on Wed, 16 July 2008 17:15 | Does the following conform to Third Normal Form?
>Previously multiple tables were created using CTAS (Create Table As Select).
>Input is data (several tables where one column contain "type" object with variable number of elements)
Violating Third Normal Form results in "complex" processing logic.
If you enjoy this complexity, then keep digging this hole deeper instead of normalizing the data.
|
Could you explain this more?
Input data (in tables) came from some csv files - loaded using SQL*LDR.
Output tables are created from the input tables in several steps.
Front-end is presenting the tables only - and no calculation is performed there.
Note that this is obsolete design that now should be updated
If you are referring to the results tables (columns content depend on some other columns that are not key columns) then you are right - however this is what is requested and it is only to be used for presentation.
|
|
|
|
|
|
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334578 is a reply to message #334461] |
Thu, 17 July 2008 03:46   |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
anacedent wrote on Wed, 16 July 2008 18:00 | ...data is not duplicated...
|
In fact the "input data" are removed periodically. So after some time we should get only the "output data".
And I remind some talks about keeping the "input data" for long time so that "output data" could be re-generated if needed. In this case you are right that some information would be duplicated - but this is required as the data processing is done in advance to save some time when displaying the results.
anacedent wrote on Wed, 16 July 2008 18:00 | Another is that a column/field only ever holds a single value.
Use of "TYPE" variables violated this principal.
|
Right. But an alternative is to keep a table with thousands of columns and have some of them empty (some older implementation used this - and they exceeded maximum number of columns per table allowed by Oracle) or to have multiple input tables (and this would cause some extra problems - example would be "How to check if we have any data loaded for particular day? Should we execute several hundrets of queries one per every table? Use of types changes this to a single query").
anacedent wrote on Wed, 16 July 2008 18:00 | Typically it is a bad design which stores derived values (number of events per day) as a separate field. This bad practice can leads to data inconsistencies or reporting discrepancies.
|
I agree. But as our input data (csv files) could be something like "number of events per hour" defined as number of events between X:00:00 and X:59:59 where the time is always specified with granularity of 1 second - we could do this.
anacedent wrote on Wed, 16 July 2008 18:00 | Short term expediencies frequently result design decisions which can haunt an application for years to come.
|
Yes... This is about 3rd version of this application within 2 years - at least I know about previous 2 in last 1.5 years (possibly this application could exist for longer than 2 years already).
anacedent wrote on Wed, 16 July 2008 18:00 | The choices are yours & the consequences are yours, too!
|
I would say "Mistakes made in the design phase would impact you ever after - even in the maintenance phase" and "You can never make a perfect design".
Thanks for your comments!
|
|
|
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334579 is a reply to message #334466] |
Thu, 17 July 2008 03:54  |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Kevin Meade wrote on Wed, 16 July 2008 18:24 | 1) GLOBAL TEMPORARY TABLES. Your use of materialized views that you will be dropping after use suggests they are temporary for the duration of your process. You might find GTTs a better choice from a performance and management perspective. Your read about them, then decide if they fit your needs.
|
This was on my mind since the beginning of this design.
Kevin Meade wrote on Wed, 16 July 2008 18:24 | 2) There is an article here on ORAFAQ I am pretty sure, also on ASKTOMHOME, and of course you could just Google for: ORACLE DYNAMIC ORDER BY. People describe how to create an order by clause that can sort on different columns and in different directions depending upon what you want. It is actually pretty easy. In particular I believe Tom at asktomhome shows how to use the case statement in the order by to get pretty fancy.
|
Although I don't understand (yet) how I could use it - I'll try to read about it. Currently we do a lot of "order by" and "group by" and etc. (currently the front-end is displaying the data as-is, that is no sorting is done on the front-end nor the DB during runtime and we need to sort them in advance - I hope to change this behavior soon).
Kevin Meade wrote on Wed, 16 July 2008 18:24 | 3) I think you are talking about using types which are different in different situations. You might want to consider using SYS.ANYDATA as a datatype and take a look at how the conversion functions are used. I wrote an article here on ORAFAQ for this. Here is a URL to my OraFAQ blog. SYS.ANYDATA won't get around the need to actually understand and code for different types so it is not "dynamic sql" in that sense, but it does allow you to create "generic" table structures that can store pretty much any data (gee sounds like a good name for it), excepting LONG/CLOB stuff. Happy reading.
|
Thanks! I'll read it for sure.
Our types have one parent type (having usually 1 or 2 elements) and children types have usually from 1 to 10 columns - so in a single "variable" column we insert many sub-columns. If anyone is reading this - note that this has significant negative impact on the performance.
|
|
|