Home » SQL & PL/SQL » SQL & PL/SQL » Is it a good idea to use REF_CURSOR to select subset of data? (Oracle 10)
Is it a good idea to use REF_CURSOR to select subset of data? [message #334417] Wed, 16 July 2008 07:27 Go to next message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
Hello,
I'm supposed to re-write some existing code and I am not sure if it is correct to split it into modules in the way I am going to do:
(given example is a simplified form)

Many tables are going to be created with columns:
T_n EventDate, CustomerId, SomeDetailAboutTheEvent

From a single table with columns:
T_input EventDate, EventType, CustomerId, SomeDetailAboutTheEvent

Then we are supposed to process the data in several steps using
SELECT * FROM T_input WHERE EventDate BETWEEN (...)

And the logic will be different for every value of CustomerId.

Is this a good idea to select the rows into a ref_cursor and then to pass it between procedures/functions that are processing the data? How should I implement it then?

Example:
PROCEDURE P
(CustomerId IN NUMBER
,DataSet    IN type_ref_cursor
) AS
  BEGIN
  (CREATE TABLE AS SELECT FROM DataSet WHERE (complicated dynamic logic));
END;

PROCEDURE P_get_data
(EventDateStart IN     DATE
,EventDateEnd   IN     DATE
,DataSet           OUT type_ref_cursor
) AS
  BEGIN
  OPEN DataSet FOR (SELECT * FROM T_input WHERE EventDate BETWEEN EventDateStart AND EventDateEnd);
END;

DECLARE
TYPE type_ref_cursor IS REF CURSOR;
my_data_set type_ref_cursor;
  BEGIN
  P_get_data('1-JAN-95','5-FEB-95',my_data_set);
  P(1, my_data_set);
  P(3, my_data_set);
END;


P.S.
Assume that calls P(...) are determined on the run time.
I am going to have multiple packages with the procedures like above.

EDIT:
According to my current knowledge - this is probably not the best way to do it. A better way seems to be a materialized view - however data are frozen and whenever the T_input is updated then materialized view is not going to be updated.

[Updated on: Wed, 16 July 2008 08:09]

Report message to a moderator

Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334439 is a reply to message #334417] Wed, 16 July 2008 08:42 Go to previous messageGo to next message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
The description of your need above seems lacking. I am having a hard time picturing what it is you are trying to do overall. Let me ask some questions and make some comments however.

1) using cursors in general is an old way of doing things. It is still sometimes necessary but it implies row by row processing which should be frowned upon as a general practice.

2) creating tables on the fly is also an old strategy. And one often employed by people making a shift from sqlserver or other non-oracle systems to oracle. Again as a general rule, having application code create tables on the fly is not the way to go.

So I would ask you these questions:

1) why do you feel you need to create tables on the fly to get what you want? Is your requirement truely that dynamic? There are of course situations that are but I often find many developers rely on dynamic sql variants as a crutch. Dynamic sql is in general very expensive so I don't do it unless I really have to.

2) if your requirements are truely dynamic than how were you planning on using materialized views? Mviews are not a dynamic sql mechanism. I might note also that MViews can be created to update on-commit. This may satisfy your requirement well enough.

Lastly, it is quite confusing in your description with respect to what all the different tables are for. DATASET? T_INPUT, and some as yet not-named table that you create dynamically.

You would have considerable more luck in this forum if you could post an actual working piece of code and ask for comments on the design and performance aspects of it.

Good luck, Kevin
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334448 is a reply to message #334439] Wed, 16 July 2008 10:01 Go to previous messageGo to next message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
Thanks for your reply Kevin,

1) I'll try to use static SQL however I do not know what is the query going to look like at the compile time.
Previously multiple tables were created using CTAS (Create Table As Select).
Details:
Input is data (several tables where one column contain "type" object with variable number of elements) and configuration. Currently existing query works fine - but has limited functionality. It is about 3000 characters long.


2) I am going to create a subset of existing data (like I wrot - input table can be very large) as a materialized view, then process it in several steps and remove it afterwards.

All this is currently under re-design.
Existing code is long and complicated (many procedures with >2000 lines each).

Maybe I could add some longer example:
CREATE TABLE input_data
(event_date   DATE,
,event_type   NUMBER
,event_detail VARCHAR2(1000)
,event_data   SOME_PARENT_TYPE
);

CREATE TYPE SOME_PARENT_TYPE AS OBJECT
( height number) not final;
CREATE TYPE SOME_CHILD_TYPE1 UNDER SOME_PARENT_TYPE
( width number, length number );
CREATE TYPE SOME_CHILD_TYPE2 UNDER SOME_PARENT_TYPE
( weight number );

CREATE TABLE events_of_type1
(event_date        DATE
,event_duration    NUMBER
,event_importance  NUMBER
,event_description VARCHAR2(1000)
);
CREATE TABLE events_of_type2
(event_date        DATE
,event_duration    NUMBER
,event_importance  NUMBER
,event_description VARCHAR2(1000)
);

-- Create part of the report for client 1 - he wants sorting by duration
SELECT FROM events_of_type1 WHERE CustomerId=1 SORT BY event_duration;

-- Create part of the report for client 2 - he wants sorting by importance
SELECT FROM events_of_type1 WHERE CustomerId=2 SORT BY event_importance;


Now the tricky part: Let say that customer 1 wants to get cummulative number of events per day while customer 2 wants average number of events per hour. events_of_type* should contain data for both - so row pattern/content will be different for every customer. And the way it is getting prepared.

Assume that first we want to filter the events (1) list using some abstract rules.
Then we want to convert (2) the data (let say that input data could be money: euro, dollars, yens, etc. and we want to have one common currency).
Then we want to do some logic (3) on the data.

Layers (1), (2) and (3) are going to access the input data (in my example with updating them).

Performance aspects:
We are going to have about 30000000 data rows weekly to analyze.
There are no real performance constrains at the moment - but we should think about it in advance.
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334451 is a reply to message #334417] Wed, 16 July 2008 10:15 Go to previous messageGo to next message
BlackSwan
Messages: 25039
Registered: January 2009
Location: SoCal
Senior Member
Does the following conform to Third Normal Form?
>Previously multiple tables were created using CTAS (Create Table As Select).
>Input is data (several tables where one column contain "type" object with variable number of elements)

Violating Third Normal Form results in "complex" processing logic.
If you enjoy this complexity, then keep digging this hole deeper instead of normalizing the data.
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334456 is a reply to message #334451] Wed, 16 July 2008 10:40 Go to previous messageGo to next message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
anacedent wrote on Wed, 16 July 2008 17:15
Does the following conform to Third Normal Form?
>Previously multiple tables were created using CTAS (Create Table As Select).
>Input is data (several tables where one column contain "type" object with variable number of elements)

Violating Third Normal Form results in "complex" processing logic.
If you enjoy this complexity, then keep digging this hole deeper instead of normalizing the data.


Could you explain this more?
Input data (in tables) came from some csv files - loaded using SQL*LDR.
Output tables are created from the input tables in several steps.
Front-end is presenting the tables only - and no calculation is performed there.
Note that this is obsolete design that now should be updated
If you are referring to the results tables (columns content depend on some other columns that are not key columns) then you are right - however this is what is requested and it is only to be used for presentation.
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334461 is a reply to message #334417] Wed, 16 July 2008 11:00 Go to previous messageGo to next message
BlackSwan
Messages: 25039
Registered: January 2009
Location: SoCal
Senior Member
A couple of basic tenets of Third Normal Form is that data is not duplicated. A data element resides in a single table.
On the surface CTAS violates this principal.

Another is that a column/field only ever holds a single value.
Use of "TYPE" variables violated this principal.

Typically it is a bad design which stores derived values (number of events per day) as a separate field. This bad practice can leads to data inconsistencies or reporting discrepancies.

Short term expediencies frequently result design decisions which can haunt an application for years to come.

The choices are yours & the consequences are yours, too!
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334466 is a reply to message #334417] Wed, 16 July 2008 11:24 Go to previous messageGo to next message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
OK, so there are some things you can read up on that may help you simplify your problem. let me suggest you take a look at the following and consider why these might be of value to you:

1) GLOBAL TEMPORARY TABLES. Your use of materialized views that you will be dropping after use suggests they are temporary for the duration of your process. You might find GTTs a better choice from a performance and management perspective. Your read about them, then decide if they fit your needs.

2) There is an article here on ORAFAQ I am pretty sure, also on ASKTOMHOME, and of course you could just Google for: ORACLE DYNAMIC ORDER BY. People describe how to create an order by clause that can sort on different columns and in different directions depending upon what you want. It is actually pretty easy. In particular I believe Tom at asktomhome shows how to use the case statement in the order by to get pretty fancy.

3) I think you are talking about using types which are different in different situations. You might want to consider using SYS.ANYDATA as a datatype and take a look at how the conversion functions are used. I wrote an article here on ORAFAQ for this. Here is a URL to my OraFAQ blog. SYS.ANYDATA won't get around the need to actually understand and code for different types so it is not "dynamic sql" in that sense, but it does allow you to create "generic" table structures that can store pretty much any data (gee sounds like a good name for it), excepting LONG/CLOB stuff. Happy reading.

Kevin Meade's OraFAQ Blog

Kevin
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334468 is a reply to message #334417] Wed, 16 July 2008 11:31 Go to previous messageGo to next message
BlackSwan
Messages: 25039
Registered: January 2009
Location: SoCal
Senior Member
It really easy to make a bad design.

http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334578 is a reply to message #334461] Thu, 17 July 2008 03:46 Go to previous messageGo to next message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
anacedent wrote on Wed, 16 July 2008 18:00
...data is not duplicated...

In fact the "input data" are removed periodically. So after some time we should get only the "output data".
And I remind some talks about keeping the "input data" for long time so that "output data" could be re-generated if needed. In this case you are right that some information would be duplicated - but this is required as the data processing is done in advance to save some time when displaying the results.

anacedent wrote on Wed, 16 July 2008 18:00
Another is that a column/field only ever holds a single value.
Use of "TYPE" variables violated this principal.

Right. But an alternative is to keep a table with thousands of columns and have some of them empty (some older implementation used this - and they exceeded maximum number of columns per table allowed by Oracle) or to have multiple input tables (and this would cause some extra problems - example would be "How to check if we have any data loaded for particular day? Should we execute several hundrets of queries one per every table? Use of types changes this to a single query").

anacedent wrote on Wed, 16 July 2008 18:00
Typically it is a bad design which stores derived values (number of events per day) as a separate field. This bad practice can leads to data inconsistencies or reporting discrepancies.

I agree. But as our input data (csv files) could be something like "number of events per hour" defined as number of events between X:00:00 and X:59:59 where the time is always specified with granularity of 1 second - we could do this.

anacedent wrote on Wed, 16 July 2008 18:00
Short term expediencies frequently result design decisions which can haunt an application for years to come.

Yes... This is about 3rd version of this application within 2 years - at least I know about previous 2 in last 1.5 years (possibly this application could exist for longer than 2 years already).

anacedent wrote on Wed, 16 July 2008 18:00
The choices are yours & the consequences are yours, too!

I would say "Mistakes made in the design phase would impact you ever after - even in the maintenance phase" and "You can never make a perfect design".

Thanks for your comments!
Re: Is it a good idea to use REF_CURSOR to select subset of data? [message #334579 is a reply to message #334466] Thu, 17 July 2008 03:54 Go to previous message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
Kevin Meade wrote on Wed, 16 July 2008 18:24
1) GLOBAL TEMPORARY TABLES. Your use of materialized views that you will be dropping after use suggests they are temporary for the duration of your process. You might find GTTs a better choice from a performance and management perspective. Your read about them, then decide if they fit your needs.

This was on my mind since the beginning of this design.

Kevin Meade wrote on Wed, 16 July 2008 18:24
2) There is an article here on ORAFAQ I am pretty sure, also on ASKTOMHOME, and of course you could just Google for: ORACLE DYNAMIC ORDER BY. People describe how to create an order by clause that can sort on different columns and in different directions depending upon what you want. It is actually pretty easy. In particular I believe Tom at asktomhome shows how to use the case statement in the order by to get pretty fancy.

Although I don't understand (yet) how I could use it - I'll try to read about it. Currently we do a lot of "order by" and "group by" and etc. (currently the front-end is displaying the data as-is, that is no sorting is done on the front-end nor the DB during runtime and we need to sort them in advance - I hope to change this behavior soon).

Kevin Meade wrote on Wed, 16 July 2008 18:24
3) I think you are talking about using types which are different in different situations. You might want to consider using SYS.ANYDATA as a datatype and take a look at how the conversion functions are used. I wrote an article here on ORAFAQ for this. Here is a URL to my OraFAQ blog. SYS.ANYDATA won't get around the need to actually understand and code for different types so it is not "dynamic sql" in that sense, but it does allow you to create "generic" table structures that can store pretty much any data (gee sounds like a good name for it), excepting LONG/CLOB stuff. Happy reading.

Thanks! I'll read it for sure.
Our types have one parent type (having usually 1 or 2 elements) and children types have usually from 1 to 10 columns - so in a single "variable" column we insert many sub-columns. If anyone is reading this - note that this has significant negative impact on the performance.
Previous Topic: List of jobs created thru DBMS_SCHEDULER
Next Topic: CHR(30)
Goto Forum:
  


Current Time: Tue Dec 06 02:55:05 CST 2016

Total time taken to generate the page: 0.13450 seconds