Bar Solutions

Subscribe to Bar Solutions feed
The beginning of knowledge is the discovery of something we do not understand. [Frank Herbert]
Updated: 1 hour 50 min ago

Polymorphic Table Functions – Part 2

Sun, 2018-12-09 00:04

In my previous post I wrote about a possible use-case for Polymorphic Table Functions. I also wrote that I had some ideas to implement extra functionality to make the function more generic.
After attending DOAG Konferentz und ausstellung 2018 and UKOUG Tech18 I finally came around to publishing my new version.
Lets start off with a table to hold my data, I chose to use the NATO phonetic alphabet for this:

create table t_alphabet
(thecount number
,alphabet varchar2(4000)
)
/
begin
  insert into t_alphabet(alphabet, thecount) values ('Alfa;Bravo;Charlie;Delta;Echo;Foxtrot;Golf',7);
  insert into t_alphabet(alphabet, thecount) values ('Hotel;India;Juliett;Kilo;Lima;Mike;November',7);
  insert into t_alphabet(alphabet, thecount) values ('Oscar;Papa;Quebec;Romeo;Sierra;Tango;Uniform',7);
  insert into t_alphabet(alphabet, thecount) values ('Victor;Whiskey;;X-ray;Yankee;;Zulu',5);
  insert into t_alphabet(alphabet, thecount) values ('Alfa.Bravo.Charlie.Delta.Echo.Foxtrot.Golf',7);
  insert into t_alphabet(alphabet, thecount) values ('Hotel.India.Juliett.Kilo.Lima.Mike.November',7);
  insert into t_alphabet(alphabet, thecount) values ('Oscar.Papa.Quebec.Romeo.Sierra.Tango.Uniform',7);
  insert into t_alphabet(alphabet, thecount) values ('Victor.Whiskey..X-ray.Yankee..Zulu',5);
  commit;
end;
/

First of all, I learned from attending the presentations by attending presentations by Andrej Pashchenko: Polymorphic Table Functions in 18c: Einführung und Beispiele and Keith Laker: Patterns and Use Cases For Polymorphic Tables that there is no need for global (package) variables to have access to the parameters supplied. The more I can rely on Oracle to take care of the value of variables, the better I like it.
I won’t bore you with all the intermediate versions of the code, lets jump straight into the ‘final’ result.
The package:

create or replace package separated_ptf is
  function describe(tab        in out dbms_tf.table_t
                   ,cols       in dbms_tf.columns_t default null
                   ,coltosplit in varchar2 default null
                   ,separator  in varchar2 default ';') return dbms_tf.describe_t;

  procedure fetch_rows(coltosplit in varchar2 default null
                      ,separator  in varchar2 default ';');
end separated_ptf;
/

The package body:

create or replace package body separated_ptf as
  function describe(tab        in out dbms_tf.table_t
                   ,cols       in dbms_tf.columns_t default null
                   ,coltosplit in varchar2 default null
                   ,separator  in varchar2 default ';') return dbms_tf.describe_t as
    -- metadata for column to add
    l_new_col dbms_tf.column_metadata_t;
    -- table of columns to add
    l_new_cols dbms_tf.columns_new_t;
    -- make sure the column to split is in the correct format (uppercase with doublequotes)
    l_coltosplit dbms_quoted_id := dbms_assert.enquote_name(str => coltosplit, capitalize => true);
  begin
    -- if the coltosplit parameter is null then
    if coltosplit is null then
      -- Mark the first column ReadOnly and don't display it anymore
      tab.column(1).for_read := true;
      tab.column(1).pass_through := false;
    else
      -- if the coltosplit parameter is not null then
      -- check every column from the source table
      for indx in tab.column.first .. tab.column.last loop
        -- if this is the column we want to split then
        if tab.column(indx).description.name = l_coltosplit then
          -- Mark this column ReadOnly and don't display it anymore
          tab.column(indx).for_read := true;
          tab.column(indx).pass_through := false;
        end if;
      end loop;
    end if;
    -- Add the new columns, as specified in the cols parameter
    for indx in 1 .. cols.count loop
      -- define metadata for column named cols(indx)
      -- that will default to a datatype of varchar2 with
      -- a length of 4000
      l_new_col := dbms_tf.column_metadata_t(name => cols(indx));
      -- add the new column to the list of columns new columns
      l_new_cols(l_new_cols.count + 1) := l_new_col;
    end loop;
    -- Instead of returning NULL we will RETURN a specific
    -- DESCRIBE_T that adds new columns
    return dbms_tf.describe_t(new_columns => l_new_cols);
  end;

  procedure fetch_rows(coltosplit in varchar2 default null
                      ,separator  in varchar2 default ';') is
    -- define a table type of varchar2 tables
    type colset is table of dbms_tf.tab_varchar2_t index by pls_integer;
    -- variable to hold the rowset as retrieved
    l_rowset dbms_tf.row_set_t;
    -- variable to hold the number of rows as retrieved
    l_rowcount pls_integer;
    -- variable to hold the number of put columns
    l_putcolcount pls_integer := dbms_tf.get_env().put_columns.count;
    -- variable to hold the new values
    l_newcolset colset;
    -- get the name of the column to be split from the get columns
    l_coltosplit dbms_quoted_id := trim('"' from dbms_tf.get_env().get_columns(1).name);
  begin
    --    dbms_tf.Trace(dbms_tf.Get_Env);
    -- fetch rows into a local rowset
    -- at this point the rows will have columns
    -- from the the table/view/query passed in
    dbms_tf.get_row_set(l_rowset, l_rowcount);
    -- for every row in the rowset...
    for rowindx in 1 .. l_rowcount loop
      -- for every column
      for colindx in 1 .. l_putcolcount loop
        -- split the row into separate values
        --  FUNCTION Row_To_Char(rowset Row_Set_t,
        --                       rid    PLS_INTEGER,
        --                       format PLS_INTEGER default FORMAT_JSON)
        --           return VARCHAR2;
        -- splitting the regexp way: http://nuijten.blogspot.com/2009/07/splitting-comma-delimited-string-regexp.html
        l_newcolset(colindx)(rowindx) := trim(separator from regexp_substr(json_value(dbms_tf.row_to_char(l_rowset, rowindx), '$.' || l_coltosplit)
                                                           ,'[^' || separator || ']*' || separator || '{0,1}'
                                                           ,1
                                                           ,colindx));
      end loop; -- every column
    end loop; -- every row in the rowset
    -- add the newly populated columns to the rowset
    for indx in 1 .. l_putcolcount loop
      dbms_tf.put_col(columnid => indx, collection => l_newcolset(indx));
    end loop;
  end;
end separated_ptf;
/

The wrapper function:

create or replace function separated_fnc(p_tbl      in table
                                        ,cols       columns default null
                                        ,coltosplit in varchar2 default null
                                        ,separator  in varchar2 default ';') return table
  pipelined row polymorphic using separated_ptf;
/

Back to the improvements I suggested earlier.
Supporting duplicate separators:
Using the Regular Expression ‘[^;]+’ didn’t make this possible, because double ; (;;) would be regarded as one. So this had to be changed into ‘[^;]+;{0,1}’. This expression says (in my words): find all the characters which are not a ; followed by 0 or 1 ;. Since this will result in a string with a ; at the end I had to add the trim function around it.
Making the column to be split up a parameter
To find the column to be split I need to make the value look the same as the tab.column(indx).description.name value I can of course add quotes around the parameter myself, but I think it is better to use the built in sys.dbms_assert.enquote_name function to do this. If the value is not supplied or null I just (try to) split the first column.
Making the separator character a parameter
This seemed like an easy task, just replace every ; in my code by a variable, but when running a couple of tests I received an error which put me on the wrong path.
Calling the function using positioned parameters works like a charm, just as you would expect it:
select *
from separated_fnc(t_alphabet, columns(first, second, third, fourth, fifth, sixth, seventh),’alphabet’,’.’)
/

  THECOUNT FIRST                SECOND               THIRD                FOURTH               FIFTH                SIXTH                SEVENTH
---------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
         7 Alfa;Bravo;Charlie;D                                                                                                          
           elta;Echo;Foxtrot;Go                                                                                                          
           lf                                                                                                                            
         7 Hotel;India;Juliett;                                                                                                          
           Kilo;Lima;Mike;Novem                                                                                                          
           ber                                                                                                                           
         7 Oscar;Papa;Quebec;Ro                                                                                                          
           meo;Sierra;Tango;Uni                                                                                                          
           form                                                                                                                          
         5 Victor;Whiskey;X-ray                                                                                                          
           ;Yankee;Zulu                                                                                                                  
         7 Alfa                 Bravo                Charlie              Delta                Echo                 Foxtrot              Golf
         7 Hotel                India                Juliett              Kilo                 Lima                 Mike                 November
         7 Oscar                Papa                 Quebec               Romeo                Sierra               Tango                Uniform
         5 Victor               Whiskey                                   X-ray                Yankee                                    Zulu

8 rows selected

But when I tried to use named parameters like this:

select *
  from separated_fnc(t_alphabet, columns(first, second, third, fourth, fifth, sixth, seventh),coltosplit => 'alphabet',separator => '.')
/

it resulted in the following error:

ORA-62573: new column (FIRST) is not allowed with describe only polymorphic table function

After the DOAG conference I looked at this together with Chris Saxon, but we couldn’t find what is going on. So we contacted Keith Laker and he told me that it was bug in the version (18.3.0.0.0) of the database I am using and that it should be fixed in an upcoming (patch) release.
If you know about this behavior, I think it is quite a useful function (and with the upcoming fixes it will become even more useful).

If you have any suggestions or maybe ideas for other use cases for Polymorphic Table Functions, please don’t hesitate to use the comments.

Polymorphic Table Functions

Sun, 2018-11-18 00:04

I have been working on a presentation on Polymorphic Table Functions. During this time I was looking for a real use case for Polymorphic Table Functions. I came up with an example which is not very useful in real life, but very useful to explain the technique.
At my current job I came across a piece of code that I had to copy and adjust to fit the needs for that specific case. The idea was always the same, I get a table with semi-colon separated values in one column that have to be split into the correct number of columns before checking the data to the current data in a specific table.
I thought: ‘Maybe I can solve this copy-paste-adjust process by using a Polymorphic Table Function.’

Let’s first set the current scene.
We have two tables. The well known EMP and DEPT tables.

create table emp
(empno    number(4) not null
,ename    varchar2(10)
,job      varchar2(9)
,mgr      number(4)
,hiredate date
,sal      number(7, 2)
,comm     number(7, 2)
,deptno   number(2)
)
/
create table dept
(deptno number(2)
,dname  varchar2(14)
,loc    varchar2(14)
)
/        

And we add the well known data:

insert into emp values (7369, 'SMITH',  'CLERK',     7902, to_date('17-DEC-1980', 'DD-MON-YYYY'),  800, null, 20);
insert into emp values (7499, 'ALLEN',  'SALESMAN',  7698, to_date('20-FEB-1981', 'DD-MON-YYYY'), 1600,  300, 30);
insert into emp values (7521, 'WARD',   'SALESMAN',  7698, to_date('22-FEB-1981', 'DD-MON-YYYY'), 1250,  500, 30);
insert into emp values (7566, 'JONES',  'MANAGER',   7839, to_date('2-APR-1981', 'DD-MON-YYYY'),  2975, null, 20);
insert into emp values (7654, 'MARTIN', 'SALESMAN',  7698, to_date('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);
insert into emp values (7698, 'BLAKE',  'MANAGER',   7839, to_date('1-MAY-1981', 'DD-MON-YYYY'),  2850, null, 30);
insert into emp values (7782, 'CLARK',  'MANAGER',   7839, to_date('9-JUN-1981', 'DD-MON-YYYY'),  2450, null, 10);
insert into emp values (7788, 'SCOTT',  'ANALYST',   7566, to_date('09-DEC-1982', 'DD-MON-YYYY'), 3000, null, 20);
insert into emp values (7839, 'KING',   'PRESIDENT', null, to_date('17-NOV-1981', 'DD-MON-YYYY'), 5000, null, 10);
insert into emp values (7844, 'TURNER', 'SALESMAN',  7698, to_date('8-SEP-1981', 'DD-MON-YYYY'),  1500,    0, 30);
insert into emp values (7876, 'ADAMS',  'CLERK',     7788, to_date('12-JAN-1983', 'DD-MON-YYYY'), 1100, null, 20);
insert into emp values (7900, 'JAMES',  'CLERK',     7698, to_date('3-DEC-1981', 'DD-MON-YYYY'),   950, null, 30);
insert into emp values (7902, 'FORD',   'ANALYST',   7566, to_date('3-DEC-1981', 'DD-MON-YYYY'),  3000, null, 20);
insert into emp values (7934, 'MILLER', 'CLERK',     7782, to_date('23-JAN-1982', 'DD-MON-YYYY'), 1300, null, 10);
insert into dept values (10, 'ACCOUNTING', 'NEW YORK');
insert into dept values (20, 'RESEARCH',   'DALLAS');
insert into dept values (30, 'SALES',      'CHICAGO');
insert into dept values (40, 'OPERATIONS', 'BOSTON');

We get data from a different system, which is in a semi-colon separated format. So we load that in a couple of staging tables:

create table empstg
(line varchar2(4000)
)
/
create table deptstg
(line varchar2(4000)
)
/

And then we add some data:

insert into empstg values ('7369;SMITH;CLERK;7902;17121980;800; ;20');
insert into empstg values ('7499;ALLEN;SALESMAN;7698;20021981;1600; 300;30');
insert into empstg values ('7521;WARD;SALESMAN;7698;22021981;1250; 500;30');
insert into empstg values ('7566;JONES;MANAGER;7839;02041981; 2975; ;20');
insert into empstg values ('7654;MARTIN;SALESMAN;7698;28091981;1250;1400;30');
insert into empstg values ('7698;BLAKE;MANAGER;7839;01051981; 2850; ;30');
insert into empstg values ('7782;CLARK;MANAGER;7839;09061981; 2450; ;10');
insert into empstg values ('7788;SCOTT;ANALYST;7566;09121982;3000; ;20');
insert into empstg values ('7839;KING;PRESIDENT; ;17111981;5000; ;10');
insert into empstg values ('7844;TURNER;SALESMAN;7698;08091981; 1500;0;30');
insert into empstg values ('7876;ADAMS;CLERK;7788;12011983;1100; ;20');
insert into empstg values ('7900;JAMES;CLERK;7698;03121981;  950; ;30');
insert into empstg values ('7902;FORD;ANALYST;7566;03121981; 3000; ;20');
insert into empstg values ('7934;MILLER;CLERK;7782;23011982;1300; ;10');
insert into empstg values ('2912;BAREL;DEVELOPER;7839;29122017;4000; ;50');

insert into deptstg values ('10;ACCOUNTING;NEW YORK');
insert into deptstg values ('20;RESEARCH;DALLAS');
insert into deptstg values ('30;SALES;NORTH CAROLINA');
insert into deptstg values ('40;OPERATIONS;BOSTON');
insert into deptstg values ('50;DEVELOPMENT;SAN FRANCISCO');

To process the data and merge it into the main tables we use a package. We could have used a merge statement, but this implies all the rows that are the same will get an update anyway, which results in a lot of journal-ling data which is done by triggers. Using the EMP and DEPT tables this wouldn’t be too much of a problem, but we are talking 250k+ rows each time (at least once a day).
So we want a little more control and only insert/update when it’s really necessary.

create or replace package process_stg is
  procedure dept;
  
  procedure emp;
end process_stg;
/
create or replace package body process_stg is
  failure_in_forall exception;
  pragma exception_init(failure_in_forall, -24381);
  c_limit     constant number := 10;
  procedure dept is
    cursor c_inserts is
      with import as
       (select trim(regexp_substr(line, '[^;]+', 1, 1)) deptno
              ,trim(regexp_substr(line, '[^;]+', 1, 2)) dname
              ,trim(regexp_substr(line, '[^;]+', 1, 3)) loc
          from deptstg stg)
      select i.deptno
            ,i.dname
            ,i.loc
        from import i
        left outer join dept d on (d.deptno = i.deptno)
       where 1 = 1
         and d.deptno is null;
    cursor c_updates is
      with import as
       (select trim(regexp_substr(line, '[^;]+', 1, 1)) deptno
              ,trim(regexp_substr(line, '[^;]+', 1, 2)) dname
              ,trim(regexp_substr(line, '[^;]+', 1, 3)) loc
          from deptstg stg)
      select i.deptno
            ,i.dname
            ,i.loc
        from import i
        join dept d on (d.deptno = i.deptno)
       where 1 = 1
         and (   coalesce(d.dname,'-NULL') <> coalesce(i.dname,'-NULL-')
              or coalesce(d.loc, '-NULL')  <> coalesce(i.loc, '-NULL-')
             );
    type data_t is table of c_inserts%rowtype index by pls_integer;
    l_data data_t;
  begin
    open c_inserts;
    loop
      fetch c_inserts bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            insert into dept(deptno, dname, loc)
            values (l_data(indx).deptno, l_data(indx).dname, l_data(indx).loc);
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_inserts;
--
    open c_updates;
    loop
      fetch c_updates bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            update dept
               set dname = l_data(indx).dname
                 , loc = l_data(indx).loc
             where 1=1
               and deptno = l_data(indx).deptno;
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_updates;
  end dept;

  procedure emp is
    cursor c_inserts is
      with import as
       (select trim(regexp_substr(line, '[^;]+', 1, 1)) empno   
              ,trim(regexp_substr(line, '[^;]+', 1, 2)) ename   
              ,trim(regexp_substr(line, '[^;]+', 1, 3)) job     
              ,trim(regexp_substr(line, '[^;]+', 1, 4)) mgr     
              ,trim(regexp_substr(line, '[^;]+', 1, 5)) hiredate
              ,trim(regexp_substr(line, '[^;]+', 1, 6)) sal     
              ,trim(regexp_substr(line, '[^;]+', 1, 7)) comm    
              ,trim(regexp_substr(line, '[^;]+', 1, 8)) deptno  
          from empstg stg)
      select i.empno   
            ,i.ename   
            ,i.job     
            ,i.mgr     
            ,i.hiredate
            ,i.sal     
            ,i.comm    
            ,i.deptno  
        from import i
        left outer join emp e on (e.empno = i.empno)
       where 1 = 1
         and e.empno is null;
    cursor c_updates is
      with import as
       (select trim(regexp_substr(line, '[^;]+', 1, 1)) empno   
              ,trim(regexp_substr(line, '[^;]+', 1, 2)) ename   
              ,trim(regexp_substr(line, '[^;]+', 1, 3)) job     
              ,trim(regexp_substr(line, '[^;]+', 1, 4)) mgr     
              ,trim(regexp_substr(line, '[^;]+', 1, 5)) hiredate
              ,trim(regexp_substr(line, '[^;]+', 1, 6)) sal     
              ,trim(regexp_substr(line, '[^;]+', 1, 7)) comm    
              ,trim(regexp_substr(line, '[^;]+', 1, 8)) deptno  
          from empstg stg)
      select i.empno   
            ,i.ename   
            ,i.job     
            ,i.mgr     
            ,i.hiredate
            ,i.sal     
            ,i.comm    
            ,i.deptno  
        from import i
        left outer join emp e on (e.empno = i.empno)
       where 1 = 1
         and (   coalesce(e.ename,    '-NULL') <> coalesce(i.ename,    '-NULL')
              or coalesce(e.job,      '-NULL') <> coalesce(i.job,      '-NULL')
              or          e.mgr                <>          i.mgr               
              or coalesce(to_char(e.hiredate,'DDMMYYYY'), '-NULL') <> coalesce(i.hiredate, '-NULL')
              or          e.sal                <>          i.sal               
              or          e.comm               <>          i.comm               
              or          e.deptno             <>          i.deptno            
             );
    type data_t is table of c_inserts%rowtype index by pls_integer;
    l_data data_t;
  begin
    open c_inserts;
    loop
      fetch c_inserts bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            insert into emp(empno, ename, job, mgr, hiredate, sal, comm, deptno)
            values (l_data(indx).empno, l_data(indx).ename, l_data(indx).job, l_data(indx).mgr, to_date(l_data(indx).hiredate, 'DDMMYYYY'), l_data(indx).sal, trim(l_data(indx).comm), l_data(indx).deptno);
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_inserts;
--
    open c_updates;
    loop
      fetch c_updates bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            update emp
              set ename       = l_data(indx).ename    
                , job         = l_data(indx).job      
                , mgr         = l_data(indx).mgr      
                , hiredate    = to_date(l_data(indx).hiredate, 'DDMMYYYY') 
                , sal         = l_data(indx).sal      
                , comm        = l_data(indx).comm     
                , deptno      = l_data(indx).deptno   
            where 1=1
              and empno       = l_data(indx).empno;
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_updates;
  end emp;
end process_stg;
/

As you can see, the code, especially for the cursors, is pretty much the same. Only difference is the number of columns that are generated from the semi-colon separated line.
I really don’t like to do the same thing over and over again, especially when the only difference is the number of columns and their names. But since this is what changes between the tables I think there is no way of making this generic in 12c or earlier. But then 18c came into play and they provide us with Polymorphic Table Functions.

This is what the documentation says (summary):
Polymorphic Table Functions
Polymorphic Table Functions (PTF) are user-defined functions that can be invoked in the FROM clause.
They are capable of processing tables whose row type is not declared at definition time and producing a
result table whose row type may or may not be declared at definition time. Polymorphic Table Functions
allow application developers to leverage the long-defined dynamic SQL capabilities to create powerful
and complex custom functions.

In my own words: Call a function, supplying a table and get a set of columns back. You can supply the names (and number) of columns as a parameter. Also, these columns don’t have to exist in the table, you can create them on the fly. That is exactly what I need. I have different tables with pretty much the same layout but the results I need are completely different.
So I came up with the following Polymorphic Table Function to do what I want. First there is the specification of the package. What I need is the DESCRIBE function (which is mandatory) and a procedure to fetch the rows, where I can alter the results.

create or replace package separated_ptf is
  function describe(tab  in out dbms_tf.table_t
                   ,cols in dbms_tf.columns_t default null) return dbms_tf.describe_t;

  procedure fetch_rows;
end separated_ptf;
/

Then there is the implementation of the package:

create or replace package body separated_ptf as
  g_colcount pls_integer; -- save the number of columns requested
  g_colname  varchar2(128); -- save the name of the first column
  function describe(tab  in out dbms_tf.table_t
                   ,cols in dbms_tf.columns_t default null) return dbms_tf.describe_t as
    -- metadata for column to add
    l_new_col dbms_tf.column_metadata_t;
    -- table of columns to add
    l_new_cols dbms_tf.columns_new_t; -- := DBMS_TF.COLUMNS_NEW_T();
  begin
    -- Mark the first column ReadOnly and don't display it anymore
    tab.column(1).for_read := true;
    tab.column(1).pass_through := false;
    -- Save the name of the first column for use in the fetch_rows procedure
    g_colname := tab.column(1).description.name;
    -- Save the number of columns for use in the fetch_rows procedure
    g_colcount := cols.count;
    -- Add the new columns, as specified in the cols parameter
    for indx in 1 .. cols.count loop
      -- define metadata for column named cols(indx)
      -- that will default to a datatype of varchar2 with
      -- a length of 4000
      l_new_col := dbms_tf.column_metadata_t(name => cols(indx));
      -- add the new column to the list of columns new columns
      l_new_cols(l_new_cols.count + 1) := l_new_col;
    end loop;
    -- Instead of returning NULL we will RETURN a specific
    -- DESCRIBE_T that adds new columns
    return dbms_tf.describe_t(new_columns => l_new_cols);
  end;

  procedure fetch_rows is
    -- define a table type of varchar2 tables
    type colset is table of dbms_tf.tab_varchar2_t index by pls_integer;
    -- variable to hold the rowset as retrieved
    l_rowset   dbms_tf.row_set_t;
    -- variable to hold the number of rows as retrieved
    l_rowcount pls_integer;
    -- variable to hold the new values
    l_newcolset colset;
  begin
    -- fetch rows into a local rowset
    -- at this point the rows will have columns
    -- from the the table/view/query passed in
    dbms_tf.get_row_set(l_rowset, l_rowcount);
    -- for every row in the rowset...
    for rowindx in 1 .. l_rowcount loop
      -- for every column
      for colindx in 1 .. g_colcount loop
        -- split the row into separate values
        -- splitting the regexp way: http://nuijten.blogspot.com/2009/07/splitting-comma-delimited-string-regexp.html
        l_newcolset(colindx)(rowindx) := trim(regexp_substr(json_value(dbms_tf.row_to_char(l_rowset, rowindx), '$.' || g_colname)
                                                      ,'[^;]+'
                                                      ,1
                                                      ,colindx));
      end loop; -- every column
    end loop; -- every row in the rowset
    -- add the newly populated columns to the rowset
    for indx in 1 .. g_colcount loop
      dbms_tf.put_col(columnid => indx, collection => l_newcolset(indx));
    end loop;
  end;
end separated_ptf;
/

After creating this Polymorphic Table Function we need an interface to use it in a SQL statement:

-- create a 'wrapper' function for the polymorphic table function
CREATE OR REPLACE FUNCTION separated_fnc(p_tbl IN TABLE,  
                cols columns DEFAULT NULL)
   RETURN TABLE PIPELINED
   ROW POLYMORPHIC USING separated_ptf;
/

Now, with this wrapper function in place we can start using it:

select *
  from separated_fnc(deptstg, columns(deptno, dname, loc))
/

Which is a lot easier than:

select trim(regexp_substr(line, '[^;]+', 1, 1)) deptno
      ,trim(regexp_substr(line, '[^;]+', 1, 2)) dname
      ,trim(regexp_substr(line, '[^;]+', 1, 3)) loc
  from deptstg stg
/

And similarly we can access the same code to retrieve data from the other table:

select *
  from separated_fnc(empstg, columns(empno, ename, job, mgr, hiredate, sal, comm, deptno))
/

That looks pretty much the same as the other one, but is definitely a lot simpler than

select trim(regexp_substr(line, '[^;]+', 1, 1)) empno
      ,trim(regexp_substr(line, '[^;]+', 1, 2)) ename
      ,trim(regexp_substr(line, '[^;]+', 1, 3)) job
      ,trim(regexp_substr(line, '[^;]+', 1, 4)) mgr
      ,trim(regexp_substr(line, '[^;]+', 1, 5)) hiredate
      ,trim(regexp_substr(line, '[^;]+', 1, 6)) sal
      ,trim(regexp_substr(line, '[^;]+', 1, 7)) comm
      ,trim(regexp_substr(line, '[^;]+', 1, 8)) deptno
  from empstg stg
/

The new implementation of my package is now like this:

create or replace package body process_stg is
  failure_in_forall exception;
  pragma exception_init(failure_in_forall, -24381);
  c_limit     constant number := 10;
  procedure dept is
    cursor c_inserts is
      with import as
       (select *
          from separated_fnc(deptstg, columns(deptno, dname, loc))
       )
      select i.deptno
            ,i.dname
            ,i.loc
        from import i
        left outer join dept d on (d.deptno = i.deptno)
       where 1 = 1
         and d.deptno is null;
    cursor c_updates is
      with import as
       (select *
          from separated_fnc(deptstg, columns(deptno, dname, loc))
       )
      select i.deptno
            ,i.dname
            ,i.loc
        from import i
        join dept d on (d.deptno = i.deptno)
       where 1 = 1
         and (   coalesce(d.dname,'-NULL') <> coalesce(i.dname,'-NULL-')
              or coalesce(d.loc, '-NULL')  <> coalesce(i.loc, '-NULL-')
             );
    type data_t is table of c_inserts%rowtype index by pls_integer;
    l_data data_t;
  begin
    open c_inserts;
    loop
      fetch c_inserts bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            insert into dept(deptno, dname, loc)
            values (l_data(indx).deptno, l_data(indx).dname, l_data(indx).loc);
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_inserts;
--
    open c_updates;
    loop
      fetch c_updates bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            update dept
               set dname = l_data(indx).dname
                 , loc = l_data(indx).loc
             where 1=1
               and deptno = l_data(indx).deptno;
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_updates;
  end dept;

  procedure emp is
    cursor c_inserts is
      with import as
       (select *
          from separated_fnc(empstg, columns(empno,ename,job,mgr,hiredate,sal,comm,deptno))
       )
      select i.empno   
            ,i.ename   
            ,i.job     
            ,i.mgr     
            ,i.hiredate
            ,i.sal     
            ,i.comm    
            ,i.deptno  
        from import i
        left outer join emp e on (e.empno = i.empno)
       where 1 = 1
         and e.empno is null;
    cursor c_updates is
      with import as
       (select *
          from separated_fnc(empstg, columns(empno,ename,job,mgr,hiredate,sal,comm,deptno))
       )
      select i.empno   
            ,i.ename   
            ,i.job     
            ,i.mgr     
            ,i.hiredate
            ,i.sal     
            ,i.comm    
            ,i.deptno  
        from import i
        left outer join emp e on (e.empno = i.empno)
       where 1 = 1
         and (   coalesce(e.ename,    '-NULL') <> coalesce(i.ename,    '-NULL')
              or coalesce(e.job,      '-NULL') <> coalesce(i.job,      '-NULL')
              or          e.mgr                <>          i.mgr               
              or coalesce(to_char(e.hiredate,'DDMMYYYY'), '-NULL') <> coalesce(i.hiredate, '-NULL')
              or          e.sal                <>          i.sal               
              or          e.comm               <>          i.comm               
              or          e.deptno             <>          i.deptno            
             );
    type data_t is table of c_inserts%rowtype index by pls_integer;
    l_data data_t;
  begin
    open c_inserts;
    loop
      fetch c_inserts bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            insert into emp(empno, ename, job, mgr, hiredate, sal, comm, deptno)
            values (l_data(indx).empno, l_data(indx).ename, l_data(indx).job, l_data(indx).mgr, to_date(l_data(indx).hiredate, 'DDMMYYYY'), l_data(indx).sal, trim(l_data(indx).comm), l_data(indx).deptno);
        exception
          when failure_in_forall then 
            dbms_output.put_line(q'[error]');
           dbms_output.put_line(sqlcode);
           dbms_output.put_line(sqlerrm);
            for indx in 1 .. sql%bulk_exceptions.count loop
              dbms_output.put_line('Error ' || indx || ' occurred on index ' || sql%bulk_exceptions(indx).error_index);
              dbms_output.put_line('Oracle error is ' ||
                                sqlerrm(-1 * sql%bulk_exceptions(indx).error_code));
            end loop;
            null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_inserts;
--
    open c_updates;
    loop
      fetch c_updates bulk collect into l_data limit c_limit;
      if l_data.count > 0 then
        begin
          forall indx in l_data.first .. l_data.last save exceptions
            update emp
              set ename       = l_data(indx).ename    
                , job         = l_data(indx).job      
                , mgr         = l_data(indx).mgr      
                , hiredate    = to_date(l_data(indx).hiredate, 'DDMMYYYY')
                , sal         = l_data(indx).sal      
                , comm        = l_data(indx).comm     
                , deptno      = l_data(indx).deptno   
            where 1=1
              and empno       = l_data(indx).empno;
        exception
          when failure_in_forall then null;
        end;
      end if;
      exit when l_data.count < c_limit;
    end loop;
    close c_updates;
  end emp;
end process_stg;
/

There is absolutely some improvement possible to the current implementation, like supporting duplicate separators, making the column to be split up a parameter, making the separator character a parameter as well, but that is a nice project for a later time.

I hope it all makes a bit of sense. If you have any improvements, don’t hesitate to comment.

Comparing queries…

Sun, 2017-10-22 02:25

How do you compare a rewritten query to its original version? Most of the time I just run a MINUS operation on the original and new query. Actually I execute two. Old query (A) MINUS New query (B) and vice versa (B) MINUS (A). Both should result in no rows. That way I thought I had proven that the resultsets for both queries are equal.
But there is a flaw in this assumption.

What if there are duplicate rows in one of the resultsets? The MINUS operator removes a row the resultset if it exists in both collections. But if one of the collections has the row twice and the other collection has it one, then it is completely removed.
Let’s say we have two queries:

-- first query
select 'alpha' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'charlie' nato_spelling from dual
NATO_SPELLING
-------------
alpha
bravo
bravo
charlie

and

-- second query
select 'alpha' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'charlie' nato_spelling from dual
NATO_SPELLING
-------------
alpha
bravo
charlie

As you can see, by just eye-balling the queries the resultsets are different.
But when you execute the minus operator on the queries you’ll get the impression the resultsets are the same:

select *
from
((-- first query
select 'alpha' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'charlie' nato_spelling from dual
 )
 MINUS
 (-- second query
select 'alpha' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'charlie' nato_spelling from dual
 )
)

NATO_SPELLING
----------------------------
select *
from
((-- second query
select 'alpha' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'charlie' nato_spelling from dual
 )
 MINUS
 (-- first query
select 'alpha' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'bravo' nato_spelling from dual union all
select 'charlie' nato_spelling from dual
 )
)
NATO_SPELLING
----------------------------

It gets worse when you compare queries that include a UNION or a UNION ALL operator:

select *
from
(
 (-- first query
  select 'alpha' nato_spelling from dual union all
  select 'bravo' nato_spelling from dual union all
  select 'bravo' nato_spelling from dual union all
  select 'charlie' nato_spelling from dual
 )
 UNION
 (-- third query
  select 'x-ray' nato_spelling from dual union all
  select 'yankee' nato_spelling from dual union all
  select 'zulu' nato_spelling from dual
 )
)
NATO_SPELLING
-------------
alpha
bravo
charlie
x-ray
yankee
zulu

6 rows selected

and

select *
from
(
 (-- first query
  select 'alpha' nato_spelling from dual union all
  select 'bravo' nato_spelling from dual union all
  select 'bravo' nato_spelling from dual union all
  select 'charlie' nato_spelling from dual
 )
 UNION ALL
 (-- third query
  select 'x-ray' nato_spelling from dual union all
  select 'yankee' nato_spelling from dual union all
  select 'zulu' nato_spelling from dual
 )
)
NATO_SPELLING
-------------
alpha
bravo
bravo
charlie
x-ray
yankee
zulu

7 rows selected

Clearly the results are different. This is because of how the UNION operator works. It remove duplicates from the resultset. But, when I check it using the MINUS operator (both ways):

select *
  from 
  (
   (select *
      from
      (
       (-- first query
        select 'alpha' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'charlie' nato_spelling from dual
       )
       UNION
       (-- third query
        select 'x-ray' nato_spelling from dual union all
        select 'yankee' nato_spelling from dual union all
        select 'zulu' nato_spelling from dual
       )
      )
   )
   MINUS
   (select *
      from
      (
       (-- first query
        select 'alpha' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'charlie' nato_spelling from dual
       )
       UNION ALL
       (-- third query
        select 'x-ray' nato_spelling from dual union all
        select 'yankee' nato_spelling from dual union all
        select 'zulu' nato_spelling from dual
       )
      )
   )
  )
NATO_SPELLING
-------------

and

select *
  from
  (
   (select *
      from
      (
       (-- first query
        select 'alpha' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'charlie' nato_spelling from dual
       )
       UNION ALL
       (-- third query
        select 'x-ray' nato_spelling from dual union all
        select 'yankee' nato_spelling from dual union all
        select 'zulu' nato_spelling from dual
       )
      )
   )
   MINUS
   (select *
      from
      (
       (-- first query
        select 'alpha' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'bravo' nato_spelling from dual union all
        select 'charlie' nato_spelling from dual
       )
       UNION
       (-- third query
        select 'x-ray' nato_spelling from dual union all
        select 'yankee' nato_spelling from dual union all
        select 'zulu' nato_spelling from dual
       )
      )
   )
  )
NATO_SPELLING
-------------

These results tell me the queries have the same resultsets, when in fact they do not.
A simple (extra) check could be to check if both queries have the same number of rows in the resultset.
In my opinion the resultsets are equal when both the MINUS queries give no results AND the count of the individual queries are the same.
Is there something I am still missing here?
I hope not.

Change UUIDs in VirtualBox

Sun, 2017-03-12 00:29

If you are anything like me you will have multiple virtualboxes running on your system. Sometimes you might want to run a copy of a virtualbox for different purposes. Like running an Oracle 11 Devdays instance as test environment but also running the same vbox for customer testing. If you copy the vbox and try to run it in the manager you’ll be presented with an error that a harddisk with the same UUID already exists. Here’s how I solved it.

First of all you make a backup-copy of the Virtualbox you want to change. While this is running you can download the portable apps UUID-GUID generator or if you are not running windows a similar program. You can also use an online GUID generator.

After the backup has completed you can start changing the UUIDs for the VirtualBox. Open the <virtualboxname>.vbox file in a text editor. There are a couple of UUIDs that need to be changed:

First look for the <Machine> tag (2nd tag in the xml file). One of the attributes is uuid={some_uuid}. You can change this to your new uuid. This is where the generator comes in, just generate a new uuid and paste that here.

Next you need to change the uuids for the harddisks. This is a little more tricky. Find the tag <Harddisk> and look for the uuid attribute. This uuid is used multiple times in the xml file. Also in the StorageControllers section. The easiest way to keep these in sync is to do a search-and-replace over the entire file. Search for the current uuid, replace with a freshly generated uuid. Before you change the next one. you also need to change the uuid in the harddisk file. You do this running a command line utility VBoxManage.
The command is like this:
<path_to_virtualbox>VBoxManage internalcommands sethduuid <filepath> <uuid>

Repeat this process for all the harddisks that are defined. This way you can have multiple instances of the same VirtualBox in your VirtualBox Manager.

You may want to change other settings like MAC Addresses for your network cards, but you can do this using the VBox interface.

Connecting PL/SQL Developer

Fri, 2016-10-28 10:28

In SQL Developer you have a lot of options when connecting to the database. You can use the TNS entries defined but you can also give the hostname, port and SID or Servicename for instance. PL/SQL Developer doesn’t supply these options, but you can still use them…

In the logonscreen you can choose the database by choosing you TNS entry. But did you know you can put in the complete text of a tnsnames.ora entry here?
So, you can type in:

username: demo
password: <<password>>
database: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=LOCALHOST)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DEMO)))

But it can be done even simpler:

username: demo
password: <<password>>
database: LOCALHOST:1521/DEMO

Maybe you already knew this and I have known this for a while, but I thought I would write this down anyway. Hope the helps some of you.

How can we use Oracle to deduplicate our data

Thu, 2016-10-13 05:13

Dear Patrick,

We have gone through a merger at our company where we are trying to merge the databases. The problem now is that we have duplicate records in our tables. We can of course go through all the records by hand and check if they exist twice. Another option is to build an application to do this. But using the Oracle Database there must be a better way to do this. Any ideas?

Ramesh Cunar

Dear Ramesh,

Going through all these records by hand seems like quite a time consuming process. I don’t know the number of records in your tables, but I would definitely not do this by hand. Writing an application to do this makes it possible to run the application multiple times, so you can update the rules making it better every time. Writing an application can be time consuming and running it can be as well.

I think your best shot would be to try and find a solution in either PL/SQL or SQL. Depending on the complexity and size of your data you could try any of the following solutions.

You can write a procedure that loops through your data and saves the records in an Associative array. If the record already exists in the array, or at least the key fields are equal, then it should not be added.

CREATE OR REPLACE PROCEDURE ot_deduplicate_aa IS
  TYPE t_ot_emp_aa IS TABLE OF ot_emp%ROWTYPE INDEX BY PLS_INTEGER;
  CURSOR c_ot_emp IS
    SELECT e.empno
          ,e.ename
          ,e.job
          ,e.mgr
          ,e.hiredate
          ,e.sal
          ,e.comm
          ,e.deptno
      FROM ot_emp e;
  r_ot_emp         ot_emp%ROWTYPE;
  l_unique_ot_emps t_ot_emp_aa;
  FUNCTION record_already_exists(record_in        IN ot_emp%ROWTYPE
                                ,collection_inout IN OUT t_ot_emp_aa) RETURN BOOLEAN IS
    l_indx        PLS_INTEGER := collection_inout.first;
    l_returnvalue BOOLEAN := FALSE;
  BEGIN
    IF l_indx IS NOT NULL THEN
      LOOP
        l_returnvalue := l_returnvalue OR ((record_in.ename = collection_inout(l_indx).ename) AND
                         (record_in.job = collection_inout(l_indx).job));
        l_indx := collection_inout.next(l_indx);
        EXIT WHEN l_returnvalue OR(l_indx IS NULL);
      END LOOP;
    END IF;
    RETURN l_returnvalue;
  END record_already_exists;

BEGIN
  OPEN c_ot_emp;
  LOOP
    FETCH c_ot_emp
      INTO r_ot_emp;
    EXIT WHEN c_ot_emp%NOTFOUND;
    -- check if this record already exists
    IF NOT (record_already_exists(record_in => r_ot_emp, collection_inout => l_unique_ot_emps)) THEN
      l_unique_ot_emps(l_unique_ot_emps.count + 1) := r_ot_emp;
    END IF;
  END LOOP;
  FOR indx IN l_unique_ot_emps.first .. l_unique_ot_emps.last LOOP
    INSERT INTO ot_emp_deduplicated
      (empno
      ,ename
      ,job
      ,mgr
      ,hiredate
      ,sal
      ,comm
      ,deptno)
    VALUES
      (l_unique_ot_emps(indx).empno
      ,l_unique_ot_emps(indx).ename
      ,l_unique_ot_emps(indx).job
      ,l_unique_ot_emps(indx).mgr
      ,l_unique_ot_emps(indx).hiredate
      ,l_unique_ot_emps(indx).sal
      ,l_unique_ot_emps(indx).comm
      ,l_unique_ot_emps(indx).deptno);
  END LOOP;
END ot_deduplicate_aa;
/

You can speed up this process by using bulk processing for both the retrieval and the storing of the data. Watch out that you don’t blow up your memory by retrieving too much data at once. You can use the limit clause to prevent this.

Another way to go at this, is the use of the analytic functions in Oracle. The idea behind using analytics is to rank all the record in their own partition. This rank can be rather arbitrary so we can use ROW_NUMBER as a ranking mechanism. In the PARTITION BY clause we can add all the columns we need to check for duplicates on. And to define which record to keep we add the (obligatory) order by clause. Then, using this new column, we can state that we are just interested in the records where the ROW_NUMBER equals 1.

WITH emp_all_rows AS
 (SELECT e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno
       ,row_number() OVER (PARTITION BY e.ename, e.job ORDER BY e.empno ASC) rn
    FROM ot_emp e) 
INSERT
  INTO ot_emp_deduplicated(empno, ename, job, mgr, hiredate, sal, comm, deptno)
(SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
   FROM emp_all_rows
  WHERE rn = 1)
/

You can of course use this same technique to delete the duplicate rows directly from the source table, but then you cannot use the query factoring (WITH clause) because that isn’t supported in SQL.

DELETE FROM ot_emp e
 WHERE e.rowid IN 
(SELECT ear.rowid
   FROM (SELECT e.rowid
               ,row_number() over(PARTITION BY e.ename ORDER BY e.empno ASC) rn
           FROM ot_emp e) ear --<-- EmpAllRows
  WHERE rn > 1)
/

It is probably faster to insert the de-duplicated rows into a new table, then drop the original table and rename the new table. Be aware of any triggers, constraints, indexes etc. that may exist and need to be recreated.

You can check out the full demo code at
https://livesql.oracle.com/apex/livesql/s/ckqg3dkqmvj5omqttapzmrhtg
Or you can send me an email and I will send you the demo script.

Hope this helps in your efforts to de-duplicate your data.

Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Winter 2015.

OTN Appreciation Day : Prebuilt Developer VMs

Tue, 2016-10-11 05:11

I learnt a lot from the entire Oracle Technology Network and I still do. One of the best features of OTN these days, IMHO, is the ability to download prebuilt Virtual Machines. Since I am a developer and not an administrator, I don’t like to be bothered with stuff like how much disk space do I need, how many cores should I use etc. I can just download a Virtual Box image, import it and start experimenting with the technology I am interested in. For instance, the multi tenant features of the Oracle 12c database. The best thing in using a virtual machine is that when you screw up really, really bad, you can just throw away the virtual machine, import the original version again and try again.
Thanks, OTN, for making it possible for me to learn new technologies without having to learn all the administrator stuff.
Oh, and if I need some extra information or find out what an error means and what I can do about it, there is almost always an answer to be found at OTN.

#ThanksOTN

How can we add custom code to existing triggers?

Thu, 2016-10-06 05:06

Dear Patrick,

We have bought an application that runs on an Oracle database. There are triggers defined on the tables, but we want to add our own code to these triggers, but we don’t have access to the source code. What is the approach we should follow to accomplish this?

Collin Bratforth

Dear Collin,

There are two types of DML triggers. Statement level triggers and row level triggers. Then for these two types there are two triggering moments, before and after. So you get four trigger moments for each DML operation. The statement level triggers fire once for each statement, the row level triggers fire once for each row. This gives you enough moments to add your code. If however you want to add code to a triggering moment already used by the application, you can just add another trigger which will fire at the same moment as the existing one.
This should give you plenty of possibilities to add your own code. Since Oracle 8i you can define multiple triggers at the same firing event, but if you define two or more triggers on the same firing event there is no way to tell which trigger will always fire first. You might think it is based on the creation order or an alphabetical order of the object name, but that is not the case. If the code executed in the triggers is not dependent on each other, then there is no problem, but what if one trigger gets the next value of a sequence and in the other trigger you want to use the current value, you might run into a problem. If the second trigger fires first then the current value of the sequence is either not defined or still holds an old value (previously defined in the session) which is both not correct.

CREATE TABLE trigger_demo
( ID NUMBER
, VALUE VARCHAR2(30)
, first_trigger_value VARCHAR2(30)
, second_trigger_value VARCHAR2(30)
)
/
CREATE SEQUENCE trigger_demo_seq START WITH 1 NOCACHE
/
CREATE OR REPLACE TRIGGER first_trigger
  BEFORE INSERT ON trigger_demo
  FOR EACH ROW
BEGIN
  :new.id                  := trigger_demo_seq.nextval;
  :new.first_trigger_value := 'First Trigger ' || to_char(trigger_demo_seq.currval);
END;
/
CREATE OR REPLACE TRIGGER second_trigger
  BEFORE INSERT ON trigger_demo
  FOR EACH ROW
BEGIN
  :new.second_trigger_value := 'Second Trigger ' || to_char(trigger_demo_seq.currval);
END;
/
INSERT INTO trigger_demo
  (VALUE)
VALUES
  ('Patrick')
/
SELECT *
  FROM trigger_demo
/

If you run this code, chances are about fifty-fifty that you will run into this error:

ORA-08002: sequence TRIGGER_DEMO_SEQ.CURRVAL is not yet defined in this session
ORA-06512: at "SCOTT.SECOND_TRIGGER", line 2
ORA-04088: error during execution of trigger 

The reason is that SCOTT.SECOND_TRIGGER fires before SCOTT.FIRST_TRIGGER fires and trigger_demo_seq.currval is only defined after trigger_demo_seq.nextval has been called which only happens in SCOTT.FIRST_TRIGGER.
Since Oracle 11g you have the possibility to tell the trigger to fire after another trigger. You can do this by adding the FOLLOWS clause to the trigger. This way you can make sure your trigger gets fired after the other one.

CREATE OR REPLACE TRIGGER second_trigger
  BEFORE INSERT ON trigger_demo
  FOR EACH ROW
  FOLLOWS first_trigger
BEGIN
  :new.second_trigger_value := 'Second Trigger ' || to_char(trigger_demo_seq.currval);
END;
/
INSERT INTO trigger_demo
  (VALUE)
VALUES
  ('Patrick')
/
SELECT *
  FROM trigger_demo
/

This way the second_trigger gets fired after the first_trigger. Unfortunately there is no preceding clause for the triggers, so you cannot have triggers get fired before the original code. There is a preceding clause available but this is reserved for cross edition triggers, which are part of Edition Based Redefinition, but that is a whole different subject.
I hope this answers your question.

Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Fall 2015.

What is overloading and how and when do I use it

Thu, 2016-09-29 04:29

Dear Patrick,

Recently I heard someone talk about overloading in Java. What is it, is it possible in PL/SQL and if so, how would I use it?

Ramesh Cumar

Dear Ramesh,

Overloading is a technique of creating multiple programs with the same name that can be called with different sets of parameters. It is definitely possible to apply this technique in PL/SQL, in fact, Oracle does this a lot of times in their own built-in packages. If you take a look at the SYS.STANDARD package then you will find a lot of functions called TO_CHAR, but with different parameter sets. You probably never wondered how Oracle can use the same function name for completely different tasks. It’s just as easy to write
TO_CHAR(9) which will result in ‘9’ as it is to write TO_CHAR(SYSDATE) which will result in the current date in to format specified in the NLS_DATE_FORMAT parameter, for example 29-12-15 if the format is ‘DD-MM-RR’. If you would want to get this value in a different format you can just write TO_CHAR (SYSDATE, ‘Month, DDth YYYY’) to get ‘December, 29th 2015’. As you can see they are all calls to a function with the same name, but with completely different sets of parameters.

This behavior cannot be realized by making all the parameters option, like this:

FUNCTION TO_CHAR (num_in in number default null
, date_in in date default null
, datemask_in in varchar2) return varchar2;

Because if you would want to call this function without using named parameters this call
TO_CHAR (SYSDATE) would not work, since SYSDATE returns a DATE and the function expects a number as its first parameter. Maybe it might work, because of the implicit typecasts, but you get the idea.
The way this is implemented is there are multiple functions defined in a package with the same name but different sets of parameters.

One of the packages you can take a look at, because its implementation is readable, i.e. not wrapped, is the HTP package which you can use to generate HTML output for instance in an APEX application. If you take a look at for instance the PRINT procedure. In the package specification you can see there are three implementations available for this procedure:

procedure print (cbuf in varchar2 character set any_cs DEFAULT NULL);
procedure print (dbuf in date);
procedure print (nbuf in number);

The parameters of these function differ not only in name, but also in data type, which is a requirement for the use of overloading:
Data type and/or number and/or name of parameters must differ
The compiler will not complain if you don’t completely comply with this rule, but at runtime you will not be able to use either one of them.

Consider the following package with its implementation

[PATRICK]SQL>CREATE OR REPLACE PACKAGE ol IS
              PROCEDURE p (param_in IN VARCHAR2);
               PROCEDURE p (param_in IN CHAR);
             END ol;
             /
[PATRICK]SQL>CREATE OR REPLACE PACKAGE BODY ol IS
               PROCEDURE p (param_in IN VARCHAR2)
               IS
               BEGIN
                 dbms_output.put_line(param_in);
               END p;
               PROCEDURE p (param_in IN CHAR)
               IS
               BEGIN
                 dbms_output.put_line(param_in);
               END p;
             END ol;
             /

If you want to call the procedure there is no way Oracle can decide which one to use.

[PATRICK]SQL>BEGIN
               ol.p('Hello World');
             END;
             /
  ol.p('Hello World');
  *
ERROR at line 2:
ORA-06550: Line 2, column 3:
PLS-00307: too many declarations of 'P' match this call.
ORA-06550: Line 2, column 3:
PL/SQL: Statement ignored.

Even if you were using named parameters you would get the same error. What we have here is so called ‘ambiguous overloading’. You can read more about this subject at http://www.stevenfeuerstein.com/learn/building-code-analysis.
So, there is definitely a use for overloading but you have to be careful about the parameters, especially when parameters have default values. If you run into a situation of ambiguous overloading you now know why the compiler didn’t complain, but the runtime engine does.

Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Summer 2015.

When would you use a normal table function?

Thu, 2016-09-22 04:22

Dear Patrick,

Last year I did a presentation on table functions at KScope. One of the questions I got was: ‘If pipelined table functions provide their results faster, why would you want to use a normal table function?’ I couldn’t come up with the answer then, maybe you can help?

Erik van Roon

Dear Erik,

Let’s start with explaining a bit what table functions are. Table Functions are functions that return a collection of data and which can be called in a normal SQL statement by using the TABLE() operator. Let’s create a simple function. Please note this is a demonstration only, not something you would normally solve using table functions (or PL/SQL as a whole).
First step is to create a record type in the database:

[PATRICK]SQL>CREATE TYPE emp_rt AS OBJECT
             ( empno NUMBER(4)
             , ename VARCHAR2(10)
             , mgr   NUMBER(4)
             )
             /

Then you need to create a table type in the database:

[PATRICK]SQL>CREATE TYPE emp_tt AS TABLE OF emp_rt
             /

Then it is time for the simple function. The DBMS_LOCK.SLEEP call is in there to show the difference between Table Functions and Pipelined Table Functions.

[PATRICK]SQL>CREATE OR REPLACE FUNCTION tf_emp RETURN emp_tt
             AS
               l_returnvalue emp_tt;
             BEGIN
               SELECT emp_rt(e.empno, e.ename, e.mgr)
                 BULK COLLECT INTO l_returnvalue
                 FROM emp e
                ORDER BY e.deptno;
               FOR indx IN l_returnvalue.first ..
                           l_returnvalue.last LOOP
                 l_returnvalue(indx).ename :=
                   INITCAP(l_returnvalue(indx).ename);
                 dbms_lock.sleep(.25); -- for demo purposes only
               END LOOP;
               RETURN l_returnvalue;
             END;
             /

Now you can call the function in the FROM clause of your SQL statement as if it were a relational table:

[PATRICK]SQL>SELECT *
               FROM TABLE(tf_emp)
             /

Notice that the result is displayed after all records have been processed, i.e. after 3.5 seconds (due to the DBMS_LOCK.SLEEP statement).
Now let’s create a PIPELINED table function, which produces the same result but in a different manner:

[PATRICK]SQL>CREATE OR REPLACE FUNCTION ptf_emp RETURN emp_tt
                               PIPELINED
             AS
               l_emps emp_tt;
               l_returnvalue emp_rt;
             BEGIN
               SELECT emp_rt(e.empno, e.ename, e.mgr)
                 BULK COLLECT INTO l_emps
                 FROM emp e
                ORDER BY e.deptno;
               FOR indx IN l_emps.first ..
                           l_emps.last LOOP
                 l_returnvalue :=
                   emp_rt(empno => l_emps(indx).empno
                         ,ename => INITCAP(l_emps(indx).ename)
                         ,mgr => l_emps(indx).mgr);
                 PIPE ROW (l_returnvalue);
                 dbms_lock.sleep(.25); -- for demo purposes only
               END LOOP;
               RETURN;
             END;
             /

If you set the arraysize of your SQL*Plus session (or your command window in PL/SQL Developer) you can see how the results are being returned as they are produced, i.e. 0.25 seconds apart.

[PATRICK]SQL>SET ARRAYSIZE 1
[PATRICK]SQL>SELECT *
               FROM TABLE(ptf_emp)
             /

Now you can see the difference between a Table Function and a Pipelined Table Function. Pipelined Table Functions are best used when you are executing different stages of transformation of your data, for example reading from an OLTP system and writing to a DataWareHouse system. If you PARALLEL_ENABLE your functions AND your source table can be read parallel then you could really see some performance benefits.
But all of this doesn’t explain why you should NOT use pipelining in a table function. The ONLY reason I can think of is when you want to be able to call the function from plain PL/SQL. PL/SQL does one call to a function and expects one result from it. Not a result spread out over many ‘callback’s.
If you create a function like this:

[PATRICK]SQL>CREATE OR REPLACE PROCEDURE call_tf AS
               l_emps emp_tt;
             BEGIN
               l_emps := tf_emp;
               FOR indx IN l_emps.first .. l_emps.last LOOP
                 dbms_output.put_line(l_emps(indx).empno ||
                                     ' ' || l_emps(indx).ename ||
                                     ' ' || l_emps(indx).mgr);
               END LOOP;
             END;
             /

And call this function, then everything works ok. It takes about 3.5 seconds for the function to complete, due to the DBMS_LOCK.SLEEP call in the tf_emp function.
If you call the pipelined table function you are stopped at compile time. The call to a pipelined table function is not allowed.

[PATRICK]SQL>CREATE OR REPLACE PROCEDURE call_ptf AS
               l_emps emp_tt;
             BEGIN
               l_emps := ptf_emp;
               FOR indx IN l_emps.first .. l_emps.last LOOP
                 dbms_output.put_line(l_emps(indx).empno ||
                                     ' ' || l_emps(indx).ename ||
                                     ' ' || l_emps(indx).mgr);
               END LOOP;
             END;
             /
Warning: Procedure created with compilation errors.
[PATRICK]SQL>sho err
Errors for PROCEDURE CALL_PTF:
LINE/COL ERROR
-------- ---------------------------------------------------------------------
1/10     PLS-00653: aggregate/table functions are not allowed in PL/SQL scope

You can of course wrap the call to the pipelined table function in a cursor like this:

[PATRICK]SQL>CREATE OR REPLACE PROCEDURE call_ptf2 AS
               CURSOR c_emps IS
                 SELECT emp_rt(t.empno, t.ename, t.mgr)
                   FROM TABLE(ptf_emp) t;
               l_emps emp_tt;
             BEGIN
               OPEN c_emps;
               FETCH c_emps BULK COLLECT
                 INTO l_emps;
               FOR indx IN l_emps.first .. l_emps.last LOOP
                 dbms_output.put_line(l_emps(indx).empno ||
                                     ' ' || l_emps(indx).ename ||
                                     ' ' || l_emps(indx).mgr);
               END LOOP;
             END;
             /

Procedure created.

But when you call this function you will see that it takes about 3.5 seconds to fetch all the records, effectively using the pipelined table function as a normal table function. This might be your escape to use pipelined table functions in a SQL only environment and still use the same function in a PL/SQL environment.
‘But you said pipelined table functions are best used then executing different stages of transformation. That includes multiple PL/SQL functions.’ you might wonder. That is correct. A pipelined table function may call another pipelined table function. It can use for instance a collection as its input like this

[PATRICK]SQL>CREATE OR REPLACE FUNCTION ptf_emp3
                                        (emps_in IN emp_tt)
                                 RETURN emp_tt PIPELINED
             AS
               l_emp emp_rt;
             BEGIN
               FOR indx IN emps_in.first .. emps_in.last LOOP
                 l_emp := emps_in(indx);
                 l_emp.ename := UPPER(l_emp.ename);
                 PIPE ROW (l_emp);
                 dbms_lock.sleep(.25); -- for demo purposes only
               END LOOP;
               RETURN;
             END;
             /

When you call this function using this:

[PATRICK]SQL>SET ARRAYSIZE 1
[PATRICK]SQL>SELECT *
               FROM TABLE(ptf_emp3(CAST(MULTISET (SELECT *
                                             FROM TABLE(ptf_emp))
                                                     AS emp_tt)))
             /

You will notice the output will not start displaying until after about 3.5 seconds which tells me the call to ptf_emp must be completed before pft_emp3 can start doing its work.
If you change the parameter to accept a cursor (and of course change the processing as well) like this:

[PATRICK]SQL>CREATE OR REPLACE FUNCTION ptf_emp4
                               (cursor_in IN SYS_REFCURSOR)
                                 RETURN emp_tt PIPELINED
             AS
               l_emp emp_rt;
             BEGIN
               LOOP
                 FETCH cursor_in
                   INTO l_emp;
                 EXIT WHEN cursor_in%NOTFOUND;
                 l_emp.ename := upper(l_emp.ename);
                 PIPE ROW(l_emp);
                 dbms_lock.sleep(.25); -- for demo purposes only
               END LOOP;
               RETURN;
             END;
             /

And call this function using this statement:

[PATRICK]SQL>SET arraysize 1
[PATRICK]SQL>SELECT *
               FROM TABLE(ptf_emp4(CURSOR (SELECT emp_rt(empno
                                                        ,ename
                                                        ,mgr)
                                       FROM TABLE(ptf_emp))))
             /

You will notice the output starting to display after about half a second. The total time needed for this function to complete is roughly the same as the previous, but this function starts processing sooner, which might be exactly what you need.

I hope this sheds a bit of light on the subject. I think bottom line is to always use pipelined table functions instead of normal table functions, except when the function is only meant for PL/SQL or client consumption. You can always work around implementation restrictions if you need to use the same codebase in both SQL and PL/SQL.

Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Spring 2015.

What’s the difference between SEMI-JOIN and ANTI-JOIN?

Thu, 2016-09-15 04:15

Dear Patrick,

What is an ANTI-JOIN? And what is the difference between the SEMI-JOIN and the ANTI-JOIN?

Lillian Sturdey

Dear Lillian,

First of all, both SEMI-JOIN and ANTI-JOIN are not in the SQL syntax but they are more a pattern. You might expect to be able to write something like:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               SEMI JOIN emp e ON (e.deptno = d.deptno)
             /

to get all the departments that have at least one employee.
Or:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               ANTI JOIN emp e ON (e.deptno = d.deptno)
             /

to get the departments with no employees. But all you get is an error saying your command is not properly ended, which can be read as a syntax error.

ERROR at line 3:
ORA-00933: ORA-00933 SQL command not properly ended.

Maybe your first idea would be to use a normal join to get all the departments with at least one employee:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               JOIN emp e ON (e.deptno = d.deptno)
             /

But this results in a record for every row in the EMP table. And we only wanted every unique department.

    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        10 ACCOUNTING     NEW YORK
        30 SALES          CHICAGO
        20 RESEARCH       DALLAS

    DEPTNO DNAME          LOC
---------- -------------- -------------
        30 SALES          CHICAGO
        20 RESEARCH       DALLAS
        10 ACCOUNTING     NEW YORK

14 rows selected.

Well, that’s easy enough, you think, just add a DISTINCT to the statement:

[PATRICK]SQL>SELECT DISTINCT d.deptno, d.dname, d.loc
               FROM dept d
               JOIN emp e ON (e.deptno = d.deptno) 
             /

Exactly the result we are looking for:

    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        10 ACCOUNTING     NEW YORK
        30 SALES          CHICAGO

But what if the EMP table contains hundreds, thousands or maybe millions of rows. That would mean the database has to do a lot of work to filter out the distinct values.
A different, and probably better, approach would be to use the SEMI-JOIN pattern. You can use the IN operator like this:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
              FROM dept d
              WHERE d.deptno IN (SELECT e.deptno
                                   FROM emp e)
             /
    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        10 ACCOUNTING     NEW YORK

This is exactly what we want to see but for big tables this is not the correct way to go. For every record in the dept table all the records in the EMP table are checked. Again, if we have a lot of employees, this means a lot of work for the database.
A better SEMI-JOIN to use is the EXISTS operator:

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
              WHERE EXISTS (SELECT 1
                              FROM emp e
                             WHERE e.deptno = d.deptno)
             /

Please note that with the current optimizer in the database Oracle will rewrite your query to use the best approach for the task. If the inner table (in our example EMP) is rather small, then the IN approach might be the best, in other cases it might be better to use the EXISTS approach. Where in earlier versions you had to think about which way to go (IN is better for small tables, EXISTS is better for big ones), you can now rely on the optimizer to make the correct choice.
If you would want to see exactly the opposite of this query, i.e. all departments with no employees, you use an ANTI-JOIN pattern, which is pretty much the same but in this case you use NOT IN or NOT EXISTS. A different approach, which I think is pretty nice is to use an OUTER JOIN and check for the non-existence of values in column for the OUTER JOINED table.

[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc
               FROM dept d
               LEFT OUTER JOIN emp e ON (e.deptno = d.deptno)
              WHERE e.empno IS NULL
             /
    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Hope this gives you a bit more insight in this subject and gives you a better understanding of the wonders of the SQL language. Notice there are many ways to reach the same result, but one approach might be more economical than the other.

Happy Oracle’ing,
Patrick Barel

Hope this answers your question.
Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Winter 2014.

What is the difference between NVL and COALESCE?

Thu, 2016-09-08 04:08

Dear Patrick,

Could you tell me what the difference is between NVL and COALESCE?

Kindest regards,
Paul McCurdey

Dear Paul,

NVL returns the value of the first argument if it doesn’t evaluate to NULL, otherwise it will return the value of the second argument.
COALESCE returns the first argument that doesn’t evaluate to NULL. That can be any one of the arguments.
So they are definitely similar – but there are significant differences.
First of all, COALESCE is a function that’s part of the ANSI-92 standard whereas NVL was made in the 80′s when there were no standards. Since COALESCE is the newer function of the two (since 9i), it is better equipped for multiple values and it does less work, therefore it’s the greener option (doing less work means using less resources, like power, and therefore it is greener).
How is doing less work better, you might ask? Well, would you want to do the work for which you know the result is never going to be used? I know I wouldn’t. That is one thing COALESCE does for you. It does not evaluate an argument if its result is not needed.
A big advantage of using the COALESCE function is the short-circuit evaluation. Where NVL evaluates both arguments, whether the second argument should be used or not, COALESCE only evaluates the arguments if they are needed.
For example:
If you run the following statement:

[PATRICK]SQL>SELECT NVL(1, 1/0) FROM dual
             /

you will see the

ORA-01476: division by zero

error.
In this statement the first argument (1) is NOT NULL so the second argument (1/0) should not be returned, but since with NVL PL/SQL evaluates the expression, this statement results in the exception.
The COALESCE function only evaluates the arguments if they are needed.
If you run the following statement:

[PATRICK]SQL>SELECT COALESCE(1, 1/0) FROM dual
          2  /

you will not get an error, since the second argument is not evaluated. In other words it evaluates exprN only if expr(N-1) evaluates to NULL.

A simple test shows the difference again:
First we create simple package which holds a variable and a function to increase that variable:

[PATRICK]SQL>CREATE OR REPLACE PACKAGE nvl_test
             IS
               g_value NUMBER := 0;
               FUNCTION increase_value RETURN NUMBER;
             END nvl_test;
             /
[PATRICK]SQL>CREATE OR REPLACE PACKAGE BODY nvl_test
             IS
               FUNCTION increase_value RETURN NUMBER
               IS
                l_returnvalue NUMBER;
               BEGIN
                 dbms_output.put_line('nvl_test.increase_value');
                 nvl_test.g_value := nvl_test.g_value + 1;
                 l_returnvalue := nvl_test.g_value;
                 RETURN l_returnvalue;
               END increase_value;
             END nvl_test;
             /

Then a script to demonstrate what happens. First display the value of the variable. Then call the NVL function where the first value is NULL. As you can see, the function in the package is called, hence the variable is increased. Then another call to the NVL function, this time with a non NULL value. The function in the package is still called even though its value is not being used.
Then we reset the value of the variable and run the same tests, but this time using the COALESCE function. As you can see, the function is only being called if the previous argument(s) evaluate to NULL.

[PATRICK]SQL>DECLARE
               l_dummy NUMBER;
               l_foo NUMBER;
             BEGIN
               dbms_output.put_line('====reset package====');
               nvl_test.g_value := 0;
               l_dummy := nvl_test.g_value;
               dbms_output.put_line(l_dummy);
               l_foo := NVL(NULL,nvl_test.increase_value);
               dbms_output.put_line(l_foo);
               l_dummy := nvl_test.g_value;
               dbms_output.put_line(l_dummy);
               l_foo := NVL(2912,nvl_test.increase_value);
               dbms_output.put_line(l_foo);
               l_dummy := nvl_test.g_value;
               dbms_output.put_line(l_dummy);
               dbms_output.put_line('====reset package====');
               nvl_test.g_value := 0;
               l_dummy := nvl_test.g_value;
               dbms_output.put_line(l_dummy);
               l_foo := coalesce(NULL,nvl_test.increase_value);
               dbms_output.put_line(l_foo);
               l_dummy := nvl_test.g_value;
               dbms_output.put_line(l_dummy);
               l_foo := coalesce(2912,nvl_test.increase_value);
               dbms_output.put_line(l_foo);
               l_dummy := nvl_test.g_value;
               dbms_output.put_line(l_dummy);
             END;
             /

====reset package====
0
nvl_test.increase_value
1
1
nvl_test.increase_value
2912
2
====reset package====
0
nvl_test.increase_value
1
1
2912
1

If you run the anonymous block in an IDE where you can step through the code, you can see when the code is executed and when it is bypassed.
So, if you don’t need or want the code executed when the value of a variable or result of a function is not NULL, then you should use COALESCE to prevent this from happening. But there might be a use case in which you always want a piece of code executed whether the first argument is NULL or not. Then you should use (or stick to) NVL.
I think the rule should be: Use COALESCE unless…

Hope this answers your question.
Happy Oracle’ing,
Patrick Barel

If you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Fall 2014

How do I get my query results paginated?

Thu, 2016-09-01 04:01

Dear Patrick,

I have got a website with a search form. I want to display a limited number of results to the user and have him/her navigate through different pages. Is this possible using plain SQL?

Kindest regards,

Mitchell Ian

Dear Mitchell,

Of course this is possible. It might take some thinking, but that has never hurt anyone (yet). First we need a table with some randomly sorted data in it. In this example I am just using 20 records, but you can use this approach on bigger tables of course.

[PATRICK]SQL>CREATE TABLE t AS 
             SELECT LEVEL val#, to_char(LEVEL, '9999') value_in_text 
               FROM dual 
             CONNECT BY LEVEL &lt; 21
              ORDER BY dbms_random.random 
             /  

Table created.

The order by dbms_random.random is to ensure the data is inserted in random order. I you just select from this new table then you data will be unordered.

Now we select the first ‘page’ from this table. Our page size is 5 records. So the query will be:

[PATRICK]SQL>SELECT *
              FROM t
             WHERE ROWNUM &lt;= 5
            /
VAL#       VALUE
---------- -----
10         10
20         20
16         16
1          1
17         17

This results in the first 5 rows from the table. If we want to get the next 5, rownums 6 through 10 then you might want to try something like this.

[PATRICK]SQL>SELECT *
               FROM t
              WHERE ROWNUM &gt; 5 AND ROWNUM &lt;= 10
             /

no rows selected

Unfortunately this doesn’t work. I appears this query will never have any resulting row with a number between 6 and 10. The solution to this issue is the use of a subquery:

[PATRICK]SQL>SELECT val#, value_in_text
               FROM (SELECT t.val#, t.value_in_text, ROWNUM rn
                       FROM t)
              WHERE rn > 5 AND rn <= 10
             /

VAL#       VALUE
---------- -----
13         13
4          4
5          5
3          3
14         14

In this query we first select all the rows we might need for the pages and using this resultset we just select the rows we are interested in for our page.

If your table is rather big you may want to include the maximum rownum in the inline view.

[PATRICK]SQL>SELECT val#, value_in_text
               FROM (SELECT t.val#, t.value_in_text, ROWNUM rn
                       FROM t
                      WHERE ROWNUM <= 10)
              WHERE rn > 5 AND rn <= 10
             /

VAL#       VALUE
---------- -----
13         13
4          4
5          5
3          3
14         14

As you are probably aware of the is no guarantee on how the rows are being returned unless you specify an order by clause. But what happens if you were to just include this order by in your query. Let’s see what happens when you include it in the first query for the first page:

[PATRICK]SQL>SELECT *
               FROM t
              WHERE ROWNUM <= 5
             ORDER BY t.val#
             /

VAL#       VALUE
---------- -----
12         12
13         13
15         15
17         17
19         19

The rows returned are in order, but they are definitely not the first 5 values currently in the table. That is how the sql engine works. It first gets the first 5 rows to honor the predicate in the query and then it sorts the result before returning it to the caller.

What we should do to get the correct behavior of our query is use a subquery to get the results in order and apply the rownum clause to that result.

[PATRICK]SQL>SELECT *
               FROM (SELECT *
                       FROM t
                     ORDER BY t.val#)
              WHERE ROWNUM &lt;= 5
             /

VAL#       VALUE
---------- -----
1          1
2          2
3          3
4          4
5          5

We can now use this to build a query to get the next page of results:

[PATRICK]SQL>SELECT val#, value_in_text
               FROM (SELECT val#, value_in_text, ROWNUM rn
                       FROM (SELECT *
                               FROM t
                             ORDER BY t.val#)
                     ORDER BY rn)
              WHERE rn > 5 AND rn <= 10
             /

VAL#       VALUE
---------- -----
6          6
7          7
8          8
9          9
10         10

When you have access to an Oracle 12c database, it is a lot easier, to get the first page of the ordered results, you can issue this statement:

[PATRICK]SQL>SELECT *
               FROM t
            ORDER BY t.val#
            FETCH FIRST 5 ROWS ONLY
            /

To get another page you can provide query with an offset of how many rows to skip:

[PATRICK]SQL>SELECT *
               FROM t
             ORDER BY t.val#
             OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY
             /

Under the covers Oracle still issues similar queries as the ones we built earlier, but it is a lot easier to write these.

Hope this sheds a bit of light on your issue.

Happy Oracle’ing,

Patrick Barel

If you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Summer 2014

Virtual Private Database…

Sun, 2016-07-31 01:39

Some time ago I was asked to assist in fixing or at least finding the cause of a performance problem. The application ran fine until the Virtual Private Database (VPD) policy was applied. Oracle claims there should be near zero impact on your application when you implement VPD, then how is this possible?

First of all, the policy applied was a rather complex one. A complex query should be executed to determine if the current user has access to the record. Let’s say this query takes up a second, then I would expect my query to run about a second slower, maybe two. But the query took several minutes to complete when the VPD policy was applied. This didn’t make sense to me, so I decided to find out what was really happening.
To do this, I opened up my sandbox database to try and recreate this situation.
First I need to create two new users

create user vpd1 identified by vpd1
/
grant connect, resource to vpd1
/
create user vpd2 identified by vpd2
/
grant connect, resource to vpd2
/

Then I created a simple table to hold the data that should be protected by the VPD policy:

drop table emp purge
/
create table emp
       (empno number(4) not null, 
        ename varchar2(10), 
        job varchar2(9), 
        mgr number(4), 
        hiredate date, 
        sal number(7, 2), 
        comm number(7, 2), 
        deptno number(2))
;
insert into emp values (7369, 'SMITH',  'CLERK',     7902, to_date('17-12-1980', 'DD-MM-YYYY'),  800, null, 20);
insert into emp values (7499, 'ALLEN',  'SALESMAN',  7698, to_date('20-02-1981', 'DD-MM-YYYY'), 1600,  300, 30);
insert into emp values (7521, 'WARD',   'SALESMAN',  7698, to_date('22-02-1981', 'DD-MM-YYYY'), 1250,  500, 30);
insert into emp values (7566, 'JONES',  'MANAGER',   7839, to_date('02-04-1981', 'DD-MM-YYYY'),  2975, null, 20);
insert into emp values (7654, 'MARTIN', 'SALESMAN',  7698, to_date('28-09-1981', 'DD-MM-YYYY'), 1250, 1400, 30);
insert into emp values (7698, 'BLAKE',  'MANAGER',   7839, to_date('01-05-1981', 'DD-MM-YYYY'),  2850, null, 30);
insert into emp values (7782, 'CLARK',  'MANAGER',   7839, to_date('09-06-1981', 'DD-MM-YYYY'),  2450, null, 10);
insert into emp values (7788, 'SCOTT',  'ANALYST',   7566, to_date('09-12-1982', 'DD-MM-YYYY'), 3000, null, 20);
insert into emp values (7839, 'KING',   'PRESIDENT', null, to_date('17-11-1981', 'DD-MM-YYYY'), 5000, null, 10);
insert into emp values (7844, 'TURNER', 'SALESMAN',  7698, to_date('08-09-1981', 'DD-MM-YYYY'),  1500,    0, 30);
insert into emp values (7876, 'ADAMS',  'CLERK',     7788, to_date('12-01-1983', 'DD-MM-YYYY'), 1100, null, 20);
insert into emp values (7900, 'JAMES',  'CLERK',     7698, to_date('03-12-1981', 'DD-MM-YYYY'),   950, null, 30);
insert into emp values (7902, 'FORD',   'ANALYST',   7566, to_date('03-12-1981', 'DD-MM-YYYY'),  3000, null, 20);
insert into emp values (7934, 'MILLER', 'CLERK',     7782, to_date('23-01-1982', 'DD-MM-YYYY'), 1300, null, 10);
commit
/
drop table emp_vpd purge
/
create table emp_vpd as select * from emp
/
commit
/

And of course I need to grant access to this table to the newly created users:

grant all on emp_vpd to vpd1
/
grant all on emp_vpd to vpd2
/

On the table I need to create a policy function so I create a package (which mimics the customers package, just simpler) to do this:

create or replace package emp_vpd_policy as
  function first_policy(owner_in   in varchar2
                       ,objname_in in varchar2) return varchar2;
  function allowed(empno_in  in number
                  ,deptno_in in number) return number;
end emp_vpd_policy;
/
sho err
create or replace package body emp_vpd_policy as
  function first_policy(owner_in   in varchar2
                       ,objname_in in varchar2) return varchar2 is
  begin
    dbms_output.put_line('first policy');
    if (user = 'VPD1') then
      return 'emp_vpd_policy.allowed(emp_vpd.empno, emp_vpd.deptno)=10';
    elsif user = 'VPD2' then
      return 'emp_vpd_policy.allowed(emp_vpd.empno, emp_vpd.deptno)=20';
    else
      return '1=1';
    end if;
  end first_policy;
  function allowed(empno_in  in number
                  ,deptno_in in number) return number is
  begin
    dbms_output.put_line('emp_vpd_policy.allowed(' || empno_in || ',' || deptno_in || ')');
    return deptno_in;
  end allowed;
end emp_vpd_policy;
/
sho err

and then protect the EMP_VPD table using a policy:

begin
  sys.dbms_rls.add_policy(object_schema => 'DEMO'
                         ,object_name => 'EMP_VPD'
                         ,policy_name => 'EMP_VPD_SEL'
                         ,function_schema => '&myuser'
                         ,policy_function => 'EMP_VPD_POLICY.FIRST_POLICY'
                         ,statement_types => 'SELECT');
end;
/

The package will show what will happen when I perform a select on the table:

conn vpd1/vpd1
set serveroutput on size unlimited
select * from demo.emp_vpd
/
EMPNO ENAME      JOB         MGR HIREDATE          SAL      COMM DEPTNO
----- ---------- --------- ----- ----------- --------- --------- ------
 7782 CLARK      MANAGER    7839 6/9/1981      2450.00               10
 7839 KING       PRESIDENT       11/17/1981    5000.00               10
 7934 MILLER     CLERK      7782 1/23/1982     1300.00               10
first policy
first policy
emp_vpd_policy.allowed(7369,20)
emp_vpd_policy.allowed(7499,30)
emp_vpd_policy.allowed(7521,30)
emp_vpd_policy.allowed(7566,20)
emp_vpd_policy.allowed(7654,30)
emp_vpd_policy.allowed(7698,30)
emp_vpd_policy.allowed(7782,10)
emp_vpd_policy.allowed(7788,20)
emp_vpd_policy.allowed(7839,10)
emp_vpd_policy.allowed(7844,30)
emp_vpd_policy.allowed(7876,20)
emp_vpd_policy.allowed(7900,30)
emp_vpd_policy.allowed(7902,20)
emp_vpd_policy.allowed(7934,10)

In my case this is done rather quickly, there’s almost no difference in timing for the query with or without the policy applied. But as you can see, the policy is executed for each and every record that is being checked. Well, not really the policy itself, but the function that is defined in the policy. So if this function takes a lot of time and your table has a lot of records then the query will run for a very long time. There has got to be a better way to do this.
Let’s analyze what happens, the actual policy is executed twice. What if we use this architecture to our benefit. In the first pass we can setup some in memory data structure to hold whatever we need, this might take some time and then in the second pass we can use this data to be used in the actual check.
First we drop the policy so we can create a new one:

begin
  sys.dbms_rls.drop_policy(object_schema => '&myuser'
                          ,object_name => 'EMP_VPD'
                          ,policy_name => 'EMP_VPD_SEL');
end;
/

For our implementation we need a Nested Table type to be created in the database:

create or replace type empnos_tt is table of number(4) 
/

Then we create a new package to hold the policy function.

create or replace package emp_vpd_pp as
  function sel( owner_in   in varchar2
              , objname_in in varchar2
              ) return varchar2;
  function read_g_empnos return empnos_tt;
end emp_vpd_pp;
/ 
sho err

The function SEL will be used in the policy. The function READ_G_EMPNOS is needed to retrieve the data in the package variable. Then the actual implementation of the package:

create or replace package body emp_vpd_pp as
  g_empnos empnos_tt;
  beenhere boolean := false;
  function sel( owner_in   in varchar2
              , objname_in in varchar2
              ) return varchar2 is
  begin
    if not(beenhere) then
      if user = 'VPD1' then
        begin
          select emp.empno
            bulk collect into g_empnos
            from emp
           where emp.deptno = 10;
         exception
           when others then
           dbms_output.put_line(sqlerrm);
         end;
      elsif user = 'VPD2' then
        begin
          select emp.empno
            bulk collect into g_empnos
            from emp
           where emp.deptno = 20;
         exception
           when others then
           dbms_output.put_line(sqlerrm);
         end;
      end if;
    end if;
    beenhere := not(beenhere);
    if ((user = 'VPD1') or (user = 'VPD2')) then
      return 'emp_vpd.empno in (select column_value
                                  from table(emp_vpd_pp.read_g_empnos))';
    else
      return '1=1';
    end if;
  end sel;
  function read_g_empnos return empnos_tt
    is
    begin
      return (g_empnos);
    end;
begin
  beenhere := false; 
end emp_vpd_pp;
/
sho err

In the initialization section of the package we initialize the Boolean variable. Then, when the policy function is executed for the first time (per query) we select the column values we need and save that into the package variable. The second time we execute the policy function we use the values saved in the predicate that is being added.

begin
  sys.dbms_rls.add_policy(object_schema => 'DEMO'
                         ,object_name => 'EMP_VPD'
                         ,policy_name => 'EMP_VPD_SEL'
                         ,function_schema => 'DEMO'
                         ,policy_function => 'EMP_VPD_PP.SEL'
                         ,statement_types => 'SELECT');
end;
/

Notice the predicate with the use of the Nested Table is executed always, but the Nested Table is only filled up in the first execution of the policy function. Using this technique the database only has to execute the expensive query once and its result can be used multiple times at almost no cost.
Using this policy function has exactly the same result, but the execution improved dramatically. Using this technique the database only has to execute the expensive query once per query instead of for every row.

This post is also available at the AMIS blog

Dot Qualify Everything?

Sun, 2016-07-24 00:35
.code, .code pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .code pre { margin: 0em; } .code .rem { color: #ff0000; } .code .kwrd { color: #008080; } .code .str { color: #0000ff; } .code .op { color: #0000c0; } .code .preproc { color: #cc6633; } .code .asp { background-color: #ffff00; } .code .html { color: #800000; } .code .attr { color: #ff0000; } .code .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .code .lnum { color: #606060; }

There is talk about Dot Qualifying Everything in your PL/SQL Code. But what are the pros and cons in this approach?

Let’s say we have a rather simple function which returns a string with alternatively upper and lower case characters. Normally I would write it as follows:

create or replace function wavey(string_in in varchar2) return varchar2 is
  l_returnvalue varchar2(30);
begin
  for indx in 1 .. length(string_in) loop
    l_returnvalue := l_returnvalue || case mod(indx, 2)
                       when 0 then
                        upper(substr(string_in, indx, 1))
                       else
                        lower(substr(string_in, indx, 1))
                     end;
  end loop;
  dbms_output.put_line(l_returnvalue);
  return l_returnvalue;
end;

The output for this function using the ENAME column of the EMP table is like this:

sMiTh
aLlEn
wArD
jOnEs
mArTiN
bLaKe
cLaRk
sCoTt
kInG
tUrNeR
aDaMs
jAmEs
fOrD
mIlLeR

But what if the was a malicious user that created a package names DBMS_OUTPUT which included all the programs in the original DBMS_OUTPUT package, but with some code added. Let’s create a simple package like this:

create or replace package dbms_output is
  procedure put_line(a in varchar2);
end;
create or replace package body dbms_output is
  procedure put_line(a in varchar2)
  is
  begin
    sys.dbms_output.put('changed :');
    sys.dbms_output.put_line(a);
  end;
end;

Notice I just included the PUT_LINE procedure in here and I am not really doing anything malicious here. The output of my function would now be:

changed :sMiTh
changed :aLlEn
changed :wArD
changed :jOnEs
changed :mArTiN
changed :bLaKe
changed :cLaRk
changed :sCoTt
changed :kInG
changed :tUrNeR
changed :aDaMs
changed :jAmEs
changed :fOrD
changed :mIlLeR

Not exactly the way I would want. How can I prevent this from happening? The answer is actually really simple. Qualify the call to DBMS_OUTPUT with the schema name where the packages resides:

create or replace function wavey(string_in in varchar2) return varchar2 is
  l_returnvalue varchar2(30);
begin
  for indx in 1 .. length(string_in) loop
    l_returnvalue := l_returnvalue || case mod(indx, 2)
                       when 0 then
                        upper(substr(string_in, indx, 1))
                       else
                        lower(substr(string_in, indx, 1))
                     end;
  end loop;
  sys.dbms_output.put_line(l_returnvalue);
  return l_returnvalue;
end;

The output is back to what we expected

sMiTh
aLlEn
wArD
jOnEs
mArTiN
bLaKe
cLaRk
sCoTt
kInG
tUrNeR
aDaMs
jAmEs
fOrD
mIlLeR

But should you Dot Qualify everything in your code? Like this?

create or replace function demo.wavey(string_in in sys.standard.varchar2) return sys.standard.varchar2 is
  l_returnvalue sys.standard.varchar2(30);
begin
  <<theloop>>
  for indx in 1 .. length(wavey.string_in) loop
    wavey.l_returnvalue := wavey.l_returnvalue || case mod(theloop.indx, 2)
                             when 0 then
                              sys.standard.upper(standard.substr(wavey.string_in, theloop.indx, 1))
                             else
                              sys.standard.lower(standard.substr(wavey.string_in, theloop.indx, 1))
                           end;
  end loop;
  sys.dbms_output.put_line(wavey.l_returnvalue);
  return wavey.l_returnvalue;
end;

I don’t this adds to the readability of the code, but it sure makes your code safer against malicious users that want to implement code that can be executed by your programs. Please add your thoughts on this subject.

Use the database to do the heavy lifting

Wed, 2016-06-15 03:25

Suppose you have data in your PL/SQL program stored in a collection. I am using the EMP table (14 records) but you should imagine you have many, many more records in your collection. If you want to order (sort) your collection in a different manner somewhere in your code you can of course write your own sorting routine but the Oracle database is doing this for ages and probably a lot smarter (after all these years) than you can ever do it.

If you are running an Oracle pre 12c database you can use the SQL-engine when you have your data in a Nested Table. I have created this little test script to try this. Again, the EMP table has only 14 records defined, but it is used to drive the point home.

CREATE OR REPLACE TYPE emp_t AS OBJECT
( empno number(4)
, ename varchar2(10)
, job varchar2(9)
, mgr number(4)
, hiredate DATE
, sal number(7,2)
, comm number(7,2)
, deptno number(2)
)
CREATE OR REPLACE TYPE emp_ntt AS TABLE OF emp_t
CREATE OR REPLACE PACKAGE emp_test IS
  FUNCTION get_emp RETURN emp_ntt;
  PROCEDURE emp_prc;
END;
CREATE OR REPLACE PACKAGE BODY emp_test IS
  PROCEDURE separator
  IS
  BEGIN
    dbms_output.put_line('-=-=-=-=-');
  END;
--
  FUNCTION get_emp RETURN emp_ntt
  IS
    l_returnvalue emp_ntt;
  CURSOR c_emp IS
  SELECT emp_t(empno, ename, job, mgr, hiredate, sal, comm, deptno)
    FROM emp;
  BEGIN
    OPEN c_emp;
    FETCH c_emp BULK COLLECT INTO l_returnvalue;
    CLOSE c_emp;
    RETURN l_returnvalue;
  END;
--
  PROCEDURE emp_prc IS
    l_emps emp_ntt;
    l_emps_sorted emp_ntt;
    CURSOR c_emp_sort(emps_in emp_ntt)
    IS
    SELECT emp_t(empno, ename, job, mgr, hiredate, sal, comm, deptno)
      FROM table(emps_in)
     ORDER BY ename;
  BEGIN
    l_emps := get_emp;
    FOR rec IN (select ename FROM table(l_emps)) loop
      dbms_output.put_line(rec.ename);
    END loop;
--
    separator;
--
    OPEN c_emp_sort(l_emps);
    FETCH c_emp_sort BULK COLLECT INTO l_emps_sorted;
    CLOSE c_emp_sort;
    FOR indx IN l_emps_sorted.first .. l_emps_sorted.last loop
      dbms_output.put_line(l_emps_sorted(indx).ename);
    END loop;
  END;
END;

If I run this code

EXEC emp_test.emp_prc

the output looks like this:

SMITH
ALLEN 
WARD 
JONES 
MARTIN 
BLAKE 
CLARK 
SCOTT 
KING 
TURNER 
ADAMS 
JAMES 
FORD 
MILLER 
-=-=-=-=- 
ADAMS 
ALLEN 
BLAKE 
CLARK 
FORD 
JAMES 
JONES 
KING 
MARTIN 
MILLER 
SCOTT 
SMITH 
TURNER 
WARD

This is a great way of using the SQL engine to do the work you need done.

But what if you don’t have your data in a Nested Table, but in an Associative Array, which is PL/SQL only. If you are using Database 12c you’re in luck, because you can now use your Associative Array in de SQL statement.

I have created a similar script as the one before, except that I cannot define a TYPE in the SQL layer, it should be defined in a package specification.

PACKAGE emp_test IS
  TYPE emp_aat IS TABLE OF emp%rowtype INDEX BY PLS_INTEGER;
  FUNCTION get_emp RETURN emp_aat;
  PROCEDURE emp_prc;
END;
PACKAGE BODY emp_test IS
  PROCEDURE separator
  IS
  BEGIN
    dbms_output.put_line('-=-=-=-=-');
  END;
--
  FUNCTION get_emp RETURN emp_aat
  IS
  l_returnvalue emp_aat;
  CURSOR c_emp IS
  SELECT *
    FROM emp;
  BEGIN
    OPEN c_emp;
    FETCH c_emp BULK COLLECT INTO l_returnvalue;
    CLOSE c_emp;
    RETURN l_returnvalue;
  END;
--
  PROCEDURE emp_prc IS
    l_emps emp_aat;
    l_emps_sorted emp_aat;
    CURSOR c_emp_sort(emps_in emp_aat)
    IS
    SELECT *
      FROM table(emps_in)
     ORDER BY ename;
  BEGIN
    l_emps := get_emp;
    FOR rec IN (select ename FROM TABLE(l_emps)) LOOP
      dbms_output.put_line(rec.ename);
    END LOOP;
--
    separator;
--
    OPEN c_emp_sort(l_emps);
    FETCH c_emp_sort BULK COLLECT INTO l_emps_sorted;
    CLOSE c_emp_sort;
    FOR indx IN l_emps_sorted.first .. l_emps_sorted.last LOOP
      dbms_output.put_line(l_emps_sorted(indx).ename);
    END LOOP;
  END;
END;

If I run this code on my 11Gr2 database it results in the following error:

No errors for PACKAGE DEMO.EMP_TEST 
Warning: Package body created with compilation errors 
Errors for PACKAGE BODY DEMO.EMP_TEST: 
LINE/COL ERROR 
-------- --------------------------------------------------------------------------- 
27/5 PL/SQL: SQL Statement ignored 
28/12 PL/SQL: ORA-22905: cannot access rows from a non-nested table item 
28/18 PLS-00382: expression is of wrong type 
33/17 PL/SQL: SQL Statement ignored 
33/35 PL/SQL: ORA-22905: cannot access rows from a non-nested table item 
33/41 PLS-00382: expression is of wrong type 
34/7 PL/SQL: Statement ignored 
34/28 PLS-00364: loop index variable 'REC' use is invalid

As mentioned before, you cannot access an Associative Array in the SQL layer. To use the SQL layer anyway you can of course create the necessary objects (record type and nested table type), copy all the data from the Associative Array to a Nested Table, run the SQL statement and copy the results back to an Associative Array. This means a lot of coding and the risk of errors. But if I run the same code on my 12Cr1 database, it completes successfully and displays the following result:

SMITH 
ALLEN 
WARD 
JONES 
MARTIN 
BLAKE 
CLARK 
SCOTT 
KING 
TURNER 
ADAMS 
JAMES 
FORD 
MILLER 
-=-=-=-=- 
ADAMS 
ALLEN 
BLAKE 
CLARK 
FORD 
JAMES 
JONES 
KING 
MARTIN 
MILLER 
SCOTT 
SMITH 
TURNER 
WARD

I think Oracle does a good job extending the possibilities of the PL/SQL language, especially integrating to SQL features into the PL/SQL layer. Although this might look like a small enhancement, but it can have a big impact on the performance of your program.

How an Oracle error can send you the wrong way…

Wed, 2016-01-06 08:53
.code, .code pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .code pre { margin: 0em; } .code .rem { color: #ff0000; } .code .kwrd { color: #008080; } .code .str { color: #0000ff; } .code .op { color: #0000c0; } .code .preproc { color: #cc6633; } .code .asp { background-color: #ffff00; } .code .html { color: #800000; } .code .attr { color: #ff0000; } .code .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .code .lnum { color: #606060; }

At my current assignment I needed to create an update statement to copy data from one table to another. Quite a simple task, I would say, but an error or actually a constraint violation sent me the wrong way in finding my solution.

Suppose I have two tables:

CREATE TABLE a
( mykey NUMBER
, thisvalue VARCHAR2(20) NOT NULL
)
/
CREATE TABLE b
( mykey NUMBER
, thatvalue VARCHAR2(20) NOT NULL
)
/

and some data in them:

INSERT INTO a (mykey,thisvalue) VALUES (1,'Larry');
INSERT INTO a (mykey,thisvalue) VALUES (2,'Bryn');
INSERT INTO a (mykey,thisvalue) VALUES (3,'Steven');
INSERT INTO a (mykey,thisvalue) VALUES (4,'Patrick');
INSERT INTO b (mykey,thatvalue) VALUES (1,'Larry Ellison');
INSERT INTO b (mykey,thatvalue) VALUES (2,'Bryn Llewellyn');
INSERT INTO b (mykey,thatvalue) VALUES (3,'Steven Feuerstein');
COMMIT
/ 

Now I want to update the values in table a with the values of table b. My first idea was to write a statement like this:

UPDATE a
   SET a.thisvalue = (select b.thatvalue
                        FROM b
                       WHERE b.mykey = a.mykey)
/

but this statement led to the following error:

ORA-01407: cannot update ("DEMO"."A"."THISVALUE") to NULL 


No problem, I thought, if the new value is somehow NULL, then just use the old value:

UPDATE a
   SET a.thisvalue = (select NVL(b.thatvalue, a.thisvalue)
                        FROM b
                       WHERE b.mykey = a.mykey)
/

but this still resulted in the same error:

ORA-01407: cannot update ("DEMO"."A"."THISVALUE") to NULL 

Then it dawned upon me. For mykey=4 there would be no match in table B, which resulted in no row returned, hence a NULL value.

The error given is absolutely correct, but it sent me the wrong way in finding a solution. If I would have gotten a NO DATA FOUND error, I would have known right away what was the problem.

The solution was pretty easy, just update the rows that have a matching row in table B:

UPDATE a
   SET a.thisvalue = (select b.thatvalue
                        FROM b
                       WHERE b.mykey = a.mykey)
 WHERE EXISTS (select b.thatvalue
                 FROM b
                WHERE b.mykey = a.mykey)
/

Another solution might be using the MERGE statement:

MERGE INTO a
USING (select b.mykey, b.thatvalue
         FROM b) b
ON (a.mykey = b.mykey)
WHEN MATCHED THEN
  UPDATE
     SET a.thisvalue = b.thatvalue
/

If the subselect results in more than one row you get an equivalent of the TOO_MANY_ROWS exception, but if the subselect results in no rows you don’t get the NO_DATA_FOUND (or equivalent) exception.

Update multiple columns

Wed, 2015-12-23 05:03

.code, .code pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .code pre { margin: 0em; } .code .rem { color: #ff0000; } .code .kwrd { color: #008080; } .code .str { color: #0000ff; } .code .op { color: #0000c0; } .code .preproc { color: #cc6633; } .code .asp { background-color: #ffff00; } .code .html { color: #800000; } .code .attr { color: #ff0000; } .code .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .code .lnum { color: #606060; }

This is something I knew somewhere in the back of my head, but had forgotten about until now.

When you want to update multiple columns in a single SQL statement based on a sub query you can of course duplicate this query for every column you want to update. But this violates the SPOD (Single Point Of Definition) ‘rule’.

As an example I have the following requirement:

Add two columns to the EMP table containing the name and job of the manager and fill these columns with the right values.

First of all I need to add the columns to the table, with is easy:

alter table emp add (mgr_name varchar2(10)
                    ,mgr_job varchar2(9)
                    )

Then comes the ‘tricky’ part. I can of course fill up these columns in separate statements, like this:

update emp e
   set e.mgr_name = (select m.ename
                       from emp m
                      where 1=1
                        and m.empno = e.mgr)
 where 1=1
   and e.mgr is not null
/
update emp e
   set e.mgr_job = (select m.job
                       from emp m
                      where 1=1
                        and m.empno = e.mgr)
 where 1=1
   and e.mgr is not null
/

But this implies two roundtrips from the client to the database. These statements can be combined into a single one:

update emp e
   set e.mgr_name = (select m.ename
                       from emp m
                      where 1=1
                        and m.empno = e.mgr)
     , e.mgr_job = (select m.job
                       from emp m
                      where 1=1
                        and m.empno = e.mgr)
 where 1=1
   and e.mgr is not null
/

But there is an easier, and more elegant way to do this:

update emp e
   set (e.mgr_name, e.mgr_job) = (select m.ename, m.job
                                   from emp m
                                  where 1=1
                                    and m.empno = e.mgr)
 where 1=1
   and e.mgr is not null
/

This is of course a pretty simple example, but you can imagine what would happen if you want to update more columns, create a complex sub query or worse, make modifications to the predicates. You are more than likely going to forget one or more sub queries giving you an undesired result.

update December 25th 2015: Find a demonstration script on LiveSQL

Autonomous transaction to the rescue

Tue, 2015-08-25 11:10
.code, .code pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .code pre { margin: 0em; } .code .rem { color: #ff0000; } .code .kwrd { color: #008080; } .code .str { color: #0000ff; } .code .op { color: #0000c0; } .code .preproc { color: #cc6633; } .code .asp { background-color: #ffff00; } .code .html { color: #800000; } .code .attr { color: #ff0000; } .code .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .code .lnum { color: #606060; }

Today, at my current project, I came across an issue where autonomous transactions came in handy.

The situation: I need to create a query to perform an export. A couple of the fields to be selected come from a global temporary table, nothing fancy so far except this global temporary table is filled by a (rather complex) procedure. Another problem is this table is emptied for every row, i.e. it will contain only one row at a time. ‘Just build a wrapper table function for this procedure and have that function call the procedure’ was my first idea.

I created a script that shows the situation

CREATE GLOBAL TEMPORARY TABLE empdate
(
  empno NUMBER(4)
, hiredate DATE
)
ON COMMIT DELETE ROWS
/
CREATE OR REPLACE PROCEDURE getthehiredate(empno_in IN NUMBER) IS
BEGIN
  DELETE FROM empdate;
  INSERT INTO empdate
    (empno
    ,hiredate)
    (SELECT empno
           ,hiredate
       FROM emp
      WHERE empno = empno_in);
END getthehiredate;
/

Then I set out to build a pipelined table function that accepts a cursor as one of its parameters. This function then loops all the values in the cursor, calls the procedure, reads the data from the global temporary table and pipes out the resulting record, nothing really fancy so far.

CREATE TYPE empdate_t AS OBJECT
(
  empno    NUMBER(4),
  hiredate DATE
)
/
CREATE TYPE empdate_tab IS TABLE OF empdate_t
/
CREATE OR REPLACE FUNCTION getallhiredates(empnos_in IN SYS_REFCURSOR) RETURN empdate_tab
  PIPELINED IS
  l_empno       NUMBER(4);
  l_returnvalue empdate_t;
BEGIN
  FETCH empnos_in
    INTO l_empno;
  WHILE empnos_in%FOUND LOOP
    getthehiredate(empno_in => l_empno);
    SELECT empdate_t(ed.empno, ed.hiredate)
      INTO l_returnvalue
      FROM empdate ed
     WHERE 1 = 1
       AND ed.empno = l_empno;
    PIPE ROW(l_returnvalue);
    FETCH empnos_in
      INTO l_empno;
  END LOOP;
  RETURN;
END getallhiredates;
/

But when I ran a query against this function:

SELECT *
FROM TABLE(getallhiredates(CURSOR (SELECT empno
FROM emp)))
/

I ran into an error:

ORA-14551: cannot perform a DML operation inside a query 

So, all the work I done so far had been for nothing? Time wasted? I don’t think so. If there is anything I learned over the years it is that Oracle tries to stop you doing certain things but at the same time supplies you the tools to create a work-around.

There is something like an autonomous transaction, that might help me in this case so I changed the code for the function a bit:

CREATE OR REPLACE FUNCTION getallhiredates(empnos_in IN SYS_REFCURSOR) RETURN empdate_tab
  PIPELINED IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  l_empno       NUMBER(4);
  l_returnvalue empdate_t;
BEGIN
  FETCH empnos_in
    INTO l_empno;
  WHILE empnos_in%FOUND LOOP
    getthehiredate(empno_in => l_empno);
    SELECT empdate_t(ed.empno, ed.hiredate)
      INTO l_returnvalue
      FROM empdate ed
     WHERE 1 = 1
       AND ed.empno = l_empno;
    PIPE ROW(l_returnvalue);
    FETCH empnos_in
      INTO l_empno;
  END LOOP;
  COMMIT;
  RETURN;
END getallhiredates;
/

But when I ran the query:

SELECT *
FROM TABLE(getallhiredates(CURSOR (SELECT empno
FROM emp)))
/

I ran into a different error:

ORA-06519: active autonomous transaction detected and rolled back

So this doesn’t work or does it? Pipelined table functions have ‘exit’ the function multiple times. Whenever a row is piped out. So, I tried to put the COMMIT just before the PIPE ROW command:

CREATE OR REPLACE FUNCTION getallhiredates(empnos_in IN SYS_REFCURSOR) RETURN empdate_tab
  PIPELINED IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  l_empno       NUMBER(4);
  l_returnvalue empdate_t;
BEGIN
  FETCH empnos_in
    INTO l_empno;
  WHILE empnos_in%FOUND LOOP
    getthehiredate(empno_in => l_empno);
    SELECT empdate_t(ed.empno, ed.hiredate)
      INTO l_returnvalue
      FROM empdate ed
     WHERE 1 = 1
       AND ed.empno = l_empno;
    COMMIT;
    PIPE ROW(l_returnvalue);
    FETCH empnos_in
      INTO l_empno;
  END LOOP;
  RETURN;
END getallhiredates;
/

And when I ran my statement again:

SELECT *
FROM TABLE(getallhiredates(CURSOR (SELECT empno
FROM emp)))
/

It worked as I hoped for.

As you can see I have tried to mimic the situation using the EMP and DEPT tables. I think this is a nice little trick, but it should be used with caution. It is not for no reason that Oracle prevents you from running DML inside a query, but in this case I can bypass this restriction.


What’s this ‘WHERE 1=1’?

Sun, 2015-03-22 02:30

Since some time I have been adding WHERE 1=1 to all my queries.
I get queries like this:

SELECT *
  FROM emp e
 WHERE 1=1
   AND e.ename LIKE 'A%'
   AND e.deptno = 20

Lots of people ask me what’s the use of this WHERE 1=1.

You know I like to type as little as possible but here I am typing a lot of extra characters. And yet, it makes my development life a lot easier.

If my query has a lot of predicates and I want to see what happens then I usually comment those predicates out by using — (two dashes). I use my own CommentLine plug-in for this. This is easy for the second and higher predicates. But if I want to comment out the first predicate, then it get a bit harder. Well, not harder, but more work.

If I didn’t use the WHERE 1=1 and I wanted to comment out the ename predicate then I would have to do something like this:

SELECT *
  FROM emp e
 WHERE /*e.ename LIKE 'A%'
   AND */e.deptno = 20

I agree, it’s not hard to do, but I think it’s a lot more work than just adding — (two dashes) in front of a line:

SELECT *
  FROM emp e
 WHERE 1 = 1
--   AND e.ename LIKE 'A%'
   AND e.deptno = 20

And, as I don’t like typing or at least, I want to make it as easy for me as possible, I am using another one of my plug-ins, Template, where I defined a template w1 which results in

WHERE 1 = 1
   AND 


making it easy for me to write the queries.

I think adding this extra predicate has no (or hardly any) influence on the execution time of the query. I think the optimizer ignores this predicate completely.

I hope this explains a bit why I write my queries like this.

Pages