The Anti-Kyte

Subscribe to The Anti-Kyte feed The Anti-Kyte
Oracle - for when it was like that when you got there
Updated: 5 hours 40 min ago

(Almost) Everything you ever wanted to know about SQLDeveloper Unit Testing but were afraid to ask

Sat, 2016-08-20 15:46

The Political fallout from Brexit continues unabated.
In the immediate aftermath of the vote, David Cameron resigned as Prime Minister, triggering a power-struggle to succeed him.
Secret pacts, double-crosses and political “assassination” followed in short order.
It was like watching an episode of Game of Thrones although, mercifully, without the full-frontal nudity.
As for the Labour Party, they have captured the Zeitgeist…by descending into the sort of internal conflict for which the word “internecine” was invented.

Following the current trend that promises are made for breaking, this post has arrived a little bit later than advertised.
I write most of this stuff on the train to and from work, and, as they have been unusually punctual of late, my Sprint 1 velocity is somewhat lower than anticipated.
So, with apologies for my tardiness…

A Quick Overview

When it comes to Unit Testing Tools/Frameworks for PL/SQL, SQLDeveloper Unit Testing can probably be considered the “official” Oracle offering.
After all, SQLDeveloper is shipped with the Oracle Database.

Therefore, this seems like a fitting place to continue my PL/SQL Unit Testing odyssey.

Useful Links and Resources

There is a very good tutorial covering setting up Tests on a Stored Procedure right there in the SQLDeveloper Help, as well as other useful info about SQlDeveloper Unit Testing :

sut_help

This post from Jeff Smith is a useful introduction to SQLDeveloper Unit Testing, in which he takes a different approach to some of the tasks that I’ve covered here.

For a completely different take on things, this article by Wouter Groeneveld is so good that I will be plagiarising bits of it later on.

Before I go any further, SQLDeveloper Unit Testing is a bit of a mouthful. Additionally I’m never sure if I should be calling it a Tool or a Framework. Therefore, I’m going to refer to it as SUT for the remainder of this post.

Testing Approach

The approach I’ve taken to evaluating SUT (and the other frameworks that I will look at in later posts) is to use the application I introduced last time and to see how I can test some of the User Stories I have for it.
These Stories cover functionality that you would commonly find in a PL/SQL application ( DML operations, ETL data load and Ref Cursor population).
Additionally, I’ve taken the view that the Application Code should not be changed in any way to accommodate any vagaries of SUT.

As it’s the Testing Tool that’s my primary concern here, the Application Code I’ve included here is pretty much the minimum required to get the tests to pass.
It’s not what you’d call Production ready.

Also, as this is my first foray into SUT so some of the Tests may not have been written in the optimum fashion.
The examples below are more a case of me saying “this is how I managed to test this feature in SUT” rather than “this is the recommended way to test this feature”.

Test Environment

These tests have been put together and executed in the current Production version of SQLDeveloper ( 4.1.3).
The database used is Oracle 11g Express Edition.

Toolbars and Buttons

The SUT toolbar looks like this :

sut_toolbar

From left-to-right, the buttons are :

  • Freeze View
  • Refresh
  • Debug – provides feedback on the runnable bits of your test without actually running it
  • Run

Additionally, I’ll be mentioning the “Plus” button quite a bit. That would be this :

big_green_plus

Right, I think we’re just about ready to go.
Before we start writing any tests, we need to do a bit of groundwork…

SQLDeveloper Unit Testing – Configuration

First things first, according to the SQLDeveloper Help :
“The Unit Test Repository is a set of tables, views, indexes and other schema objects…”

Look, don’t panic. Despite this apparently database-centric architecture, you do not have to deploy your test code along with your application code base to execute your tests. Remember, as with all the mainstream Oracle IDEs, SQLDeveloper allows you to be connected to different databases simultaneously.
Added to this, it would appear that a significant element of the SUT architecture involves XML configuration files. This would also explain the lack of any requirement to mess about with database links to get all this stuff to work.

Repository Pre-Setup Tasks

The first design decision I made was that I wanted to keep any testing infrastructure entirely separate from my Application Code.
Therefore, I’ve created a separate database schema – TEST_REPO – specifically to hold the Repository.

Whilst it’s perfectly possible to have SQLDeveloper guide you through all of the setup work, this does require that you connect as a highly privileged user (SQLDeveloper tends to ask for the SYS password).
That’s not a problem if, like me, you’re just messing about on your own copy of Oracle Express Edition. If you’re in a more formal environment however, you may well need to provide your DBA with a script.

This should do the job (runnable in SQL*Plus) …

--
-- Script to create the roles required for a SQLDeveloper Unit Testing Repository
-- and a schema to host the repository
--
set verify off
define repo_owner = test_repo
define repo_default_ts = users
define repo_temp_ts = temp
accept passwd prompt 'Enter a password for the new schema [] : ' hide

-- Setup Roles
create role ut_repo_administrator;
grant create public synonym,drop public synonym to ut_repo_administrator;
grant select on dba_role_privs to ut_repo_administrator;
grant select on dba_roles to ut_repo_administrator;
grant select on dba_tab_privs to ut_repo_administrator;
grant execute on dbms_lock to ut_repo_administrator;

create role ut_repo_user;
grant select on dba_role_privs to ut_repo_user;
grant select on dba_roles to ut_repo_user;
grant select on dba_tab_privs to ut_repo_user;
grant execute on dbms_lock to ut_repo_user;

-- Create Schema to host Repository
create user &repo_owner identified by &passwd;
alter user &repo_owner default tablespace &repo_default_ts;
alter user &repo_owner temporary tablespace &repo_temp_ts;
alter user &repo_owner quota unlimited on &repo_default_ts;

-- System Privs
grant create session, connect, resource, create view to &repo_owner;

-- Role Privs
grant ut_repo_administrator to &repo_owner with admin option;
grant ut_repo_user to &repo_owner with admin option;

-- Object Priv
grant select on dba_roles to &repo_owner;
grant select on dba_role_privs to &repo_owner;

Note that you may want to change the the Repository Owner Schema and tablespace variables to values more suited to your environment.

Now we’ve created the new schema, we need to create a connection for it in SQLDeveloper.

In the Connections Navigator, click the Plus button …

create_conn1

Input the appropriate details in the New/Select Database Connection Window…

create_conn2

If you hit the Test button, you should see the Status set to “Success” in the bottom left-hand corner of the window.

Once it’s all working, hit the Save button, to retain the connection, then hit Connect to logon as TEST_REPO.

Now that’s all done we’re ready for the next step…

Creating the Repository

From the Main Menu, select View/Unit Test

A new, but currently quite boring navigator window will open up under the Connections Navigator :

no_repo

From the Main Menu, select Tools/Unit Test/Select Current Repository…

In the Repository Connection window, select the test_repo connection we’ve just created…

repo_conn

…and hit OK. This should give you :

create_repo

Just say Yes.

After a brief interlude, during which SQLDeveloper does an impression of a Cylon from the original Battlestar Galactica…

create_repo_progress

By your command !

you will get..

.repo_created

If you now have a look in the Object Browser for the test_repo user, you’ll see that SQLDeveloper has been quite busy…

repo_tree

Next up…

Granting Access to the Repository

From the Tools menu select Unit Tests then Manage Users…

When prompted for the connection to manage users, I’ve chosen TEST_REPO as it’s the only one that currently has admin rights on the repository.

The Manage Users Window that appears has two tabs, one for Users and one for Administrators :

manage_users1

I’ve added the owner of the application code that I’ll be testing to keep things simple.

The result is that, if I now connect as FOOTIE ( the application owner), I should be able to start adding some tests.

Naming Conventions

The SQLDeveloper Help has something to say on naming tests, which you may want to consider.
I’ve simply gone down the route of using the format package_name.procedure_name_n.
I want to group the unit tests at the same level as the programs for which they are written, so the Test Suites I’ll be creating are grouped by (and named for) the packages that the tests run against.

One aspect of object naming that I haven’t formalised in this post is that of items that I’ve added to the Library. This is an area to which you may well want to give some consideration.

Testing Insert Statements

The first set of tests I need to write centre around inserting data into tables. To cover some of the scenarios we might commonly encounter when performing this sort of operation, I’ll be looking at adding records to Tables related to each other by means of Referential Integrity constraints.

My First Test – Inserting a Record into a table with a Primary Key

I’ve got a User Story about adding Competitions to the application. The Story has two Acceptance Criteria :

  1. A new competition can be added
  2. A Competition cannot be added more than once

Here’s a quick reminder of what the COMPETITIONS table looks like :

create table competitions
(
    comp_code varchar2(5) constraint comp_pk primary key,
    comp_name varchar2(50) not null,
    description varchar2(4000)
)
/

As we’re trying to follow the approach of Test Driven Development and write our tests first, we just have a stub PL/SQL procedure to run the initial tests against :

create or replace package body manage_competitions
as

    procedure add_competitions
    (
        i_code competitions.comp_code%type,
        i_name competitions.comp_name%type,
        i_desc competitions.description%type default null
    )
    is
    begin
        null;
    end add_competitions;
end manage_competitions;
/

Time to meet…

The Create Unit Test Wizard

As this is my first test, I’m going to go through it step-by-step here. After that, the novelty will quickly wear off and I’ll just reference the steps or the Test Process Stages as appropriate.

In the Unit Test Navigator, right-click the Tests node and select Create Test…

create_test1

This will bring up the Create Unit Test Wizard

create_test_wiz1a

Select the Connection for the Application Owner (FOOTIE, in this case) from the Connections drop-down…

create_test_wiz1b

…and navigate the tree to the procedure we want to test – i.e. manage_competitions.add_competition…

create_test_wiz1c

And now click Next

This brings us to step 2 of the Wizard, specifying the Test Name. In line with my standard, I’m calling this one MANAGE_COMPETITIONS.ADD_COMPETITION_1.
Leave the Radio Group selection as Create with single Dummy implementation

create_test_wiz2

Click Next

I’m not going to create a Setup Process in this test for the moment. If you did want to, then you’d click the Plus button and…well, we’ll return to that later.
Anyway, step 3 looks like this :

create_test_wiz3

Click Next

Now we need to specify the input parameter values that we’ll be using in the test.
The values I’m going to use are :

  • I_CODE – UT1
  • I_NAME – Test1
  • I_DESC – Test competition

Leave the Expected Result as Success

create_test_wiz4

Click Next

…which takes us to…

create_test_wiz5a

Click the Plus to add a Process Validation and a drop-down will appear.
Scroll through this and select Query returning row(s) :

create_test_wiz5b

…and you’ll be presented with…

create_test_wiz5c

We now replace the default SQL statement with one that checks that we’ve inserted our record successfully.
Rather than hard-coding the COMP_CODE value we’ve input into the procedure, we can use SUT’s substitution syntax, thus inheriting the value of the I_CODE input parameter we specified back in Step 4. The code we’re going to add is :

select null
from competitions
where comp_code = '{I_CODE}'

Note that the parameter names appear to be case sensitive. If it’s entered in lowercase, SUT will complain at runtime.
The end result looks like this :

create_test_wiz5d

Click OK to dismiss the Process Validation Window.

Back on the Wizard page, click Next

Finally, we need to specify the Teardown Process – i.e. code to return the application to the state it was prior to the test being run.

create_test_wiz6a

Hit the Plus button again and you will get a slightly different drop_down. This time, we want User PL/SQL Code

create_test_wiz6b

You’ll now get the Teardown Process Window. As we’ve only inserted a single row as part of this test, we can simply rollback the transaction to put things back as they were. This being a PL/SQL block, the code is :

begin
    rollback;
end;

create_test_wiz6c

Click OK to dismiss the Window.
Now click Next to get to the Summary Page, which should look something like…

create_test_wiz7

Once you’re happy with the Summary, click Finish.

You should now see the new test in the Unit Test Navigator :

first_test

Click on the new test and you will see :

test_win_details

We can now run the test by clicking the Run Button. This will cause the Test Pane to switch to the details tab where the results of our first execution will be displayed :

test_win_results

As you’d expect, because the program we’re calling doesn’t do anything, the test validation fails.

Before we go any further, there are a couple of things we probably want to do with this test.
First of all, we really should have code that ensures that we won’t be trying to insert a value that already exists in this table.
Secondly, it’s quite likely that we’ll want to reuse some of the test code we’ve written in future tests.

OK, first of all then …

Dynamic Value Queries

In my previous attempt to write Unit Tests in anonymous PL/SQL blocks, it was relatively simple to set an appropriate value for the Primary Key of the record we wanted to insert programmatically.
Obviously, this is not the case with the Unit Test we’ve just written.

Fortunately, SUT allows you to populate the call parameters with values from what it calls a Dynamic Value Query.

In the SQLDeveloper help, the method demonstrated for setting up a Dynamic Value Query, involves creating a table to to hold the values and then querying that at runtime.
In this instance, we don’t need a table to determine the parameter values we need to use.

In the Details tab of the test, we need to click the Dynamic Value Query pencil icon :

dyn_qry_edit

This will bring up the Dynamic Value Query window.

This code should do the job for now…

--
-- Make (reasonably) certain that the comp_code value we're trying to create
-- does not already exist in the table.
with suffix as
(
    select max( to_number( substr( comp_code, regexp_instr( comp_code, '[[:digit:]]')))) + 1 as numeral
    from competitions
    where comp_code like 'UT%'
    and regexp_instr( substr( comp_code, -1, 1), '[[:digit:]]') = 1 -- only want codes with a numeric suffix
    union -- required if there are no records in the table...
    select 1 from dual
)
select 'UT'||max(numeral) as I_CODE,
    'Test1' as I_NAME,
    'A test' as I_DESC
from suffix
where numeral is not null

…so I’ve dropped it into the Dynamic Value Query…

dyn_qry_final

Click OK and we’re ready to re-run the test to check that our changes have worked.

Press the Run button and SQLDeveloper will ask if you want to save the changes you’ve just made.

unsaved_changes

We do.

The test will still fail ( we’re still running against a stub, remember), however, we can see from the test results that the Dynamic Values Query has generated the input values we were expecting :

dvq_run

NOTE – whilst the Dynamic Values Query we have written here will (eventually) work as expected, there is rather more to them than meets the eye. This is mainly because they are executed before any other step in a test. This can be problematic when a Dynamic Values Query needs to rely on values that are set in a Test’s Startup Process. There is a solution to this however, which I’ll cover later on.

Adding Code to the Library

We’ve written three distinct bits of code for our first test which we’ll probably want to use again.

SUT provides the Library as a means of storing such code for re-use in other tests.

Looking at our test in SQLDeveloper, the first code we see is the Teardown Process.

To add this to the library, click the Pencil icon to bring up the Teardown Process code window.

In the field at the bottom, we can now give this code a name and then hit the Publish button to send it to the library :

add_to_lib

You should now see that the label we’ve given this code (rollback_transaction, in this case) appears in the Library drop-down at the top of the Window :

add_to_lib2

Now click the OK button to dismiss the window.

We can repeat this process for our Dynamic Values Query ( labelled add_new_competition) and our Validation ( competition_row_added).

If we now expand the Library node in the Unit Test pane, we can see that our code appears in the Library :

lib_tree.png

This should mean that creating our second test requires a little less typing…

Testing for an Exception

Our second test is to check that the Application does not allow us to add a duplicate COMPETITIONS record.

Opening the Wizard again, we select the same package and procedure as for our first test.

Following my naming convention, this test is called MANAGE_COMPETITIONS.ADD_COMPETITION_2.
This time, we do need a Startup Process to make sure that the record we attempt to insert already exists.
So this time, hit the Plus button on the Startup Process step and select User PL/Sql Code from the drop-down.

In the Startup Process Code Window, we’re going to add the following PL/SQL code :

begin
    merge into competitions
    using dual
    on (comp_code = 'UT1')
    when not matched then
        insert ( comp_code, comp_name, description)
        values( 'UT1', 'Test', null);
end;

Essentially, we want to begin by making sure that a competition with a COMP_CODE value of ‘UT1’ exists in the table.

We’ll probably need to do this again somewhere down the line, so I’ve added it to the library as setup_competition_UT1.

For this test, in the Specify Parameters step, we can type in the parameters manually. Our Setup Process ensures that a record exists and our test execution is to try to insert this record again. The parameter values we enter are :

  • I_CODE = UT1
  • I_NAME = Test
  • I_DESC – leave as null

This time, the expected result should be changed from Success to Exception.
We’re expecting this call to fail with “ORA-00001: unique constraint violated”.
Therefore, we need to set the Expected Error number to 1.

test2_params.png

Our test will fail unless it encounters the exception we’ve specified when we call the procedure.
Given this, we don’t need to add a separate validation step.

We will be needing a Teardown Process however.
As before, we’re going to choose User PL/Sql Code from the drop-down.
This time however, we won’t have to type anything in to the window.
Instead, click on the library drop-down at the top of the code window :

import_td1

… select rollback_transaction from the drop-down, and then check Subscribe

import_td2

By subscribing to this code snippet, we “inherit” it from the library. Therefore it cannot edit it directly in the test.
If we did want to “tweak” it for the purposes of this specific test, we could hit the Copy button instead of subscribing.
This would copy the code block into the current test where it would then be editable for that test.

The test Summary looks like this :

test2_summary

As expected, when we execute this, it fails because the expected exception was not raised :

test2_first_run

Now we have both of our tests written, we need to write some application code to get them to pass.

create or replace package body manage_competitions
as
	procedure add_competition
	(
		i_code competitions.comp_code%type,
		i_name competitions.comp_name%type,
		i_desc competitions.description%type default null
	)
	is
	begin
        insert into competitions( comp_code, comp_name, description)
        values( i_code, i_name, i_desc);
	end add_competition;
end manage_competitions;
/

Now when we re-execute our tests, we can see that they pass :

test1_success.png

…and…

test2_success.png

Note that the latest test runs appear at the top of the Results listing.

Removing Old Test Results

If you want to remove an older result you can do this by right-clicking on it in the Results Tab and selecting Delete Result….

You can also clear down all results for a test by right-clicking in the Unit Test pane and selecting Purge Test Results…

purge_test1

This will bring up the Purge Test Results dialogue which gives you the option to remove all results for the test, or just those from before a given time and date.

purge_test2

NOTE – if you have a Test or a Suite with it’s Results tab showing when you purge, then they may not disappear immediately.
If this is the case, just click the Refresh button on the Test toolbar.

Creating a Test Suite

The development process around an application such as ours, will revolve around the PL/SQL package as the atomic unit of code.
Even though packages are containers for specific procedures, functions etc, it’s at the package level that the code is deployed to the database and therefore, it’s at that level it’s stored in Source Control.
So, when a developer needs to make a change to a package, we want them to be able to checkout and run all of the tests for the package.
SUT allows us to group our tests by means of a Test Suite.

So, we’re going to create a Suite for the MANAGE_COMPETITIONS package so that we can group the tests we’ve just created, and add more tests to it later on.

In the Unit Test tree, right-click on Suites and select Add Suite…

add_suite1

In the Add Suite dialogue that appears, enter a name for the Suite.

add_suite2

The new suite now appears in the Unit Test tree.
Now we need to add our tests to it.

Expand the tree under the new test suite, right-click on Tests and select Add Test…

add_suite_test1

Now select the tests you want to add from the list that appears in the Add to Suite Window :

add_suite_test2

…and click OK.

Notice that, although the tests have been added to the Suite, they still appear under the Test Node in the tree.
This node is going to get fairly crowded as more tests are added. This is one reason that a sensible naming convention is quite useful.

If we now run our new suite, we can see that all tests in the suite will run :

suite_results

There is more to Test Suites than simply grouping tests together, but more on that later.

Thus far, we’ve covered similar ground to the Tutorial exercise in the SQLDeveloper Help, although it’s fair to say we’ve taken a slightly different route.

We’re now going back to a time when Scotland ruled the world in order to test…

Adding data to a “child” table

The Acceptance Criteria for the Add a Tournament Story are :

  1. A tournament can be added
  2. A tournament cannot be added for a non-existent competition
  3. The competition that this tournament is for must be specified
  4. The same tournament cannot be added for a competition more than once
  5. If specified, the year the tournament begins cannot be greater than the year that it ends

The stub of the code we’ll be testing is…

create or replace package manage_tournaments
as
	procedure add_tournament
	(
		i_code tournaments.comp_code%type,
		i_year_end tournaments.year_completed%type,
		i_teams tournaments.number_of_teams%type,
		i_host tournaments.host_nation%type default null,
		i_year_start tournaments.year_started%type default null
	);
end manage_tournaments;
/

create or replace package  body manage_tournaments
as
	procedure add_tournament
	(
		i_code tournaments.comp_code%type,
		i_year_end tournaments.year_completed%type,
		i_teams tournaments.number_of_teams%type,
		i_host tournaments.host_nation%type default null,
		i_year_start tournaments.year_started%type default null
	)
	is
	begin
            null;
	end add_tournament;
/

…and the table DDL is…

create table tournaments
(
    id number constraint tourn_pk primary key,
    comp_code varchar2(5),
    year_completed number(4) not null,
    host_nation varchar2(100),
    year_started number(4),
    number_of_teams number(3) not null,

    constraint tourn_uk unique( comp_code, year_completed, host_nation)
)
/

alter table tournaments
    add constraint tourn_comp_fk foreign key
        (comp_code) references competitions(comp_code)
/

Hmmm, does something there look odd to you ? We may come back to it in a while.
First though, let’s write the tests for…

Variable Substitutions using binds

Our first test is wittily and originally entitled MANAGE_TOURNAMENTS.ADD_TOURNAMENT_1.
It’s simply testing that we can legitimately add a TOURNAMENT record to our application.

For the Test Startup, we need to make sure that we have a COMPETITION record to assign the tournament to so I’ve subscribed to the setup_competition_ut1 that we added to the Library earlier.

As for the call to the package, the parameters I’m using are from the very first International Football Tournament – the British Home Championships ( won by Scotland ) :

  • I_CODE = UT1
  • I_YEAR_END = 1884
  • I_TEAMS = 4
  • I_HOST = null
  • I_YEAR_START = null

The Validation Process is a Boolean function. Now, I had a few issues with the I_HOST replacement as a string (possibly because I passed in a parameter value of null in the test).
Fortunately, you can reference parameter values as bind variables…

declare
    l_count pls_integer;
    l_host tournaments.host_nation%type := :I_HOST;
begin
    select count(*)
    into l_count
    from tournaments
    where comp_code = '{I_CODE}'
    and year_completed = {I_YEAR_END}
    and nvl(host_nation, 'MIKE') = nvl(l_host, 'MIKE');

    return l_count = 1;
end;

I know I’ll need to use this again to verify the test for adding a duplicate tournament so I’ve saved it to the Library as single_tournament_record_exists.

The teardown process is also taken from the library (a simple rollback once again).

We’ve got a database and we’re not afraid to use it !

The test to ensure that you can’t add a TOURNAMENT for a non-existent COMPETITION is more about the Data Model than the code. What we’re actually testing is that the Foreign Key from TOURNAMENTS to COMPETITIONS is in place and working as expected.

It follows a similar pattern to the very first test we created for adding a competition.
Indeed, the Dynamic Values Query looks rather familiar :

with suffix as
(
    select max( to_number( substr( comp_code, regexp_instr( comp_code, '[[:digit:]]')))) + 1 as numeral
    from competitions
    where comp_code like 'UT%'
    and regexp_instr( substr( comp_code, -1, 1), '[[:digit:]]') = 1 -- only want codes with a numeric suffix
    union -- required if there are no records in the table...
    select 1 from dual
)
select 'UT'||max(numeral) as I_CODE,
    1884 as I_YEAR_END,
    4 as I_TEAMS,
    null as I_HOST,
    null as I_YEAR_START
from suffix
where numeral is not null

Whilst the hard-coded names and parameter values reflect the fact that we’re calling a different procedure, the code to derive the I_CODE parameter value is identical.

Whilst we could add this query to the Library then copy it where we needed to and make necessary changes in each test, there is an alternative method of reuse we might consider. We could create a database function in the Repository to return the desired value for I_CODE.

NOTE – there are several considerations when determining whether or not this is a route that you wish to go down in terms of your own projects. However, in this instance, my Test Repository is being used for a single application, and by doing this, I’m ensuring that this code only needs to be written once.

We’re going to create this function in the TEST_REPO schema.
Before we do that though, we need to grant access to the COMPETITIONS table to TEST_REPO. So, connected as FOOTIE :

grant select on competitions to test_repo
/

Then, as TEST_REPO :

create function get_new_comp_code
    return varchar2
is
    l_comp_code varchar2(5);
begin
    with suffix as
    (
        select max( to_number( substr( comp_code, regexp_instr( comp_code, '[[:digit:]]')))) + 1 as numeral
        from footie.competitions
        where comp_code like 'UT%'
        and regexp_instr( substr( comp_code, -1, 1), '[[:digit:]]') = 1 -- only want codes with a numeric suffix
        union -- required if there are no records in the table...
        select 1 from dual
    )
    select 'UT'||max(numeral) into l_comp_code
    from suffix
    where numeral is not null;

    return l_comp_code;
end get_new_comp_code;
/

Yes, in the Real World you may very well do this as part of a package rather than as a stand-alone function.

Next, we need to grant privileges on the function to the Repository Role :

grant execute on get_new_comp_code to ut_repo_user
/

The Dynamic Values Query can now be written to use the new function :

select test_repo.get_new_comp_code as I_CODE,
    1884 as I_YEAR_END,
    4 as I_TEAMS,
    null as I_HOST,
    null as I_YEAR_START
from dual

We expect this test to fail with ORA-2291 – Integrity Constraint violated.

Multiple Processes for a single Testing Stage

To test that the application will not allow the addition of a duplicate TOURNAMENT we need to make sure that records exist in two tables rather than one – i.e. a “parent” COMPETITIONS record and a “child” TOURNAMENTS record, which we’ll be attempting to duplicate.

Now we could do this in a single Startup Process. However, if we do at as two separate steps then we can save both to the Library and get a greater degree of re-use out of them. So, resisting the distant memory of Forms 2.3 Step Triggers and the feeling that you’re coding like it’s 1989…

On the Specify Startup Step of the Wizard, click the Plus button, add the first Startup Process ( i.e. ensure that the parent COMPETITIONS record exists).
Once you’re done with that, hit the Plus button again :

another_startup1

Select User PL/Sql code from the drop-down and then enter the following :

begin
    merge into tournaments
    using dual
    on
    (
        comp_code = 'UT1'
        and year_completed = 1916
        and host_nation = 'ARGENTINA'
    )
    when not matched then
        insert
        (
            id, comp_code, year_completed,
            host_nation, year_started, number_of_teams
        )
        values
        (
            tourn_id_seq.nextval, 'UT1', 1916,
            'ARGENTINA', null, 4
        );
end;

Incidentally we’ve moved on a bit in terms of test data and are now using details from the first Continental Tournament – the Copa America.
I’ve added this to the library as setup_tournament_for_UT1.

After all that, you should end up with something like …

multple_startup

The remaining tests don’t present anything new so we come to the point where we need to get the code to pass.
At this point you might be confident that this will do the job…

create or replace package body manage_tournaments
as
	procedure add_tournament
	(
		i_code tournaments.comp_code%type,
		i_year_end tournaments.year_completed%type,
		i_teams tournaments.number_of_teams%type,
		i_host tournaments.host_nation%type default null,
		i_year_start tournaments.year_started%type default null
	)
	is
	begin
        if i_year_start is not null then
            if nvl(i_year_end, i_year_start) < i_year_start then
                raise_application_error( -20000, q'[A tournament cannot end before it has begun...unless you're England !]');
            end if;
        end if;
		insert into tournaments
		(
			id, comp_code, year_completed,
			host_nation, year_started, number_of_teams
		)
		values
		(
			tourn_id_seq.nextval, i_code, i_year_end,
			i_host, i_year_start, i_teams
		);
	end add_tournament;
end manage_tournaments;
/

Right, let’s test it shall we ?

When Data Models go bad…

I’ve added all five of the tests I’ve written to the MANAGE_TOURNAMENTS Test Suite so, I’m going to use it to execute my tests…

In the Unit Test Pane, expand the Suites node and click on the Suite we want to run. This will bring up the details of the Suite :

suite_run1

Now press the Run button and…not everything went as expected.

By collapsing the result tree, we can see which Test (or Tests) failed :

suite_results1

I think you noticed the problem a bit earlier. Someone has been a bit remiss when doing the data modelling. the COMP_CODE column is missing a NOT NULL constraint.

The Data Modeller claims that this was a deliberate oversight so we can see the value of testing the Data Model and not simply the code. Yeah, right.

Once we address this :

alter table tournaments modify comp_code not null
/

… and re-run the offending test on it’s own to check that it passes, we can then re-run the entire suite as a regression test.
Looks like the change hasn’t broken anything else ( the failure at the bottom is just the previous run result):

suite_results2

Summary of the Insert Tests

So far, we’ve found out how SUT is put together as a Testing Framework.

The tests follow the expected four stage pattern – Setup, Execute, Validate, Teardown, but of these only the Execute stage is mandatory.
Multiply Processes are permissible in any stage apart from Execute.
The facility to group tests into Test Suites is present, as you would expect.

We’ve also started to explore some of the specifics of SUT.

Pleasingly, it handles expected Exceptions with the minimum of input down to the level of the expected error code.

The Dynamic Values Query provides a means of generating conditional test input criteria.
The Variable Substitution syntax makes validation of tests executed with such generated values somewhat more straightforward.

If and when limitations of SUT are encountered then you always have the option of extending it’s capabilities using custom PL/SQL Stored Program Units.
I can see that you’re still a bit dubious about that last point. OK, I’m sure we’ll come back to that later( mainly because I’ve already written that bit !)

In the meantime, let’s see what we can learn from the tests for the second DML activity we’re looking at…

Testing Deletes

As with any DML action, how well an application handles Deletion of records is crucial in terms of how well it ensures data integrity.
With that statement of the obvious out of the way, let’s take a look at how SUT can help us with testing this aspect of our Application’s functionality.

Some of the tests we need to write require records to be present in the TOURNAMENT_TEAMS table, which looks like this :

create table tournament_teams
(
    tourn_id number,
    team_name varchar2(100),
    group_stage varchar2(2),

    constraint team_pk primary key (tourn_id, team_name)
)
/

alter table tournament_teams
    add constraint team_tourn_fk foreign key
        (tourn_id) references tournaments(id)
/
The thing about Dynamic Value Queries

The Acceptance Criteria we want to test for in the User Story for deleting a tournament are :

  1. A Tournament can be deleted
  2. A Tournament cannot be deleted if there are Teams assigned to the Tournament

I’ve created a stub procedure in the MANAGE_TOURNAMENTS package.
The appropriate signature has been added to the package header and the procedure in the body currently looks like this :

...
	procedure remove_tournament( i_id tournaments.id%type)
	is
	begin
        null;
	end remove_tournament;
...

Yes, using the Natural Key for a TOURNAMENT record as input into this procedure would have made life easier in terms of writing the test. However, I feel quite strongly that we should not be making compromises in the Application Code to accommodate any Testing Tools we may need to use.

To ensure that our test is completely standalone, we may need to create both a COMPETITIONS and a TOURNAMENTS record in the Test Startup.
We then need to find out what the ID value is of the TOURNAMENT record we want to delete.
No problem there. We can re-use the routines we already have in the Library to Create the UT1 Competition and associated tournament.
Then we just need to use a Dynamic Values Query to retrieve the ID – something like :

select id as I_ID
from tournaments
where comp_code = 'UT1'
and year_completed = 1916
and host_nation = 'ARGENTINA'

Nothing to see here, right ? Well, if you had taken a look at the SQLDeveloper Help Dynamic Value Queries you would have noticed that…

“A dynamic value query is executed before the execution of all implementations in a test, including any startup action for the test. If you must populate a table before a dynamic value query is evaluated, you can do this in the startup action for a suite that includes the test.”

Fortunately, if you’re not entirely sure about how your test is going to work, you can click the Debug button and get details of the execution steps of the test or, in this case…

debug_dqv

So, it would appear that the solution to this particular problem is to wrap this test in it’s own Test Suite.
This way, we can run the Startup Processes at the Suite level instead of the the Test level to ensure that they execute before the Dynamic Values Query.

So, I’m going to create a suite called manage_tournaments.remove_tournament_ws – for Wrapper Suite
As well as the Startup Processes, I’ll also move the Teardown Process from the test to this Suite.
Then, I’ll allocate the test to the Suite.

Creating the Wrapper Suite

This is the same process for creating any other Test Suite – i.e. Go to the Unit Test Navigator, right-click on the Suites node and select Add Suite… from the menu.

dvq_ws1

If we now bring up our new Suite, we can add the Startup and Teardown processes in the same way as we do for a test (i.e. hit the Plus button and away you go).
Once that’s done, we need to assign a test to the Suite.
Once again Hit the Plus button, this time in the Test or Suite Section to bring up the Add to Suite dialogue :

add_to_suite1

Select the test we want to add, in our case, MANAGE_TOURNAMENTS.REMOVE_TOURNAMENT_1 and make sure that the Run Test Startups and Run Test Teardowns are unchecked :add_to_suite2

Click OK and…

dvq_ws2

Even though we’ve specified that the Startup and Teardown Processes should not be executed when the test is run within the Suite, it’s probably a good idea to go back and remove them, if only to save much confusion later on.

Anyhow, when we now execute the suite we can see that the results are what we’re expecting and that, reassuringly, the generated synthetic key value ( I_ID) is being passed in :

successful_fail

Running Startup Processes in the Suite and the Test

To make sure that the Foreign Key from TOURNAMENT_TEAMS to TOURNAMENTS is working, we need to insert a TOURNAMENT_TEAMS record for the appropriate TOURNAMENT.ID as part of the Setup Process.

As with the previous tests, we’re going to need to include this in the Wrapper Suite we’ve just created so that the Dynamic Values Query to get the ID value works.

Hang on, let’s consider that decision for a moment.

It is true that the second test will require the same Startup Processes that we have in our existing Wrapper Suite for the first test. It will also need these Startup Processes to be executed in a Wrapper Suite as it needs to have access to the TOURNAMENTS.ID value in a Dynamic Values Query.

To a programmer, it’s logical therefore that the second test should be allocated to the same Test Suite as the first as the code has already been written there and there’s absolutely no need to go duplicating effort ( even if it is mainly just importing stuff from the Library).

Of course, we will need to “move things about a bit” to make sure that both tests can run properly within the same suite. For example, we need to perform the “successful” delete test last as the test for an Exception is relying on the record to be deleted…er…not being deleted when it runs.

To a tester, things may appear a little different. One of the principles of Unit Testing is to make sure that, as far as possible, that tests can be run independently of each other.

It is for this reason that you should give serious consideration to creating a separate Wrapper Suite for our second test.

The alternative, as I’m about to demonstrate, gets a bit messy…

So for our new test, as well as the Setup Processes in the Suite, we’ve also included one in the test for the addition of the TOURNAMENT_TEAMS record.
The creation of the TOURNAMENT_TEAMS record needs to remain in the Test Startup Process rather than in the Suite as it’s only relevant to this test and not for all tests in the Suite. However, as the TOURNAMENT record we’re looking for will definitely have been created by the Wrapper Suite Startup Processes – before the Test Startup Process fires, this should not be a problem.

So, the main differences between this test – MANAGE_TOURNAMENTS.REMOVE_TOURNAMENT_2 – and it’s predecessor are simply that we are expecting this test to error with ORA-2292 – “Integrity constraint violated” – and that we now include the following Startup Process code to create the TOURNAMENT_TEAMS record :

declare
    l_id tournaments.id%type;
begin
    select id
    into l_id
    from tournaments
    where comp_code = 'UT1'
    and year_completed = 1916
    and host_nation = 'ARGENTINA';

    merge into tournament_teams
    using dual
    on
    (
        tourn_id = l_id
        and team_name = 'URUGUAY'
    )
    when not matched then
    insert( tourn_id, team_name, group_stage)
    values(l_id, 'URUGUAY', null);
end;

Now things start to get a bit complicated. In order to make sure that the test for a legitimate delete does not fail, we need to “teardown” the child record in TOURNAMENT_TEAMS that we created in our Startup Process. Well, no problem…except that SUT does not appear to allow Variable Substitution syntax to be used in a Teardown Process.
Therefore, we need to indulge in a little light hacking and put the following code in a Validation Action in our test :

declare
    l_id tournament_teams.tourn_id%type := :I_ID;
begin
    delete from tournament_teams
    where tourn_id = l_id
    and team_name = 'URUGUAY';
end;

This time, when we add the test to the Wrapper Suite, we make sure that the Test Startups are run :

add_to_suite_test2

Finally, we need to make sure that our new test runs first in the suite. In the Test Suite listing, click on the Test name then click the blue up arrow…

reorder_tests1

…until our test is at the top of the pile…

reorder_tests2

Once we’re happy with our tests, we can then fix the application code :

...
procedure remove_tournament( i_id tournaments.id%type)
is
begin
    delete from tournaments
    where id = i_id;
end remove_tournament;
...

…and run our wrapper suite to make sure everything passes…

ws_final_run

Finally, we need to add the manage_tournaments.remove_tournaments_ws suite to the Suite we have for all the tests for the MANAGE_TOURNAMENTS package.
To do this, go to the Unit Test Navigator and expand the MANAGE_TOURNAMENTS suite.
Then, right-click the Suites node and select Add Suite…

suite_to_suite1

Now select the manage_tournements.remove_tournaments_ws suite from the list …

suite_to_suite2

…and click OK.

Near enough is good enough ? Finding a key value generated during a Startup Process

The other record deletion story we have concerns COMPETITIONS records.

The procedure we’ll be testing is in the MANAGE_COMPETITIONS package :

...
procedure remove_competition( i_code competitions.comp_code%type) is
begin
    delete from competitions
    where comp_code = i_code;
end remove_competition;
...

The acceptance criteria, and indeed, the functionality, that we’re testing here is very similar to our previous User Story.
The Acceptance Criteria are :

  • A competition can be deleted
  • A competition cannot be deleted if a tournament exists for it

To make sure that the test to Delete a COMPETITION is self-contained, we need to make sure that the record we are deleting has no child records.
The easiest way to do this is to create the record as part of the Startup Process.
Obviously, this will need to referenced by a Dynamic Values Query and therefore this code will need to run in another Wrapper Suite.

Once again, I’m using the GET_NEW_COMP_CODE function I created earlier. Yes, that one that you weren’t sure about. The one that you’re probably still not sure about. The Startup Process in my Wrapper Suite will be User PL/Sql Code :

declare
    l_code competitions.comp_code%type;
begin
    l_code := test_repo.get_new_comp_code;
    insert into competitions(comp_code, comp_name, description)
    values( l_code, 'Test', null);
end;

The next step may well be a bit tricky – in the Dynamic Values Query we use to determine the parameters to pass to the procedure, we need to find the COMP_CODE created in the Startup Process.
Now, we can do something like this…

select 'UT'
    || to_char(substr(test_repo.get_new_comp_code, regexp_instr( test_repo.get_new_comp_code, '[[:digit:]]')) -1) as I_CODE
from dual;

…but if the table has a change made to it in another session in the interval between our Startup Process and our Dynamic Values Query executing then we may well end up using an incorrect COMP_CODE value.

Let’s stop and think for a moment.
What we are writing here is not application code that may be executed concurrently by multiple users. We are writing Unit Tests.
Therefore, whilst this potential inexactitude would be problematic within the core Application Codebase, it’s not so much of an issue for a Unit Test.
Remember, the Unit tests will probably be run on a Development Environment with few users ( i.e. Developers) connected. They may also be run on a Continuous Integration Environment, in which case they are likely to be the only thing running on the database.
OK, so I could do something clever with the COMP_CODE value used in the Startup Process being assigned to a package variable/temporary table/whatever for reference by later testing steps in the same session, but I’m really not sure I need to go to all that effort right now.
You may well disagree with this approach, but as I’m the one at the keyboard right now, we’re pressing on…

The validation code for this test will be a Query returning no row(s) :

select null
from competitions
where comp_code = '{I_CODE}'

The final Test (inside it’s wrapper suite), looks like this :

del_comp1

By contrast, making sure that we can’t delete a COMPETITIONS record which has TOURNAMENTS records associated with it is pretty straightforward.
We simply use the MERGE statements we’ve already added to the library to make sure we have a COMP_CODE UT1 and test against that.
As we know the value that we want to pass in ahead of time, we don’t even need a Dynamic Values Query. Therefore, we don’t need another Wrapper Suite.

The test ends up looking like this :
del_comp2

Having checked that they work as expected, I’ve added the second test and the wrapper suite for the first test to the MANAGE_COMPETITIONS suite.
Together with our earlier tests for this package, the Suite as a whole now looks like this :

manage_comps_suite_contents

Deletion Tests Summary

By using the ability to define Startup (and Teardown) Processes at Suite level, we can workaround some of the limitations of Dynamic Values Queries.
Additionally, this property of Suites offers some additional flexibility within SUT.
This does mean that some test code may end up in parts of the test structure where you would not normally expect to find them.

Updates

There doesn’t seem to be anything too different about the way SUT lets you test Update operations, except for the facility to have multiple implementations of a single test. There’s an example of this in the Testing Tutorial in the SQLDeveloper Help. Alternatively…

Multiple Implementations of the same test

We’ve got a Story about updating TOURNAMENTS records.

The Acceptance Criteria are :

  • Change the number of teams taking part in a tournament
  • Change the year a tournament started
  • The year a tournament started cannot be after the year a tournament finished

The procedure that we’re testing is in the MANAGE_TOURNAMENTS package :

procedure edit_tournament
(
    i_id tournaments.id%type,
    i_teams tournaments.number_of_teams%type default null,
    i_year_start tournaments.year_started%type default null
)
is
begin
    update tournaments
    set number_of_teams = nvl(i_teams, number_of_teams),
        year_started = nvl(i_year_start, year_started)
    where id = i_id;
end edit_tournament;

We’ve also added a check constraint to the table :

alter table tournaments add constraint chk_end_after_start
    check(nvl(year_started, year_completed) <= year_completed)
/

Once again, as the procedure we’re testing requires an ID value that we may or may not be creating at runtime, we’ll be needing a Wrapper Suite to feed a Dynamic Values Query to generate the TOURNAMENTS.ID value we pass into the procedure we’re testing.

Once we’ve got our Suite – which I’ve called manage_tournaments.edit_tournament_ws, we can start looking at the Test.

The Dynamic Value Query for the first test is :

select id as I_ID,
    5 as I_TEAMS,
    null as I_YEAR_START
from tournaments
where comp_code = 'UT1'
and year_completed = 1916
and host_nation = 'ARGENTINA'

I’ve published this to the Library as edit_tourn_params_ut1 as we’re going to need variations of it shortly.
The Expected Result is SUCCESS.

The Process Validation is a Boolean Function, which I’m adding to the library as verify_edit_tournament :

declare
    l_id tournaments.id%type := :I_ID;
    l_teams tournaments.number_of_teams%type := :I_TEAMS;
    l_year_start tournaments.year_started%type := :I_YEAR_START;
    l_count pls_integer;
begin
    select 1
    into l_count
    from tournaments
    where id = l_id
    and number_of_teams = nvl(l_teams, number_of_teams)
    and nvl(year_started, year_completed) = coalesce(l_year_start, year_started, year_completed);

    return l_count = 1;
end;

If we expand our new test in the Unit Test Navigator we can see that we have something called Test Implementation 1 under it.

utn_impl

Each test can have multiple implementations, a feature we’re going to make the most of for the User Acceptance Criteria we’re dealing with now.
First thing to do then, is to rename Test Implementation 1 to something a bit more meaningful.

To do this, right-click on the Implementation and select Rename Implementation…

impl_rc_rename

Then enter the new name and hit OK

impl_rc_rename2

Now we can create a second implementation by right-clicking the Test itself and selecting Add Implementation…

test_rc_add_impl

This time, I’ve called the implementation update_year_started.

We can now see that the new Implementation is in focus in the test, but that the Execution and Validation Processes have not been populated.impl_in_focus

I’ve copied in the Dynamic Values Query from the Library and made the necessary changes for this implementation…

impl2_dvq

…and subscribed to the verify_edit_tournament Boolean Function we created in the first implementation.

The third Implementation is called update_start_after_end and is the same as the second, except I’m passing in a year later than the current YEAR_STARTED value for the TOURNAMENTS record.
The Expected Result is Exception with an ORA-2290 Check Constraint violated error so there’s no need to include the validation function.

One point to note here is that the Implementations seem to execute alphabetically by name and I don’t see any way of changing this manually.
This is not an issue in this case, when each test is reasonably independent, but it’s worth bearing in mind.

Once all of the application code is in place, the Test Suite result looks like this :

impl_results

Testing an In/Out SYS_REFCURSOR

Yes, we’re at the point that you may well have been dreading.
The Ref Cursor has been such a wonderfully useful additions to the PL/SQL language. It makes passing data between PL/SQL and programs written in other languages so much easier.
It is ironic therefore, that getting stuff out of a Ref Cursor is often quite painful when using a PL/SQL client.

Given this, one option we might consider when testing Ref Cursors could be to use whatever test framework is being employed to test the client code calling our PL/SQL API. However, that would be to pass up the opportunity to use Wouter’s clever little trick.

So, with nothing up my sleeves…

Is this a Ref Cursor that I see before me ?

The Acceptance Criterion for our User Story is that the application lists all tournaments in the system the specified competition.

The procedure we need to test is in the MANAGE_TOURNAMENTS package and looks like this :

...
procedure list_tournaments
(
    i_comp_code tournaments.comp_code%type,
    io_tourn_list in out SYS_REFCURSOR
)
is
begin
    open io_tourn_list for
        select id, comp_code, year_completed, host_nation, year_started, number_of_teams
        from tournaments
        where comp_code = i_comp_code;
end list_tournaments;
...

The first clue to the fact that this test will be a bit different from normal comes when you select this procedure right at the start of the Create Test Wizard.
Immediately you will get :

dvq_warning

In a way this is reassuring. SUT recognizes that we need to handle a REF CURSOR. However, the template Query it provides for the Dynamic Values Query we need to use appears to pose more questions than answers…

select ? as I_COMP_CODE,
    ? as IO_TOURN_LIST,
    ? as IO_TOURN_LIST$
from ?
where ?

Now, there may well be a way of getting this to work as intended, but I’ve not been able to find out what it is.
What we can do instead is a bit of light cheating…

Wouter’s Method

To start with, we need to create a procedure in the TEST_REPO schema. This will act as a dummy Test Execution so we can do the real testing in the Validation Process.

Still don’t like me creating my own objects in the Repository ? Well, fair enough, but in this case, I can’t see any other option.
The procedure then is :

create or replace procedure this_is_not_a_cursor
as
--
-- Been listening to a bit of Public Image Ltd, hence the name of this proc...
--
begin
    null;
end;
/

…look, you can put it in a package if that’ll make you feel any better about it.

Anyway, we need to grant execute permissions to the SUT roles :

grant execute on this_is_not_a_cursor to ut_repo_user
/
grant execute on this_is_not_a_cursor to ut_repo_administrator
/

Now, let’s try creating our test again. We’re using the FOOTIE connection as usual. However, this time, we’ll be selecting a program from the Other Users node…

cursor_wiz1

The test name still follows our naming convention -i.e. MANAGE_TOURNAMENTS.LIST_TOURNAMENTS_1.

The Startup Process makes sure that we have records to query :

declare
    procedure ins( i_year tournaments.year_completed%type,
        i_host tournaments.host_nation%type,
        i_teams tournaments.number_of_teams%type)
    is
    begin
        merge into tournaments
        using dual
        on
        (
            comp_code = 'WC'
            and year_completed = i_year
            and host_nation = i_host
        )
        when not matched then
            insert ( id, comp_code, year_completed,
                host_nation, year_started, number_of_teams)
            values( tourn_id_seq.nextval, 'WC', i_year,
                i_host, null, i_teams);
    end ins;
begin
    merge into competitions
    using dual
        on ( comp_code = 'WC')
        when not matched then
        insert( comp_code, comp_name, description)
        values('WC', 'World Cup', 'FIFA World Cup');

    ins(1930, 'URUGUAY', 13);
    ins(1934, 'ITALY', 16);
    ins(1938, 'FRANCE', 16);
    ins(1950, 'BRAZIL', 13);

end;

We don’t need to specify any input parameters for the Execution Step of our test so we can skip straight on to Process Validation. Here we define some User PL/Sql Code, where all the fun happens…

declare
    l_rc sys_refcursor;
    rec_tourn tournaments%rowtype;
    l_count pls_integer := 0;
    l_exp_count pls_integer;
begin
    -- The "real" test...
    manage_tournaments.list_tournaments('WC', l_rc);
    loop
        fetch l_rc into rec_tourn;
        exit when l_rc%notfound;
        l_count := l_count + 1;
    end loop;
    close l_rc;
    -- confirm that the correct number of records have been retrieved
    select count(*) into l_exp_count
    from tournaments
    where comp_code = 'WC';

    if l_count != l_exp_count then
        raise_application_error(-20900, 'Number of records in ref cursor '||l_count||' does not match expected count of '||l_exp_count);
    end if;
end;

So, as we can use a PL/SQL block in Process Validation, we can define our SYS_REFCURSOR variable and execute the call to our procedure here.

Having added our “standard” Teardown, we’re ready to test.

The result….

ref_cur_magic

The main drawback with this approach is that you now have a “customised” repository and will have to cope with any extra admin around administration and deployment of such objects. On the plus side, you can test Ref Cursor stuff.

Startup/Teardown using Table/Row Copy

Sooner or later you will encounter a testing scenario where a simple rollback just won’t do.
The next User Story is just such an example…

Bulk Upload Competitions – Using Table Copy for testing and rollback

This Story is intended to replicate the sort of ETL process that is quite common, especially in a Data Warehouse environment.
The scenario here is that you receive a delimited file containing data that needs to be loaded into your application.
The load needs to be permissive – i.e. you don’t want to fail the entire load if only a few records error.
The file format is validated as being what is expected by being loaded into an external table,
The load process then uses LOG ERRORS to upload all the records it possibly can, whilst keeping track of those records that failed by dumping them into an Error table.
The thing about LOG ERRORS is that it runs an Autonomous Transaction in the background.
Therefore, even you issue a rollback after the load, any records written to the error table will be persisted.
In light of this, we’re going to need to use something else for our Teardown process.

The Data Model

Just to quickly recap, we already have an external table :

create table competitions_xt
(
    comp_code varchar2(5),
	comp_name varchar2(50),
	description varchar2(4000)
)
    organization external
    (
        type oracle_loader
        default directory my_files
        access parameters
        (
            records delimited by newline
            badfile 'competitions.bad'
            logfile 'competitions.log'
            skip 1
            fields terminated by ','
            (
                comp_code char(5),
                comp_name char(50),
                description char(4000)
            )
        )
        location('competitions.csv')
    )
    reject limit unlimited
/

We also have a csv file – competitions.csv with the data to load (including a duplicate record) :

comp_code,comp_name,description
HIC,Home International Championship, British Home International Championship
CA,Copa America,Copa America (South American Championship until 1975)
OLY,Olympic Football Tournament,The Olympics
WC,World Cup,The FIFA World Cup
CEIC,Central European International Cup,Central European International Cup - a forerunner to the European Championships
EURO,European Championship,UEFA European Championship
HIC,Home International Championship, British Home International Championship

We have an error table – ERR$_COMPETITIONS – that’s been created by :

begin
    dbms_errlog.create_error_log('COMPETITIONS');
end;
/

…and we have a stub we’ll be using to test the load (in the MANAGE_COMPETITIONS package

    procedure upload_competitions
    is
    begin
        null;
    end;

When we create the test, the Startup Processes need to backup both the COMPETITIONS table and the ERR$_COMPETITIONS table.
Creating the first Startup Process, we select Table or Row Copy from the drop-down :

backup_tab1

In the Window that pops up, the Source Table is the table we want to copy.
The Target Table is the temporary table that SQLDeveloper is going to create as a copy of the Source Table.
Note that the Target Table defaults to the same name irrespective of how many Startup Processes we specify.
For our first Startup Process, things look like this :

startup1

Notice that the Generated Query field updates as you enter the name of the source table. If you want to make sure that this Query is going to work at runtime, you can hit the Check button and (hopefully) be reassured with the message :

startup1a

So, we’ve got a backup for the COMPETITIONS table, now we need one for the error table.
This is pretty similar to the first Startup Process except that this time we rename the Temporary Table to TMP$MANAGE_COMPETITIONS.UPLERR :

startup2

As with the startup, there are two validation processes required. Actually, you could probably do it all in one but that would be to pass up the opportunity to demonstrate both at work.

Both of these will be User PL/SQL Code blocks. First off, check that we’ve loaded the correct number of rows :

declare
    l_count pls_integer;
    wrong_count exception;
begin
    select count(comp.comp_code)
    into l_count
    from competitions comp
    inner join competitions_xt cxt
    on comp.comp_code = cxt.comp_code;

    if l_count != 7 then
        raise wrong_count;
    end if;
end;

…and then make sure that we have the correct number of error records ….

declare
    l_count pls_integer;
    wrong_count exception;
begin
    select count(*)
    into l_count
    from err$_competitions;

    if l_count != 1 then
        raise wrong_count;
    end if;
end;

Finally, for the Teardown, we need to restore our tables to the state prior to the test execution.

This time, the process type from the drop-down is Table or Row Restore

Note that we can check the check-box to drop the temp table pre-checked…

restore1

For the second Teardown process, to restore the Error table to it’s former state, we need to do a bit more typing.
This is because SQLDeveloper defaults to the same values for each Teardown Process.
So, we need to specify that our Target Table is ERR$_COMPETITIONS and our Source Table is “TMP$MANAGE_COMPETITIONS.UPLERR” :

restore2

After all that, we can see that we have two Startup Processes, two Validation Processes, and two Teardown processes in our new test :

test7_summary

After confirming that the test fails as expected…

test7_fail

…we update the application code…

...
procedure upload_competitions
is
begin
    insert into competitions( comp_code, comp_name, description)
        select comp_code, comp_name, description
        from competitions_xt
        log errors reject limit unlimited;
end upload_competitions;
...

…and re-run the test…

test7_summary

Sharing your Suites – version control and code promotion for Tests

If you plan to use your SUT tests in anything other than a single repository, chances are that you’ll want to be able to :

  • transfer them between environments
  • put them under some form of source control

Well, you’re in luck. Not only can you export Tests or Suites to an xml file on disk, an export will automatically include any Subscribed objects from the Library.
To demonstrate, right-click on the Suite we want to export :

rc_export

…and select the file location…

exp_file

…and SUT will anticipate your every need…exp_conf

…almost.

Obviously, you’ll need to make sure you deploy any custom Stored Program Units upon which the exported objects are dependent.

To import a file into a repository, you can use the Main Tools/Unit Test menu :imp_menu

…which allows you to choose the file to import, as well as the option of whether or not to overwrite an object of the same name that already exists in the repository :imp_file_select

Summary

Overall, SQLDeveloper Unit Testing provides a number of helpful features to reduce the burden of writing and maintaining tests.
Notable plus points are :

  • being able to save code in a Library to promote re-use
  • table backup and restore functionality for test Startups and Teardowns
  • the seamless way that exceptions can be tested down to the level the error code
  • the fact that test results are retained and collated in the tool

Being declarative in nature, SUT provides a common structure for Unit Tests. Being declarative in nature, SUT does have some limitations.
It is possible to overcome some of these limitations by adding custom database objects to the Repository. Some consideration needs to be given as to what extent you want to do this.

I will be comparing SUT with other PL/SQL testing frameworks in a future post. Before that, I need to evaluate some other frameworks.
The next one on my list is utPLSQL…


Filed under: PL/SQL, SQLDeveloper Tagged: Create Unit Test Wizard, Dynamic Value Queries, exceptions, Export, Import, Library, manage users, Purge Results, Repository, SQLDeveloper Unit Testing, Startup Process, sys_refcursor, Table or Row Copy, Teardown Process, Test Implementations, Test Suites, Testing in/out ref cursors, ut_repo_administrator role, ut_repo_user role, Variable Substitution, wrapper suites

Test Driven Development and PL/SQL – The Odyssey Begins

Sun, 2016-07-31 14:58

In the aftermath of the Brexit vote, I’m probably not alone in being a little confused.
Political discourse in the UK has focused on exactly who it was who voted to Leave.
The Youth Spokesperson you get on a lot of political programs right now, will talk accusingly of older voters “ruining” their future by opting to Leave.
Other shocked Remainers will put it down to people without a University Degree.
I’m not sure where that leaves me as someone who is ever so slightly over the age of 30, does not have a degree…and voted to Remain. I’m pretty sure I’m not Scottish…unless there’s some dark family secret my parents haven’t let me in on.
I suppose I must be a member of the “Metropolitan Elite” the Leave side was always muttering darkly about.
After all, I do pay a great deal of money to be driven from my country residence to London to work every day…although I do have to share the train with the odd one or two fellow elitists who’ve made the same extravagant choice.
This does of course assume that Milton Keynes qualifies as being “in the country” and that my living there is a matter of choice rather than a question of being able to afford living any closer to London.

With all the excrement…er…excitement of the Referendum Campaign and it’s aftermath, I somehow never got around to writing my application to track the progress of the Euros (or the Copa America for that matter).
Whenever a major football tournament comes around, I always resolve to do this, if only to evoke memories of my youth when a large part of my bedroom wall was taken up with a World Cup Wallchart where you could fill in the results as they happened. That’s without mentioning the months leading up to the tournament and trying to complete the Panini collection – the only time you’d ever hear a conversation such as “OK, I’ll let you have Zico in exchange for Mick Mills”.

In order to prevent this happening again, I’ve resolved to write an application capable of holding details of any major international football tournament.
In the course of writing this application, I’d like to take the opportunity to have a look at an aspect of PL/SQL development that maybe isn’t as commonly used as it should be – Unit Testing.

Over the next few weeks, I plan to take a look at some of the Testing Frameworks available for PL/SQL and see how they compare.
The objective here is not so much to find which framework is the best/most suitable, but to perform an objective comparison between them using the same set of tests which implement fairly commonly encountered functionality.

If you’re looking for recommendations for a framework, then this article by Jacek Gebal is probably a good place to start.

In this post, I’ll be outlining the functionality that I’ll be testing in the form of User Stories, together with the application data model (or at least, the bit of it I need to execute the tests).
I’ll also have a look at the common pattern that tests written in these frameworks tend to follow.
Just to highlight why using a Test Framework might be useful, I’ll also script a couple of simple tests in SQL to see just how much code you have to write to implement tests without using a framework.

Unit Testing Approach

Taking the purist approach to Test-Driven Development, we’d need to :

  1. Write the test first and ensure that it fails
  2. Write the minimum amount of code required for the test to pass
  3. Run the test to make sure it passes

Additionally, we’d need to make sure that the tests were independent of each other – that the execution of one test is not dependent on the successful execution of a prior test.

Following this approach to the letter would cause one or two issues.
Firstly, if the procedure your testing does not exist, your test will not run and fail. It will error.
As the point of this step is, essentially, to ensure that your test code is sound (i.e. it won’t pass irrespective of the code it runs against), this is not what we’re after.
The second issue is specific to PL/SQL.
When defining PL/SQL procedures that interact with database tables, it’s usually a good idea to use anchored declarations where appropriate.
Even if we write a stub procedure, if the tables it will interact with do not exist, we’d have to use native types for our parameters and update the signature of the procedure once the tables had been created.
There is always the danger that this additional step would be missed.

So, in terms of PL/SQL then, I’d suggest that the pre-requisites for writing our test are :

  • The data model components (tables, views, RI constraints) that the procedure will interact with
  • A stub procedure with correctly typed parameters

Many testing frameworks seem to adopt four basic steps for each test, some of which are optional. They are :

  1. Setup – put the application into a known state from which the test can run
  2. Execute – run the code that you want to test
  3. Verify – check that what you expected to happen did happen
  4. Teardown – return the system to the state it was in before this test was run

This is the general pattern that I’ll be following for my tests.

The Stories to Test

My story finding is complete and the backlog has been populated.
The stories selected for Sprint 1 are on the board. OK, they’ve been chosen so that they cover some of the more common scenarios that we might need to test.
The stories are :

  1. Add a competition – tests insert into a table with a Primary Key
  2. Add a Tournament – insert into a table with a Foreign Key constraint
  3. Remove Tournament – delete from a table
  4. Remove Competitions – delete from a Parent table
  5. Edit a Tournament – update of a record in a table
  6. View Tournaments by Competition – select multiple records using an in/out ref cursor
  7. Bulk Upload Competitions – insert records in bulk using LOG ERRORS
  8. Add a Team to a Tournament – insert a record using a synthetic foreign key

Each of these stories have multiple Acceptance Criteria.
It’s worth noting that, as some of the functionality (i.e. the data integrity) is implemented in the data model ( Primary Keys, Foreign Keys etc), the Acceptance Criteria for these stories needs to cover this as well as the functionality implemented in the PL/SQL code itself.

The Data Model

I’ve taken the approach that, for a story to be Sprint Ready, the Data Model to support it must already be in place.
Currently, the data model looks like this :

sprint1_data_model

The DDL to create the application owner is :

create user footie identified by password
/

alter user footie default tablespace USERS
/

grant create session, create table, create procedure, create sequence to footie
/

alter user footie quota unlimited on users
/

grant read, write on directory my_files to footie
/

…where password is the password you want to give this user.
Note that a directory called MY_FILES already exists in my database.

The DDL to create the Data Model includes the COMPETITIONS table…

create table competitions
(
    comp_code varchar2(5) constraint comp_pk primary key,
    comp_name varchar2(50) not null,
    description varchar2(4000)
)
/

comment on table competitions is
    'International Football competitionss for which tournament data can be added'
/

comment on column competitions.comp_code is
    'Internal code to uniquely identify this competitions'
/

comment on column competitions.comp_name is
    'The name of the competitions'
/

comment on column competitions.description is
    'A description of the competitions'
/

…an external table to facilitate the bulk upload of COMPETITIONS…

create table competitions_xt
(
    comp_code varchar2(5),
    comp_name varchar2(50),
    description varchar2(4000)
)
    organization external
    (
        type oracle_loader
        default directory my_files
        access parameters
        (
            records delimited by newline
            skip 1
            fields terminated by ','
            badfile 'competitions.bad'
            logfile 'competitions.log'
            (
                comp_code char(5),
                comp_name char(50),
                description char(4000)
            )
        )
            location('competitions.csv')
    )
reject limit unlimited
/

….the TOURNAMENTS table…

create table tournaments
(
    id number constraint tourn_pk primary key,
    comp_code varchar2(5),
    year_completed number(4) not null,
    host_nation varchar2(100),
    year_started number(4),
    number_of_teams number(3) not null,

    constraint tourn_uk unique( comp_code, year_completed, host_nation)
)
/

comment on table tournaments is
    'A (finals) tournament of an International Football competition. Table alias is tourn'
/

comment on column tournaments.id is
    'Synthetic PK for the table as the Natural Key includes host_nation, which may be null. Values taken from sequence tourn_id_seq'
/
comment on column tournaments.comp_code is
    'The Competition that this tournament was part of (e.g. World Cup). Part of the Unique Key. FK to COMPETITIONS(comp_code)'
/

comment on column tournaments.year_completed is
    'The year in which the last match of this tournament took place. Manatory. Part of Unique Key'
/

comment on column tournaments.host_nation is
    'The nation where the tournament was played (if a finals tournament). Part of Unique Key but is optional'
/
comment on column tournaments.year_started is
    'The year in which the first match was played ( if applicable). Cannot be later than the value in YEAR_COMPLETED'
/

comment on column tournaments.number_of_teams is
    'The number of teams taking part in the tournament'
/

…the TOURNAMENT_TEAMS table…

create table tournament_teams
(
    tourn_id number,
    team_name varchar2(100),
    group_stage varchar2(2),

    constraint team_pk primary key (tourn_id, team_name)
)
/

comment on table tournament_teams is
    'Teams participating in the Tournament. Alias is TEAM'
/

comment on column tournament_teams.tourn_id is
    'The ID of the tournament the team is participating in. Foreign Key to TOURNAMENTS(id).'
/

comment on column tournament_teams.team_name is
    'The name of the participating team'
/

comment on column tournament_teams.group_stage is
    'If the tournament has an initial group stage, the group identifier that the team is drawn in'
/

…this being 11g, a sequence to provide the values for the TOURNAMENT.ID synthetic key (in 12c you can define this as part of the table ddl)…

create sequence tourn_id_seq
/

… a Foreign Key constraint from TOURNAMENTS to COMPETITIONS….

alter table tournaments
    add constraint tourn_comp_fk foreign key
        (comp_code) references competitions(comp_code)
/

…and a Foreign Key from TOURNAMENT_TEAMS to TOURNAMENTS…

alter table tournament_teams
    add constraint team_tourn_fk foreign key
        (tourn_id) references tournaments(id)
/

I’ve kept the Foreign Keys in separate files to make the initial deployment of the application simpler. By doing this, I can create the tables in any order without worrying about RI constraints. I can then add these as a separate step after all of the tables have been created.
The tables’ non-RI constraints (Primary, Unique Keys, Not Null constraints etc.) are created along with the table.

One other point to note is that I know there are one or two issues with the first-cut of the DDL above. This is so that I can see how well the tests I write highlight these issues.
As we know, before we begin writing a test, we’ll need to have a stub procedure for it to compile against.

The first of these is :

create or replace package footie.manage_competitions
as
    procedure add_competition
    (
        i_code footie.competitions.comp_code%type,
        i_name footie.competitions.comp_name%type,
        i_desc footie.competitions.description%type
    );
end manage_competitions;
/

create or replace package body footie.manage_competitions
as
    procedure add_competition
    (
        i_code footie.competitions.comp_code%type,
        i_name footie.competitions.comp_name%type,
        i_desc footie.competitions.description%type
    )
    is
    begin
        null;
    end add_competition;
end manage_competitions;
/
Scripting the Tests

I want to write a test script for my first story – Add a Competition.
There are two Acceptance Criteria that I need to test :

  • A new competition is added
  • A competition cannot be added more than once

That’s pretty simple, so the test should be fairly straight-forward. Using a SQL script, the first test would probably look something like this :

set serveroutput on size unlimited
declare
    l_result varchar2(4);
    l_err_msg varchar2(4000) := null;

    l_rec_count pls_integer;
    l_code footie.competitions.comp_code%type;
    l_name footie.competitions.comp_name%type := 'UT1 Comp';
    l_desc footie.competitions.description%type := 'Test';

    l_counter pls_integer := 1;

begin
    -- Setup - make sure that the competition we're using for the test does not already exist
    select 'UT'||numeral
    into l_code
    from
    (
        select max( to_number( substr( comp_code, regexp_instr( comp_code, '[[:digit:]]')))) + 1 as numeral
        from footie.competitions
        where comp_code like 'UT%'
        and regexp_instr( substr( comp_code, -1, 1), '[[:digit:]]') = 1
        union
        select 1 from dual
    )
    where numeral is not null;
    -- execute - nested block so that we can handle/report any exceptions
    begin
        footie.manage_competitions.add_competition(l_code, l_name, l_desc);
        l_result := 'PASS';
    exception
        when others then
            l_err_msg := dbms_utility.format_error_backtrace;
            l_result := 'FAIL';
    end; -- execute block
    -- validate
    if l_result = 'PASS' then
        select count(*) into l_rec_count
        from footie.competitions
        where comp_code = l_code
        and comp_name = l_name
        and description = l_desc;

        if l_rec_count != 1 then
            l_result := 'FAIL';
            l_err_msg := 'Record not added';
        end if;
    end if;
    -- teardown
    rollback;
    -- Display Result
    dbms_output.put_line('Add Competition : '||l_result);
    if l_result = 'FAIL' then
        dbms_output.put_line(l_err_msg);
    end if;
end;
/

If we run this, we’d expect it to fail, as things stand :

PL/SQL procedure successfully completed.

Add Competition : FAIL
Record not added

Our second test will probably look like this :

set serveroutput on size unlimited
declare
    l_result varchar2(4);
    l_err_msg varchar2(4000) := null;
    l_err_code number;

    l_rec_count pls_integer;
    l_code footie.competitions.comp_code%type := 'UT2';
    l_name footie.competitions.comp_name%type := 'UT2 Comp';
    l_desc footie.competitions.description%type := null;

    l_counter pls_integer := 1;

begin
    -- Setup - make sure that the competition we're using for the test exists
    merge into footie.competitions
    using dual
    on (comp_code = l_code)
    when not matched then
    insert( comp_code, comp_name, description)
    values(l_code, l_name, l_desc);

    -- execute - nested block so that we can handle/report any exceptions
    begin
        footie.manage_competitions.add_competition(l_code, l_name, l_desc);
        l_result := 'FAIL';
    exception
        when others then
            l_err_msg := dbms_utility.format_error_backtrace;
            l_result := 'PASS';
    end; -- execute block
    -- validate
    if l_result = 'PASS' then
        select count(*) into l_rec_count
        from footie.competitions
        where comp_code = l_code
        and comp_name = l_name
        and nvl(description, 'X') = nvl(l_desc, 'X');

        if l_rec_count > 1 then
            l_result := 'FAIL';
            l_err_msg := 'Duplicate record has been added';
        end if;
    end if;
    -- teardown
    rollback;
    -- Display Result
    dbms_output.put_line('Add Competition : '||l_result);
    if l_result = 'FAIL' then
        dbms_output.put_line(l_err_msg);
    end if;
end;
/

That looks rather similar to our first test. Furthermore, the two test scripts combined add up to quite a lot of code.
At least when we run it, it fails as expected…

PL/SQL procedure successfully completed.

Add Competition : FAIL

The next step is to write the code to pass the tests…

create or replace package body manage_competitions
as
	procedure add_competition
	(
		i_code competitions.comp_code%type,
		i_name competitions.comp_name%type,
		i_desc competitions.description%type default null
	)
	is
	begin
        insert into competitions( comp_code, comp_name, description)
        values( i_code, i_name, i_desc);
	end add_competition;
end manage_competitions;
/

Now when we run our tests we get …

PL/SQL procedure successfully completed.

Add Competition : PASS

PL/SQL procedure successfully completed.

Add Competition : PASS

Now, I could carry on writing SQL scripts for Acceptance Criteria for all of the other stories in the sprint, but I think you get the idea.
Writing tests this way requires a large amount of code, much of which is being replicated in separate tests.

Of course, you could go down the route of moving some of your repeating test routines into PL/SQL packages and deploying your test code along with your application code to non-production environments.
Before going to all of that extra effort though, it’s probably worth checking to see if there’s a framework out there that can help reduce the testing burden.
Fortunately, I’ve got a couple of years until the next major international tournament so, I’ll be taking some time to do just that.
So, tune in next week ( or sometime soon after) when I’ll be taking a look at the first of these frameworks…SQLDeveloper Unit Testing


Filed under: Oracle, PL/SQL, SQL Tagged: external table, foreign key, Test Driven Development, unique key

The Oracle Data Dictionary – Keeping an eye on your application in uncertain times

Sun, 2016-07-17 13:15

I’ve got to say that it’s no surprise that were leaving Europe. It’s just that we expected it to be on penalties, probably to Germany.
Obviously, that “we” in the last gag is England. Wales and Northern Ireland have shown no sense of decorum and continued to antagonise our European Partners by beating them at football.
Currently, the national mood seems to be that of a naughty child who stuck their fingers in the light socket to see what would happen, and were awfully surprised when it did.

In the midst of all this uncertainty, I’ve decided to seek comfort in the reassuringly familiar.
Step forward the Oracle Data Dictionary – Oracle’s implementation of the Database Catalog.

However closely you follow the Thick Database Paradigm, the Data Dictionary will serve as the Swiss Army Knife in your toolkit for ensuring Maintainability.
I’ll start of with a quick (re)introduction of the Data Dictionary and how to search it using the DICTIONARY view.
Then I’ll cover just some of the ways in which the Data Dictionary can help you to get stones out of horses hooves keep your application healthy.

Right then….

What’s in the Data Dictionary ?

The answer is, essentially, metadata about any objects you have in your database down to and including source code for any stored program units.
Data Dictionary views tend to come in three flavours :

  • USER_ – anything owned by the currently connected user
  • ALL_ – anything in USER_ plus anything the current user has access to
  • DBA_ – anything in the current database

The Data Dictionary has quite a lot of stuff in it, as you can tell by running this query :

select count(*)
from dictionary
/

You can sift through this mountain of information by having a look at the comments available in DICTIONARY (DICT to it’s friends) for each of the Views listed.
For example…

select comments
from dict
where table_name = 'USER_TABLES'
/

COMMENTS
--------------------------------------------------
Description of the user's own relational tables

You can see a graphical representation of these USER_ views in whatever Oracle IDE you happen to be using.
For example, in SQLDeveloper…

sqldev_tree

This graphical tree view corresponds roughly to the following Data Dictionary views :

View Name DICT Comments Additional Comments USER_TABLES Description of the user’s own relational tables USER_VIEWS Description of the user’s own views USER_EDITIONING_VIEWS Descriptions of the user’s own Editioning Views USER_INDEXES Description of the user’s own indexes USER_OBJECTS Objects owned by the user This includes functions, packages, procedures etc USER_QUEUES All queues owned by the user ALL_QUEUE_TABLES All queue tables accessible to the user USER_TRIGGERS Triggers having FOLLOWS or PRECEDES ordering owned by the user Includes Cross Edition Triggers USER_TYPES Description of the user’s own types USER_MVIEW_LOGS All materialized view logs owned by the user USER_SEQUENCES Description of the user’s own SEQUENCEs USER_SYNONYMS The user’s private synonyms ALL_SYNONYMS All synonyms for base objects accessible to the user and session Includes PUBLIC synonyms USER_DB_LINKS Database links owned by the user ALL_DB_LINKS Database links accessible to the user ALL_DIRECTORIES Description of all directories accessible to the user ALL_EDITIONS Describes all editions in the database USER_XML_SCHEMAS Description of XML Schemas registered by the user USER_SCHEDULER_JOBS All scheduler jobs in the database RESOURCE_VIEW Whilst not part of the DICTIONARY per se, you can see details of XML DB Schema in this view USER_RECYCLEBIN User view of his recyclebin ALL_USERS Information about all users of the database

As all of this metadata is available in views, it can be interrogated programatically via SQL, as we’ll discover shortly. Before that though, let’s introduce…

The Brexit Schema

To add an element of topicality, the following examples will be based on this schema.

The user creation script looks like this :

grant connect, create table, create procedure, create sequence
    to brexit identified by ceul8r
/

alter user brexit default tablespace users
/
alter user brexit quota unlimited on users
/

You’ll probably want to choose your own (weak) pun-based password.

The tables in this schema are ( initially at least)…

create table countries
(
    iso_code varchar2(3),
    coun_name varchar2(100) not null,
    curr_code varchar2(3) not null,
    is_eu_flag varchar2(1)
)
/

create table currencies
(
    iso_code varchar2(3) constraint curr_pk primary key,
    curr_name varchar2(100)
)
/

For reasons which will become apparent, we’ll also include this procedure, complete with “typo” to ensure it doesn’t compile…

create or replace procedure add_currency
(
	i_iso_code currencies.iso_code%type,
	i_curr_name currencies.curr_name%type
)
as

begin
	-- Deliberate Mistake...
	brick it for brexit !
	insert into currencies( iso_code, curr_name)
	values( i_iso_code, i_curr_name);
end add_currency;
/

The examples that follow are based on the assumption that you are connected as the BREXIT user.

First up….

Spotting tables with No Primary Keys

Say that we want to establish whether a Primary Key has been defined for each table in the schema.
Specifically, we want to check permanent tables which comprise the core application tables. We’re less interested in checking on Global Temporary Tables or External Tables.
Rather than wading through the relevant DDL scripts, we can get the Data Dictionary to do the work for us :

select table_name
from user_tables
where temporary = 'N' -- exclude GTTs
and table_name not in
(
    -- exclude External Tables ...
    select table_name
    from user_external_tables
)
and table_name not in
(
    -- see if table has a Primary Key
    select table_name
    from user_constraints
    where constraint_type = 'P'
)
/

TABLE_NAME
------------------------------
COUNTRIES

It looks like someone forgot to add constraints to the countries table. I blame the shock of Brexit. Anyway, we’d better fix that…

alter table countries add constraint
	coun_pk primary key (iso_code)
/

…and add an RI constraint whilst we’re at it…

alter table countries add constraint
	coun_curr_fk foreign key (curr_code) references currencies( iso_code)
/

…so that I’ve got some data with which to test…

Foreign Keys with No Indexes

In OLTP applications especially, it’s often a good idea to index any columns that are subject to a Foreign Key constraint in order to improve performance.
To see if there are any FK columns in our application that may benefit from an index…

with cons_cols as
(
    select cons.table_name,  cons.constraint_name,
        listagg(cols.column_name, ',') within group (order by cols.position) as columns
    from user_cons_columns cols
    inner join user_constraints cons
		on cols.constraint_name = cons.constraint_name
	where cons.constraint_type = 'R'
    group by cons.table_name, cons.constraint_name
),
ind_cols as
(
select ind.table_name, ind.index_name,
    listagg(ind.column_name, ',') within group( order by ind.column_position) as columns
from user_ind_columns  ind
group by ind.table_name, ind.index_name
)
select cons_cols.table_name, cons_cols.constraint_name, cons_cols.columns
from cons_cols
where cons_cols.table_name not in
(
    select ind_cols.table_name
    from ind_cols
    where ind_cols.table_name = cons_cols.table_name
    and ind_cols.columns like cons_cols.columns||'%'
)
/

Sure enough, when we run this as BREXIT we get…

TABLE_NAME		       CONSTRAINT_NAME	    COLUMNS
------------------------------ -------------------- ------------------------------
COUNTRIES		       COUN_CURR_FK	    CURR_CODE

Post Deployment Checks

It’s not just the Data Model that you can keep track of.
If you imagine a situation where we’ve just released the BREXIT code to an environment, we’ll want to check that everything has worked as expected. To do this, we may well recompile any PL/SQL objects in the schema to ensure that everything is valid….

exec dbms_utility.compile_schema(user)

…but once we’ve done this we want to make sure. So…

select object_name, object_type
from user_objects
where status = 'INVALID'
union
select constraint_name, 'CONSTRAINT'
from user_constraints
where status = 'DISABLED'
/

OBJECT_NAME		       OBJECT_TYPE
------------------------------ -------------------
ADD_CURRENCY		       PROCEDURE

Hmmm, I think we’d better fix that, but how do we find out what the error is without recompiling ? hmmm…

select line, position, text
from user_errors
where name = 'ADD_CURRENCY'
and type = 'PROCEDURE'
order by sequence
/

LINE POSITION TEXT
---- -------- --------------------------------------------------------------------------------
  10        8 PLS-00103: Encountered the symbol "IT" when expecting one of the following:     

                 := . ( @ % ;
Impact Analysis

Inevitably, at some point during the life of your application, you will need to make a change to it. This may well be a change to a table structure, or even to some reference data you previously thought was immutable.
In such circumstances, you really want to get a reasonable idea of what impact the change is going to have in terms of changes to your application code.
For example, if we need to make a change to the CURRENCIES table…

select name, type
from user_dependencies
where referenced_owner = user
and referenced_name = 'CURRENCIES'
and referenced_type = 'TABLE'
union all
select child.table_name, 'TABLE'
from user_constraints child
inner join user_constraints parent
	on child.r_constraint_name = parent.constraint_name
where child.constraint_type = 'R'
and parent.table_name = 'CURRENCIES'
/

NAME                           TYPE
------------------------------ ------------------
ADD_CURRENCY                   PROCEDURE
COUNTRIES                      TABLE             

Now we know the objects that are potentially affected by this proposed change, we have the scope of our Impact Analysis, at least in terms of objects in the database.

Conclusion

As always, there’s far more to the Data Dictionary than what we’ve covered here.
Steven Feuerstein has written a more PL/SQL focused article on this topic.
That about wraps it up for now, so time for Mexit.


Filed under: Oracle, PL/SQL, SQL Tagged: Data Dictionary, dbms_utility.compile_schema, dict, dictionary, listagg, thick database paradigm, user_constraints, user_cons_columns, USER_DEPENDENCIES, user_errors, user_ind_columns, user_objects, user_tables

Oracle – Pinning table data in the Buffer Cache

Thu, 2016-06-23 15:13

As I write, Euro 2016 is in full swing.
England have managed to get out of the Group Stage this time, finishing second to the mighty…er…Wales.
Fortunately Deb hasn’t mentioned this…much.

In order to escape the Welsh nationalism that is currently rampant in our house, let’s try something completely unrelated – a tale of Gothic Horror set in an Oracle Database…

It was a dark stormy night. Well, it was dark and there was a persistent drizzle. It was Britain in summertime.
Sitting at his desk, listening to The Sisters of Mercy ( required to compensate for the lack of a thunderstorm and to maintain the Gothic quotient) Frank N Stein was struck by a sudden inspiration.
“I know”, he thought, “I’ll write some code to cache my Reference Data Tables in a PL/SQL array. I’ll declare the array as a package header variable so that the data is available for the entire session. That should cut down on the amount of Physical IO my application needs to do !”

Quite a lot of code later, Frank’s creation lurched off toward Production.
The outcome wasn’t quite what Frank had anticipated. The code that he had created was quite complex and hard to maintain. It was also not particularly quick.
In short, Frank’s caching framework was a bit of a monster.

In case you’re wondering, no, this is not in any way autobiographical. I am not Frank (although I will confess to owning a Sisters of Mercy album).
I am, in fact, one of the unfortunates who had to support this application several years later.

OK, it’s almost certain that none of the developers who spawned this particular delight were named after a fictional mad doctor…although maybe they should have been.

In order to prevent others from suffering from a similar misapplication of creative genius, what I’m going to look at here is :

  • How Oracle caches table data in Memory
  • How to work out what tables are in the cache
  • Ways in which you can “pin” tables in the cache (if you really need to)

Fortunately, Oracle memory management is fairly robust so there will be no mention of leeks

Data Caching in Action

Let’s start with a simple illustration of data caching in Oracle.

To begin with, I’m going to make sure that there’s nothing in the cache by running …

alter system flush buffer_cache
/

…which, provided you have DBA privileges should come back with :

System FLUSH altered.

Now, with the aid of autotrace, we can have a look at the difference between retrieving cached and uncached data.
To start with, in SQL*Plus :

set autotrace on
set timing on

…and then run our query :

select *
from hr.departments
/

The first time we execute this query, the timing and statistics output will be something like :

...
27 rows selected.

Elapsed: 00:00:00.08
...

Statistics
----------------------------------------------------------
	106  recursive calls
	  0  db block gets
	104  consistent gets
	 29  physical reads
	  0  redo size
       1670  bytes sent via SQL*Net to client
	530  bytes received via SQL*Net from client
	  3  SQL*Net roundtrips to/from client
	  7  sorts (memory)
	  0  sorts (disk)
	 27  rows processed

If we now run the same query again, we can see that things have changed a bit…

...
27 rows selected.

Elapsed: 00:00:00.01
...

Statistics
----------------------------------------------------------
	  0  recursive calls
	  0  db block gets
	  8  consistent gets
	  0  physical reads
	  0  redo size
       1670  bytes sent via SQL*Net to client
	530  bytes received via SQL*Net from client
	  3  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	 27  rows processed

The second run was a fair bit faster. This is mainly because the data required to resolve the query was cached after the first run.
Therefore, the second execution required no Physical I/O to retrieve the result set.

So, exactly how does this caching malarkey work in Oracle ?

The Buffer Cache and the LRU Algorithm

The Buffer Cache is part of the System Global Area (SGA) – an area of RAM used by Oracle to cache various things that are generally available to any sessions running on the Instance.
The allocation of blocks into and out of the Buffer Cache is achieved by means of a Least Recently Used (LRU) algorithm.

You can see details of this in the Oracle documentation but, in very simple terms, we can visualise the workings of the Buffer Cache like this :

lru_algorithm

When a data block is first read from disk, it’s loaded into the middle of the Buffer Cache.
If it’s then “touched” frequently, it will work it’s way towards the hot end of the cache.
Otherwise it will move to the cold end and ultimately be discarded to make room for other data blocks that are being read.
Sort of…

The Small Table Threshold

In fact, blocks that are retrieved as the result of a Full Table Scan will only be loaded into the mid-point of the cache if the size of the table in question does not exceed the Small Table Threshold.
The usual definition of this ( unless you’ve been playing around with the hidden initialization parameter _small_table_threshold) is a table that is no bigger than 2% of the buffer cache.
As we’re using the default Automated Memory Management here, it can be a little difficult to pin down exactly what this is.
Fortunately, we can find out (provided we have SYS access to the database) by running the following query :

select cv.ksppstvl value,
    pi.ksppdesc description
from x$ksppi pi
inner join x$ksppcv cv
on cv.indx = pi.indx
and cv.inst_id = pi.inst_id
where pi.inst_id = userenv('Instance')
and pi.ksppinm = '_small_table_threshold'
/

VALUE      DESCRIPTION
---------- ------------------------------------------------------------
589        lower threshold level of table size for direct reads

The current size of the Buffer Cache can be found by running :

select component, current_size
from v$memory_dynamic_components
where component = 'DEFAULT buffer cache'
/

COMPONENT                                                        CURRENT_SIZE
---------------------------------------------------------------- ------------
DEFAULT buffer cache                                                251658240

Now I’m not entirely sure about this but I believe that the Small Table Threshold is reported in database blocks.
The Buffer Cache size from the query above is definitely in bytes.
The database we’re running on has a uniform block size of 8k.
Therefore, the Buffer Cache is around 614 blocks.
This would make 2% of it 614 blocks, which is slightly more than the 589 as being reported as the Small Table Threshold.
If you want to explore further down this particular rabbit hole, have a look at this article by Jonathan Lewis.

This all sounds pretty good in theory, but how do we know for definite that our table is in the Buffer Cache ?

What’s in the Buffer Cache ?

In order to answer this question, we need to have a look at the V$BH view. The following query should prove adequate for now :

select obj.owner, obj.object_name, obj.object_type,
    count(buf.block#) as cached_blocks
from v$bh buf
inner join dba_objects obj
    on buf.objd = obj.data_object_id
where buf.class# = 1 -- data blocks
and buf.status != 'free'
and obj.owner = 'HR'
and obj.object_name = 'DEPARTMENTS'
and obj.object_type = 'TABLE'
group by obj.owner, obj.object_name, obj.object_type
/

OWNER                OBJECT_NAME          OBJECT_TYPE          CACHED_BLOCKS
-------------------- -------------------- -------------------- -------------
HR                   DEPARTMENTS          TABLE                            5

Some things to note about this query :

  • the OBJD column in v$bh joins to data_object_id in DBA_OBJECTS and not object_id
  • we’re excluding any blocks with a status of free because they are, in effect, empty and available for re-use
  • the class# value needs to be set to 1 – data blocks

So far we know that there are data blocks from our table in the cache. But we need to know whether all of the table is in the cache.

Time for another example…

We need to know how many data blocks the table actually has. Provided the statistics on the table are up to date we can get this from the DBA_TABLES view.

First of all then, let’s gather stats on the table…

exec dbms_stats.gather_table_stats('HR', 'DEPARTMENTS')

… and then check in DBA_TABLES…

select blocks
from dba_tables
where owner = 'HR'
and table_name = 'DEPARTMENTS'
/

    BLOCKS
----------
	 5

Now, let’s flush the cache….

alter system flush buffer_cache
/

…and try a slightly different query…


select *
from hr.departments
where department_id = 60
/
DEPARTMENT_ID DEPARTMENT_NAME		     MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
	   60 IT				    103        1400

We can now use the block total in DBA_TABLES to tell how much of the HR.DEPARTMENTS table is in the cache …

select obj.owner, obj.object_name, obj.object_type,
    count(buf.block#) as cached_blocks,
    tab.blocks as total_blocks
from v$bh buf
inner join dba_objects obj
    on buf.objd = obj.data_object_id
inner join dba_tables tab
    on tab.owner = obj.owner
    and tab.table_name = obj.object_name
    and obj.object_type = 'TABLE'
where buf.class# = 1
and buf.status != 'free'
and obj.owner = 'HR'
and obj.object_name = 'DEPARTMENTS'
and obj.object_type = 'TABLE'
group by obj.owner, obj.object_name, obj.object_type, tab.blocks
/

OWNER	   OBJECT_NAME	   OBJECT_TYP CACHED_BLOCKS TOTAL_BLOCKS
---------- --------------- ---------- ------------- ------------
HR	   DEPARTMENTS	   TABLE		  1	       5

As you’d expect the data blocks for the table will only be cached as they are required.
With a small, frequently used reference data table, you can probably expect it to be fully cached fairly soon after the application is started.
Once it is cached, the way the LRU algorithm works should ensure that the data blocks are constantly in the hot end of the cache.

In the vast majority of applications, this will be the case. So, do you really need to do anything ?

If your application is not currently conforming to this sweeping generalisation then you probably want to ask a number of questions before taking any precipitous action.
For a start, is the small, frequently accessed table you expect to see in the cache really frequently accessed ? Is your application really doing what you think it does ?
Whilst where on the subject, are there any rogue queries running more regularly than you might expect causing blocks to be aged out of the cache prematurely ?

Once you’re satisfied that the problem does not lie with your application, or your understanding of how it operates, the next question will probably be, has sufficient memory been allocated for the SGA ?
There are many ways you can look into this. If your fortunate enough to have the Tuning and Diagnostic Packs Licensed there are various advisor that can help.
Even if you don’t, you can always take a look at V$SGA_TARGET_ADVICE.

If, after all of that, you’re stuck with the same problem, there are a few options available to you, starting with…

The Table CACHE option

This table property can be set so that a table’s data blocks are loaded into the hot end of the LRU as soon as they are read into the Buffer Cache, rather than the mid-point, which is the default behaviour.

Once again, using HR.DEPARTMENTS as our example, we can check the current setting on this table simply by running …

select cache
from dba_tables
where owner = 'HR'
and table_name = 'DEPARTMENTS'
/

CACHE
-----
    N

At the moment then, this table is set to be cached in the usual way.

To change this….

alter table hr.departments cache
/

Table HR.DEPARTMENTS altered.

When we check again, we can see that the CACHE property has been set on the table…

select cache
from dba_tables
where owner = 'HR'
and table_name = 'DEPARTMENTS'
/

CACHE
-----
    Y

This change does have one other side effect that is worth bearing in mind.
It causes the LRU algorithm to ignore the Small Table Threshold and dump all of the selected blocks into the hot end of the cache.
Therefore, if you do this on a larger table, you do run the risk of flushing other frequently accessed blocks from the cache, thus causing performance degradation elsewhere in your application.

The KEEP Cache

Normally you’ll have a single Buffer Cache for an instance. If you have multiple block sizes defined in your database then you will have a Buffer Cache for each block size. However, you can define additional Buffer Caches and assign segments to them.

The idea behind the Keep Cache is that it will hold frequently accessed blocks without ageing them out.
It’s important to note that the population of the KEEP CACHE uses the identical algorithm to that of the Buffer Cache. The difference here is that you select which tables use this cache…

In order to take advantage of this, we first need to create a KEEP Cache :

alter system set db_keep_cache_size = 8m scope=both
/

System altered.

Note that, on my XE 11gR2 instance at least, the minimum size for the Keep Cache appears to be 8 MB ( or 1024 8k blocks).
We can now see that we do indeed have a Keep Cache…

select component, current_size
from v$memory_dynamic_components
where component = 'KEEP buffer cache'
/

COMPONENT               CURRENT_SIZE
----------------------  ------------
KEEP buffer cache       8388608

Now we can assign our table to this cache….

alter table hr.departments
    storage( buffer_pool keep)
/

Table altered.

We can see that this change has had an immediate effect :

select buffer_pool
from dba_tables
where owner = 'HR'
and table_name = 'DEPARTMENTS'
/

BUFFER_POOL
---------------
KEEP

If we run the following…

alter system flush buffer_cache
/

select * from hr.departments
/

select * from hr.employees
/

…we can see which cache is being used for each table, by amending our Buffer Cache query…

select obj.owner, obj.object_name, obj.object_type,
    count(buf.block#) as cached_blocks,
    tab.blocks as total_blocks,
    tab.buffer_pool as Cache
from v$bh buf
inner join dba_objects obj
    on buf.objd = obj.data_object_id
inner join dba_tables tab
    on tab.owner = obj.owner
    and tab.table_name = obj.object_name
    and obj.object_type = 'TABLE'
where buf.class# = 1
and buf.status != 'free'
and obj.owner = 'HR'
and obj.object_type = 'TABLE'
group by obj.owner, obj.object_name, obj.object_type,
    tab.blocks, tab.buffer_pool
/   

OWNER      OBJECT_NAME          OBJECT_TYPE     CACHED_BLOCKS TOTAL_BLOCKS CACHE
---------- -------------------- --------------- ------------- ------------ -------
HR         EMPLOYEES            TABLE                       5            5 DEFAULT
HR         DEPARTMENTS          TABLE                       5            5 KEEP

Once again, this approach seems rather straight forward. You have total control over what goes in the Keep Cache so why not use it ?
On closer inspection, it becomes apparent that there may be some drawbacks.

For a start, the KEEP and RECYCLE caches are not automatically managed by Oracle. So, unlike the Default Buffer Cache, if the KEEP Cache finds it needs a bit more space then it’s stuck, it can’t “borrow” some from other caches in the SGA. The reverse is also true, Oracle won’t allocate spare memory from the KEEP Cache to other SGA components.
You also need to keep track of which tables you have assigned to the KEEP Cache. If the number of blocks in those tables is greater than the size of the cache, then you’re going to run the risk of blocks being aged out, with the potential performance degradation that that entails.

Conclusion

Oracle is pretty good at caching frequently used data blocks and thus minimizing the amount of physical I/O required to retrieve data from small, frequently used, reference tables.
If you find yourself in a position where you just have to persuade Oracle to keep data in the cache then the table CACHE property is probably your least worst option.
Creating a KEEP Cache does have the advantage of affording greater manual control over what is cached. The downside here is that it also requires some maintenance effort to ensure that you don’t assign too much data to it.
The other downside is that you are ring-fencing RAM that could otherwise be used for other SGA memory components.
Having said that, the options I’ve outlined here are all better than sticking a bolt through the neck of your application and writing your own database caching in PL/SQL.


Filed under: Oracle, SQL Tagged: alter system flush buffer_cache, autotrace, buffer cache, dba_objects, dbms_stats.gather_table_stats, Default Buffer cache, how to find the current small table threshold, Keep Cache, lru algorithm, small table threshold, Table cache property, v$bh, v$memory_dynamic_components, what tables are in the buffer cache, x$ksppcv, x$ksppi

What’s Special About Oracle ? Relational Databases and the Thick Database Paradigm

Fri, 2016-06-03 09:46

A wise man (or woman – the quote is unattributed) once said that assumption is the mother of all cock-ups.
This is especially true in the wonderful world of databases.
The term NoSQL covers databases as different from each other as they are from the traditional Relational Database Management Systems (RDBMS).
The assumption implicit in that last sentence is that Relational Databases are broadly the same.

The problems with this assumption begin to manifest themselves when a team is assembled to write a new application running on an Oracle RDBMS.

Non-Oracle developers may have been used to treating databases as merely a persistence layer. Their previous applications may well have been written to be Database Agnostic.
This is a term which is likely to cause consternation among Oracle Developers, or at least, Oracle Developers who have ever tried to implement and support a Database Agnostic application running on Oracle. They may well think of this approach as the “Big Skip” anti-pattern where the database is treated as a dumping ground for any old rubbish the application feels like storing.

As a consequence, they will strongly favour the application being “Front-End Agnostic”. In other words, they will lean toward the Thick Database Paradigm as a template for application architecture.
With all of this Agnosticism about it’s amazing how religious things can get as the relative merits of these opposing views are debated.

These diametrically opposing views on the optimum architecture for a database centric application all stem from that one assumption about Relational Databases.
To make things even more interesting, both sides in this debate share this assumption.
The fact of the matter is that Oracle is very different from other RDBMSs. Oracle Developers need to appreciate this so that they can accept that the Database Agnostic Architecture is a legitimate choice for some RDBMSs and is not simply the result of non-Oracle Developers not knowing anything about databases.
The other point to note is that Oracle is very different from other RDBMS – OK, it’s technically the same point, but it’s such an important one, it’s worth mentioning twice.
Non-Oracle Developers need to understand this so that they can accept that the Thick Database Paradigm is a legitimate choice for the Oracle RDBMS and not simply the result of technological parochialism on the part of Oracle Developers.

Whatever kind of developer you are, you’re probably wondering just what I’m banging on about right now and where this is going.
Well, the purpose of this post is to take several steps back from the normal starting point for the debate over the optimal application architecture for a Database Centric Application on Oracle and set out :

  • Why Relational Databases are different from each other
  • Why the Thick Database Approach can be particularly suited to Oracle
  • Under what circumstances this may not be the case

Hopefully, by the end I’ll have demonstrated to any non-Oracle Developers reading this that the Thick Database Paradigm is at least worth considering when developing this type of application when Oracle is the RDBMS.
I will also have reminded any Oracle Developers that Oracle is a bit different to other RDBMS and that this needs to be pointed out to their non-Oracle colleagues when the subject of application architecture is being discussed.
I will attempt to keep the discussion at a reasonably high-level, but there is the odd coding example.
Where I’ve included code, I’ve used the standard Oracle demo tables from the HR application.
There are several good articles that do dive into the technical nitty-gritty of the Thick Database Paradigm on Oracle and I have included links to some of them at the end of this post.

I can already hear some sniggering when the term Thick Database gets used. Yes, you there in the “Web Developers Do It Online” t-shirt.
In some ways it would be better to think of this as the Intelligent Database Paradigm, if only to cater to those with a more basic sense of humour.

Assumptions

Before I go too much further, I should really be clear about the assumptions all of this is based on.

Application Requirements

To keep things simple, I’m going to assume that our theoretical application implements some form On-Line Transaction Processing (OLTP) functionality.
Of course, I’m going to assume that Oracle is the chosen database platform (or at least, the one you’re stuck with).
Most importantly, I’m going to assume that the fundamental non-functional requirements of the application are :

  • Accuracy
  • Performance
  • Security
  • Maintainability
Terminology

On a not entirely unrelated topic, I should also mention some terms, when used in the context of the Oracle RDBMS, have a slightly different meanings to that you might expect…

  • database – normally a term used to describe the database objects in an application – in Oracle we’d call this a schema. This is because database objects in Oracle must be owned by a database user or schema.
  • stored procedure – it’s common practice in PL/SQL to collect procedures and functions into Packages – so you’ll often hear the term Packaged Procedures, Packages, or Stored Program Units to cover this
  • database object – this is simply any discrete object held in the database – tables, views, packages etc
  • transaction – by default, Oracle implements the default ANSI SQL behaviour that a transaction consists of one or more SQL statements. A transaction is normally terminated explicitly by the issuing of a COMMIT or a ROLLBACK command.
The HR Schema

This is normally pre-installed with every Oracle database, although your DBA may have removed it as part of the installation.
If you want to follow along and it’s not installed, you can find the build script for it in :

$ORACLE_HOME/demo/schema/human_resources/hr_main.sql

Note that the script requires you to provide the SYS password for the database.

I’ve created copies of two of the tables from this application, the EMPLOYEES and DEPARTMENTS tables, for use in the examples below.

Database Agnostic and Thick Database – definitions

To keep things simple, we can explain each in the context of the Model-View-Controller(MVC) design pattern.

In MVC, the application components are divided into three categories :

  • The View – the GUI
  • The Controller – where all of the application logic exists. This layer sits in the middle between the view and the…
  • Model – the persistence layer – traditionally a Relational Database implementing a Physical Data Model

The Database Agnostic Approach is to treat the Model simply as a persistence layer. Implementation of Referential Integrity in the database is minimal and the implementation of any business logic is done entirely in the Controller layer, little or none of which impinges upon the RDBMS in the Model.
The main idea behind this approach is that it is trivial to migrate the application from one RDBMS to another.

The Thick Database Paradigm takes a very different approach.
The Referential Integrity is rigorously applied in the RDBMS, and the Data Model is done in some approximation of Third Normal Form.
The Controller layer is in fact implemented as two physical layers.
The code outside of the database – the Data Access Layer (DAL) accesses the model by means of a Transactional API (XAPI) which is held in Stored Procedures inside the RDBMS engine itself.
We’re going to explore the advantages of this approach in the context of the Oracle RDBMS.

There we are then, something for everyone to object to.
The thing is, both of these approaches have their place. The trick is to know the circumstances under which one is more appropriate.
It may help at this point then, if we can return to the question of…

Why RDBMSs are different from each other

Maybe that heading should read “Are RDBMSs different from each other ?” Superficially at least, they do seem to have a fair bit in common.
To start with, they all implement the relational model to some degree. This means that data is arranged in tables and that (in the main) it is possible to define relationships between these tables.
For circumstances where a Business Transaction may require multiple DML statements, the RDBMS will enable the creation of Stored Procedures to enable such transactions to be done in a single call to the database.
The most obvious similarity is, of course, that any retrieval of or amendment to data stored in the database is ultimately done by means of a Structured Query Language (SQL) statement.

A fundamental characteristic of SQL is that it is a Declarative Language. You use it to tell the database what data you want to access. It is the Database Engine that then has to figure out how to do this.

Whilst the implementation of SQL is (more-or-less) standard across RDBMSs, the underlying Database Engines behave very differently.

One example of the differences between Database Engines can be seen when you need to execute a query that contains many table joins.
If you were running such a query on MSSQL, it may well be more efficient to do this in multiple steps. This would be done by writing a query to populate a temporary table with a result set and then joining from that table to return the final results.
This contrasts with Oracle, where the optimal approach is usually to do this with a single SQL statement.

For the moment, I’ll assume that the above has been sufficient to persuade you that Relational Databases are in fact different from each other in a fairly fundamental way.
Feel free to put this to the test yourself. Go ahead, I’ll wait….

OK. We’re all agreed on that then.
The question you’re now asking is this –

If RDBMSs are different from each other is the Database Agnostic approach the best architecture for all of them ?

The next thing we need to understand is….

Why Oracle is Special

“Because it’s so expensive !” may well be your first thought. Remember that we’re assuming that Oracle is the RDBMS platform that you have chosen ( or been lumbered with) for your application. This being the case, we come back to the question of why the Thick Database Paradigm is worthy of consideration for your application architecture.

Returning to our list of non-functional application requirements, can you guess which of the application components is likely to have the biggest impact on performance of an Oracle Database Application ? Clue : It’s also the most expensive thing to change after go-live as it’s the card at the bottom of the house of cards that is your Application….

The Physical Data Model

This aspect of the Thick Database Paradigm is often overlooked. However, it is by far the most important aspect in maximizing the success of the implementation of this architectural approach.

Oh, that’s your sceptical face, isn’t it. You’re really not entirely sure about this. You’re probably not alone, even some Oracle Developers will be giving me that same look about now. I hope this next bit is convincing because my flame-proof underpants are currently in the wash.

OK, as I said a little while ago ( and I think you pretty much agreed at the time), any interaction with stored in an RDBMS will ultimately require the execution of an SQL statement by the Database Engine.
The particular bit of the Oracle Kernel that works out the how is probably called KX$ something. Friends however, tend to refer to it as the Cost Based Optimizer (CBO).

The CBO is pretty sophisticated. The more information you can provide Oracle about your data model the better the execution plans the CBO generates.
The upshot is that the better the data model, the faster that statements against it will run.

For example, the CBO understands RI constraints and can account for them in it’s execution plans as I will now demonstrate…

I’ve copied the EMPLOYEES and DEPARTMENTS tables, including data, from the standard Oracle demo – the HR Application.

The DEPARTMENTS table looks like this :

create table departments
(
    department_id number(4) not null,
    department_name varchar2(30) not null,
    manager_id number(6),
    location_id number(4)
)
/

alter table departments
    add constraint departments_pk primary key( department_id)
/  

…and the EMPLOYEES like this :

create table employees
(
    employee_id number(6) not null,
    first_name varchar2(20),
    last_name varchar2(25) not null,
    email varchar2(25) not null,
    phone_number varchar2(20),
    hire_date date not null,
    job_id varchar2(10) not null,
    salary number(8,2),
    commission_pct number(2,2),
    manager_id number(6),
    department_id number(4)
)
/

alter table employees 
    add constraint employees_pk  primary key (employee_id)
/

Note that whilst DEPARTMENT_ID is listed in both tables I’ve not implemented any RI constraints at this point.

Now consider the following query

select emp.first_name, emp.last_name, dept.department_id
from employees emp
inner join departments dept
    on emp.department_id = dept.department_id
where emp.department_id = 60
/

If we ask the CBO for an execution plan for this query…

explain plan for
select emp.first_name, emp.last_name, dept.department_id
from employees emp
inner join departments dept
    on emp.department_id = dept.department_id
where emp.department_id = 60
/

… it will come back with something like this :

select *                  
from table(dbms_xplan.display)
/

Plan hash value: 2016977165

-------------------------------------------------------------------------------------
| Id  | Operation	   | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |		    |	  5 |	110 |	  3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS	   |		    |	  5 |	110 |	  3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN| DEPARTMENTS_PK |	  1 |	  4 |	  0   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMPLOYEES	    |	  5 |	 90 |	  3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPT"."DEPARTMENT_ID"=60)
   3 - filter("EMP"."DEPARTMENT_ID"=60)

16 rows selected.

If we now add a constraint to ensure that a DEPARTMENT_ID in the EMPLOYEES table must already exist in the DEPARTMENTS table…

alter table employees 
    add constraint emp_dept_fk foreign key (department_id) references departments(department_id)
/   

…and then get the execution plan…

explain plan for
select emp.first_name, emp.last_name, dept.department_id
from employees emp
inner join departments dept
    on emp.department_id = dept.department_id
where emp.department_id = 60
/

select *                  
from table(dbms_xplan.display)
/

Plan hash value: 1445457117

-------------------------------------------------------------------------------
| Id  | Operation	  | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	      |     5 |    90 |     3	(0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEES |     5 |    90 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMP"."DEPARTMENT_ID"=60)

13 rows selected.

…we can see that the CBO is smart enough to know that the RI constraint eliminates the need to read the DEPARTMENTS table at all for this query.

A sensible data model has some other key benefits.

For example…

insert into hr.employees
(
    employee_id, 
    first_name, 
    last_name,
    email, 
    hire_date, 
    job_id,
    department_id
)
values
(
    207,
    'MIKE',
    'S',
    'mikes',
    sysdate,
    'IT_PROG',
    999 -- department_id does not exist in the DEPARTMENTS table
)
/

…results in …

SQL Error: ORA-02291: integrity constraint (HR.EMP_DEPT_FK) violated - parent key not found

Simply by typing a one-line statement to add this constraint, we’ve prevented the possibility of orphaned records being added to our application.
Better still, this rule will be enforced however the data is added – no just records added via the application.

About now, non-Oracle developers may well be making the point that this logic needs to be implemented in the application code anyway. By adding it to the data model, aren’t we effectively coding the same functionality twice ?
Well, as we can see from the example above, the code required to create an RI constraint is minimal. Also, once it’s created, it exists in Oracle, there is no need to explicitly invoke it every time you need to use it.
Additionally, if you fully adopt the Thick Database approach, you don’t necessarily have to write code to re-implement rules enforced by constraints.

One other point that may well come up is the fact that most of the world now uses some variation of the Agile Development Methodology. Producing a complete data model in Sprint 1 is going to be a bit of a tall order for an application of even moderate complexity.
This is true. However, by implementing the Data Access Layer (DAL) pattern and separating the application code from the underlying data model, it’s possible to create stubs in place of parts of the data model that haven’t been developed. This does make it possible to fit Data Modelling into the structure required by these methodologies.

The key point here is that, even if this is the only bit of the thick db paradigm you implement your app will be far more maintainable.
The tuning tools at your disposal will be far more effective and useful if your application is based on a well defined, relational data model.

Whilst we’re on the subject of Application Code, it’s probably worth asking….

What are Stored Procedures Good For ?

In order to understand this, we need to look at the concept of a Database Transaction.
The ANSI Standard for SQL mandates that a transaction consists of one or more DML statements. In general terms, if the transaction is committed then all of the changes made by each statement in the transaction is saved. Otherwise, none of them are.
By default, many RDBMSs implement a transaction as a single SQL statement. In Oracle, the default behaviour conforms to the ANSI Standard.
In circumstances where a Business Transaction requires multiple DML statements, things can get a bit tricky without a Stored Procedure.
The application needs to issue multiple individual statements and commit each one in turn.
If a second or subsequent statement fails for any reason then you find that your data is left in an inconsistent state.
Stored Procedures solve this problem by bundling these statements up into a single transaction.
We’ll have a look at a specific example of this approach in Oracle using…

PL/SQL

The typical approach taken by vendors to implement Stored Procedures in an RDBMS involves providing some extensions to SQL to make it Turing Complete.
These extensions ( variable declaration, conditional statements, looping) are normally fairly minimal.
Oracle took a rather different approach with PL/SQL.
They took the ADA programming language and provided it with SQL extensions.
From the start then, PL/SQL was rather more fully featured than your average Stored Procedure language.
In the almost 30 years of it’s existence, PL/SQL has been further integrated within the RDBMS engine. Also, the addition of thousands of Oracle supplied libraries (packages) have extended it’s functionality to the point where it can be used for tasks as diverse as inter-session communication, backup and recovery, and sending e-mail.
Being a fully-fledged 3GL embedded into the heart of the database engine, PL/SQL is the fastest language for processing data in Oracle.
This is partly due to the fact that the code is co-located with the data, so network latency and bandwidth are not really an issue.
Yes, and you thought the idea of co-locating code and data was invented when those whizzy NoSQL databases came along, didn’t you ?
PL/SQL allows the developer to take a set-based approach to working with data. You can pretty much drop a SQL DML statement straight into a PL/SQL program without (necessarily) having to build it as a string first.
Furthermore, remember that transactions can encompass multiple database changes. By implementing these in PL/SQL, the entire transaction can be completed with a single database call, something that is not necessarily the case when the Controller code is outside of the database.
Implementing Business Transactions in PL/SQL is commonly done using the Transactional API (XAPI) pattern.

There is one particular aspect of ADA which has become central to the way that PL/SQL applications are written and that is the Package.
Rather than having lots of standalone procedures and functions, it is common practice to group these “stored procedures” into PL/SQL packages.
This approach has several advantages.
Grouping related functionality into packages reduces the number of individual programs you need to keep track of.
PL/SQL packages are stored in Oracle’s Database Catalogue ( the Data Dictionary) as two distinct objects – a Package Header or Specification – essentially the signature of all of the functions and procedures in the package ( package members) – and a Package Body – the actual code.
The Package Header is the object that is called to invoke a member procedure.
Provided you are not changing the signature of a public package member, you can amend the code in the package body without having to re-compile the header.
This means that you can make changes to the transactional code “under the hood” without necessarily requiring any re-coding in the caller to a packaged procedure.

Right, it’s time for an example.

Say we want to change the Manager of a Department. In this case, the current IT Department Manager – Alexander Hunold has decided that he’s far too busy to attend all of those planning meetings. I mean he took this job so he didn’t have to speak to anybody. You can tell he’s not really a people person, I mean just look at that T-shirt.
Diana Lorentz on the other hand, whilst also having the required technical background has a much better way with people.
So, in order to change the manager in the IT Department from Alexander to Diana we need to :

  1. Update the record in the DEPARTMENTS table with the ID of the new manager
  2. Update the EMPLOYEES records for members of that department so that they now report to the new manager
  3. Update the EMPLOYEES record for the new manager so that she now reports to the Department’s overall boss

Among other things, we’ll need to know which DEPARTMENT_ID we need to make these changes for. This would normally be selected from a drop-down list in the Application’s UI, with the name of the Department being displayed to the user but the ID being passed to our procedure.
Whilst the list of Departments is static/reference data and may well be cached on the mid-tier of our application to save repeated database calls, we’ll still need a means of getting this data out of the database initially.
Therefore, we may well have a package that contains two members :

  • a function to return the department information
  • a procedure to assign the new manager

Such a package will probably look something like this. First the Package Header…

create or replace package manage_departments
is
    --
    -- This is the package header or specification.
    -- It gives the signature of all public package members (functions and packages
    --
    function get_department_list return sys_refcursor;
    procedure change_manager
    (
        i_department_id departments.department_id%type,
        i_old_manager_id employees.employee_id%type,
        i_new_manager_id departments.manager_id%type
    );
end manage_departments;
/

… and now the body…

create or replace package body manage_departments
is
    --
    -- This is the package body.
    -- It contains the actual code for the functions and procedures in the package
    --
    function get_department_list return sys_refcursor
    is
        l_rc sys_refcursor;
    begin
        open l_rc for
            select department_name, department_id
            from departments;
        return l_rc;
    end get_department_list;
    
    procedure change_manager
    (
        i_department_id departments.department_id%type,
        i_old_manager_id employees.employee_id%type,
        i_new_manager_id departments.manager_id%type
    )
    is
        l_dept_head_manager employees.manager_id%type;
    begin
        --
        -- First update the department record with the new manager
        --
        update departments
        set manager_id = i_new_manager_id
        where department_id = i_department_id;

        -- Now find the Manager of the existing department head
        -- we'll need this to assign to the new department head
        --
        select manager_id 
        into l_dept_head_manager
        from employees
        where employee_id = i_old_manager_id;        
        --
        -- Now update all of the employees in that department to
        -- report to the new manager...apart from the new manager themselves
        -- who reports to the department head.
        update employees
        set manager_id = 
            case when employee_id != i_new_manager_id 
                then i_new_manager_id
                else l_dept_head_manager
            end
        where department_id = i_department_id;        
        --
        -- Note - for the purposes of simplicity I have not included any
        -- error handling.
        -- Additionally, best practice is normally to allow transaction control
        -- to be determined by the caller of a procedure so an explicit commit
        -- or rollback needs to take place there.
        --
    end change_manager;
end manage_departments;
/

Using the Oracle CLI, SQL*Plus to act as the caller, we can see how the function works :

set autoprint on
set pages 0
var depts refcursor
exec :depts := manage_departments.get_department_list

PL/SQL procedure successfully completed.

Administration				  10
Marketing				  20
Purchasing				  30
Human Resources 			  40
Shipping				  50
IT					  60
Public Relations			  70
Sales					  80
Executive				  90
Finance 				 100
Accounting				 110
Treasury				 120
Corporate Tax				 130
Control And Credit			 140
Shareholder Services			 150
Benefits				 160
Manufacturing				 170
Construction				 180
Contracting				 190
Operations				 200
IT Support				 210
NOC					 220
IT Helpdesk				 230
Government Sales			 240
Retail Sales				 250
Recruiting				 260
Payroll 				 270

27 rows selected.

Now we need to call the procedure to change the manager. In order to keep things simple, I’ve cheated a bit here and not included the code to lookup the EMPLOYEE_IDs of Alexander (103) and Diana ( 107).

So, using SQL*Plus once again :

exec manage_departments.change_manager(60, 103, 107)
commit;

NOTE – it is also possible (and often preferred) to pass parameters by reference when calling PL/SQL. So, the following code would work equally well ( and possibly be a bit more readable) :

exec manage_departments.change_manager( i_department_id => 60, i_old_manager_id => 103, i_new_manager_id =>; 107);
commit;

We can now see that both of the DML changes have been applied :

select emp.first_name||' '||emp.last_name, dept.manager_id
from departments dept
inner join employees emp
    on dept.manager_id = emp.employee_id
where dept.department_id = 60
/ 

EMP.FIRST_NAME||''||EMP.LAST_NAME	       MANAGER_ID
---------------------------------------------- ----------
Diana Lorentz					      107


select first_name, last_name, manager_id
from employees
where department_id = 60
/

FIRST_NAME	     LAST_NAME		       MANAGER_ID
-------------------- ------------------------- ----------
Alexander	     Hunold			      107
Bruce		     Ernst			      107
David		     Austin			      107
Valli		     Pataballa			      107
Diana		     Lorentz			      102

The fact that Packages are stored in the Data Dictionary means that Oracle automatically keeps track of the dependencies that they have on other database objects.
This makes impact analysis much easier. For example, if we were going to make a change to the DEPARTMENTS table, we could see what other database objects might be impacted by running the following query on the Data Dictionary :

select name, type
from user_dependencies
where referenced_name = 'DEPARTMENTS'
and referenced_type = 'TABLE'
/

NAME			       TYPE
------------------------------ ------------------
MANAGE_DEPARTMENTS	       PACKAGE
MANAGE_DEPARTMENTS	       PACKAGE BODY

One more significant benefit of using PL/SQL is that any parameters passed into a stored procedure – whether part of a package or standalone – are automatically bound.
Bind variables are advantageous for two reasons.
Firstly, use of them enables Oracle to re-execute frequently invoked statements from memory, without having to re-validate them each time. This is known as a soft parse. This offers significant performance benefits.
The second, and perhaps more important advantage is that bind variables tend not to be susceptible to SQL Injection strings.
Effectively, calling a PL/SQL stored program unit is the equivalent of making a Prepared Statement call.
Whilst this automatic binding does not render PL/SQL completely immune from SQL Injection, it does greatly reduce the attack surface for this kind of exploit.

In-Memory Processing

In-Memory processing is big at the moment. It’s one of those things like Big Data in that there is lots of enthusiasm around something which, to be frank, has already been happening for many years.
Oracle has some rather sophisticated memory management out of the box.
As already mentioned, SQL and PL/SQL code that is frequently executed, together with the meta-data required to parse it, is cached in memory.
The same is true for frequently used data blocks. In other words, if you have data that is frequently accessed, Oracle will look to store this in memory, thus reducing the amount of physical I/O it needs to do.
This has nothing to do with Oracle’s newfangled “In-memory” option. It’s a core part of the product.
Generally speaking, the more application code you add to the RDBMS, the more efficiently Oracle will work.

Benefits of the Thick Database Paradigm

When measured against the non-functional requirements for our application, the Thick Database approach ticks all of the boxes.

Accuracy

Referential Integrity in the Data Model means that we can prevent incorrect data from being stored.
The flexibility of PL/SQL and it’s close coupling with SQL means that we can easily implement business rules to ensure system accuracy.
By implementing a XAPI layer in PL/SQL, we ensure that there is a single point of entry into the application. Because business transactions always execute the same code, we can ensure that the results of those transactions are repeatable, and accurate.

Performance

As we have seen, a well-defined Data Model allows the CBO to choose the optimum execution plan for each query.
The use of bind variables ensures that frequently executed statements are cached in memory.
The fact that most of the processing happens inside the database engine means that network latency is minimized as a performance overhead.
By it’s very nature, any application that manipulates and stores data will increase the amount of data it handles over time.
This increase in data volumes will start to affect performance.
Oracle is designed and optimized to handle data stored in relational structures. Having a properly defined data model will enable you to maximise the effectiveness of the tuning tools at your disposal.

Maintainability

Having your application code in a single location ( i.e. the PL/SQL XAPI layer) means that code is not replicated across multiple application layers.
As PL/SQL is tightly coupled with SQL, it also means that you tend to need fewer lines of code to implement application functionality.
Having the application code in the database means that dependency tracking comes “for free” by means of the Data Dictionary.
This is especially handy when doing Impact Analysis on any application changes that may be required down the line.

Security

PL/SQL parameters are bound auto-magically. Unless you’re being careless with some dynamic SQL inside of the PL/SQL code itself, these parameters are pretty much immune to SQL Injection.

Still feeling sceptical after reading that ? Good. Whilst I have provided some evidence to support these assertions, it’s not what you’d call incontrovertible.
But I’m getting ahead of myself. Before summarising, I did say that there may be some circumstances where this approach may not be suitable…

When the Thick Database Paradigm may not be appropriate

By it’s very nature the Thick Database approach on Oracle RDBMS puts an Application smack in the middle of an Oracle “walled garden”.
If you ever want to migrate to another RDBMS, the task is unlikely to be straight forward.
Yes, PostgresSQL is similar in nature to PL/SQL. As I’ve never attempted a migration from Oracle to Postgres, I can’t comment on whether this lessens the effort required.

So, if you’re in a situation where you know that your application will need to move to another RDBMS in the short term, the pain of sub-optimal performance on Oracle may be worth the gain when you come to do the migration.
A word of warning here – I have personal experience of applications that we’re only supposed to be on Oracle for six months after Go-live…and we’re still in Production several years later.

Alternatively, you may be a software vendor who needs to support your application across multiple database platforms.
The benefit of having a single code base for all supported platforms may outweigh the overhead of the additional effort required to address the issues that will almost certainly arise when running a Database Agnostic application on an Oracle RDBMS.
If you do find yourself in this situation then you may consider recommending a database other than Oracle to your clients.

It is worth pointing out however, that in either case, a well-designed physical data model where Referential Integrity is enforced by means of constraints will provide substantial mitigation to some of the performance issues you may encounter.

This is certainly not going to help with an application using the Entity-Attribute-Value (EAV) model.
I would suggest that if EAV is absolutely essential to your solution then a Relational Database almost certainly isn’t.

Summary and Further Reading

If you’ve made it this far, I hope that you have at least been persuaded that the Thick Database Paradigm is not a completely bonkers way of writing an application against an Oracle database.
That’s not to say that you’re sold on the idea by any means. As I’ve said already, what I’ve attempted to do here is provide some illustrations as to why this approach is preferred among Oracle Developers. It’s not cast-iron proof that this is the case with your specific application.
What you’ll probably want to do now is read up a bit more on this approach following which, you may well want to do some testing to see if all of these claims stack up.

So, if you want some proper, in-depth technical discussions on the Thick Database Paradigm, these links may be of some use :

If and when you do come to do some testing, it’s important to remember that the benefits of the Thick Database approach – certainly in performance terms – become more apparent the greater the volume of data and transactions the application needs to handle.
Running performance tests against the tiny HR application that I’ve used here is probably not going to tell you too much.


Filed under: Oracle, PL/SQL, SQL Tagged: CBO, database agnostice, dbms_xplan, Ref cursors, Referential Integrity, thick database paradigm, USER_DEPENDENCIES

Null is Odd…or Things I Used to Know about SQL Aggregate Functions

Mon, 2016-05-16 15:03

Brendan McCullum recently played his final Test for New Zealand.
That’s something of an understatement. In his last game he made a century in a mere 54 balls, a feat unmatched in 139 years of test cricket.
From the outside looking in, it seemed that McCullum had come to realise something he’d always known. Playing cricket is supposed to be fun.
What’s more, you can consider yourself quite fortunate if you get paid for doing something you enjoy, especially when that something is hitting a ball with a stick.

With the help of Mr McCullum, what follows will serve to remind me of something I’ve always known but may forget from time to time.
In my case, it’s the fact that NULL is odd. This is especially true when it comes to basic SQL aggregation functions.

Some test data

We’ve got a simple table that holds the number of runs scored by McCullum in each of his Test innings together with a nullable value to indicate whether or not he was dismissed in that innings.

This is relevant because one of the things we’re going to do is calculate his batting average.

In Cricket, the formula for this is :

Runs Scored / (Innings Batted – Times Not Out)

Anyway, here’s the table :

create table mccullum_inns
(
    score number not null,
    not_out number
)
/

…and the data…

insert into mccullum_inns( score,not_out) values (57, null);
insert into mccullum_inns( score,not_out) values (19, 1);
insert into mccullum_inns( score,not_out) values (13, null);
insert into mccullum_inns( score,not_out) values (55, null);
insert into mccullum_inns( score,not_out) values (3, null);
insert into mccullum_inns( score,not_out) values (5, null);
insert into mccullum_inns( score,not_out) values (96, null);
insert into mccullum_inns( score,not_out) values (54, null);
insert into mccullum_inns( score,not_out) values (20, null);
insert into mccullum_inns( score,not_out) values (21, null);
insert into mccullum_inns( score,not_out) values (4, null);
insert into mccullum_inns( score,not_out) values (143, null);
insert into mccullum_inns( score,not_out) values (17, 1);
insert into mccullum_inns( score,not_out) values (10, null);
insert into mccullum_inns( score,not_out) values (8, null);
insert into mccullum_inns( score,not_out) values (10, null);
insert into mccullum_inns( score,not_out) values (36, null);
insert into mccullum_inns( score,not_out) values (29, null);
insert into mccullum_inns( score,not_out) values (24, null);
insert into mccullum_inns( score,not_out) values (3, null);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (99, null);
insert into mccullum_inns( score,not_out) values (7, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (111, null);
insert into mccullum_inns( score,not_out) values (24, null);
insert into mccullum_inns( score,not_out) values (19, null);
insert into mccullum_inns( score,not_out) values (74, null);
insert into mccullum_inns( score,not_out) values (23, null);
insert into mccullum_inns( score,not_out) values (31, null);
insert into mccullum_inns( score,not_out) values (33, null);
insert into mccullum_inns( score,not_out) values (5, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (5, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (14, 1);
insert into mccullum_inns( score,not_out) values (43, null);
insert into mccullum_inns( score,not_out) values (17, null);
insert into mccullum_inns( score,not_out) values (9, null);
insert into mccullum_inns( score,not_out) values (26, null);
insert into mccullum_inns( score,not_out) values (13, null);
insert into mccullum_inns( score,not_out) values (21, null);
insert into mccullum_inns( score,not_out) values (7, null);
insert into mccullum_inns( score,not_out) values (40, null);
insert into mccullum_inns( score,not_out) values (51, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (85, null);
insert into mccullum_inns( score,not_out) values (9, null);
insert into mccullum_inns( score,not_out) values (42, null);
insert into mccullum_inns( score,not_out) values (97, null);
insert into mccullum_inns( score,not_out) values (24, null);
insert into mccullum_inns( score,not_out) values (11, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (9, null);
insert into mccullum_inns( score,not_out) values (71, null);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (2, null);
insert into mccullum_inns( score,not_out) values (66, null);
insert into mccullum_inns( score,not_out) values (8, null);
insert into mccullum_inns( score,not_out) values (3, null);
insert into mccullum_inns( score,not_out) values (30, null);
insert into mccullum_inns( score,not_out) values (84, 1);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (31, null);
insert into mccullum_inns( score,not_out) values (19, null);
insert into mccullum_inns( score,not_out) values (3, null);
insert into mccullum_inns( score,not_out) values (84, null);
insert into mccullum_inns( score,not_out) values (115, null);
insert into mccullum_inns( score,not_out) values (24, null);
insert into mccullum_inns( score,not_out) values (6, null);
insert into mccullum_inns( score,not_out) values (1, null);
insert into mccullum_inns( score,not_out) values (29, null);
insert into mccullum_inns( score,not_out) values (18, null);
insert into mccullum_inns( score,not_out) values (13, null);
insert into mccullum_inns( score,not_out) values (78, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (24, null);
insert into mccullum_inns( score,not_out) values (89, null);
insert into mccullum_inns( score,not_out) values (185, null);
insert into mccullum_inns( score,not_out) values (19, 1);
insert into mccullum_inns( score,not_out) values (24, null);
insert into mccullum_inns( score,not_out) values (104, null);
insert into mccullum_inns( score,not_out) values (5, null);
insert into mccullum_inns( score,not_out) values (51, null);
insert into mccullum_inns( score,not_out) values (65, null);
insert into mccullum_inns( score,not_out) values (11, 1);
insert into mccullum_inns( score,not_out) values (4, null);
insert into mccullum_inns( score,not_out) values (225, null);
insert into mccullum_inns( score,not_out) values (40, null);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (56, null);
insert into mccullum_inns( score,not_out) values (35, null);
insert into mccullum_inns( score,not_out) values (2, null);
insert into mccullum_inns( score,not_out) values (64, null);
insert into mccullum_inns( score,not_out) values (14, null);
insert into mccullum_inns( score,not_out) values (11, null);
insert into mccullum_inns( score,not_out) values (34, null);
insert into mccullum_inns( score,not_out) values (1, null);
insert into mccullum_inns( score,not_out) values (16, null);
insert into mccullum_inns( score,not_out) values (12, null);
insert into mccullum_inns( score,not_out) values (83, null);
insert into mccullum_inns( score,not_out) values (48, null);
insert into mccullum_inns( score,not_out) values (58, 1);
insert into mccullum_inns( score,not_out) values (61, null);
insert into mccullum_inns( score,not_out) values (5, null);
insert into mccullum_inns( score,not_out) values (31, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (84, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (19, null);
insert into mccullum_inns( score,not_out) values (22, null);
insert into mccullum_inns( score,not_out) values (42, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (23, null);
insert into mccullum_inns( score,not_out) values (68, null);
insert into mccullum_inns( score,not_out) values (13, null);
insert into mccullum_inns( score,not_out) values (4, null);
insert into mccullum_inns( score,not_out) values (35, null);
insert into mccullum_inns( score,not_out) values (7, null);
insert into mccullum_inns( score,not_out) values (51, null);
insert into mccullum_inns( score,not_out) values (13, null);
insert into mccullum_inns( score,not_out) values (11, null);
insert into mccullum_inns( score,not_out) values (74, null);
insert into mccullum_inns( score,not_out) values (69, null);
insert into mccullum_inns( score,not_out) values (38, null);
insert into mccullum_inns( score,not_out) values (67, 1);
insert into mccullum_inns( score,not_out) values (2, null);
insert into mccullum_inns( score,not_out) values (8, null);
insert into mccullum_inns( score,not_out) values (20, null);
insert into mccullum_inns( score,not_out) values (1, null);
insert into mccullum_inns( score,not_out) values (21, null);
insert into mccullum_inns( score,not_out) values (22, null);
insert into mccullum_inns( score,not_out) values (11, null);
insert into mccullum_inns( score,not_out) values (113, null);
insert into mccullum_inns( score,not_out) values (9, null);
insert into mccullum_inns( score,not_out) values (37, null);
insert into mccullum_inns( score,not_out) values (12, null);
insert into mccullum_inns( score,not_out) values (224, null);
insert into mccullum_inns( score,not_out) values (1, null);
insert into mccullum_inns( score,not_out) values (8, null);
insert into mccullum_inns( score,not_out) values (302, null);
insert into mccullum_inns( score,not_out) values (7, null);
insert into mccullum_inns( score,not_out) values (17, null);
insert into mccullum_inns( score,not_out) values (4, null);
insert into mccullum_inns( score,not_out) values (3, null);
insert into mccullum_inns( score,not_out) values (31, null);
insert into mccullum_inns( score,not_out) values (25, null);
insert into mccullum_inns( score,not_out) values (18, null);
insert into mccullum_inns( score,not_out) values (39, null);
insert into mccullum_inns( score,not_out) values (43, null);
insert into mccullum_inns( score,not_out) values (45, null);
insert into mccullum_inns( score,not_out) values (202, null);
insert into mccullum_inns( score,not_out) values (195, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (22, null);
insert into mccullum_inns( score,not_out) values (42, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (41, null);
insert into mccullum_inns( score,not_out) values (55, null);
insert into mccullum_inns( score,not_out) values (6, null);
insert into mccullum_inns( score,not_out) values (80, null);
insert into mccullum_inns( score,not_out) values (27, null);
insert into mccullum_inns( score,not_out) values (4, null);
insert into mccullum_inns( score,not_out) values (20, null);
insert into mccullum_inns( score,not_out) values (75, null);
insert into mccullum_inns( score,not_out) values (17, 1);
insert into mccullum_inns( score,not_out) values (18, null);
insert into mccullum_inns( score,not_out) values (18, null);
insert into mccullum_inns( score,not_out) values (0, null);
insert into mccullum_inns( score,not_out) values (10, null);
insert into mccullum_inns( score,not_out) values (145, null);
insert into mccullum_inns( score,not_out) values (25, null);
commit;

I’ve loaded this into my Oracle 11gXE Enterprise Edition database.

Don’t count on COUNT()

Let’s just check the number of rows in the table :

select count(*), count(score), count(not_out)
from mccullum_inns
/

  COUNT(*) COUNT(SCORE) COUNT(NOT_OUT) 
---------- ------------ --------------
       176          176             9
       

Hmmm, that’s interesting. Whilst there are 176 rows in the table, a count of the NOT_OUT column only returns 9, which is the number of rows with a non-null value in this column.

The fact is that COUNT(*) behaves a bit differently from COUNT(some_column)…

with stick as
(
    select 1 as ball from dual
    union all select 2 from dual
    union all select null from dual
)    
select count(*), count(ball)
from stick
/

COUNT(*)                             COUNT(BALL)
---------- ---------------------------------------
         3                                       2

Tanel Poder provides the explanation as to why this happens here.
Due to this difference in behaviour, you may well consider that COUNT(*) is a completely different function to COUNT(column), at least where NULLS are concerned.

When all else fails, Read the Manual

From very early on, database developers learn to be wary of columns that may contain null values and code accordingly, making frequent use of the NVL function.
However, aggregate functions can prove to be something of a blind spot. This can lead to some interesting results.
Whilst we know ( and can prove) that NULL + anything equals NULL…

select 3 + 1 + 4 + 1 + null as ball
from dual
/

     BALL
----------
         

…if we use an aggregate function…

with stick as 
(
    select 3 as ball from dual
    union all select 1 from dual
    union all select 4 from dual
    union all select 1 from dual
    union all select null from dual
)
select sum(ball) 
from stick
/

SUM(BALL)
----------
         9

…so, calculating an average may well lead to some confusion…

with stick as 
(
    select 3 as ball from dual
    union all select 1 from dual
    union all select 4 from dual
    union all select 1 from dual
    union all select null from dual
)
select avg(ball)
from stick
/

AVG(BALL)
----------
      2.25

…which is not what we would expect given :

with stick as 
(
    select 3 as ball from dual
    union all select 1 from dual
    union all select 4 from dual
    union all select 1 from dual
    union all select null from dual
)
select sum(ball)/count(*) as Average
from stick
/

   AVERAGE
----------
       1.8 

You can see similar behaviour with the MAX and MIN functions :

with stick as 
(
    select 3 as ball from dual
    union all select 1 from dual
    union all select 4 from dual
    union all select 1 from dual
    union all select null from dual
)
select max(ball), min(ball)
from stick
/

 MAX(BALL)  MIN(BALL)
---------- ----------
         4          1

Looking at the documentation, we can see that :

“All aggregate functions except COUNT(*), GROUPING, and GROUPING_ID ignore nulls. You can use the NVL function in the argument to an aggregate function to substitute a value for a null. COUNT and REGR_COUNT never return null, but return either a number or zero. For all the remaining aggregate functions, if the data set contains no rows, or contains only rows with nulls as arguments to the aggregate function, then the function returns null.”

So, if we want our aggregate functions to behave themselves, or at least, behave as we might expect, we need to account for situations where the column on which they are operating may be null.
Returning to COUNT…

select count(nvl(not_out, 0)) 
from mccullum_inns
/

                 COUNT(NVL(NOT_OUT,0))
---------------------------------------
                                    176
                    

Going back to our original task, i.e. finding McCullum’s final batting average, we could do this :

select count(*) as Inns, 
    sum(score) as Runs,
    sum(nvl(not_out,0)) as "Not Outs",
    round(sum(score)/(count(*) - count(not_out)),2) as Average
from mccullum_inns
/

INNS  RUNS   Not Outs AVERAGE
----- ----- ---------- -------
  176  6453          9   38.64
  

However, now we’ve re-learned how nulls are treated by aggregate functions, we could save ourselves a bit of typing…

select count(*) as Inns, 
    sum(score) as Runs,
    count(not_out) as "Not Outs",
    round(sum(score)/(count(*) - count(not_out)),2) as Average
from mccullum_inns
/

INNS  RUNS   Not Outs AVERAGE
----- ----- ---------- -------
  176  6453          9   38.64

Time to draw stumps.


Filed under: Oracle, SQL Tagged: avg, count(*), max, min, null and aggregate functions, NVL, sum

Standard SQL ? – Oracle REGEXP_LIKE

Thu, 2016-04-28 06:55

Is there any such thing as ANSI Standard SQL ?
Lots of databases claim to conform to this standard. Recent experience tends to make me wonder whether it’s more a just basis for negotiation.
This view is partly the result of having to juggle SQL between three different SQL parsers in the Cloudera Hadoop infrastructure, each with their own “quirks”.
It’s worth remembering however, that SQL differs across established Relational Databases as well, as a recent question from Simon (Teradata virtuoso and Luton Town Season Ticket Holder) demonstrates :

Is there an Oracle equivalent of the Teradata LIKE ANY operator when you want to match against a list of patterns, for example :

like any ('%a%', '%b%')

In other words, can you do a string comparison, including wildcards, within a single predicate in Oracle SQL ?

The short answer is yes, but the syntax is a bit different….

The test table

We’ve already established that we’re not comparing apples with apples, but I’m on a bit of a health kick at the moment, so…

create table fruits as
    select 'apple' as fruit from dual
    union all
    select 'banana' from dual
    union all
    select 'orange' from dual
    union all
    select 'lemon' from dual
/
The multiple predicate approach

Traditionally the search statement would look something like :

select fruit
from fruits
where fruit like '%a%'
or fruit like '%b%'
/

FRUIT 
------
apple 
banana
orange

REGEXP_LIKE

Using REGEXP_LIKE takes a bit less typing and – unusually for a regular expression – less non-alphanumeric characters …

select fruit
from fruits
where regexp_like(fruit, '(a)|(b)')
/

FRUIT 
------
apple 
banana
orange

We can also search for multiple substrings in the same way :

select fruit
from fruits
where regexp_like(fruit, '(an)|(on)')
/

FRUIT 
------
banana
orange
lemon 

I know, it doesn’t feel like a proper regular expression unless we’re using the top row of the keyboard.

Alright then, if we just want to get records that start with ‘a’ or ‘b’ :

select fruit
from fruits
where regexp_like(fruit, '(^a)|(^b)')
/

FRUIT 
------
apple 
banana

If instead, we want to match the end of the string…

select fruit
from fruits
where regexp_like(fruit, '(ge$)|(on$)')
/

FRUIT
------
orange
lemon

…and if you want to combine searching for patterns at the start, end or anywhere in a string, in this case searching for records that

  • start with ‘o’
  • or contain the string ‘ana’
  • or end with the string ‘on’

select fruit
from fruits
where regexp_like(fruit, '(^o)|(ana)|(on$)')
/

FRUIT
------
banana
orange
lemon

Finally on this whistle-stop tour of REGEXP_LIKE, for a case insensitive search…

select fruit
from fruits
where regexp_like(fruit, '(^O)|(ANA)|(ON$)', 'i')
/

FRUIT
------
banana
orange
lemon

There’s quite a bit more to regular expressions in Oracle SQL.
For a start, here’s an example of using REGEXP_LIKE to validate a UK Post Code.
There’s also a comprehensive guide here on the PSOUG site.
Now I’ve gone through all that fruit I feel healthy enough for a quick jog… to the nearest pub.
I wonder if that piece of lime they put in top of a bottle of beer counts as one of my five a day ?


Filed under: Oracle, SQL Tagged: regexp_like

Getting the current SQL statement from SYS_CONTEXT using Fine Grained Auditing

Sun, 2016-04-17 14:44

The stand-off between Apple and the FBI has moved on. In essence both sides have taken it in turns to refuse to tell each other how to hack an iPhone.

Something else that tends to tell little or nothing in the face of repeated interrogation is SYS_CONTEXT(‘userenv’, ‘current_sql’).
If you’re fortunate enough to be running on Enterprise Edition however, a Fine Grained Auditing Policy will loosen it’s tongue.

Consider the following scenario.
You’ve recently got a job as a database specialist with Spectre.
They’ve been expanding their IT department recently as the result of their “Global Surveillance Initiative”.

There’s not much of a view from your desk as there are no windows in the hollowed out volcano that serves as the Company’s HQ.
The company is using Oracle 12c Enterprise Edition.

Everything seems to be going along nicely until you suddenly get a “request” from the Head of Audit, a Mr Goldfinger.
The requirement is that any changes to employee data in the HR system are recorded, together with the statement executed to change each record.
Reading between the lines, you suspect that Mr White – head of HR – is not entirely trusted by the hierarchy.

Whilst journalling triggers are common enough, capturing the actual SQL used to make DML changes is a bit more of a challenge.
Explaining this to Mr Goldfinger is unlikely to be a career-enhancing move. You’re going to have to be a bit creative if you want to avoid the dreaded “Exit Interview” (followed by a visit to the Piranha tank).

First of all though….

Fine Grained Auditing Configuration

You need to do a quick check to make sure that Fine Grained Auditing is available and configured in the way you would expect.

Access to Fine Grained Auditing

FGA is a feature of Oracle Enterprise Edition.
If you were working on any other edition of the database, Oracle would tell you that FGA is not enabled. For example, running the following on Oracle Express Edition 11g…

begin
    dbms_fga.add_policy
    (
        object_schema => 'HR',
        object_name => 'DEPARTMENTS',
        policy_name => 'WATCHING YOU',
        audit_condition => null,
        statement_types => 'INSERT, UPDATE, DELETE'
    );
end;
/

… will result in the Oracle Database telling you your fish-food …

ERROR at line 1:
ORA-00439: feature not enabled: Fine-grained Auditing
ORA-06512: at "SYS.DBMS_FGA", line 20
ORA-06512: at line 2

You can avoid this embarrassment simply by checking what edition of Oracle you’re running :

select banner
from v$version
/

In the case of Oracle 12c, you’ll get :

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE	12.1.0.2.0	Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

If you don’t happen to work for a worldwide crime syndicate and/or don’t have access to an Enterprise Edition database, you can still have a play around by means of a Developer Day Virtual Box image.

Unified Auditing

The other thing you need to check is just where audit records are going to be written to. This is not so much a requirement for the solution being implemented here, but it is relevant to some of the examples that follow.

By default, unified auditing is not implemented in 12c and you can confirm this by running :

select value
from v$option
where parameter = 'Unified Auditing'
/

If the query returns FALSE, then Unified Auditing has not been enabled.
Otherwise, it’s probably worth taking a look at the documentation to see how this affects auditing behaviour in the database.

Initialization Parameters

Assuming Unified Auditing has not been configured, the location of the audit records will be dictated by the AUDIT_TRAIL initialization parameter. You can check this value as follows :

select value
from v$parameter
where name = 'audit_trail'
/

If the value is set to DB, or DB, EXTENDED then any FGA policies should write to the tables mentioned below.

Now to take a closer look at FGA…

How long before SYS_CONTEXT cracks ?

To test exactly when you will be able to retrieve the DML statement you’re interested in, you can knock up a quick test.

First, you need a table to audit against for testing purposes :

create table trigger_messages
(
    message varchar2(4000)
)
/

Next, a simple procedure to insert a record :

create or replace procedure add_message( i_msg in trigger_messages.message%type)
is
begin
    insert into trigger_messages(message) values( i_msg);
end;
/

Now for a DML trigger on the table :

create or replace trigger trg_msg
    for insert or update or delete 
    on trigger_messages
    compound trigger
    
    l_action varchar2(10);
    before statement is
    begin
        l_action := case when inserting then 'INSERT' when updating then 'UPDATE' else 'DELETE' end;
        dbms_output.put_line('Before Statement '||l_action);
        dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
    end before statement;
    
    before each row is
    begin
        dbms_output.put_line('Before Row '||l_action);
        dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
    end before each row;
    
    after each row is
    begin
        dbms_output.put_line('After Row '||l_action);
        dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
    end after each row;
    
    after statement is
    begin
        dbms_output.put_line('After Statement '||l_action);
        dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
    end after statement;
end trg_msg;
/

Next up, you need a procedure to serve as a handler for a Fine Grained Auditing event. The reason for this will become apparent when we run the test. Note that the signature for an FGA handler procedure is mandated :

create or replace procedure trg_msg_fga
(
    object_schema varchar2,
    object_name varchar2,
    policy_name varchar2
)
is
begin
    dbms_output.put_line('FGA Policy');
    dbms_output.put_line(sys_context('userenv', 'current_sql'));
    dbms_output.put_line(sys_context('userenv', 'current_bind'));
    dbms_output.put_line(sys_context('userenv', 'current_sql_length'));
end;
/

Now all that’s left to do is to create an FGA policy on the table :

begin
    dbms_fga.add_policy
    (
        object_schema => 'MIKE',
        object_name => 'TRIGGER_MESSAGES',
        policy_name => 'FIRING_ORDER',
        statement_types => 'INSERT, UPDATE, DELETE',
        handler_schema => 'MIKE',
        handler_module => 'TRG_MSG_FGA'
    );
end;
/

You can confirm that the policy has been created successfully and is enabled by querying DBA_AUDIT_POLICIES…

select object_schema, object_name, enabled,
    sel, ins, upd, del
from dba_audit_policies
where policy_owner = user
and policy_name = 'FIRING_ORDER'
/

OBJECT_SCHEMA	OBJECT_NAME	     ENABLED	SEL   INS   UPD   DEL
--------------- -------------------- ---------- ----- ----- ----- -----
MIKE		TRIGGER_MESSAGES     YES	NO    YES   YES   YES

Now you’re ready to test…

set serveroutput on size unlimited

begin 
    add_message('Spectre - at the cutting-edge of laser technology');
end;
/

update trigger_messages set message = 'Spectre - coming to a browser near you'
/

delete from trigger_messages
/

The results are quite interesting…

Before Statement INSERT
My lips are sealed
Before Row INSERT
My lips are sealed
FGA Policy
INSERT INTO TRIGGER_MESSAGES(MESSAGE) VALUES( :B1 )
#1(49):Spectre - at the cutting-edge of laser technology
51
After Row INSERT
My lips are sealed
After Statement INSERT
My lips are sealed

PL/SQL procedure successfully completed.

Before Statement UPDATE
My lips are sealed
Before Row UPDATE
My lips are sealed
FGA Policy
update trigger_messages set message = 'Spectre - coming to a browser near you'
78
After Row UPDATE
My lips are sealed
After Statement UPDATE
My lips are sealed

1 row updated.

Before Statement DELETE
My lips are sealed
Before Row DELETE
My lips are sealed
After Row DELETE
My lips are sealed
FGA Policy
delete from trigger_messages
28
After Statement DELETE
My lips are sealed

1 row deleted.

From this you conclude that :

  • sys_context is only populated with the current statement inside the fga handler procedure
  • the handler procedure is invoked prior to the after row event for inserts and updates, but not for deletes

At this point, you consider that it might just be simpler to interrogate the DBA_FGA_AUDIT_TRAIL view, which has also captured the DML statements we’ve just run :

select sql_text
from dba_fga_audit_trail
where policy_name = 'FIRING_ORDER'
order by timestamp
/  

SQL_TEXT
----------------------------------------------------------------------------------------------------------------------------------
INSERT INTO TRIGGER_MESSAGES(MESSAGE) VALUES( :B1 )
update trigger_messages set message = 'Spectre - coming to a browser near you'
delete from trigger_messages

Note – the bind values for the procedure call can be found in the SQL_BIND column of this view.

However, it’s worth noting that we haven’t actually commited the test transaction yet these records are still present.
They will remain there, even if the transaction is rolled back.

In the end, you decide that the best approach is a journalling trigger…

The Unnecessarily Slow Dipping Mechanism – the DML trigger

Due to the nature of the organization, Spectre doesn’t have employees. It has associates. This is reflected in the table that you need to audit :

create table associates
(
    emp_id number,
    emp_name varchar2(100),
    job_title varchar2(30)
)
/

The table to hold the audit trail will probably look something like this :

create table assoc_audit
(
    action varchar2(6),
    changed_by varchar2(30),
    change_ts timestamp,
    emp_id number,
    emp_name varchar2(100),
    job_title varchar2(30),
    statement varchar2(4000),
    binds varchar2(4000)
)
/

It’s worth pausing at this point to note that SYS_CONTEXT can report up to 32k of a statement.
It does this by splitting the statement into eight 4k chunks, available in the USERENV context variables CURRENT_SQL, CURRENT_SQL1…CURRENT_SQL7.
It also provides the length of the statement it currently holds in the CURRENT_SQL_LENGTH variable.
Therefore, you may consider having a 32k varchar statement column in the audit table ( if this is enabled on your database), or even a column for the contents of each of these variables.
For the sake of simplicity, plus the fact that none of the examples here are very large, you decide to stick with just the one 4k varchar column to hold the statement.

There’s a procedure for adding new records to the table :

create or replace procedure add_associate
(
    i_emp_id in associates.emp_id%type,
    i_name in associates.emp_name%type,
    i_job_title in associates.job_title%type
)
is
begin
    insert into associates( emp_id, emp_name, job_title)
    values( i_emp_id, i_name, i_job_title);
end;
/
    

In the real world this would probably be in a package, but hey, you’re working for Spectre.

Now we need a handler for the FGA policy that we’re going to implement. In order for the context values that are captured to be accessible to the trigger, this handler is going to be part of a package which includes a couple of package variables :

create or replace package assoc_fga_handler
as

    g_statement varchar2(4000);
    g_binds varchar2(4000);
    
    -- The procedure to be invoked by the FGA policy.
    -- Note that the signature for this procedure is mandatory
    procedure set_statement
    (
        object_schema varchar2,
        object_name varchar2,
        policy_name varchar2
    );
end assoc_fga_handler;
/

create or replace package body assoc_fga_handler
as

    procedure set_statement
    (
        object_schema varchar2,
        object_name varchar2,
        policy_name varchar2
    )
    is
    begin
        g_statement := sys_context('userenv', 'current_sql');
        g_binds := sys_context('userenv', 'current_bind');
    end set_statement;
end assoc_fga_handler;
/

Now for the trigger. You may notice some compromises here …

create or replace trigger assoc_aud
    for insert or update or delete on associates
    compound trigger

    type typ_audit is table of assoc_audit%rowtype index by pls_integer;
    tbl_audit typ_audit;
    l_idx pls_integer := 0;
    
    after each row is
    begin
        l_idx := tbl_audit.count + 1;
        tbl_audit(l_idx).action := case when inserting then 'INSERT' when updating then 'UPDATE' else 'DELETE' end;
        tbl_audit(l_idx).changed_by := user;
        tbl_audit(l_idx).change_ts := systimestamp;
        tbl_audit(l_idx).emp_id := case when inserting then :new.emp_id else :old.emp_id end;
        tbl_audit(l_idx).emp_name := case when inserting then :new.emp_name else :old.emp_name end;
        tbl_audit(l_idx).job_title := case when inserting then :new.job_title else :old.job_title end;
    end after each row;
    
    after statement is
    begin
        for i in 1..tbl_audit.count loop
            tbl_audit(i).statement := assoc_fga_handler.g_statement;
            tbl_audit(i).binds := assoc_fga_handler.g_binds;
        end loop;
        forall j in 1..tbl_audit.count
            insert into assoc_audit values tbl_audit(j);
        -- cleardown the array
        tbl_audit.delete;    
    end after statement;
end assoc_aud;
/

Due to the fact that the FGA policy is not fired until after an AFTER ROW trigger for a DELETE, we are only guaranteed to capture the CURRENT_SQL value in an AFTER STATEMENT trigger.
The upshot is that we’re left with a PL/SQL array which is not constrained by a LIMIT clause. In these circumstances it’s not too much of an issue, Spectre has quite a small number of employees…er…associates, so you’re not likely to end up with an array large enough to cause memory issues.
On a potentially larger volume of records you may well consider splitting the INSERT and UPDATE portions of the trigger so that you can limit the size of the arrays generated by these operations. For DELETEs however, it appears that we may well be stuck with this approach.
On a not entirely unrelated subject, Jeff Kemp has an interesting method of speeding up Journalling Triggers.

All that remains is for the FGA policy….

begin
    dbms_fga.add_policy
    (
        object_schema => 'MIKE',
        object_name => 'ASSOCIATES',
        policy_name => 'ASSOCIATES_DML',
        statement_types => 'INSERT, UPDATE, DELETE',
        handler_schema => 'MIKE',
        handler_module => 'ASSOC_FGA_HANDLER.SET_STATEMENT'
    );
end;
/

…and now you can test…

set serveroutput on size unlimited
--
-- Cleardown the tables before running the test
--
truncate table assoc_audit
/

truncate table associates
/

begin
    add_associate(1, 'Odd Job', 'HENCHMAN');
    add_associate(2, 'Jaws', 'HENCHMAN');
    add_associate(3, 'Mayday', 'HENCHWOMAN');
    add_associate(4, 'Ernst Stavro Blofeld', 'CRIMINAL MASTERMIND');
    add_associate(5, 'Emilio Largo', 'Deputy Evil Genius');
    
end;
/

insert into associates( emp_id, emp_name, job_title)
values(6, 'Hans', 'Bodyguard and Piranha keeper')
/

commit;

update associates
set job_title = 'VALET'
where emp_id = 1
/
commit;


delete from associates
where emp_id = 1
/

commit;

-- Spectre is an Equal Opportunities Employer...and I need a statement
-- affecting multiple rows to test so...
update associates
set job_title = 'HENCHPERSON'
where job_title in ('HENCHMAN', 'HENCHWOMAN')
/

commit;

It is with a sense of relief that, when you check the audit table after running this you find …

select action, emp_name, 
    statement, binds
from assoc_audit
order by change_ts
/

ACTION EMP_NAME 	    STATEMENT							 BINDS
------ -------------------- ------------------------------------------------------------ ----------------------------------------
INSERT Odd Job		    INSERT INTO ASSOCIATES( EMP_ID, EMP_NAME, JOB_TITLE) VALUES(  #1(1):1 #2(7):Odd Job #3(8):HENCHMAN
			     :B3 , :B2 , :B1 )

INSERT Jaws		    INSERT INTO ASSOCIATES( EMP_ID, EMP_NAME, JOB_TITLE) VALUES(  #1(1):2 #2(4):Jaws #3(8):HENCHMAN
			     :B3 , :B2 , :B1 )

INSERT Mayday		    INSERT INTO ASSOCIATES( EMP_ID, EMP_NAME, JOB_TITLE) VALUES(  #1(1):3 #2(6):Mayday #3(10):HENCHWOMAN
			     :B3 , :B2 , :B1 )

INSERT Ernst Stavro Blofeld INSERT INTO ASSOCIATES( EMP_ID, EMP_NAME, JOB_TITLE) VALUES(  #1(1):4 #2(20):Ernst Stavro Blofeld #3(
			     :B3 , :B2 , :B1 )						 19):CRIMINAL MASTERMIND

INSERT Emilio Largo	    INSERT INTO ASSOCIATES( EMP_ID, EMP_NAME, JOB_TITLE) VALUES(  #1(1):5 #2(12):Emilio Largo #3(18):Depu
			     :B3 , :B2 , :B1 )						 ty Evil Genius

INSERT Hans		    insert into associates( emp_id, emp_name, job_title)
			    values(6, 'Hans', 'Bodyguard and Piranha keeper')

UPDATE Odd Job		    update associates
			    set job_title = 'VALET'
			    where emp_id = 1

DELETE Odd Job		    delete from associates
			    where emp_id = 1

UPDATE Jaws		    update associates
			    set job_title = 'HENCHPERSON'
			    where job_title in ('HENCHMAN', 'HENCHWOMAN')

UPDATE Mayday		    update associates
			    set job_title = 'HENCHPERSON'
			    where job_title in ('HENCHMAN', 'HENCHWOMAN')


10 rows selected.

Looks like the Piranhas will be going hungry…for now !


Filed under: Oracle, PL/SQL, SQL Tagged: audit_trail initialization parameter, compound trigger, dba_fga_audit_trail, dbms_fga.add_policy, SYS_CONTEXT, sys_context current_bind, sys_context current_sql, sys_context current_sql_length, v$option, v$parameter, v$version

Adrift in a Data Lake – an Oracle Developer’s first steps in Hadoop

Mon, 2016-03-28 11:49

We live in interesting times. As I write, the political life of a great nation is currently in thrall to a wealthy right-wing populist with a rather distinctive hairstyle.
But enough about Boris Johnson.

For someone used to the way things are done in Oracle, Hadoop can be something of a culture shock.
My initial introduction to Hadoop and it’s “vibrant ecosystem” was some internally written documentation.
As with most technical documentation, it was written with the assumption of a certain level of prior knowledge on the part of the reader. For this particular reader, it proved to be an erroneous assumption.

After a half an hour of struggling through this text , I was left wondering what you would use a Khafka Banana Oozie for.

Maybe it’s for killing zombie processes ? Let’s face it, with all that Java running on your system there are bound to be a few knocking around.
I’m a little hazy on my Zombie Lore, so I’m not entirely clear as to why a rapid application of fresh fruit would pose an existential threat to your average zombie. Maybe they’re potassium intolerant ?

There are a bewildering array of tools associated with Hadoop, many of which appear to provide almost identical functionality.
For example, a relational database traditionally requires only one SQL Engine…and I’ll be taking a look at two of them.

Moving from Oracle to Hadoop can feel rather like trading your car in for a box of Lego.
If the box does contain any instructions they seem to have been translated from Java into English…by someone who doesn’t speak either.
Note to reader : please substitute your own language as appropriate.

Fortunately, there are Hadoop distributions available which bundle the core tools required to get up an running. We’ll gloss over the fact that these distributions – Cloudera and Hortonworks – don’t contain the exact same tools.

In my search to find a coherent explanation of how to use Hadoop, I’ve found that the VM provided by Cloudera, together with the introductory tutorial, is a valuable aid to familiarising myself with the basics.

To start with then, I’ll cover getting hold of the Cloudera VM and setting it up in VirtualBox.
Then I’ll go through some of the tools available and what they do.
I’ll do this from the perspective of an Oracle developer (mainly because I don’t have much option) and will point out the driftwood of familiarity that you might be able to cling to in order to stay afloat in your Data Lake.

What I’ll cover is :

  • The core components of Hadoop
  • HDFS commands
  • Transferring data between a relational database and hadoop using SQOOP
  • Querying structured data using Hive and Impala
  • Uploading data using Kite

Ready to Dive in ?

Cloudera VM setup for Virtualbox

Before we head off to get the VM, there are a couple of points worth mentioning.

First of all, the VM is running on a 64-bit version of Centos 6.4. Therefore you need to make sure that your host operating system ( i.e. the one that you’ll be running Virtual Box on) is also 64-bit.

Secondly, the minimum recommended memory allocation for the VM is 4GB.

Finally, I’m using the Cloudera Quickstart 5.5 VM. The behaviour of some of the tools covered here differs in 5.4 and 5.5
The Cloudera VM can be downloaded from here .

You’ll need to fill in some details about yourself before moving on.

Once you’re at the downloads page, select VirtualBox from the Platform drop-down list.
VMs for other platforms are also available.

Once you hit the “Download Now” button you’ll be asked for some further details. However, these do not appear to be validated.

The download is quite chunky, just over 5GB.

Once it’s completed you should have a zip file called :

cloudera-quickstart-vm-5.5.0-0-virtualbox.zip

You can use a standard unzip utility to extract this file ( e.g. Winzip on Windows, plain old zip on Linux).
Once extracted, you’ll see a directory called cloudera-quickstart-vm-5.5.0-0-virtualbox which contains two files :

cloudera-quickstart-vm-5.5.0-0-virtualbox-disk1.vmdk
cloudera-quickstart-vm-5.5.0-0-virtualbox.ovf

The .ovf file is the VirtualBox image and the .vmdk is the VM’s hard-drive image.

To set up the VM in Virtualbox, you’ll need to import the .ovf file as an Appliance.
The steps to do this are the same as those for importing an Oracle Developer Day 12c Image.

When you first start the VM it does take a while to come up. When it does, you should see

cloudera_vm_start

Tweaking the VM settings

If you don’t happen to live in Boston and/or you don’t have a US keyboard, you may want to make a couple of configuration changes to the VM.

To change the Location ( and consequently the Time Zone) :

  1. Click on the Date and Time on the Status Bar. It’s in the top right-hand corner of the screen.
  2. Hit the Edit button next to Locations
  3. Add your location by typing it in the Location Name dialog box (a drop-down list will appear as you start typing).
  4. Now highlight Boston and click the Remove button. Your new location should now show at the bottom of the Time window
  5. Move your mouse over your location and an Edit button should appear. Click this and the Date and Time should now reflect your new location

set_location

You will be prompted for a password to confirm this change. You need to enter cloudera

To change the keyboard layout :

  1. Go to the System Menu and select Preferences and then Keyboard
  2. Navigate to the Layouts tab and click Add
  3. Select your preferred layout from the drop-down list
  4. Once your new layout is shown, click the Default radio button next to it

set_keyboard

These changes should take effect without the need to re-start the VM.

OK, so let’s get started….

The Core components of Hadoop

Hadoop itself comprises three main components :

  • HDFS – a distributed file system
  • The MapReduce framework
  • Yarn – a Job scheduler / resource manager for the parallel execution of MapReduce jobs
MapReduce

The MapReduce framework consists of a Mapper and a Reducer.
In SQL terms, the Mapper program applies a query predicate against the data you are querying – essentially, it does the filtering.
The Reducer then does any aggregation on the result set produced from the Mapper process.

Yarn

Yarn is the default job scheduler and resource manager. It facilitates parallel execution of MapReduce jobs.

HDFS

HDFS – Hadoop File System – is a distributed file system. The idea is that datafiles are replicated across multiple nodes (physical servers) in a cluster. Essentially, any program can run on any node. By replicating the data to each node, network latency is minimised for these programs.
It comes with it’s own set of commands which you can use interactively.
These appear to be largely a subset of those you’d find on a Linux OS.

The format of these commands is :

hadoop fs -command [some arguments]

However, the way that the VM is configured, we need to run these commands as the hdfs user.
Therefore, in the examples that follow the commands will follow the format..

sudo -u hdfs hadoop fs -command [some arguments]

To start with, let’s see what we’d need to do to upload a csv file to HDFS.

The file in question is called tools.csv, which I’ve created in the cloudera user’s home directory on the vm.
It contains the following :

tools.csv

To start with, let’s see what’s currently in hdfs :

sudo -u hdfs hadoop fs -ls /

Found 5 items
drwxrwxrwx   - hdfs  supergroup          0 2015-11-18 10:57 /benchmarks
drwxr-xr-x   - hbase supergroup          0 2016-03-23 12:08 /hbase
drwxrwxrwt   - hdfs  supergroup          0 2016-03-22 12:23 /tmp
drwxr-xr-x   - hdfs  supergroup          0 2015-11-18 11:01 /user
drwxr-xr-x   - hdfs  supergroup          0 2015-11-18 11:00 /var

This shows us the top-level directories. Let’s take a look at what’s in /user :

sudo -u hdfs hadoop fs -ls /user

Found 8 items
drwxr-xr-x   - cloudera cloudera            0 2015-11-18 10:56 /user/cloudera
drwxr-xr-x   - mapred   hadoop              0 2015-11-18 10:57 /user/history
drwxrwxrwx   - hive     supergroup          0 2015-11-18 11:01 /user/hive
drwxrwxrwx   - hue      supergroup          0 2015-11-18 10:58 /user/hue
drwxrwxrwx   - jenkins  supergroup          0 2015-11-18 10:58 /user/jenkins
drwxrwxrwx   - oozie    supergroup          0 2015-11-18 10:59 /user/oozie
drwxrwxrwx   - root     supergroup          0 2015-11-18 10:58 /user/root
drwxr-xr-x   - hdfs     supergroup          0 2015-11-18 11:01 /user/spark

For the purposes of this test, I’ll create a directory under the /user/cloudera directory, and then check that it’s been created as expected :

sudo -u hdfs hadoop fs -mkdir /user/cloudera/test
sudo -u hdfs hadoop fs -ls /user/cloudera
Found 1 items
drwxr-xr-x   - hdfs cloudera          0 2016-03-23 14:25 /user/cloudera/test

Notice that only the directory owner has write permissions on the directory.
As I’m feeling reckless, I want to grant write permissions to everyone.
This can be done as follows :

sudo -u hdfs hadoop fs -chmod a+w /user/cloudera/test
sudo -u hdfs hadoop fs -ls /user/cloudera
Found 1 items
drwxrwxrwx   - hdfs cloudera          0 2016-03-23 14:25 /user/cloudera/test

In HDFS, the chmod command seems to accept the same arguments as it’s Linux counterpart.

To check that we can now see the directory :

sudo -u hdfs hadoop fs -ls /home/cloudera/test

The simplest way to load our csv is to use the put command :

sudo -u hdfs hadoop fs -put tools.csv /user/cloudera/test/put_tools.csv
sudo -u hdfs hadoop fs -ls /user/cloudera/test
Found 1 items
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 14:49 /user/cloudera/test/put_tools.csv

There is another way to do this :

sudo -u hdfs hadoop fs -copyFromLocal tools.csv /user/cloudera/test/tools.csv
sudo -u hdfs hadoop fs -ls /user/cloudera/test
Found 2 items
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 14:49 /user/cloudera/test/put_tools.csv
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 17:04 /user/cloudera/test/tools.csv

If we want to delete a file then :

sudo -u hdfs hadoop fs -rm /user/cloudera/test/put_tools.csv
16/03/23 17:06:51 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/cloudera/test/put_tools.csv
sudo -u hdfs hadoop fs -ls /user/cloudera/test
Found 1 items
-rw-r--r--   1 hdfs cloudera        301 2016-03-23 17:04 /user/cloudera/test/tools.csv

You can display the contents of a file in hdfs by using the cat command :

sudo -u hdfs hadoop fs -cat /user/cloudera/test/tools.csv
tool,description
hue,Web-based UI for Hadoop
sqoop,Transfer structured data between an RDBMS and Hadoop
flume,stream a file into Hadoop
impala,a query engine
hive,another query engine
spark,a query engine that is not hive or impala
khafka,a scheduler
banana,a web UI framework
oozie,another scheduler

In order to demonstrate copying a file from hdfs to the local filesystem in the VM, we’ll need to create a directory that the hdfs user has access to :

mkdir test
chmod a+rw test
cd test
ls -ld
drwxrwxrwx 2 cloudera cloudera 4096 Mar 23 17:13 .

Now, as the hdfs user, we can retrieve our file from hdfs onto the local file system using copyFromLocal :

sudo -u hdfs hadoop fs -copyToLocal /user/cloudera/test/tools.csv /home/cloudera/test/welcome_back.csv
ls -l welcome_back.csv
-rw-r--r-- 1 hdfs hdfs 301 Mar 23 17:18 welcome_back.csv

Another method of doing this is using get :

sudo -u hdfs hadoop fs -get /user/cloudera/test/tools.csv /home/cloudera/test/got_it.csv
ls -l got_it.csv
-rw-r--r-- 1 hdfs hdfs 301 Mar 23 17:21 got_it.csv

One final hdfs command that may come in useful is du, which shows the amount of space used by a directory or file on hdfs :

sudo -u hdfs hadoop fs -du /user/cloudera
301  301  /user/cloudera/test
SQOOP

The VM comes with a MySQL database from which data is loaded into Hadoop via SQOOP.
SQOOP is a tool for transferring structured data between an RDBMS and Hadoop.

The documentation does say that SQOOP is capable of loading data into Oracle using the command (from the Local File System) :

sqoop import --connect jdbc:oracle:thin:@//db_name --table table_name

However, said documentation says that it’s been tested with Oracle 10.2 Express Edition, so you may want to have a play around with it before using it in anger.

The tutorial directs us to use SQOOP to ingest all of the data from the MySQL database by running the following command :

sqoop import-all-tables \
    -m 1 \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

There’s a fair amount going on here, we’re connecting to MySQL, then outputting the data as a compressed file onto hdfs in the /user/hive/warehouse directory.
The compression library being used is Snappy.

It’s instructive to see the output when we run this command as it shows both MapReduce and Yarn in action. You’ll probably see lines like :

...
16/03/23 17:36:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458734644938_0001
16/03/23 17:36:08 INFO impl.YarnClientImpl: Submitted application application_1458734644938_0001
16/03/23 17:36:08 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1458734644938_0001/
16/03/23 17:36:08 INFO mapreduce.Job: Running job: job_1458734644938_0001
16/03/23 17:37:28 INFO mapreduce.Job: Job job_1458734644938_0001 running in uber mode : false
16/03/23 17:37:28 INFO mapreduce.Job:  map 0% reduce 0%
16/03/23 17:38:29 INFO mapreduce.Job:  map 100% reduce 0%
16/03/23 17:38:34 INFO mapreduce.Job: Job job_1458734644938_0001 completed successfully
...

After a fair amount of time, the command should end with :

...
16/03/23 17:53:53 INFO mapreduce.ImportJobBase: Transferred 46.1318 KB in 157.9222 seconds (299.1283 bytes/sec)
16/03/23 17:53:53 INFO mapreduce.ImportJobBase: Retrieved 1345 records.

If we now check, we can see that a directory has been created for each table :

hadoop fs -ls /user/hive/warehouse
Found 6 items
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:38 /user/hive/warehouse/categories
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:41 /user/hive/warehouse/customers
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:44 /user/hive/warehouse/departments
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:48 /user/hive/warehouse/order_items
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:51 /user/hive/warehouse/orders
drwxrwxrwx   - cloudera supergroup          0 2016-03-23 17:53 /user/hive/warehouse/products

Digging further into the catogories directory and it’s children, we find that the table structure has also been transferred across :

hadoop fs -cat /user/hive/warehouse/categories/.metadata/schemas/1.avsc
{
  "type" : "record",
  "name" : "categories",
  "doc" : "Sqoop import of categories",
  "fields" : [ {
    "name" : "category_id",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "category_id",
    "sqlType" : "4"
  }, {
    "name" : "category_department_id",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "category_department_id",
    "sqlType" : "4"
  }, {
    "name" : "category_name",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "category_name",
    "sqlType" : "12"
  } ],
  "tableName" : "categories"
}

The data is stored in a .parquet file :

hadoop fs -ls /user/hive/warehouse/categories
Found 3 items
drwxr-xr-x   - cloudera supergroup          0 2016-03-23 17:35 /user/hive/warehouse/categories/.metadata
drwxr-xr-x   - cloudera supergroup          0 2016-03-23 17:38 /user/hive/warehouse/categories/.signals
-rw-r--r--   1 cloudera supergroup       1956 2016-03-23 17:38 /user/hive/warehouse/categories/a3db2e78-2861-4906-a769-e8035c03d7d2.parquet

There are a number of file formats you can use with Hadoop, each with their own pros and cons.
You can see a discussion of some of these formats here.

NOTE – the first time I ran this, I shutdown the VM after I got to here. When I started it again, Impala (see below) refused to see any databases. I had to trash and re-create the VM to get it to work. I’m not sure why this happened ( I did have a hunt around) but just thought I’d mention it in case you’re thinking of doing the same.

Anyway, now we have data in Hadoop, it would be good if we could interrogate it…

Not Quite SQL – Hive and Impala

Cloudera comes with a Web based UI for Hadoop in the form of Hue.
Note that Hortonworks seem to be standardazing on a different UI tool – Apache Ambari.

In the Cloudera VM, there is a link to Hue on the Bookmarks bar in Firefox.
Click on this link and then connect as cloudera (password cloudera).

The favoured query engine for interactive queries in Cloudera is called Impala.
I believe that, at the time of writing, Hortonworks are sticking with Hive.

As far as I can tell, it seems that Hive has been retained by Cloudera to handle what, in RDBMS terms, would be called the catalog.
In Oracle terms this would be the Data Dictionary.

This is essentially the metadata for the tables in the database.
This metadata seems to be read by all of the SQL Engines irrespective of where it is updated from.

Once you’re connected via Hue you can select either of these tools from the Query Editors drop-down.

Initially, the tutorial directs you to Impala.

The Data Dictionary

The first thing to note about Impala is that it doesn’t bother reading the table metadata unless you tell it to. Therefore, if you make any changes DML or DDL changes, you probably need to tell Impala to check the metadata for any query results to pick up these changes.
For example, we know that we’ve got six tables in our database which we’ve created via SQOOP. However, if you ask Impala about it :

show tables;

…you get the not entirely helpful :

The operation has no results

By contrast, if you try this in Hive (Query Editors/Hive), the tables are all present and correct.

To persuade Impala to see these changes you need to run :

invalidate metadata;

Note that you can also run this command for individual tables should the need arise, e.g. :

invalidate metadata categories;

Anyway, now we can see the tables in Impala, we can run some queries against them.
Whilst we’re at it, we can do a simple comparison between Impala and Hive in terms of how they process the same query.

Comparative performance

The query in question (taken from the Getting Started Tutorial provided with the VM) is :

-- Most popular product categories
select c.category_name, count(order_item_quantity) as count
from order_items oi
inner join products p on oi.order_item_product_id = p.product_id
inner join categories c on c.category_id = p.product_category_id
group by c.category_name
order by count desc
limit 10;

The syntax looks reassuringly familiar.

In Impala this ran in around 35 seconds.
I then ran the same query in Hive, which took about 7 minutes.

Once the query is run, Hue shows an Explain button. Click on this and you can see the execution plan for the query.

In Hive the plan looks like this :

STAGE DEPENDENCIES:
  Stage-9 is a root stage
  Stage-3 depends on stages: Stage-9
  Stage-4 depends on stages: Stage-3
  Stage-0 depends on stages: Stage-4

STAGE PLANS:
  Stage: Stage-9
    Map Reduce Local Work
      Alias -> Map Local Tables:
        c
          Fetch Operator
            limit: -1
        p
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        c
          TableScan
            alias: c
            Statistics: Num rows: 24 Data size: 2550 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: category_id is not null (type: boolean)
              Statistics: Num rows: 12 Data size: 1275 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col10 (type: int)
                  1 category_id (type: int)
        p
          TableScan
            alias: p
            Statistics: Num rows: 5737 Data size: 45896 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (product_id is not null and product_category_id is not null) (type: boolean)
              Statistics: Num rows: 1435 Data size: 11479 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 order_item_product_id (type: int)
                  1 product_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: oi
            Statistics: Num rows: 206028 Data size: 1648231 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: order_item_product_id is not null (type: boolean)
              Statistics: Num rows: 103014 Data size: 824115 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 order_item_product_id (type: int)
                  1 product_id (type: int)
                outputColumnNames: _col3, _col10
                Statistics: Num rows: 113315 Data size: 906526 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col10 (type: int)
                    1 category_id (type: int)
                  outputColumnNames: _col3, _col20
                  Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col20 (type: string), _col3 (type: int)
                    outputColumnNames: _col20, _col3
                    Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: count(_col3)
                      keys: _col20 (type: string)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 124646 Data size: 997178 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: bigint)
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 62323 Data size: 498589 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-4
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col1 (type: bigint)
              sort order: -
              Statistics: Num rows: 62323 Data size: 498589 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string)
      Reduce Operator Tree:
        Select Operator
          expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: bigint)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 62323 Data size: 498589 Basic stats: COMPLETE Column stats: NONE
          Limit
            Number of rows: 10
            Statistics: Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 10
      Processor Tree:
        ListSink

The Impala plan looks a bit different :

Estimated Per-Host Requirements: Memory=4.16GB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
default.categories, default.order_items, default.products

11:MERGING-EXCHANGE [UNPARTITIONED]
|  order by: count(order_item_quantity) DESC
|  limit: 10
|
06:TOP-N [LIMIT=10]
|  order by: count(order_item_quantity) DESC
|
10:AGGREGATE [FINALIZE]
|  output: count:merge(order_item_quantity)
|  group by: c.category_name
|
09:EXCHANGE [HASH(c.category_name)]
|
05:AGGREGATE
|  output: count(order_item_quantity)
|  group by: c.category_name
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: p.product_category_id = c.category_id
|
|--08:EXCHANGE [BROADCAST]
|  |
|  02:SCAN HDFS [default.categories c]
|     partitions=1/1 files=1 size=1.91KB
|
03:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: oi.order_item_product_id = p.product_id
|
|--07:EXCHANGE [BROADCAST]
|  |
|  01:SCAN HDFS [default.products p]
|     partitions=1/1 files=1 size=43.80KB
|
00:SCAN HDFS [default.order_items oi]
   partitions=1/1 files=1 size=1.57MB

As well as being somewhat shorter, the Impala plan appears more familiar to someone looking at an Oracle Query Plan.

To check for the possible effect of caching, I then ran these queries again.
For Impala, the runtime dropped to around 13 seconds.
For Hive, the runtime was the same as for the first run.

The Impala plan was unchanged, despite the fact that the runtime was drastically reduced.
From this behaviour I would infer that there is some caching effect for Impala, although a cache similar to the Buffer Cache in Oracle is not shared between the two Query Engines.

The explanation offered in the tutorial is that Hive compiles SQL queries into MapReduce jobs wheras Impala was designed as a SQL engine.
It would appear that Impala is better for small-scale interactive queries wheras Hive is more suited to large-scale ETL.

There’s a benchmarking comparison between Hive, Impala, and Spark that may be of interest here.

Optimizer Statistics

You’ll notice that, at the start of the Impala query plan, there’s a warning about missing statistics.

If you roll the mouse over one of the tables on the left-hand side of the Hue Query Editor Window, you’ll see an icon appear which enables you to explore the table’s metadata.

Looking at one of these tables we can see that there are no stats present :

no_stats

We can fix this easily enough in Impala by gathering stats for each of our tables. For example :

compute stats categories;

If we now check the metadata for the CATEGORIES table, we can see that stats are present :

with_stats

If we now re-run the original query, the plan will no longer show the warning. However, the rest of the plan remains unchanged.
Given that these are quite small tables, this is probably not surprising.

There’s more information about stats gathering in Impala here.

External Tables

One thing that Hive is good for apparently is creating External Tables.

In the tutorial, some unstructured data ( a log file) is loaded and then external tables created in Hive as using the following code :


CREATE EXTERNAL TABLE intermediate_access_logs (
    ip STRING,
    date STRING,
    method STRING,
    url STRING,
    http_version STRING,
    code1 STRING,
    code2 STRING,
    dash STRING,
    user_agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'input.regex' = '([^ ]*) - - \\[([^\\]]*)\\] "([^\ ]*) ([^\ ]*) ([^\ ]*)" (\\d*) (\\d*) "([^"]*)" "([^"]*)"',
    'output.format.string' = '%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s')
LOCATION '/user/hive/warehouse/original_access_logs';

CREATE EXTERNAL TABLE tokenized_access_logs (
    ip STRING,
    date STRING,
    method STRING,
    url STRING,
    http_version STRING,
    code1 STRING,
    code2 STRING,
    dash STRING,
    user_agent STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/hive/warehouse/tokenized_access_logs';

ADD JAR /usr/lib/hive/lib/hive-contrib.jar;

INSERT OVERWRITE TABLE tokenized_access_logs SELECT * FROM intermediate_access_logs;

Once again, it looks familiar for the most part.
One point of interest is the INSERT OVERWRITE command used to insert data into the table.
This has the effect of removing any pre-existing data in the table before inserting the new data.

String functions and other SQL Stuff

We can query the new external table in Impala :

invalidate metadata tokenized_access_logs;
select * from tokenized_access_logs;

The values in the url column contain %20 characters instead of spaces.
Let’s see what Impala can do in terms of the standard SQL string manipulation functions.

Unlike Oracle, there’s no REPLACE function, there is however a REGEXP_REPLACE…

select regexp_replace(url, '%20', ' ')
from tokenized_access_logs;

When we run this, we can see that the REGEXP_REPLACE has done the job :

regexp_replace

The logs we’re really interested in are where a specific product has been viewed.
If we can get the name of the product from the url, then maybe that will help when relating this data back to the most popular items in terms of sales…

select substr(regexp_replace( url, '%20', ' '), instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9)
from tokenized_access_logs
where url like '%\/product\/%';

When we run this, we can see that the INSTR function also works as expected…to a point. Note that the ‘/’ characters need to be escaped with a ‘\’.
The result looks like this :

products_url

Now, there are a couple of records with the product name and “/add_to_cart” appended. With INSTR in Oracle, you can search for a pattern starting at the end of a string – e.g. :

select instr('/product/Pelican Sunstream 100 Kayak/add_to_cart', '/',-1,1)
from dual;

INSTR('/PRODUCT/PELICANSUNSTREAM100KAYAK/ADD_TO_CART','/',-1,1)
---------------------------------------------------------------
							     37

There is no such option in the Impala equivalent. It simply searches the string from the start and reports the first occurrence.
Fortunately, we want to strip out these results for the stuff we’re going to do in a minute. Therefore, I’ve just amended the query to be :

select substr(regexp_replace( url, '%20', ' '), instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9)
from tokenized_access_logs
where url like '%\/product\/%'
and url not like '%\/add_to_cart%';

We can use fairly standard SQL to get a listing of the products by number of views :

select count(*), substr(regexp_replace( url, '%20', ' '), instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9)
from tokenized_access_logs
where url like '%\/product\/%'
and url not like '%\/add_to_cart%'
group by url
order by 1 desc;

In the tutorial, it mentions that there is one product that has a lot of views but no sales. I wonder if we can find this just using SQL ?

with viewed_products as
(
select count(*) as times_viewed,
substr(regexp_replace( url, '%20', ' '),
       instr( regexp_replace(url, '%20', ' '), '\/product\/') + 9) as product_name
from tokenized_access_logs
where url like '%\/product\/%'
and url not like '%\/add_to_cart%'
group by url
)
select v.times_viewed, v.product_name
from viewed_products v
where upper(v.product_name) not in
(
  select upper(p.product_name)
  from products p
  inner join
  (
    select oi.order_item_product_id as product_id,
        count(oi.order_item_id) as units_sold
    from order_items oi
    inner join orders o
        on oi.order_item_order_id = o.order_id
    where o.order_status not in ('CANCELED', 'SUSPTECTED_FRAUD')
    group by order_item_product_id
  ) s
    on p.product_id = s.product_id
);

OK, it’s not the most elegant SQL I’ve ever written but it does demonstrate that :

  • You can define in-line views using the WITH clause
  • You can use a NOT IN subquery

…better than that, it gives the correct answer :

adidas Kids' RG III Mid Football Cleat

In the tutorial, the reason for the lack of sales is a coding error in the product page. You can ask your own snarky question about Unit Testing practices here.

Going back to the csv files we were playing around with on hdfs earlier, how would we get those into our database ?

Go fly a Kite

Yes, it’s another tool. This one is called Kite. It’s purpose, in this case at least, is to help us create a table based on the tools.csv file and allow us to then make use of it in Impala.

First of all, I’ve created the tools.csv file on the local file system :

l tools.csv
-rw-rw-r-- 1 cloudera cloudera 303 Mar 26 17:21 tools.csv

Now to get kite to create a “table” definition based on the csv :

kite-dataset csv-schema tools.csv --class Tools -o tools.avsc
ls -l tools.*
-rw-rw-r-- 1 cloudera cloudera 373 Mar 26 17:22 tools.avsc
-rw-rw-r-- 1 cloudera cloudera 303 Mar 26 17:21 tools.csv

If we have a look at the new avro file that kite has generated, it looks similar in terms of format to the one that SQOOP generated earlier :

{
  "type" : "record",
  "name" : "Tools",
  "doc" : "Schema generated by Kite",
  "fields" : [ {
    "name" : "tool",
    "type" : [ "null", "string" ],
    "doc" : "Type inferred from 'hue'",
    "default" : null
  }, {
    "name" : "description",
    "type" : [ "null", "string" ],
    "doc" : "Type inferred from 'Web-based UI for Hadoop'",
    "default" : null
  } ]
}

The next step is to create the table metadata ( in kite terminology this is a dataset)…

kite-dataset create tools -s tools.avsc

…and finally add the data itself to our table…

kite-dataset csv-import tools.csv tools
The url to track the job: http://localhost:8080/
Added 9 records to "tools"

To check in Impala, we can head over to Hue, select Impala from the Query Editors drop-down and run :

invalidate metadata tools;
select * from tools;

The result should look like this :

tools_query

Back on the command line, we can see that kite has added files to hdfs :

hadoop fs -ls /user/hive/warehouse/tools
Found 3 items
drwxr-xr-x   - cloudera supergroup          0 2016-03-26 17:32 /user/hive/warehouse/tools/.metadata
drwxr-xr-x   - cloudera supergroup          0 2016-03-26 17:35 /user/hive/warehouse/tools/.signals
-rw-r--r--   1 cloudera supergroup        622 2016-03-26 17:35 /user/hive/warehouse/tools/8baf8440-41b0-4f29-8750-16aeb3aec9b0.avro

The .avro file appears to have been compressed using the Snappy compression tool also used by SQOOP earlier.
You can however read the file by running :

hadoop fs -text /user/hive/warehouse/tools/8baf8440-41b0-4f29-8750-16aeb3aec9b0.avro
{"tool":{"string":"banana"},"description":{"string":"a web UI framework"}}
{"tool":{"string":"flume"},"description":{"string":"stream a file into Hadoop"}}
{"tool":{"string":"hive"},"description":{"string":"another query engine"}}
{"tool":{"string":"hue"},"description":{"string":"Web-based UI for Hadoop"}}
{"tool":{"string":"impala"},"description":{"string":"a query engine"}}
{"tool":{"string":"khafka"},"description":{"string":"a scheduler"}}
{"tool":{"string":"oozie"},"description":{"string":"another scheduler"}}
{"tool":{"string":"spark"},"description":{"string":"a query engine that is not hive or impala ?"}}
{"tool":{"string":"sqoop"},"description":{"string":"Transfer structured data between an RDBMS and Hadoop"}}
Summary

The Getting Started Tutorial goes on to cover various other tools available in the Cloudera distribution for doing data manipulation and analysis.
Additionally, you may find this presentation on Hadoop internals for Oracle Devs by Tanel Poder useful.

For now though, I have enough to keep my head above water.


Filed under: Oracle, SQL Tagged: Hadoop, hdfs, Hive, Impala, Kite, SQOOP

Live to Win – Motorhead Covers and Pythonic Irrigation

Tue, 2016-03-08 15:31

The recent passing of Lemmy has caused me to reflect on on the career of one of the bands who made my growing up (and grown-up) years that much…well…louder.

Yes, I know that serious Python documentation should employ a sprinkling of Monty Python references but, let’s face it, what follows is more of a quick trawl through some basic Python constructs that I’ve found quite useful recently.
If I put them all here, at least I’ll know where to look when I need them again.

In any case, Michael Pailin made a guest appearance on the album Rock ‘n’ Roll so that’s probably enough of a link to safisfy the Monty Python criteria.

I find Python a really good language to code in…especially when the alternative is writing a Windows Batch Script. However, there is a “but”.
Python 3 is not backward compatible with Python 2. This can make life rather interesting on occasion.

It is possible to write code that is compatible with both versions of the language and there’s a useful article here on that topic.

The code I’ve written here has been tested on both Python 2 (2.7.6) and Python 3 (3.4.3).

One of the great things about Python is that there are a number of modules supplied as standard, which greatly simplify some common programming tasks.
What I’m going to run through here is :

  • Getting information about the environment
  • Handling runtime arguments with the argparse module
  • Reading config files with configparser
  • Writing information to log files with the logging module

Existential Questions

There are a number of questions that you’ll want to answer programatically, sooner rather than later…

Who am I

There’s a couple of ways to find out the user your connected as from inside Python.
You could simply use the os.getlogin() function…

import os
print( os.getlogin())

…but according to the official documentation [link] this is probably a better option…

import os
import pwd
print( pwd.getpwuid(os.getuid())[0])

Additionally, we may want to know the name of the Python program we’re currently in. The following script – called road_crew.py – should do the job :

import os
print(os.path.basename(__file__))

Running this we get :

road_crew.py
Where am I

Step forward the platform module, as seen here in this code (saved as hammersmith.py) :

import platform

def main() :
    # Get the name of the host machine
    machine_name = platform.node()
    # Get the OS and architecture
    os_type = platform.system()
    if platform.machine() == 'x86_64' :
        os_arch = '64-bit'
    else :
        os_arch = '32-bit'

    print('Running on '+machine_name+' which is running '+ os_type + ' ' + os_arch)

    # Now get more detailed OS information using the appropriate function...
    if os_type == 'Linux' :
        print(platform.linux_distribution())
    elif os_type == 'Windows' :
        print(platform.win32_ver())
    elif os_type == 'Mac' :
        #NOTE - I don't have a Mac handy so have no way of testing this statement
        print(platform.mac_ver())
    else :
        print("Sky high and 6000 miles away!")

if __name__ == '__main__' :
    main()

Running this on my Linux Mint machine produces :

Running on mike-TravelMate-B116-M which is running Linux 64-bit
('LinuxMint', '17.3', 'rosa')

As mentioned previously, you also may be quite keen to know the version of Python that your program is running on….

import sys

major = sys.version_info[0]
minor = sys.version_info[1]
micro = sys.version_info[2]

if major == 3 :
    print('Ace of Spades !')
else :
    print('Bomber !')

print('You are running Python ' + str(major) + '.' + str(minor) + '.' + str(micro))

On Python 3, this outputs…

Ace of Spades !
You are running Python 3.4.3

…whilst on Python 2…

Bomber !
You are running Python 2.7.6
When Am I

As for the current date and time, allow me to introduce another_perfect_day.py…

import time

today = time.strftime("%a %d %B %Y")
now = time.strftime("%H:%M:%S")

print("Today's date is " + today);
print("The time is now " + now);

…which gives us …

Today's date is Sun 06 March 2016
The time is now 19:15:19
Argument parsing

The argparse module makes handling arguments passed to the program fairly straightforward.
It allows you to provide a short or long switch for the argument, specify a default value, and even write some help text.
The program is called no_remorse.py and looks like this :

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-a", "--age", default = 40, help = "How old are you ? (defaults to 40 - nothing personal)")
args = vars(parser.parse_args())
age = args['age']
if int(age) > 39 :
    print('I remember when Motorhead had a number 1 album !')
else :
    print('Who are Motorhead ?')

The argparse gives us a couple of things. First of all, if we want to know more about the required parameters, we can simply invoke the help :

python no_remorse.py -h
usage: no_remorse.py [-h] [-a AGE]

optional arguments:
  -h, --help         show this help message and exit
  -a AGE, --age AGE  How old are you ? (defaults to 40 - nothing personal)

If we run it without specifying a value for age, it will pick up the default….

python no_remorse.py
I remember when Motorhead had a number 1 album !

…and if I’m tempted to lie about my age (explicitly, as opposed to by omission in the previous example)…

python no_remorse.py -a 39
Who are Motorhead ?

As well as using the single-letter switch for the parameter, we can use the long version …

python no_remorse.py --age 48
I remember when Motorhead had a number 1 album !

One other point to note, the program will not accept arguments passed by positon, either the long or short switch for the argument must be specified. Either that or Python comes with it’s own outrageous lie detector…

python no_remorse.py 25
usage: no_remorse.py [-h] [-a AGE]
no_remorse.py: error: unrecognized arguments: 25
Reading a config file

There are times when you need a program to run on multiple environments, each with slightly different details ( machine name, directory paths etc).
Rather than having to pass these details in each time you run the program, you can dump them all into a file for your program to read at runtime.
Usually, you’ll pass in an argument to point the program at the appropriate section of your config file. A config file will look something like this :

[DEV]
db_name = dev01

[TEST]
db_name = test01

[PROD]
db_name = prod

In this example, your program will probably accept an argument specifying which environment it needs to run against and then read the appropriate section of the config file to set variables to the appropriate values.

My working example is slightly different and is based on cover versions that Motorhead have done of other artists’ tracks, together with a couple of my favourite covers of Motorhead songs by other bands :

[MOTORHEAD]
Tammy Wynette = Stand By Your Man
The Kingsmen = Louie Louie

[METALLICA]
Motorhead = Overkill

[CORDUROY]
Motorhead = Motorhead

Now, you could spend a fair amount of time trying to figure out how to read this file and get the appropriate values…or you could just use the configparser module…

Conditional Import – making sure you find Configparser

The configparser module was renamed in Python3 so the import statement for it is different depending on which version of Python your using.
Fortunately, Python offers the ability to conditionally import modules as well as allowing you to alias them.
Therefore, this should solve your problem…

try:
    import configparser
except ImportError :
    import ConfigParser as configparser

So, if we’re running Python 3 the first import statement succeeds.
If we’re running Python2 we’ll get an ImportError, in which case we import the version 2 ConfigParser and alias it as configparser.
The alias means that we can refer to the module in the same way throughout the rest of the program without having to check which version we’ve actually imported.
As a result, our code should now run on either Python version :

try:
    import configparser
except ImportError :
    import ConfigParser as configparser

config = configparser.ConfigParser()
config.read('covers.cfg')

#Get a single value from the [CORDUROY] section of the config file
cover_artist = 'CORDUROY'
#Find the track they covered, originally recorded by Motorhead
# Pass the config section and the original artist ( the entry on the left-hand side of the "="
# in the config file
track = config.get(cover_artist, 'Motorhead')
# cover_artist and track are string objects so we can use the title method to initcap the output
print(cover_artist.title() + ' covered ' + track.title() + ' by Motorhead')

# Loop through all of the entries in the [MOTORHEAD] section of the config file
for original_artist in config.options('MOTORHEAD') :
    print('Motorhead covered ' + config.get('MOTORHEAD', original_artist) + ' by ' + original_artist.upper())

Run this and we get…

Corduroy covered Motorhead by Motorhead
Motorhead covered Stand By Your Man by TAMMY WYNETTE
Motorhead covered Louie Louie by THE KINGSMEN
Dead Men Tell No Tales

…but fortunately the Python logging module will let your programs sing like a canary.

As with the configparser, there’s no need to write lots of code to open and write to a file.
There are five levels of logging message supported :

  • DEBUG
  • INFO
  • WARNING – the default
  • ERROR
  • CRITICAL

There is a separate call to write each message type. The message itself can be formatted to include information such as a timestamp and the program from which the message was written. There’s a detailed how-to on logging here.

For now though, we want a simple program (logger.py) to write messages to a file wittily and originally titled logger.log…

import logging

logging.basicConfig(
    filename='logger.log',
    level=logging.INFO,
    format='%(asctime)s:%(filename)s:%(levelname)s:%(message)s'
)

logging.debug('No Remorse')
logging.info('Overnight Sensation')
logging.warn('March or Die')
logging.error('Bad Magic')

There’s no output to the screen when we run this program but if we check, there should now be a file called logger.log in the same directory which contains :

2016-03-06 19:19:59,375:logger.py:INFO:Overnight Sensation
2016-03-06 19:19:59,375:logger.py:WARNING:March or Die
2016-03-06 19:19:59,375:logger.py:ERROR:Bad Magic

As you can see, the type of message in the log depends on the logging member invoked to write the message.

If you want a more comprehensive/authoritative/coherent explanation of the features I’ve covered here, then have a look at the official Python documentation.
On the other hand, if you want to check out a rather unusual version of one of Motorhead’s signature tracks, this is definitely worth a look.


Filed under: python Tagged: argparse, configparser, logging, os.getlogin, os.path.basename, platform.node, platform.system, pwd.getpwuid, sys.version_info, time.strftime

Resolving Hardware Issues with a Kernel Upgrade in Linux Mint

Sun, 2016-02-07 11:40

One evening recently, whilst climbing the wooden hills with netbook in hand, I encountered a cat who had decided that halfway up the stairs was a perfect place to catch forty winks.
One startled moggy later, I had become the owner of what I can only describe as…an ex-netbook.

Now, finally, I’ve managed to get a replacement (netbook, not cat).

As usual when I get a new machine, the first thing I did was to replace Windows with Linux Mint…with the immediate result being that the wireless card stopped working.

The solution ? Don’t (kernel) panic, kernel upgrade !

Support for most of the hardware out there is included in the Linux Kernel. The kernel is enhanced and released every few months. However, distributions, such as Mint, tend to stick on one kernel version for a while in order to provide a stable base on which to develop.
This means that, if Linux is not playing nicely with your Wireless card/web-cam/any other aspect of your machine’s hardware, a kernel upgrade may resolve your problem.
Obviously it’s always good to do a bit of checking to see if this might be the case.
It’s also good to have a way of putting things back as they were should the change we’re making not have the desired effect.

What I’m going to cover here is the specific issue I encountered with my new Netbook and the steps I took to figure out what kernel version might fix the problem.
I’ll then detail the kernel upgrade itself.

Machine details

The machine In question is an Acer TravelMate-B116.
It has an 11.6 inch screen, 4GB RAM and a 500GB HDD.
For the purposes of the steps that follow, I was able to connect to the internet via a wired connection to my router. Well, up until I got the wireless working.
The Linux OS I’m using is Linux Mint 17.3 Cinnamon.
Note that I have disabled UEFI and am booting the machine in Legacy mode.

Standard Warning – have a backup handy !

In my particular circumstances, I was trying to configure a new machine. If it all went wrong, I could simply re-install Mint and be back where I started.
If you have stuff on your machine that you don’t want to lose, it’s probably a good idea to back it up onto separate media ( e.g. a USB stick).
Additionally, if you are not presented with a grub menu when you boot your machine, you may consider running the boot-repair tool.
This will ensure that you have the option of which kernel to use if you have more than one to choose from ( which will be the case once you’ve done the kernel upgrade).

It is possible that upgrading the kernel may cause issues with some of the hardware that is working fine with the kernel you currently have installed, so it’s probably wise to be prepared.

Identifying the card

The first step then, is to identify exactly which wireless network card is in the machine.
From a terminal window …

lspci

00:00.0 Host bridge: Intel Corporation Device 2280 (rev 21)
00:02.0 VGA compatible controller: Intel Corporation Device 22b1 (rev 21)
00:0b.0 Signal processing controller: Intel Corporation Device 22dc (rev 21)
00:13.0 SATA controller: Intel Corporation Device 22a3 (rev 21)
00:14.0 USB controller: Intel Corporation Device 22b5 (rev 21)
00:1a.0 Encryption controller: Intel Corporation Device 2298 (rev 21)
00:1b.0 Audio device: Intel Corporation Device 2284 (rev 21)
00:1c.0 PCI bridge: Intel Corporation Device 22c8 (rev 21)
00:1c.2 PCI bridge: Intel Corporation Device 22cc (rev 21)
00:1c.3 PCI bridge: Intel Corporation Device 22ce (rev 21)
00:1f.0 ISA bridge: Intel Corporation Device 229c (rev 21)
00:1f.3 SMBus: Intel Corporation Device 2292 (rev 21)
02:00.0 Network controller: Intel Corporation Device 3165 (rev 81)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

It looks like the penultimate entry is our wireless card.
It is possible to get details of the card you have by using “Intel Corporation Device 3165” as a search term. However, we may be able to get the name of the card by running ….

lspci -vq |grep -i wireless -B 1 -A 4

In my case, this returns :

02:00.0 Network controller: Intel Corporation Wireless 3165 (rev 81)
	Subsystem: Intel Corporation Dual Band Wireless AC 3165
	Flags: bus master, fast devsel, latency 0, IRQ 200
	Memory at 91100000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>

Further digging around reveals that, according to Intel, this card is supported in linux starting at Kernel version 4.2.

Now, which version of the Kernel are we actually running ?

Identifying the current kernel version and packages

This is relatively simple. In the Terminal just type :

uname -r

On Mint 17.3, the output is :

3.19.0-32-generic

At this point, we now know that an upgrade to the kernel may well solve our wireless problem. The question now is, which packages do we need to install to effect the upgrade ?

If you look in the repositories, there appear to be at least two distinct versions of kernel packages, the generic and something called low-latency.
In order to be confident of which packages we want to get, it’s probably a good idea to work out what we have now.
This can be achieved by searching the installed packages for the version number of the current kernel.
We can do this in the terminal :

dpkg --list |grep 3.19.0-32 |awk '{print $2}'

In my case, this returned :

linux-headers-3.19.0-32
linux-headers-3.19.0-32-generic
linux-image-3.19.0-32-generic
linux-image-extra-3.19.0.32-generic
linux-kernel-generic

As an alternative, you could use the graphical Synaptic Package Manager.
You can start this from the menu ( Administration/Synaptic Package Manager).

synaptic1

Now we know what we’ve got, the next step is to find the kernel version that we need…

Getting the new kernel packages

It may well be the case that the kernel version you’re after has already been added to the distro’s repository.
To see if this is the case, use Synaptic Package Manager to search as follows :

Start Synaptic Package Manager from the System Menu.
You will be prompted for your password.

Click the Status button and select Not Installed

synaptic_search1

In the Quick filter bar, enter the text : linux-headers-4.2*-generic

synaptic_search2

This should give you a list of any kernel 4.2 versions available in the repository.

If, as I did, you find the version you’re looking for, you need to select the packages that are equivalent to the ones you already have installed on your system.
Incidentally, there are a number of 4.2 kernel versions available, so I decided to go for the latest.
In my case then, I want to install :

  • linux-headers-4.20.0-25
  • linux-headers-4.20.0-25-generic
  • linux-image-4.20.0-25-generic
  • linux-image-extra-4.20.0-25-generic

NOTE – If you don’t find the kernel version you are looking for, you can always download the packages directly using these instructions.

Assuming we have found the version we want, we need to now search for the relevant packages.
In the Quick filter field in Synaptic, change the search string to : linux-*4.2.0-25

To Mark the packages for installation, right-click each one in turn and select Mark for Installation

synaptic_select

Once you’ve selected them all, hit the Apply button.

Once the installation is completed, you need to re-start your computer.

On re-start, you should find that the Grub menu has an entry for Advanced Options.
If you select this, you’ll see that you have a list of kernels to choose to boot into.
This comes in handy if you want to go back to running the previous kernel version.

For now though, we’ll boot into the kernel we’ve just installed.
We can confirm that the installation has been successful, once the machine starts, by opening a Terminal and running :

uname -r

If all has gone to plan, we should now see…

4.2.0-25-generic

Even better in my case, my wireless card has now been recognised.
Opening the systray icon, I can enable wireless and connect to my router.

Backing out of the Kernel Upgrade

If you find that the effects of the kernel upgrade are undesirable, you can always go back to the kernel you started with.
If at all possible, I’d recommend starting Mint using the old kernel before doing this.

If you’re running on the kernel for which you are deleting the packages, you may get some alarming warnings. However, once you re-start, you should be back to your original kernel version.

The command then, is :

sudo apt-get remove linux-headers-4.2* linux-image-4.2*

…where 4.2 is the version of the kernel you want to remove.
Run this and the output looks like this…

The following packages will be REMOVED
  linux-headers-4.2.0-25 linux-headers-4.2.0-25-generic
  linux-image-4.2.0-25-generic linux-image-extra-4.2.0-25-generic
  linux-signed-image-4.2.0-25-generic
0 to upgrade, 0 to newly install, 5 to remove and 7 not to upgrade.
After this operation, 294 MB disk space will be freed.
Do you want to continue? [Y/n]

Once the packages have been removed, the old kernel will be in use on the next re-boot.
After re-starting, you can check this with :

uname -r

Thankfully, these steps proved unnecessary in my case and the kernel upgrade has saved me from hardware cat-astrophe.


Filed under: Linux, Mint Tagged: Acer TravelMate-B116, apt-get remove, dpkg, Intel Corporation Dual Band Wireless AC 3165, kernel upgrade, lspci, synaptic package manager, uname -r

Making Datapump Import Stat-tastically faster

Wed, 2016-01-20 14:29

I’m determined to adopt a positive mental attitude this year.
When the train company explains delays by saying we have the wrong kind of sunshine, I prefer to marvel at the fact that the sun is shining at all in the depths of an English Winter. Let’s face it, it’s a rare enough phenomenon in the summer.
The slow-running of the train caused by this rare natural phenomenon also gives me more time to write this post.
There’s more “good” news – Datapump Import tends to be rather slow when it comes to applying optimizer statistics.
This is because it insists on doing it one row at a time.
All of which provides us with an opportunity to work from home optimize our import job… by not bothering importing the stats.
“Hang on”, you’re thinking, “won’t that mean you have to re-gather stats after the import, which probably won’t be that quick either ?”

Not necessarily. You just need to think positive…

What I’m going to cover here is :

  • How to save stats to a table
  • Export without the stats
  • Import without stats
  • Applying stats from a table

I’m using 11gR2 Express Edition in the examples that follow.
We’ll start by exporting the HR schema and then import the tables into the HR_DEV schema.

As there are overhead-line problems in the Watford Junction area, we’ve also got time to choose between running the datapump export and import on the command line or via the DBMS_DATAPUMP package.

Saving Stats to a Table

Let’s start by making sure that we have some optimizer stats on the tables in the HR schema :

select table_name, last_analyzed, num_rows
from dba_tab_statistics
where owner = 'HR'
order by table_name
/

TABLE_NAME                     LAST_ANALYZED        NUM_ROWS
------------------------------ ------------------ ----------
COUNTRIES                      13-JAN-16                  25
DEPARTMENTS                    13-JAN-16                  27
EMPLOYEES                      13-JAN-16                 107
JOBS                           13-JAN-16                  19
JOB_HISTORY                    13-JAN-16                  10
LOCATIONS                      13-JAN-16                  23
REGIONS                        13-JAN-16                   4

7 rows selected.

I can see that all of the tables in the schema have stats, which is good enough for my purposes here.
If you find that the LAST_ANALYZED value is null for the tables in your database, or if you just decide that you want to take a less cavalier approach to the relevance of your Optimizer stats, you can update them by running :

begin
    dbms_stats.gather_schema_stats('HR');
end;
/

Now we know we’ve got some stats, we need to save them to a table. This process is made fairly straightforward by DBMS_STATS. To create an appropriately structured table in the HR schema, we simply need to run :

begin
    dbms_stats.create_stat_table( ownname => 'HR', stattab => 'exp_stats');
end;
/

The CREATE_STAT_TABLE procedure creates the table specified in the stattab parameter, in the schema specified in the ownname parameter.

So, we now have a table in HR called EXP_STATS, which looks like this…

desc hr.exp_stats

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 STATID                                             VARCHAR2(30)
 TYPE                                               CHAR(1)
 VERSION                                            NUMBER
 FLAGS                                              NUMBER
 C1                                                 VARCHAR2(30)
 C2                                                 VARCHAR2(30)
 C3                                                 VARCHAR2(30)
 C4                                                 VARCHAR2(30)
 C5                                                 VARCHAR2(30)
 N1                                                 NUMBER
 N2                                                 NUMBER
 N3                                                 NUMBER
 N4                                                 NUMBER
 N5                                                 NUMBER
 N6                                                 NUMBER
 N7                                                 NUMBER
 N8                                                 NUMBER
 N9                                                 NUMBER
 N10                                                NUMBER
 N11                                                NUMBER
 N12                                                NUMBER
 D1                                                 DATE
 R1                                                 RAW(32)
 R2                                                 RAW(32)
 CH1                                                VARCHAR2(1000)
 CL1                                                CLOB


Now we need to populate this table. Once again, we need to use DBMS_STATS…

begin
    dbms_stats.export_schema_stats( ownname => 'HR', stattab => 'exp_stats');
end;
/

…and we can see that we now have some data in the table…

select count(*)
from exp_stats
/

  COUNT(*)
----------
        62

The Export

When it comes to datapump exports, you may reasonably take the view that the best policy is to export everything and then pick and choose what you want from the resultant dump file when importing.

Speaking of the dump file, if you want to find it on the OS, you’ll need to know the location pointed to by the DATA_PUMP_DIR directory object. To find this :

select directory_path
from dba_directories
where directory_name = 'DATA_PUMP_DIR'
/

If you’re running the datapump utility from the command line…

expdp system/pwd@XE directory=data_pump_dir dumpfile=hr_full_exp.dmp schemas=HR

…where pwd is the password for SYSTEM.

Alternatively, you can use the PL/SQL API as implemented through the DBMS_DATAPUMP package :

declare
    l_dph number;
    l_state varchar2(30) := 'NONE';
    l_status ku$_status;
begin
    l_dph := dbms_datapump.open
    (
        operation => 'EXPORT',
        job_mode => 'SCHEMA',
        job_name => 'HR_FULL_EXP'
    );
    
    -- Just the HR schema...
    dbms_datapump.metadata_filter
    (
        handle => l_dph,
        name => 'SCHEMA_EXPR',
        value => q'[ IN ('HR') ]'
    );
    
    dbms_datapump.add_file
    (
        handle => l_dph,
        filename => 'hr_full_exp.dmp',
        directory => 'DATA_PUMP_DIR',
        filetype => dbms_datapump.ku$_file_type_dump_file,
        reusefile => 1
    );
    
    dbms_datapump.add_file
    (
        handle => l_dph,
        filename => 'hr_full_exp.log',
        directory => 'DATA_PUMP_DIR',
        filetype => dbms_datapump.ku$_file_type_log_file,
        reusefile => 1
    );
    
    dbms_datapump.log_entry
    (
        handle => l_dph,
        message => 'Job starting at '||to_char(sysdate, 'HH24:MI:SS')
    );
    
    dbms_datapump.start_job( handle => l_dph);

    --
    -- Wait for the job to finish...
    --
    while l_state not in ('COMPLETED', 'STOPPED')
    loop
        dbms_datapump.get_status
        (
            handle => l_dph,
            mask => dbms_datapump.ku$_status_job_error +
                dbms_datapump.ku$_status_job_status +
                dbms_datapump.ku$_status_wip,
            timeout => -1,
            job_state => l_state,
            status => l_status
        );
    end loop;
    dbms_datapump.detach( l_dph);
end;
/

After we’ve run this, we can check the log file and see that the EXP_STATS table has been included in the export by checking the export.log file that gets created in the DATA_PUMP_DIR directory…

...
. . exported "HR"."EXP_STATS"                            20.03 KB      62 rows
. . exported "HR"."COUNTRIES"                            6.367 KB      25 rows
. . exported "HR"."DEPARTMENTS"                          7.007 KB      27 rows
. . exported "HR"."EMPLOYEES"                            16.80 KB     107 rows
. . exported "HR"."JOBS"                                 6.992 KB      19 rows
. . exported "HR"."JOB_HISTORY"                          7.054 KB      10 rows
. . exported "HR"."LOCATIONS"                            8.273 KB      23 rows
. . exported "HR"."REGIONS"                              5.476 KB       4 rows
...
Importing without Applying stats

To import the HR tables into the HR_DEV schema, whilst ensuring that datapump doesn’t apply stats…

If you’re using the import command-line utility …

impdp system/pwd@XE directory=data_pump_dir dumpfile=hr_full_exp.dmp remap_schema=HR:HR_DEV exclude=STATISTICS

Alternatively, using DBMS_DATAPUMP…

declare
    l_dph number;
    l_state varchar2(30) := 'NONE';
    l_status ku$_status;
        
begin

    l_dph := dbms_datapump.open
    (
        operation => 'IMPORT',
        job_mode => 'SCHEMA',
        job_name => 'HR_IMP_NO_STATS'
    );

    --
    -- Import HR objects from the export file into the HR_DEV schema
    --    
    dbms_datapump.metadata_remap
    (
        handle => l_dph,
        name => 'REMAP_SCHEMA',
        old_value => 'HR',
        value => 'HR_DEV'
    );
    
    -- Don't import any stats...
    dbms_datapump.metadata_filter
    (
        handle => l_dph,
        name => 'EXCLUDE_PATH_EXPR',
        value => q'[ = 'STATISTICS']'
    );
    
    dbms_datapump.set_parameter
    (
        handle => l_dph,
        name => 'TABLE_EXISTS_ACTION',
        value => 'REPLACE'
    );
    
   dbms_datapump.add_file
    (
        handle => l_dph,
        filename => 'hr_full_exp.dmp',
        directory => 'DATA_PUMP_DIR',
        filetype => dbms_datapump.ku$_file_type_dump_file,
        reusefile => 1
    );
    
    dbms_datapump.add_file
    (
        handle => l_dph,
        filename => 'hr_full_imp.log',
        directory => 'DATA_PUMP_DIR',
        filetype => dbms_datapump.ku$_file_type_log_file,
        reusefile => 1
    );

    dbms_datapump.log_entry
    (
        handle => l_dph,
        message => 'Job starting at '||to_char(sysdate, 'HH24:MI:SS')
    );
    
    dbms_datapump.start_job( handle => l_dph);
 
    -- Wait for the job to finish...
 
    while l_state not in ('COMPLETED', 'STOPPED')
    loop
        dbms_datapump.get_status
        (
            handle => l_dph,
            mask => dbms_datapump.ku$_status_job_error +
                dbms_datapump.ku$_status_job_status +
                dbms_datapump.ku$_status_wip,
            timeout => -1,
            job_state => l_state,
            status => l_status
        );
    end loop;
    dbms_datapump.detach( l_dph);
end;
/    

If we now check, we can confirm that there are indeed, no stats on the tables we’ve just imported…

select table_name, last_analyzed, num_rows
from dba_tab_statistics
where owner = 'HR_DEV'
order by table_name
/

TABLE_NAME                     LAST_ANALYZED        NUM_ROWS
------------------------------ ------------------ ----------
COUNTRIES
DEPARTMENTS
EMPLOYEES
EXP_STATS
JOBS
JOB_HISTORY
LOCATIONS
REGIONS

8 rows selected.


Now for the final touch, apply the stats that we have in the EXP_STATS table. Should be easy enough…

Applying stats from a table

If we were importing into a schema of the same name as we saved stats for, this would be straight forward.
However, in this case, we’re importing into a different schema – HR_DEV.
Therefore, if we want to avoid “leaves-on-the-line”, we need to do a little light hacking.

To make things a bit clearer, let’s have a look at the contents of the C5 column of our EXP_STATS table…

select distinct(c5)
from exp_stats
/

C5
------------------------------
HR


Yes, the table owner (for that is what the C5 column contains) is set to HR. This is reasonable enough as it was the stats for this schema which we saved to the table in the first place. However, this means that the stats will not be applied to the tables in the HR_DEV schema unless we do this…

update exp_stats
set c5 = 'HR_DEV'
where c5 = 'HR'
/

62 rows updated.

commit;

Commit complete.

Now that’s done, we can apply the stats with a call to DBMS_STATS.IMPORT_SCHEMA_STATS…

begin
    dbms_stats.import_schema_stats(ownname => 'HR_DEV', stattab => 'exp_stats');
end;
/

Check again, and the stats are now on the tables :

select table_name, last_analyzed, num_rows
from dba_tab_statistics
where owner = 'HR_DEV'
order by table_name
/

TABLE_NAME                     LAST_ANALYZED        NUM_ROWS
------------------------------ ------------------ ----------
COUNTRIES                      13-JAN-16                  25
DEPARTMENTS                    13-JAN-16                  27
EMPLOYEES                      13-JAN-16                 107
EXP_STATS
JOBS                           13-JAN-16                  19
JOB_HISTORY                    13-JAN-16                  10
LOCATIONS                      13-JAN-16                  23
REGIONS                        13-JAN-16                   4

8 rows selected.


Whilst importing stats separately does entail a few more steps, it does mean that there is rather less hanging around for datapump import to do it’s impression of a train trying to get through “the wrong kind of snow”.


Filed under: Oracle, PL/SQL Tagged: DataPump, dba_tab_statistics, dbms_datapump, dbms_datapump.metadata_filter, dbms_stats, dbms_stats.create_stat_table, dbms_stats.export_schema_stats, dbms_stats.import_schema_stats, EXCLUDE_PATH_EXPR, expdp, impdp, importing stats into a different schema using dbms_stats, remap_schema

DBMS_METADATA and SELECT_CATALOG_ROLE – Cat Herding in Oracle

Sun, 2016-01-10 11:28

Last year we got a kitten. Little Cleo was full of the joys of…well…being a cat. Then, one day, she just disappeared.
Several months later, having given up hope of ever seeing her again, we adopted Nutmeg.
Then, just before Christmas, Cleo suddenly re-appeared.
It’s a complete mystery as to where she had been for the last year and she has not condescended to comment on the matter.
The end result is that we are now a two cat family.
This brings with it certain complications.
When they aren’t studiously ignoring each other, the cats sit there giving each other hard stares for hours on end.
I think there may be some tension over exactly just who owns that fluffy ball.
To ensure that our sleep is not disturbed by these two fighting like cats in a sack, it’s necessary to ensure that they are in separate rooms before we retire for the evening.
As a result we’ve become rather expert at the art of Cat Herding, which largely consists of bribery with cat-nip, among other things.

Whilst acquiring a reputation as a “dealer” among the feline population of Milton Keynes, I have had cause to reflect on the similarity of our new hobby with the trials and tribulations of persuading DBMS_METADATA.GET_DDL that you do actually have permissions to see the source code you’ve asked for.

This is regularly a fairly tortuous process. In what follows I will be examining just why SELECT_CATALOG_ROLE is DBMS_METADATA cat-nip…and why SELECT ANY DICTIONARY isn’t.
I’ll also look at how you can stop chasing your tail and get this function to behave itself when invoked from within a stored program unit…

SELECT ANY DICTIONARY

According to the Oracle Documentation, the SELECT ANY DICTIONARY privilege allows you to :

“Query any data dictionary object in the SYS schema. This privilege lets you selectively override the default FALSE setting of the O7_DICTIONARY_ACCESSIBILITY initialization parameter.”

Some explanation is required here. Prior to Oracle 8, a user with the SELECT ANY TABLE privilege had access to any table in the database. From 8 onwards, this privilege was restricted to exclude the Data Dictionary. This is where SELECT ANY DICTIONARY came in.

Before we go any further, let’s just make sure that the O7_DICTIONARY_ACCESSIBILITY parameter is not set to TRUE…

select value
from v$parameter
where name = 'O7_DICTIONARY_ACCESSIBILITY'
/

VALUE
--------------------------------------------------------------------------------
FALSE

Now, to see the effect of this privilege on DBMS_METADATA.GET_DDL, let’s create a user :

grant create session, select any dictionary
    to cleo identified by password_you_can_type_with_paws
/

If we now connect to the database as cleo, we can see that she has permissions to query the DBA_SOURCE view, among other things, and therefore to retrieve the source for a procedure in the HR schema :

set heading off
set pages 500
set lines 130
select text
from dba_source
where owner = 'HR'
and name = 'ADD_JOB_HISTORY'
and type = 'PROCEDURE'
order by line
/

Sure enough, we get the expected output :

PROCEDURE add_job_history
  (  p_emp_id          job_history.employee_id%type
   , p_start_date      job_history.start_date%type
   , p_end_date        job_history.end_date%type
   , p_job_id          job_history.job_id%type
   , p_department_id   job_history.department_id%type
   )
IS
BEGIN
  INSERT INTO job_history (employee_id, start_date, end_date,
                           job_id, department_id)
    VALUES(p_emp_id, p_start_date, p_end_date, p_job_id, p_department_id);
END add_job_history;

13 rows selected.

SQL>

By default, EXECUTE on the DBMS_METADATA package are granted to PUBLIC. So, invoking the GET_DDL function for the same program unit should return the DDL statement required to re-create it. We already know we have access to the source so this should be no problem, right ?

set long 5000
select dbms_metadata.get_ddl( schema => 'HR', name => 'ADD_JOB_HISTORY', object_type => 'PROCEDURE')
from dual
/
ERROR:
ORA-31603: object "ADD_JOB_HISTORY" of type PROCEDURE not found in schema "HR"
ORA-06512: at "SYS.DBMS_METADATA", line 4018
ORA-06512: at "SYS.DBMS_METADATA", line 5843
ORA-06512: at line 1

no rows selected

SQL>

Has Oracle forgotten about this procedure ? Maybe it’s rolled under the sofa ?
Either way, further investigation would seem to be in order…

No SELECT_CATALOG_ROLE, no comment

Like a cat, DBMS_METADATA.GET_DDL can appear to be, to put it delicately, rather indepenently minded. It certainly doesn’t always do what it’s told.

To try to determine what’s happening, we could do some tracing and look through the recursive statements to see which precisely what is causing the error. However, there is a quicker way.
Let’s start by looking at the comments in the DBMS_METADATA package header :

-- SECURITY
-- This package is owned by SYS with execute access granted to PUBLIC.
-- It runs with invokers rights, i.e., with the security profile of
-- the caller.  It calls DBMS_METADATA_INT to perform privileged
-- functions.
-- The object views defined in catmeta.sql implement the package's security
-- policy via the WHERE clause on the public views which include syntax to
-- control user access to metadata: if the current user is SYS or has
-- SELECT_CATALOG_ROLE, then all objects are visible; otherwise, only
-- objects in the schema of the current user are visible.

This points us in the direction of the file…

$ORACLE_HOME/rdbms/admin/catmeta.sql

Sure enough, when we get to line 10209, things become a bit clearer…

-- base view for procedures, functions, packages and package bodies

create or replace force view ku$_base_proc_view of ku$_proc_t
  with object identifier (obj_num) as
  select '1','1',
         oo.obj#,
         oo.type#,
         value(o),
         sys.dbms_metadata_util.get_source_lines(oo.name,oo.obj#,oo.type#)
  from  sys.ku$_edition_schemaobj_view o, sys.ku$_edition_obj_view oo
  where (oo.type# = 7 or oo.type# = 8 or oo.type# = 9 or oo.type# = 11)
    and oo.obj#  = o.obj_num and oo.linkname is NULL
         AND (SYS_CONTEXT('USERENV','CURRENT_USERID') IN (o.owner_num, 0) OR
              EXISTS ( SELECT * FROM sys.session_roles
                       WHERE role='SELECT_CATALOG_ROLE' ))
/

Unless you are either the owner of the stored program unit you’re trying to retrieve, or connected as USER_ID 0 ( i.e. SYS), the only way that you’re going to get anything back from a query on this view is if you have been granted a role called SELECT_CATALOG_ROLE.

To verify this, let’s create another user…

grant create session
    to nutmeg identified by must_have_catnip
/

grant select_catalog_role
    to nutmeg
/

Once again, we have access to the DBA_SOURCE view…

set heading off
set pages 500
set lines 130
select text
from dba_source
where owner = 'HR'
and name = 'ADD_JOB_HISTORY'
and type = 'PROCEDURE'
order by line
/

PROCEDURE add_job_history
  (  p_emp_id          job_history.employee_id%type
   , p_start_date      job_history.start_date%type
   , p_end_date        job_history.end_date%type
   , p_job_id          job_history.job_id%type
   , p_department_id   job_history.department_id%type
   )
IS
BEGIN
  INSERT INTO job_history (employee_id, start_date, end_date,
                           job_id, department_id)
    VALUES(p_emp_id, p_start_date, p_end_date, p_job_id, p_department_id);
END add_job_history;

13 rows selected.

SQL>

…however, we can now also use DBMS_METADATA.GET_DDL …

set long 5000
select dbms_metadata.get_ddl( schema => 'HR', name => 'ADD_JOB_HISTORY', object_type => 'PROCEDURE')
from dual
/
  CREATE OR REPLACE PROCEDURE "HR"."ADD_JOB_HISTORY"
  (  p_emp_id          job_history.employee_id%type
   , p_start_date      job_history.start_date%type
   , p_end_date        job_history.end_date%type
   , p_job_id          job_history.job_id%type
   , p_department_id   job_history.department_id%type
   )
IS
BEGIN
  INSERT INTO job_history (employee_id, start_date, end_date,
                           job_id, department_id)
    VALUES(p_emp_id, p_start_date, p_end_date,
 p_job_id, p_department_id);
END add_job_history;

SQL>

It seems that DBMS_METADATA has retrieved it’s fluffy ball from under the sofa.

DBMS_METADATA.GET_DDL in a Stored Program Unit

The fact that SELECT_CATALOG_ROLE is a role rather than a system privilege does tend to make life interesting if you put calls to DBMS_METADATA.GET_DDL into a stored program unit.

To demonstrate, let’s create a function that does just that (in the nutmeg schema – i.e. a schema that does not have the role granted to it) :

alter session set current_schema = nutmeg
/

create or replace function get_ddl( i_owner in varchar2, i_name in varchar2, i_type in varchar2)
    return clob
is
begin
    dbms_output.put_line('Listing active session roles...');
    for r_role in (select role from sys.session_roles) loop
        dbms_output.put_line('Role : '||r_role.role);
    end loop;
    dbms_output.put_line('Roles listed');
    return dbms_metadata.get_ddl( schema => i_owner, name => i_name, object_type => i_type);
end;
/

This should work fine for a user with the role granted, shouldn’t it ?
Let’s test it (once again, connected as cleo)…

set serveroutput on size unlimited
select get_ddl('HR', 'ADD_JOB_HISTORY', 'PROCEDURE')
from dual
/
ERROR:
ORA-31603: object "ADD_JOB_HISTORY" of type PROCEDURE not found in schema "HR"
ORA-06512: at "SYS.DBMS_METADATA", line 4018
ORA-06512: at "SYS.DBMS_METADATA", line 5843
ORA-06512: at "NUTMEG.GET_DDL", line 10

no rows selected

Listing active session roles...
Roles listed
SQL>

Because we’re calling the DBMS_METADATA.GET_DDL function from inside a stored program unit, the role is disabled.
In order for this to work we need to make the stored program unit invoker’s rights…

alter session set current_schema = nutmeg
/

create or replace function get_ddl( i_owner in varchar2, i_name in varchar2, i_type in varchar2)
    return clob
    authid current_user
is
begin
    dbms_output.put_line('Listing active session roles...');
    for r_role in (select role from sys.session_roles) loop
        dbms_output.put_line('Role : '||r_role.role);
    end loop;
    dbms_output.put_line('Roles listed');
    return dbms_metadata.get_ddl( schema => i_owner, name => i_name, object_type => i_type);
end;
/

Now, when we invoke this function (as cleo once again), the role is still applicable…

set serveroutput on size unlimited
set long 5000
set heading off
select get_ddl('HR', 'ADD_JOB_HISTORY', 'PROCEDURE')
from dual
/

  CREATE OR REPLACE PROCEDURE "HR"."ADD_JOB_HISTORY"
  (  p_emp_id          job_history.employee_id%type
   , p_start_date      job_history.start_date%type
   , p_end_date        job_history.end_date%type
   , p_job_id          job_history.job_id%type
   , p_department_id   job_history.department_id%type
   )
IS
BEGIN

  INSERT INTO job_history (employee_id, start_date, end_date,
                           job_id, department_id)
    VALUES(p_emp_id, p_start_date, p_end_date, p_job_id, p_department_id);
END add_job_history;

Listing active session roles...
Role : SELECT_CATALOG_ROLE
Role : HS_ADMIN_SELECT_ROLE
Roles listed
SQL>

It would seem then, that only a user granted SELECT_CATALOG_ROLE can usefully use DBMS_METADATA.GET_DDL in a stored program unit to retrieve DDL for objects not owned by them.

Something you might want to consider at this point, is that SELECT_CATALOG_ROLE is itself granted the HS_ADMIN_SELECT_ROLE role. Between them, these roles have SELECT access on quite a large number of SYS objects :

select count(*)
from dba_tab_privs
where grantee in ('SELECT_CATALOG_ROLE', 'HS_ADMIN_SELECT_ROLE')
/

  COUNT(*)
----------
      2207

SQL>

At this point, you may well ask if there is any way for users to utilise our function without having this role granted.
After all, a common approach to application security is to bestow execute access to users on a stored program unit without them having any visibility of the underlying tables and views.

Well, there is…

Doing without the SELECT_CATALOG_ROLE

Before we go any further I think I should point out that there are several issues with creating objects in the SYS schema.

The objects in the schema are effectively part of the Oracle software. The schema can be seen as being analogous to root on a *nix system.

There are various admonitions against performing DDL in the SYS schema. These include

As with most “golden rules” however, there is at least one exception – in this case, the Password Verify Function springs to mind.

I suppose the best reason for avoiding this sort of thing is that it would only take one moment of inattention to cause potentially irreparable damage to your Oracle installation. Even with a backup re-installing Oracle is no trivial matter. Therefore, I strongly suggest that you consider carefully whether the benefits of the approach I’m about to take here outweigh the issues of granting SELECT_CATALOG_ROLE.

So then, as SYS…

create function get_ddl( i_owner in varchar2, i_name in varchar2, i_type in varchar2)
    return clob
is
begin
    dbms_output.put_line('Listing active session roles...');
    for r_role in (select role from sys.session_roles) loop
        dbms_output.put_line('Role : '||r_role.role);
    end loop;
    dbms_output.put_line('Roles listed');
    dbms_output.put_line('Userid is : '||sys_context('userenv', 'current_userid'));
    return dbms_metadata.get_ddl( schema => i_owner, name => i_name, object_type => i_type);
end;
/

grant execute on get_ddl to cleo
/

Note that we’re using CREATE rather than CREATE OR REPLACE to ensure that we don’t accidentally overwrite anything.

Now, when we call this function as cleo…

set serveroutput on
set heading off
set lines 130
set long 5000
select sys.get_ddl('HR', 'ADD_JOB_HISTORY', 'PROCEDURE')
from dual
/

  CREATE OR REPLACE PROCEDURE "HR"."ADD_JOB_HISTORY"
  (  p_emp_id          job_history.employee_id%type
   , p_start_date      job_history.start_date%type
   , p_end_date        job_history.end_date%type
   , p_job_id          job_history.job_id%type
   , p_department_id   job_history.department_id%type
   )
IS
BEGIN

  INSERT INTO job_history (employee_id, start_date, end_date,
                           job_id, department_id)
    VALUES(p_emp_id, p_start_date, p_end_date,
 p_job_id, p_department_id);
END add_job_history;

Listing active session roles...
Roles listed
Userid is : 0
SQL>

The fact that we don’t have the role becomes irrelevant because the function is running with definer’s rights (i.e. as SYS, which as a CURRENT_USERID of 0).

Conclusions

The security model implemented for DBMS_METADATA is quite unusual for Oracle supplied PL/SQL packages.
As we have seen, invoking this package, especially it’s GET_DDL function often behaves in unexpected (and possibly entertaining) ways.


Filed under: Oracle, PL/SQL Tagged: 07_dictionary_accessibility, catmeta.sql, current_userid, DBMS_METADATA, DBMS_METADATA.GET_DDL, definer's rights, ku$_edition_schemaobj_view, ORA-31603, select any dictionary, SELECT_CATALOG_ROLE, sys.session_roles, SYS_CONTEXT

Getting one of your Five-a-Day – connecting Remotely to a Raspberry Pi from Linux Mint

Sat, 2015-12-26 12:42

It’s Christmas. To mark the occasion, my son bought me a top-of-the-range computer…

pi_board

Christmas has come early ! Er, hang, on…

Yes, a Raspberry Pi 2 b-spec, complete with 900 MHz Quad-core ARM processor and 1 GB RAM.

Getting it up and running was a bit more of a challenge than I had anticipated.
The Pi uses HDMI for Video output and my ageing monitor is not equipped for HDMI…

tv

The best program on TV – NOOBS doing it’s thing.

In the end, I had to “borrow” the TV.
This arrangement was, of necessity, extremely temporary. The TV had to be back in it’s usual place ready for The Strictly-TOWIE-Dancing-Get-Me-Out-Of-Here Christmas Special, on pain of pain.
Therefore, my first Pi project was to connect to it remotely from another machine, namely, my Linux Mint Laptop.
This will enable me to run the Pi headless (i.e. without a monitor/keyboard/mouse attached to it).

I’m going to cover two different methods of connecting to the Pi.
The first is using ssh to connect to the command line.
The second is to connect remotely to the Raspbian desktop itself.

Just to avoid any confusion, I will be referring to the Raspberry Pi as “the Pi” and the machine I’m connecting from as “Mint”.

About the Environment The Pi

The Pi I’m using for this is running the Raspbian Jessie OS.
It is set to start the desktop on boot.
The Pi is up and the desktop is running.
The Pi is connected to the router via a network cable.

The Mint Machine

The version of Mint I’m running on is 17.2.

The Network

I’m running on a standard home network with all devices connecting to a router.
The router assigns IP addresses to each connected machine dynamically via DHCP.
Apart from the router itself, no device on the network has a fixed IP address.

SSH – Setup

We need to perform these steps on the Pi.

First of all, we need to know what the machine name of the Pi is.

As the devices that connecting to the network are dynamically allocated an IP address it’s simpler to address a specific machine by name.

So, on the Pi, open a terminal and type :

uname -n

This returns the name of the computer. In my case :

raspberrypi

The next thing we need to do is to make sure that the Pi will accept connections via SSH.

On the Raspbian Desktop, click on the Menu and select Preferences/Raspberry Pi Configuration

prefs_menu

Let’s face it, this is the closest I’m going to get to fresh fruit and veg for the next couple of weeks.

Next click on the Interfaces tab and make sure that ssh is enabled.

config_window

Once the steps have been completed, we’re ready to test…

SSH – from the remote machine

We want to connect to the Pi from a remote machine as a user that exists on the Pi.
Note that this user does not need to exist on the remote machine.

When we run the ssh command, we need to specify the user we’re connecting as, and the name of the machine we’re connecting to (i.e. the Raspberry Pi itself).

I’m going to connect as the user pi. We’ve already found out that the name of the Raspberry Pi is “raspberrypi”.

So, I just need to open a terminal window on my remote machine and type :

ssh pi@raspberrypi

The first time you ssh to another computer on your network, you’ll get this warning :

The authenticity of host 'raspberrypi (192.168.1.144)' can't be established.
ECDSA key fingerprint is 03:72:d9:84:58:c8:a6:cc:37:bc:c3:47:8f:1c:90:e0.
Are you sure you want to continue connecting (yes/no)? 

Type “yes” and Hit Return…

Warning: Permanently added 'raspberrypi,192.168.1.144' (ECDSA) to the list of known hosts.
pi@raspberrypi's password:

Enter the password and…

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

To confirm that we’re now connected to the pi…

uname -a
Linux raspberrypi 4.1.13-v7+ #826 SMP PREEMPT Fri Nov 13 20:19:03 GMT 2015 armv7l GNU/Linux

Whilst ssh is handy if you just need command line access to the Pi, if you want access to the Desktop, you’ll need to try something a bit different.

Using RDP to run the Pi Desktop Remotely

Now, there is more than one way to do this. I’ve chosen to use RDP as it’s quite simple to setup.

Installing xrdp on the Pi

To start with, we need to install the xrdp package on the Pi. At this point, you can either do this on the machine itself ( by opening a Terminal window), or connect via ssh.
Either way, the command you need to enter is :

sudo apt-get install xrdp

You will be prompted for your password and should then get some output that looks like this :

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  vnc4server x11-apps x11-session-utils xbase-clients xbitmaps xfonts-base
Suggested packages:
  vnc-java mesa-utils x11-xfs-utils
The following NEW packages will be installed:
  vnc4server x11-apps x11-session-utils xbase-clients xbitmaps xfonts-base xrdp
0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded.
Need to get 8,468 kB of archives.
After this operation, 17.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://mirrordirector.raspbian.org/raspbian/ jessie/main x11-apps armhf 7.7+4 [529 kB]
Get:2 http://mirrordirector.raspbian.org/raspbian/ jessie/main x11-session-utils armhf 7.7+1 [60.1 kB]
Get:3 http://mirrordirector.raspbian.org/raspbian/ jessie/main xbase-clients all 1:7.7+7 [36.7 kB]
Get:4 http://mirrordirector.raspbian.org/raspbian/ jessie/main vnc4server armhf 4.1.1+X4.3.0-37.6 [1,434 kB]
Get:5 http://mirrordirector.raspbian.org/raspbian/ jessie/main xbitmaps all 1.1.1-2 [32.1 kB]
Get:6 http://mirrordirector.raspbian.org/raspbian/ jessie/main xfonts-base all 1:1.0.3 [6,181 kB]
Get:7 http://mirrordirector.raspbian.org/raspbian/ jessie/main xrdp armhf 0.6.1-2 [195 kB]
Fetched 8,468 kB in 35s (236 kB/s)
Selecting previously unselected package x11-apps.
(Reading database ... 123536 files and directories currently installed.)
Preparing to unpack .../x11-apps_7.7+4_armhf.deb ...
Unpacking x11-apps (7.7+4) ...
Selecting previously unselected package x11-session-utils.
Preparing to unpack .../x11-session-utils_7.7+1_armhf.deb ...
Unpacking x11-session-utils (7.7+1) ...
Selecting previously unselected package xbase-clients.
Preparing to unpack .../xbase-clients_1%3a7.7+7_all.deb ...
Unpacking xbase-clients (1:7.7+7) ...
Selecting previously unselected package vnc4server.
Preparing to unpack .../vnc4server_4.1.1+X4.3.0-37.6_armhf.deb ...
Unpacking vnc4server (4.1.1+X4.3.0-37.6) ...
Selecting previously unselected package xbitmaps.
Preparing to unpack .../xbitmaps_1.1.1-2_all.deb ...
Unpacking xbitmaps (1.1.1-2) ...
Selecting previously unselected package xfonts-base.
Preparing to unpack .../xfonts-base_1%3a1.0.3_all.deb ...
Unpacking xfonts-base (1:1.0.3) ...
Selecting previously unselected package xrdp.
Preparing to unpack .../xrdp_0.6.1-2_armhf.deb ...
Unpacking xrdp (0.6.1-2) ...
Processing triggers for man-db (2.7.0.2-5) ...
Processing triggers for fontconfig (2.11.0-6.3) ...
Processing triggers for systemd (215-17+deb8u2) ...
Setting up x11-apps (7.7+4) ...
Setting up x11-session-utils (7.7+1) ...
Setting up xbase-clients (1:7.7+7) ...
Setting up vnc4server (4.1.1+X4.3.0-37.6) ...
update-alternatives: using /usr/bin/vnc4server to provide /usr/bin/vncserver (vncserver) in auto mode
update-alternatives: using /usr/bin/Xvnc4 to provide /usr/bin/Xvnc (Xvnc) in auto mode
update-alternatives: using /usr/bin/x0vnc4server to provide /usr/bin/x0vncserver (x0vncserver) in auto mode
update-alternatives: using /usr/bin/vnc4passwd to provide /usr/bin/vncpasswd (vncpasswd) in auto mode
update-alternatives: using /usr/bin/vnc4config to provide /usr/bin/vncconfig (vncconfig) in auto mode
Setting up xbitmaps (1.1.1-2) ...
Setting up xfonts-base (1:1.0.3) ...
Setting up xrdp (0.6.1-2) ...
Processing triggers for systemd (215-17+deb8u2) ...

Once that little lot has scrolled up your screen, you can exit the session ( just type “exit”).
There are some guides which suggest that you need to re-boot the Pi at this point. I found that this was not necessary. However, if things don’t quite work as described from this point on, it may be worth doing this. After all, “Have you tried turning it off and on again ?” is a cliche for a reason !

It’s probably worth mentioning that, at this point, you should be able to connect from any Windows ( Windows 7 or above) remote machine using the built-in Remote Desktop app.
That’s not we’re after though. Oh no. We want to be able to do this from Mint…

Installing rdesktop on Linux Mint

Back on Mint, open a Terminal window and…

sudo apt-get install rdesktop

Once again you should be prompted for your password ( remember this is for your user on the Mint machine, not on the Pi). You should then see something like …

[sudo] password for mike:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
  pcscd
The following NEW packages will be installed
  rdesktop
0 to upgrade, 1 to newly install, 0 to remove and 83 not to upgrade.
Need to get 139 kB of archives.
After this operation, 427 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu/ trusty/universe rdesktop amd64 1.7.1-1ubuntu2 [139 kB]
Fetched 139 kB in 10s (12.6 kB/s)
Selecting previously unselected package rdesktop.
(Reading database ... 192214 files and directories currently installed.)
Preparing to unpack .../rdesktop_1.7.1-1ubuntu2_amd64.deb ...
Unpacking rdesktop (1.7.1-1ubuntu2) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Setting up rdesktop (1.7.1-1ubuntu2) ...

Again, I found that no re-boot was required on the Mint machine. Once again, it might be different for you.

Either way, we should now be able to prove that the Pi will run happily without all of those wires plugged into it…

Connecting via rdesktop

…but I’d recommend keeping it all plugged in until you’ve tested first.

To connect from Mint, open a Terminal window and type :

rdesktop raspberrypi -u username_on_pi -p password -g 90%

…where username_on_pi is the name of the user on the Pi and password is the password for that user.

The -g switch tells rdesktop what size to create the window ( in our case, 90% of the screen size on our main machine)

You should now see (after a short interlude whilst the desktop is rendered)….

rdp_on_pi

Alternatively, if you want to run in a full screen, you can use the -f switch for rdesktop instead – i.e. :

rdesktop raspberrypi -u pi -p raspberry -f

Once you’re in full screen mode, you can toggle between full screen and window mode at any point by pressing CTRL+ALT+ENTER.
It’s worth noting if you do this and then minimize the rdesktop window, when you maximize again, desktop can appear to be blank and or the re-drawing might not be complete. I’m not sure why this is, or what the fix is.

One other point to note, it’s more secure to run rdesktop without specifying the password. In this way, you’ll be prompted for it when you connect.
So, if you run…

rdesktop raspberrypi -u pi -g 90%

…you will be presented with

pi_login

A Python Program to automate connection

To save us typing in the rdesktop command each time we want to connect to the Pi, we could write a simple bash script to automate our rdesktop command. However, in the circumstances, Python seems a more appropriate medium…

#!/usr/bin/env python

import sys, subprocess

def main():
    # Make sure that we've got a single argument passed in
    # NOTE - the first argument is the name of this program.
    if len(sys.argv) != 2 :
        sys.exit('Usage : %s screen_size as percentage between 25 and 100', sys.argv[0])

    if is_valid_screen_size(sys.argv[1]) :
        # Pass in the screen size to the function to build the rdesktop command
        command = build_rdp_command(sys.argv[1])
    else :
        sys.exit('Usage : %s screen_size as percentage between 25 and 100', sys.argv[0])

    try :
        # Run the command...
        status = subprocess.call(command, shell=True)
    except OSError as e :
        print >> sys.stderr, 'Error : ', e
    sys.exit( status)

def is_valid_screen_size( input_size) :
    # Validate the screen size.
    # Return True if it's a valid value

    # Make sure requested size is an integer
    try :
        int( input_size)
    except ValueError :
        return False
    # Now make sure it's not ridiculously small...or over 100
    if int( input_size) < 25 or int(input_size) > 100 :
        return False
    else :
        return True

def build_rdp_command(screen_size):
    # Return the appropriate rdesktop command

    # Initialize &quot;constants&quot; to use in the rdesktop command.
    PI_NAME = 'raspberrypi'
    PI_USER = 'pi'

    command_str = "rdesktop " + PI_NAME + " -u " + PI_USER
    if screen_size == 100 :
        # Full screen
        command_str = command_str + " -f "
    else :
        # Specify the percentage
        command_str = command_str + " -g " + screen_size +"%"
    return command_str    

if __name__ == "__main__" :
    main()

The program is saved as rdp_to_pi.py.
At the time of writing, the default Python version on Mint (17.2) is 2.7.6 ( although you can invoke a Python 3.4.3 interpreter by typing python3 at the prompt).
Therefore, this program is written for Python v2.

The first line of the program tells Linux to use the Python interpreter when this program is executed.

The program then following :

  • validates that it’s been passed a sensible argument value on the command line for the screen size percentage
  • builds the appropriate rdesktop command line using “constant” value for the machine name for the Pi and the name of the user to connect as
  • executes the command

To run the program, you first need to set the appropriate file permissions…

chmod u+x rdp_to_pi.py

…and then run it using a single “.” followed by a “/” ( not the usual “. ./” to run a bash script)…

./rdp_to_pi.py 90
Keyboard Mapping Issues

There are some other issues as well, most notably, the Pi seems to have forgotten where it is and has somehow adopted settings for a US keyboard.
If you want to test this and happen to have a UK keyboard, try opening a text editor and typing any of the following :

  • ” – will print @
  • @ – will print “
  • £ – will print #
  • | – will print ???
  • ~ – will print |

I’ve tried various things to fix this, but to no avail.
Despite the fact that both the Pi and the remote machine are configured with a UK keyboard, rdesktop seems to ignore this and insist on using US keyboard mappings.
I suspect that this is something to do with an X-Server configuration setting somewhere but I just can’t figure out where.

You may have more luck using this link as a starting point.

If anyone does have a solution to this, please let me know.

For now though, I’ve achieved my project goals :

  • get the telly back in place before there’s trouble
  • allow me to play with my Raspberry Pi whilst the festive extravaganza plays out on said TV

That should keep me quiet for a bit.


Filed under: Linux, Mint Tagged: python, python main function, python subprocess module, python sys module, python ValueError exception, Raspberry Pi, SSH, xrdp