Development

Historical Perspective of ORM and Alternatives

Kenneth Downs - Wed, 2010-12-15 20:55

A couple of years ago I broke my basic rule of sticking to practical how-to and general programming philosophy and wrote Why I Do Not Use ORM. It sure got a lot of hits, and is read every day by people searching such things as "orm bad" or "why use orm". But I have never been satisfied with that post, and so I decided to take another stab from another angle. There are legitimate problems that led to ORM, and those problems need to be looked at even if we cannot quite agree on what they are or if ORM is the answer.

UPDATE: In response to comments below and on reddit.com, I have a new post that gives a detailed analysis of an algorithm implemented as a sproc, in app code with embedded SQL, and in ORM.

Here then, is one man's short history of commercial database application programming, from long before the ORM system, right up to the present.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The Way Back Machine

When I began my career the world was a different place. No Web, no Java, and Object Orientation had not yet entered the mainstream. My first application was written on a timeshare system (a microVAX) and writing LAN applications made me a good living for awhile before I graduated to client/server.

In those days there were three things a programmer (We were not "software engineers" yet, just programmers) had to know. Every programmer I knew wanted to master all of these skills. They were:

  • How to design a database schema for correctness and efficiency.
  • How to code an application that could process data from the database, correctly and efficiently.
  • How to make a good UI, which came down to hotkeys and stuffing the screen with as much info as possible.

In this essay we are going to look at those first two.

My own experience may be somewhat peculiar in that I have never worked on a team where the programmers were separated from the database. (OK, one exception, in my current assignment there is an iron curtain between the two, but happily it is not my problem from where I sit). Coders made tables, and "tablers" wrote code. So this focus on being a good developer by developing both skills may be rare, enjoyed by those who have the same ecumenical background that I enjoyed.

Some Changes That Did Not Matter

Things changed rapidly, but most of those changes did not really affect application development.

When Windows 95 came out, being "almost as good as a Mac", we recoded our DOS apps into Windows apps without too much trouble and life went on as before.

Laser printers replaced dot-matrix for most office use, CPUs kept getting faster (and Windows kept getting slower), each year there were more colors on the screen, disks got bigger and RAM got cheaper.

Only the internet and the new stateless programming required any real adjustment, but it was easy for a database guy because good practice had always been to keep your transactions as short as possible. The stateless thing just kind of tuned that to a sharp edge.

Finally, with the internet, the RDBMS finally lost its place as sole king of the datastore realm, but those new datastores will have to wait for another day, lest we get bogged down.

Enter Object Orientation

Arguably nothing changed programming more than Object Orientation. Certainly not Windows 95, faster graphics or any of those other Moore's Law consequences. I would go so far as to say that even the explosion of the web just produced more programming, and of different kinds of apps, and even that did not come close to the impact of Object Orientation. Disagree if you like, but as it came in, it was new, it was strange, it was beautiful, and we were in love.

Now here is something you may not believe. The biggest question for those of us already successfully developing large applications was: What is it good for? What does it give me that I do not already have? Sure its beautiful, but what does it do?

User interfaces were for me the easiest first place to see the benefits. When the widgets became classes and objects, and we empolyed encapsulation, inheritance and composition, the world changed and I don't know anybody who ever looked back.

OOP, Data, and Data Structures

But in the matter of processing data, things were not so clear cut. The biggest reason may have been that all languages back then had specialized data structures that were highly tuned to handling relational data. These worked so well that nobody at first envisioned anything like ActiveRecord because we just did not need it.

With these structures you could write applications that ran processes involving dozens of tables, lasting hours, and never wonder, "Gosh, how do I map this data to my language of choice?" You chose the language you were using precisely because it knew how to handle data!

I would like to throw in just one example to show how OOP was not relevant to getting work done back then. I was once asked to optimize something called "ERP Allocation" that ran once/day, but was taking 26 hours at the largest customer site, obviously a big problem. It turned out there was a call to the database inside of a tightly nested loop, and when I moved the query outside of the loop the results were dramatic. The programmers got the idea and they took over from there. The main point of course is that it was all about how to efficiently use a database. The language was OOP, and the code was in a class, but that had nothing to do with the problem or the solution. Going further, coding a process so data intensive as this one using ActiveRecord was prima facia absurd to anybody who knew about data and code.

Java and the Languages of The Internet

But the web had another impact that was far more important than just switching to stateless programming. This was the introduction of an entirely new family of languages that took over the application space, listed here in no particular order: Perl, PHP, Python, Ruby, and the king of them all: Java.

All of these languages have one thing in common that positively jumps out at a veteran: a complete lack of data structures specialized for handling relational data. So as these languages exploded in popularity with their dismal offerings in data handling, the need to provide something better in that area became rapidly clear.

Java has a special role to play because it was pure OOP from the ground up. Even the whitespace is an object! The impact of Java is very important here because Object Orientation was now the One True Faith, and languages with a more flexible approach were gradually demoted to mere 'scripting' languages. ( Of course proponents will quickly point out that 1/12 of the world's population is now using a single application written in one of those 'scripting' languages).

So the explosion of languages without decent data handling abilities, coupled with a rise in OOP-uber-alles thinking led us quite naturally to:

The First Premise of ORM: The Design Mismatch

The first premise of ORM is that there is a design mismatch between OOP and Relational, which must resolved before any meaningful work can be done.

This view is easy to sympathize with, even if you disagree, when you consider the points raised in the above sections, that the languages in play lack any real specialized data structures, and that a certain exclusive truthiness to OOP has arisen that is blind to entire classes of solutions.

So we must grant the ORM crowd their first premise, in modified form. It is not that there is a design mismatch, it is that there is something missing, something that was in older systems that is just not there in the newer languages. Granting that this missing feature is an actual mismatch requires a belief in the Exclusive Truth of OOP, which I do not grant. OOP is like the computer itself, of which Commander Spock said, "Computers make excellent servants, but I have no wish to be servant to a computer."

But anyway, getting back to the story, the race was on to replace what had been lost, and to do it in an OOPy way.

The Second Premise of ORM: Persistence

Fast forward and we soon have an entire family of tools known as Object-Relational-Mappers, or ORM. With them came an old idea: persistence.

The idea has always been around that databases exist to persist the work of the programmer. I thought that myself when I was, oh, about 25 or so. I learned fast that my view of reality was, *cough*, lacking, and that in fact there are two things that are truly real for a developer:

  • The users, who create the paycheck, and
  • The data, which those users seemed to think was supposed to be correct 100% of the time.

From this perspective, the application code suddenly becomes a go-between, the necessary appliance that gets data from the db to the user (who creates the paycheck), and takes instructions back from the user and puts them in the database (correctly, thank you, and don't make the user wait). No matter how beautiful the code was, the user would only ever see the screen (or page nowadays) and you only heard about it if it was wrong. Nobody cares about my code, nobody cares about yours.

However, in the ORM world the idea of a database as the persistence layer now sits on a throne reserved for axiomatic truth. Those who disagree with me on this may say that I have the mistaken perspective of an outsider, to which I could say only that it is this very idea that keeps me an outsider.

But we should not paint the world with a broad brush. Chris Wong writes an excellent blog where he occassionally details how to respect the database while using Hibernate, in this post and this post.

An Alternative World View

There are plenty of alternatives to ORM, but I would contend that they begin with a different world view. Good business recognizes the infinite value of the users as the generators of the Almighty Paycheck, and the database as the permanent record of a job well done.

This worldview forces us into a humble position with respect to our own application code, which is that it is little more than a waiter, carrying orders to the kitchen and food back to the patrons. When we see it this way, the goal becomes to write code that can efficiently get data back and forth. A small handful of library routines can trap SQL injection, validate types, and ship data off to the database. Another set can generate HTML, or, can simply pass JSON data up to those nifty browser client libraries like ExtJS (now "Sencha" for some reason).

This covers a huge amount of what an application does, if you do not have much in the way of business logic.

But How Do You Handle Business Logic?

I have an entire essay on this about half-written, but in short, it comes down to understanding what business logic really is. Update: This post is now available

The tables themselves are the bottom layer of business logic. The table design itself implements the foundation for all of the business rules. This is why it is so important to get it right. The tables are organized using normalization to have a place for everything and everything in its place, and after that the application code mostly writes itself.

The application code then falls into two areas: value-add and no value-add. There is no value-add when the application simply ships data off to the user or executes a user request to update the database. Those kinds of things should be handled with the lightest possible library that gets the job done.

But the value-add stuff is different, where a user's request requires lookups, possibly computations and so forth. The problem here is that a naive analysis of requirements (particulary the transliteration error (Scroll down to "The Customer Does Not Design Tables) will tend to generate many cases of perceived need for value-add where a simpler design can reduce these cases to no value-add. But even when the database has been simplified to pristine perfection, there are jobs that require loops, multiple passes and so forth, which must be made idempotent and robust, which will always require some extra coding. But if you know what you are doing, these always turn out to be the ERP Allocation example given above: they are a lot more about the data than the classes.

Another huge factor is where you come down on the normalization debate, particularly on the inclusion of derived values. If you keep derived values out of the database, which is technically correct from a limited perspective, then suddenly the value-add code is much more important because without it your data is incomplete. If you elect to put derived values into your database than value-add code is only required when writing to the database, so huge abstractions meant to handle any read/write situation are unnecessary. (And of course, it is extremely important to Keep denormalized values correct ).

And the Rest of It

This essay hardly covers the entirety of making code and data work together. You still have to synchronize schema changes to code, and I still think a data dictionary is the best D-R-Y way to do that.

I hope this essay shows something of why many programmers are so down on ORM, but much more importantly that there are coherent philosophies out there that begin with a different worldview and deliver what we were all doing before ORM and what we will all still be doing after ORM: delivering data back and forth between user and database.

Categories: Development

The Cost of Round Trips To The Server

Kenneth Downs - Sat, 2010-12-11 13:33

A database is not much without the applications that connect to it, and one of the most important factors that affects the application's performance is how it retrieves data from queries. In this essay we are going to see the effect of round trips on application performance.

This blog has two tables of contents, the Complete Table of Contents and the list of Database Skills.

Pulling 15,000 Rows

The test will pull 15,000 rows from a table. We do it three different ways and see which is faster and by how much.

Getting a Lot of Rows

The script below creates a table and puts 1 million rows into it. We want far more rows in the table than we will actually pull so that we can pull fresh rows on every pass through the test. It is deliberately crafted to spread out the adjacent values of the integer primary key. This is because, inasmuch as can control what is going on, we want every single row to be on a different page, so that in all tests the cost of retrieving the row is roughly the same and we are measuring only the effect of our retrieval methods.

The script can be run without modification in pgAdmin3, and with slight mods on MS SQL Server.

create table test000 (
    intpk int primary key
   ,filler char(40)
)


--  BLOCK 1, first 5000 rows
--  pgAdmin3: run as pgScript
--  All others: modify as required  
--
declare @x,@y;
set @x = 1;
set @y = string(40,40,1);
while @x <= 5000 begin
    insert into test000 (intpk,filler)
    values ((@x-1)*200 +1,'@y');

    set @x = @x + 1;
end

-- BLOCK 2, put 5000 rows aside 
--
select  * into test000_temp from test000

-- BLOCK 3, Insert the 5000 rows 199 more
--          times to get 1million altogether
--  pgAdmin3: run as pgScript
--  All others: modify as required  
--  
declare @x;
set @x = 1;
while @x <= 199 begin
    insert into test000 (intpk,filler)
    select intpk+@x,filler from test000_temp;

    set @x = @x + 1;
end
Test 1: The Naive Code

The simplest code is a straight loop that pulls 15,000 consecutive rows by sending an explicit query for each one.

# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 1, Individual explicit fetches
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 1, using $x1 to $x2";
$timeBegin = microtime(true);
while ($x1++ <= $x2) {
    $dbResult = pg_exec("select * from test000 where intpk=$x1");
    $row = pg_fetch_array($dbResult);
}
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 1, elapsed time: ".$elapsed;
echo "\n";
Test 2: Prepared Statements

The next command asks the server to prepare a statement, but it still makes 15,000 round trips, executing the prepared statement with a new parameter each time. The code looks like this:

# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 2, Individual fetches with prepared statements
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 2, using $x1 to $x2";
$timeBegin = microtime(true);
$dbResult = pg_prepare("test000","select * from test000 where intpk=$1");
while ($x1++ <= $x2) {
    $pqResult = pg_execute("test000",array($x1));
    $row = pg_fetch_all($pqResult);
}
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 2, elapsed time: ".$elapsed;
echo "\n";
Test 3: A single round trip

This time we issue a single command to retrieve 15,000 rows, then we pull them all down in one shot.

# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 3, One fetch, pull all rows
$timeBegin = microtime(true);
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 3, using $x1 to $x2";
$dbResult = pg_exec(
    "select * from test000 where intpk between $x1 and $x2"
);
$allRows = pg_fetch_all($dbResult);
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 3, elapsed time: ".$elapsed;
echo "\n";
Results

I ran this five times in a row, and this is what I got:

Naive 15,000 Prepared 15,000 One Round Trip ~1.800 seconds ~1.150 seconds ~0.045 seconds

Compared to the naive example, the set-oriented fetch of al 15,000 rows in a single shot ran 40 times faster. This is what set-oriented code does for an application.

While the prepared statement option ran faster than the naive option, the set oriented example still ran 25 times faster than the repeated prepared statements.

I also re-arranged the order of the tests, and the results were the same.

Does Server or Language Matter?

So this test was done using PHP against PostgreSQL, will other servers and client languages get different results? Given the same hardware, a different client language or server is going to have a different spread but the shape will be the same. Fetching all rows in a single shot beats the living frack out of round trips inside of loops in any client language against any server.

Putting It Into Use

The most obvious conclusion is that any query returning more than 1 row should return all rows as a set. The advantage is so stark with large row counts that it is worthwhile making this the default for our applications, unless we can find a very good reason not to. So what would the objections be?

One objection might go something like, "Ken, I see the numbers, but I know my app very well and we never pull more than 10-20 rows in a pop. I cannot imagine how it would matter at 10-20 rows, and I do not want to recode." This makes sense so I ran a few more tests with 20 and 100 rows, and found that, on my hardware, you need about 100 rows to see a difference. At 20 rows all three are neck-in-neck and at 100 the set is pulling 4 times faster than the prepared statement and 6 times faster than the naive statement. So the conclusion is not an absolute after all, some judgment is in order.

Another thing to consider is how many simultaneous reads and writes might be going on at any given time. If your system is known to have simultaneous transactions running regularly, then the complete fetch may be a good idea even if you do some tests for best-guess row count and the tests are inconclusive. The reason is that the test is a single user case, but multiple simultaneous users put a strain on the database, even when they are not accessing the same tables. In this case we want the application to play the "good citizen" and get in and out as quickly as possible to reduce strain on the server, which will improve the performance of the entire application, not just the portions optimized for complete fetches.

Another objection might be, "Well, my code needs to pull from multiple tables, so I cannot really do this. When we do -PROCESS-X- we go row by row and need to pull from multiple tables for each row." In this case you *definitely* need to go set oriented and pull all associated quantities down in a query with a JOIN or two. Consider this, if on your particular hardware the ratio of naive row-by-row to single fetch is 10, and you must pull from 2 other tables for each row, that means you are really running 30 times slower (ratio is 10 x 3 reads) than you could be.

A Final Note About PHP, Data Structures, and Frameworks

Back when dinosaurs ruled the Earth and there was no internet (outside of Universities, etc), the languages we used had specialized data structures that were tuned to database use. Compared to those older systems the newer languages born on the internet are more or less starving for such a data structure.

PHP gets by fairly well because its associative array can be used as a passive (non object-oriented) data structure that comes pretty close to what we had before.

I bring this up because the choice of a language and its support for a "fetch all" operation obviously impacts how well the conclusions of this post can be implemented. If your mapping tool has an iterator that absolves you of all knowledge of what is going on under the hood, it may be worthwhile to see if it is doing a complete fetch or a row-by-row.

Categories: Development

Submit Analysis Request to the Database Programmer

Kenneth Downs - Wed, 2010-12-08 17:50

I generally do not reveal too many details about systems I design for customers or employers. This leaves me sometimes in a bind for example material. I either have to simplify it beyond what I would like, or make something up that I have not actually put into Production.

On top of that, one of the key themes of this blog is that table design is a crucial skill, and if the examples I give do not match what you are doing, they may be hard to make use of.

So I would like invite analysis requests. Go over to the Contact the Author page and drop me an email and tell me about the system you are trying to design or optimize.

There are no rules on the type of system.

The most interesting mini-projects would be those where advice you have been given elsewhere (or here for that matter) does not seem to fit.

I will do my best to reply, even if I have to say no, so that nobody is left wondering.

Remember this blog is one of those hobby/professional things, good for all of us but nobody is getting paid, so if you are in a terrible hurry this might not be the best thing.

Categories: Development

A Case When Table Design is Easy and Predictable

Kenneth Downs - Thu, 2010-12-02 20:36

Good table design is a great foundation for a successful application stack. Table design patterns basically resolve into master tables and transaction tables. When we know a thing or two about the master tables (or entities if you prefer), we can infer a great deal about the transactions.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

A Time Billing System

Imagine we have been asked to recode the company's time-billing system. Because this is for the company we work for, we have some inside knowledge about how things work. We know that:

  • There are, of course, customers.
  • ....and employees who record time
  • Each hour we record goes against a Work Order
  • There are different kinds of jobs, like project management, programming, programming management, and others.

Knowing only this, is it possible to anticipate what the system will look like? A safe answer is "no", on the claim that we will undoubtedly learn more, but this safe answer happens to be wrong. We can in fact anticipate the overall shape of the system, and new information will shift details, but it will not change the shape.

We can anticipate the nature of the transactions if we determine the upper bound of complexity and the combinatorial completeness of the system.

The Upper Bound of Complexity

We can safely assume that the big number to get right is going to be the billing rate. Our employer assumes we will get everything else right, but the billing rate is going to have them chewing their fingernails until they know we understand it and have coded it correctly.

The cool thing is that we already have enough information to establish an upper bound on the complexity of the system by looking at the master tables, where a master table is generally one that lists details about real things like people, places, things, or activities. So far we know (or think we know) about three master tables:

  • Customers
  • Employees
  • Services

Now we define the upper bound of complexity as:

The upper bound of complexity occurs when the billing rate is determined by all three master entities.

In plain English, calculating a billing rate can be as complicated as looking up a rate specific to a customer for a service for an employee but cannot be more complex than that because there are no other entities with which to work.

Combinatorially Complete

We can also anticipate all possible calculations for the billing rate by working through the complete set of combinations of master entities. This would look like the list below. Note that we are not trying to figure out right now which of these is likely to occur, we just want to get them listed out:

  • Each service has a default rate
  • Each customer has a negotiated rate
  • Each employee bills out at a default rate
  • The combination customer-service may have a rate
  • The combination customer-employee may have a rate
  • The combination customer-service-employee may have a rate (this is the upper bound of complexity, all three master entities determine the rate).

Unless we live in a super-simple world where only the first item in the list is present, we will end up dealing with several if not all of the combinations listed above.

Each of these combinations then becomes a table, and we know the billing rate will be determined by a resolution. New Information

Now comes the big day and we interview with somebody we'll call "The Explainer" who is going to officially explain the billing system. Can he break what we already know? No. At most he can:

  • Make us aware of new master entities, perhaps there are "projects" and "contracts" that get their own billing arrangements.
  • Dispel our notions about some of the combinations by saying, "Oh we never give a customer a default rate, the default rates come out of the services."
Going in Cold

What about the case where we know absolutely nothing about an assignment when we go in to begin the interviews? We can do a good job of thinking on our feet if we draw "The Explainer" towards the master entities. As we gain confidence that we know what the master entities are, we can ask questions to probe Combinatorial Completeness and the Upper Bound of Complexity.

One caveat: This method works for transactions between master entities. When "The Explainer" starts describing something that cannot be recognized as an interaction between master entities, do not try to stuff the problem into this box, it may not fit.

What About the Application?

At this point, we can also anticipate a lot of what the application will look like. We will need maintenance screens for all of the master entities, and a really slick UI will allow for very easy editing of those various cross-reference combination tables. As long as that much is done, we are almost finished, but not yet.

There will be some billing process that pulls the time entries, finds the correct billing rate for each one, and permanently records the invoices. If we use a resolution this task is child's play to code, debug, and maintain.

Then of course there is the presentation, the actual bill. Depending on the company, these may be delivered as hardcopy or in email. That will of course have to be coded up.

Conclusion

There are two conclusions. First, as originally stated, many transactions can be anticipated when you know what the master entities are.

But secondly, and every bit as important, once the table design is sound, the application pretty much writes itself. On a personal note, this is probably why I do not find application coding as exciting as I once did. Once I realized that the real challenge and satisfaction was in working out the tables, the coding of the app became a bit of a drudge, it requires no judgment as far as business rules are concerned.

Categories: Development

Happy Holidays everyone!

Java 2 Go! - Thu, 2010-12-02 15:57


This is a summary only. Please, visit the blog for full content and more.
Categories: Development

Creating a centered page layout using ADF Faces (only)

Java 2 Go! - Wed, 2010-12-01 14:07
by Eduardo Rodrigues It’s been a long time since I last posted here about my old friend: ADF. But now that I’m working in ADF BC’s awesome development team, I was already feeling the self-pressure...

This is a summary only. Please, visit the blog for full content and more.
Categories: Development

The Really Cool NTILE() Window Function

Kenneth Downs - Tue, 2010-11-30 21:29

If you regularly code queries and have never been introduced to the windowing functions, then you are in for a treat. I've been meaning to write about these for over a year, and now it's time to get down to it.

Support in Major Servers

SQL Server calls these functions Ranking Functions.

PostgreSQL supports a wider range of functions than MS SQL Server, having put them in at 8.4, and PostgreSQL and calls them Window Functions.

Oracle's support is broader (by a reading of the docs) than SQL Server or PostgreSQL, and they call them Analytic Functions.

I try to stay away from MySQL, but I did a quick Google on all three terms and came up with a few forum posts asking when and if they will be supported.

The NTILE() Function

In this post we are going to look at NTILE, a cool function that allows you to segment query results into groups and put numbers onto them. The name is easy to remember because it can create any -tile, a percentile, a decile, or anything else. In short, an n-tile. But it is much easier to understand with an example, so let's go right to it.

Finding percentiles

Consider a table of completed sales, perhaps on an eCommerce site. The Sales Manager would like them divided up into quartiles, four equally divided groups, and she wants the average and maximum sale in each quartile. Let's say the company is not exactly hopping, and there are only twelve sales, which is good because we can list them all for the example. If we already had the quartiles provided then the query would be easy, so if we were lucky enough to be starting with this:

 CUSTTYPE | AMOUNT  | QUARTILE
----------+---------+----------
 RETAIL   |   78.00 |   1
 RETAIL   |  234.00 |   1
 DEALER   |  249.00 |   1
 DEALER   |  278.00 |   2
 RETAIL   |  392.00 |   2
 RETAIL   |  498.00 |   2
 DEALER   |  500.00 |   3
 RETAIL   |  738.00 |   3
 DEALER   | 1250.00 |   3
 RETAIL   | 2029.00 |   4
 RETAIL   | 2393.00 |   4
 RETAIL   | 3933.00 |   4

The query would be child's play if we already had the quartile:

Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM ORDERS
 GROUP BY quartile
 ORDER BY quartile
The Problem is We Do Not Have Quartile

The problem of course is that we do not usually have handy columns like QUARTILE provided, but we can generate the QUARTILE column during the query by using NTILE.

Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        -- The subquery is necessary
        -- to process all rows and add the quartile column
        SELECT amount
             , ntile(4) over (order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY quartile
 ORDER BY quartile

This query will give us what the Sales Manager wants.

Dissecting the Function and The OVER Clause

The NTILE() function takes a single argument, which tells the server how many groups to divide the data into. If there are not an exact number of rows in each group, the server decides which groups will be missing one row. So in an exact case all of your groups have the same count of rows, but when it does not divide evenly, one or more of them will be one row short.

If you pass 100 to NTILE(), you get a percentile. If you pass 10, you get a decile, and so forth.

The magic is in the OVER() function. This supports two clauses, and the example shows one, the ORDER BY. Quite simply, the ORDER BY clause tells the server how to line up the rows when adding the NTILE values. The clause is very flexible, and has nothing to do with your query's overall ORDER BY clause. The Second Clause: PARTITION

Now we will pretend the Sales Manager is not satisfied, and wants separate numbers for the two Customer Types. We could do this if the NTILE() function would create two sets of quartiles, one for each Customer Type, like so:

 CUSTTYPE | AMOUNT  | QUARTILE
----------+---------+----------
 DEALER   |  249.00 |   1
 DEALER   |  278.00 |   2
 DEALER   |  500.00 |   3
 DEALER   | 1250.00 |   4
 RETAIL   |   78.00 |   1
 RETAIL   |  234.00 |   1
 RETAIL   |  392.00 |   2
 RETAIL   |  498.00 |   2
 RETAIL   |  738.00 |   3
 RETAIL   | 2029.00 |   3
 RETAIL   | 2393.00 |   4
 RETAIL   | 3933.00 |   4

We can do this by using the PARTITION BY clause, which tells the server to break the rows into groups and apply the NTILE() numbering separately within each group. The new query would be this:

Select custtype
     , quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        -- The subquery is necessary
        -- to process all rows and add the quartile column
        SELECT amount
             , ntile(4) over (partition by custtype
                                 order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY custtype,quartile
 ORDER BY custtype,quartile
Bonus Points: The Median

Now once again the Sales Manager, who is never satisified, comes down and says that the average is no good, she needs the max and the median sale value within each quartile. To keep it simple, she does not need this broken out by customer type, it can be applied to the entire set.

This is a case where we can use NTILE() twice. The first time we will break all sales up into four groups, to get the quartiles, and then we will break up each quartile into two groups to get the median. The code looks like this:

Select quartile
     , max(case when bitile=1 then amount else 0 end) as medAmount
     , max(amount) as maxAmount
  FROM (
        -- The second pass adds the
        -- 2-tile value we will use to find medians
        SELECT quartile
             , amount
             , ntile(2) over (partition by quartile
                                  order by amount) as bitile
          FROM (
                -- The subquery is necessary
                -- to process all rows and add the quartile column
                SELECT amount
                     , ntile(4) over (order by amount) as quartile
                  FROM ORDERS
               ) x1
       ) x2
 GROUP BY quartile
 ORDER BY quartile

The magic here is that we know we've divided the data evenly into four sets, so the median will be the maximum value half way through each set. In other words, it will be the maximum value when the value of bitile=1 for each quartile.

One More Note About Oracle

Once you get down the basics of the OVER clause, Oracle looks really good, because they support the clause over the largest range of functions, at least going by the respective doc pages for each platform.

Categories: Development

Loops Without Cursors

Kenneth Downs - Mon, 2010-11-29 21:19
Looping Without Cursors

Sometimes you need to process a table row-by-row, and the established approach is to use cursors, which are verbose, slow, and painful to code and use.

The Cursor Example

Here is the basic minimum syntax required to loop through a table and get something done. The SQL flavor is MS SQL Server, but its not much better in any other flavor.

-- I coded this off the top of my head, there
-- may be a minor syntax error or two

-- Most of this is pseudo-code, but take
-- note that it is ordered on column1
declare someCursorName cursor for
 select column1, column2, column3 
   from anyTable
  ORDER BY column1

-- Have to do this now
open someCursorName

-- Now you need to declare some variables
-- For the example I'm just making everything int
declare @column1 int
      , @column2 int
      , @column3 int

-- Gosh, we're actually about to start the loop!  Finally!
fetch next from someCursorName into @column1,@column2,@column3
while @@fetch_status = 0 begin

   --  If you still remember what you actually wanted
   --  to do inside the loop, code it here:

-- Repeat this line from the top here again:
fetch next from someCursorName into @column1,@column2,@column3
end

-- Not done yet, these two lines are crucial
close someCursorName
deallocate someCursorName

Call me petty, but what I hate about that code is that I have to refer to specific columns of interest 3 times (not counting the declarations). You refer to them in the cursor declaration and in the two FETCH commands. With a little clever coding, we can vastly simplify this and do it only once.

Using An Ordered Column

We can execute the same loop without the cursor if one of the columns is ordered and unique. Let us say that column1 is the primary key, and is an auto-incremented integer. So it is ordered and unique. The code now collapses down to:

-- I coded this off the top of my head, there
-- may be a minor syntax error or two

-- We can't get around declaring the vars, so do that
declare @column1 int
      , @column2 int
      , @column3 int

-- If you know a safe value for initialization, you
-- can use the code below.  If this is not 100% 
-- safe, you must query for the value or it must
-- be supplied from some other source
set @column1 = -1

-- BONUS POINTS: Can this become an infinite loop?
while 1 = 1 begin

-- Now we code the query and exit condition
 select TOP 1
        @column1 = column1
      , @column2 = column2
      , @column3 = column3 
   from anyTable
  WHERE column1 > @column1  -- this is what advances the loop
  ORDER BY column1

if @@rowcount = 0 begin
    break
end

        -- Put the actions here        

end
Final Notes

The only requirement for this approach is that you have a unique ordered column. This usually means a unique key or primary key. If "column1" is not unique, the loop will skip all but the first value in each group.

Also, it is very nice if you know a safe value to use as an initializer. Without that, you must query for the minimum value that matches the condition and then decrement it by one.

Finally, can this loop become infinite? No. Well, if, in the extremely unlikely situation that rows are being added to the base table faster than you are processing them, then yes, it could go on for a very long time. But if that were happening I'd say there was a separate problem to look at.

It should probably go without saying, but if the particular loop is going to happen very often, the table should be indexed on your unique ordered column. If it is a primary key or you already have a unique constraint it is not necessary to create an index explicitly because there will be one as part of the key or constraint.

Categories: Development

Revisiting Normalization and Denormalization

Kenneth Downs - Sat, 2010-11-27 12:43

In this blog I have done at many articles on Normalization and Denormalization, but I have never put all of the arguments together in one place, so that is what I would like to do today.

There are links to related essays on normalization and denormalization at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The What and Why of Normalization

Normalization is the process of designing tables so that each fact is stored in exactly one place. A "fact" in this case is any detail that we have to keep track of, such as a product's description, a product's price, an employee's social security number, and so forth.

The process is all about figuring out what tables you need and what columns each table will have. If we are talking about an employee's social security number, then we can guess right from the start that will have a table of EMPLOYEES, and that one of the columns will be SSN. As we get more details, we add more tables and columns.

The advantage of normalization comes when your application writes data to the database. In the simplest terms, when the application needs to store some fact, it only has to go to one place to do it. Writing this kind of code is very easy. Easy to write, easy to debug, easy to maintain and improve.

When the database is not normalized, you end up spending more time writing more complicated application code that is harder to debug. The chances of bad data in your production database go way up. When a shop first experiences bad data in production, it starts to become tempting to "lock down" access to the database, either by forcing updates to go through stored procedures or by trying to enforce access to certain tables through certain codepaths. Both of these strategies: stored procedures and code paths, are the actually the same strategy implemented in different tiers, they both try to prevent bugs by routing access through some bit of code that "knows what to do." But if the database is normalized, you do not need any magic code that "knows what to do."

So that, in brief, is what normalization is and why we do it. Let's move on now to denormalization.

Denormalization is Harder to Talk About

Normalization is easy to explain because there is a clearly stated end-goal: correct data. Moreover, there are well-defined methods for reaching the goal, which we call the normal forms, First Normal Form, Second Normal Form, and higher forms. By contrast, denormalization is much harder to talk about because there is no agreed-upon end goal. To make matters worse, denormalization violates the original theory of Relational Databases, so you still have plenty of people screaming not to do it all, making things even more confusing. What we have now in our industry is different shops denormalizing in different ways for different reasons.

The arguments that I have heard in my career boil down to two basic groups. The first set of arguments centers around calculated or derived values, and the second set centers around programmer convenience.

Arguments for Derived Values

My own experience comes down heavily in favor of denormalizing by storing derived values directly into the tables, with the extremely signficant caveat that you must have a way to ensure that they are always correct. In this paradigm you maintain strict normalization for facts supplied from the outside, and then layer on additional facts that are calculated during write operations and saved permanently.

Here is a very simple example. A strictly normalized database happens to be missing data that many programmers would automatically assume should be stored. Believe it or not, a simple value in a shopping cart like EXTENDED_PRICE is forbidden by 3rd normal form because it is a non-key dependency, or, in plain English, since it can be derived from other values (QTY * PRICE), then it is redundant, and we no longer have each fact stored in exactly one place. The value of EXTENDED_PRICE is only correct if it always equals QTY * PRICE, and so there is now a "fact" that is spread across three locations. If you store EXTENDED_PRICE, but do not have a way to ensure that it will always 100% of the time equal QTY * PRICE, then you will get bad data.

So, given the risk of bad data, what is to be gained by putting EXTENDED_PRICE into the cart? The answer is that it adds value to the database and actually simplifies application code. To see why, imagine a simple eCommerce shopping cart that does not store any derived values. Every single display of the cart to the user must go all over the place to gather lots of details and recalculate everything. This means re-calculating not just the EXTENDED_PRICE, but adding in item level discounts, taking account of possible tax exemptions for different items, rolling the totals to the cart, adding in tax, shipping, perhaps a customer discount, a coupon, and who knows what else. All of this just to display the cart, every time, no matter what the purpose.

This situation leads to three problems. A pitifully slow application (too many disk reads and lots of cycles calculating the values), maddening bugs when an application update has subtle changes to the calculations so the customer's order no longer displays the same numbers as it did yesterday, and the frustrating requirement that the simplest of reports must route through application code to calculate these values instead of simply reading them off the disk, which leads to reporting systems that are orders of magnitude slower than they could be and horribly more complicated than they need to be because they can't just read straight from the tables.

Now let's look at how that same shopping cart would be used if all of those calculated values were generated and saved when the order is written. Building on your foundation of normalized values (price, qty), you need only one body of code that has to perform calculations. This magic body of code takes the user-supplied values, adds in the calculations, and commits the changes. All other subsequent operations need only to read and display the data, making them faster, simpler, and more robust.

So the obvious question is how to make sure the derived values are correct. If they are correct, we gain the benefits with no down side. If there is the smallest chance of bad data, we will quickly pay back any benefit we gained by chasing down the mistakes.

From a technical standpoint, what we really need is some technology that will make sure the calculations cannot be subverted, it cannot be possible for a stray bit of program code or SQL Statement to put the wrong value in for EXTENDED_PRICE. There are a few generally accepted ways to do this:

  • Require all writes to go through a certain codepath. The only PRO here is that you keep the logic in the application code, and since most shops have more programmers than database people, this makes sense. The only CON is that it never works. One programmer working alone can maintain discipline, but a team cannot. All it takes is one programmer who did not know about the required codepath to screw it all up. Also, it makes your system inflexible, as it is no longer safe to write to the database except through a single application.
  • Require all writes to go through stored procedures. This is nominally better than the codepath solution because it is not subvertible, and you can allow different side apps and utilities to safely write to the database. But it makes a lot of work and tends to be very inflexible.
  • Putting triggers onto tables that perform the calculations and throw errors if a SQL statement attempts to explicitly write to a derived column. This makes the values completely non-subvertible, ensures they will always be correct, and allows access from any application or utility. The downside is that the triggers cannot be coded by hand except at extreme cost, and so must be generated from a data dictionary, which is fairly easy to do but tends to involve extreme psychological barriers. In these days of ORM many programmers mistakenly believe their class files define reality, but this is not true. Reality is defined by the users who one way or another create the paychecks, and by the database, which is the permanent record of facts. But a programmer who thinks his classes define reality simply cannot see this and will reject the trigger solution for any number of invalid reasons.

So denormalizing by putting in derived values can make a database much more valuable, but it does require a clear systematic approach to generating the derived values. There is no technical problem associated with ensuring the values are correct because of course the application has to do that somehow somewhere anyway, the real barriers tend to be the psychological and political.

Arguments For Programmer Convenience

The second set of arguments for denormalization tend to be rather weak, and come down to something like this (you have to picture the programmer whining like a child when he says this), "I don't like my data scattered around so many tables, can't we play some other game instead?"

Many programmers, when they first learn about normalization and build a normalized database, discover that the data they need to build a screen is "scattered" about in many tables, and that it is tedious and troublesome to get it all together for presentation to the user. A simple example might be a contacts list. The main table is CONTACTS, and it contains not much more than first and last name. A second table is a list of PHONES for each contact, and a third table is a list of various mailing addresses. A fourth table of EMAILS stores their email addresses. This makes four tables just to store a simple contact! We programmers look at this and something inside of us says, "That's just way too complicated, can't I do something else instead?"

This is a case of programmer convenience clashing with correctness of data. Nobody argues (at least not that I've heard) that they do not want the data to be correct, they just wonder if it is possible to simplify the tables so that they do not have to go out to so many places to get what they need.

In this case, programmers argue that denormalization will make for simpler code if they deliberately skip one or more steps in the normalizing process. (Technically I like to call the result a "non-normalized" database instead of denormalized, but most people call it denormalized, so we will go with that.)

The argument goes something like this: I know for a fact that nobody in the contacts list will have more than 3 emails, so I'm going to skip the EMAILS table and just put columns EMAIL1, EMAIL2, and EMAIL3 into the main CONTACTS table. In this case, the programmer has decided to skip 1st Normal Form and put a repeating group into the CONTACTS table. This he argues makes for simpler database retrieval and easier coding.

The result is painfully predictable. The simplification the programmer sought at one stage becomes a raft of complications later on. Here is an example that will appear trivial but really gets to the heart of the matter. How do you count how many emails a user has? A simple SELECT COUNT(*)...GROUP BY CONTACT that would have worked before now requires more complicated SQL. But isn't this trivial? Is it really that bad? Well, if all you are coding is a CONTACTS list probably not, but if you are doing a real application with hundreds of tables and this "convenience" has been put out there in dozens of cases, than it becomes a detail that programmers need to know on a table-by-table basis, it is an exception to how things ought to be that has to be accounted for by anybody who touches the table. In any shop with more than 5 programmers, whatever convenience the original programmer gained is lost quickly in the need to document and communicate these exceptions. And this is only a single trivial example.

Other examples come when it turns out you need more than three slots for phone. In the normalized case this never comes up. Any user can have any number of phones, and the code to display the phones is running through a loop, so it does not need to be modified for the case of 1 phone, 2 phones, etc. But in the "convenient" denormalized case you now must modify the table structure and the code that displays the contacts, making it quite inconvenient.

Then you have the case of how to define unused slots. If the user has only one email, do we make EMAIL2 and EMAIL3 empty or NULL? This may also seem like a silly point until you've sat through a flamewar at the whiteboard and discovered just how passionate some people are about NULL values. Avoiding that argument can save your shop a lot of wasted time.

In short, programmer convenience should never lead to a shortcut in skipping normalization steps because it introduces far more complications than it can ever pay for.

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The normalization essays on this blog are:

Categories: Development

Our second official book review in on its way...

Java 2 Go! - Thu, 2010-11-25 03:34
by Fábio Souza Hi everyone, are you doing well? I'm fine, thanks for asking. Well, I was wondering these days : "It would be a good idea to change the topic for a little while and talk about a...

This is a summary only. Please, visit the blog for full content and more.
Categories: Development

Hidden gems in JHeadstart runtime

JHeadstart - Wed, 2010-11-24 00:15

JHeadstart is well known for its powerful generator and its rich collection of templates. What a lot of people don't know is that JHeadstart also has an extensive runtime library that can be very useful, even if you don't use the generator.

A great example is the DatabaseProcedure class. This class allows you to call a database procedure or function from your ADF application with only one or two lines of code! For the source code and more information on this class read Calling an Oracle database procedure from Java with only ONE line of code and Calling a database procedure with table input and output parameters from Java.

Categories: Development

Prepare Now For Possible Future Head Transplant

Kenneth Downs - Fri, 2010-11-19 22:06

This is the Database Programmer blog, for anybody who wants practical advice on database use.

There are links to other essays at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Planning For The Unlikely

We programmers love to plan for things that will hardly ever happen, like coding the system's upgrade engine to handle spontaneous human combustion, making sure the SQL scrubbing layer can also launch a rocket into space, and, well, trying to work out ahead of time what to do if we ever need a head transplant.

The boss comes over and says, "can you toss a simple plot onto the sales' staff home page that shows sales by day? Use that 'jquicky' or whatever you call it. Should take a couple of hours, right?" And three days later we're working on the world's greatest plotting system that can report everything except what the boss actually asked for because we haven't gotten around to that part of it yet. (Really, can he expect me to just bang this out without the required Seven Holy Layers of Abstraction and Five Ritual Forms of Parameterization, and the Just and Wholesome Mobile Support, or the features Not Yet Required but visible to the Far Seeing Wise and Learned Men?)

Abstraction Contraptions

So what I am getting at is that programmers of all stripes are addicted to abstraction, it gives us goosebumps and makes us feel warm and tingly, and so we do it even when we do not need to. We build abstraction contraptions.

When it comes to designing a database, this unhealthy proclivity can seriously slow you down, because of what I call:

Ken's Law

Everybody wants to be remembered for something. If I could write my own epitaph, it might be:

Table-based datastores are optimally abstract

This law is not about database access, it is about database design. It can be expressed informally as:

People Understand Tables Just Fine

Or more rigorously as:

Table-based datastores are optimally abstract; they require no additional abstraction when requirements are converted to desgin; they cannot be reduced to a less abstract form.

Structured Atomic Values

I should point out that this essay deals with structured atomic values, who live in the Kingdom of The Relational Database. The concepts discussed here do not apply to free-text documents or images, or sound files, or any other media.

No Additional Abstraction Required

My basic claim here is that you cannot create an abstraction of data schemas that will pay for itself. At best you will create a description of a database where everything has been given a different name, where tables have been designated 'jingdabs' and columns have been designated 'floopies' and in the end all of your jingdab floopies will become columns in tables. Oh, and I suppose I should mention the Kwengars will be foreign keys and the Slerzies will be primary keys.

After that it goes downhill, because if we generate an abstraction that is not a simple one-to-one mapping, we actually obscure the end goal. Consider an example so simple as to border on the trivial or absured. Why would we ever use the terms "One-to-Many" or "Many-to-Many" when the more precise terms "child table" and "cross-reference table" convey the same idea without the noise? I said above that this would sound trivial, and you can accuse me of nit-picking, but this is one of those camel's nose things, or perhaps a slippery slope. When technical folk get together to design a system, we should call things what they are, and not make up new words to make ourselves sound smarter.

No de-Abstraction is Possible

The second half of Ken's law says that you cannot de-Abstract a table schema into some simpler form. This should be very easy to understand, because relational databases deal with atomic values, that is, values which cannot themselves be decomposed. If you cannot decompose something, then it cannot be an abstraction of something more specific.

Going further, if the schema has been normalized, then every fact is stored in exactly one place, so no further simplification is possible. If you cannot simplify it or resolve it into something more specific, then it is not an abstraction of something else.

But Does it Work?

I originally began to suspect the existence of what I call in all humility "Ken's Law" when I was sitting at a large conference table with my boss, her boss, a couple of peers, and 3 or 4 reps from a Fortune 5 company. My job was basically to be C3PO, human-cyborg relations. Some people at the table protested loudly to being non-technical, while others had technical roles. But everybody at the table spent all day discussing table design.

Later on, when at a different position, the programmers received their instructions from Project Managers. The best Project Managers worked with their customers to figure out what they were trying to keep track of, and handed us specs that were basically table layouts. The customers loved these guys because they felt they could "understand what the project manager was talking about", and the project managers, who swore they were not technical, were respected because they handed us requirements we could actually understand and implement.

Since that time I have further learned that it is very easy for anybody who deals with non-technical people to bring them directly to table design without telling them you are doing it. All you have to do is listen to what words they use and adopt your conversation accordingly. If they say things like "I need a screen that shows me orders by customer types" they have told you there will be a table of customer types. Talk to them in terms of screens. If they say, "Our catalog has 3 different price list and four discount schemes" then you know that there will be a PRICELIST table, a DISCOUNTS table, and likely some cross-references and parent-child relationships going on here.

So How Does ORM Come Into This?

One of the greatest abstraction contraptions of this century (so far), is ORM, or Object-Relational Mapping, which I do not use precisely because it is an abstraction contraption.

To be clear, the mistake that ORM makes is not at the design phase, but at the access phase. The ORM meme complex instructs its victims that it is ok to put structured atomic values into a Relational Database, but when it comes time to access and use that data we will pretend we did not put it into tables and we will pretend that the data is in objects. In this sense the term Object- Relational Mapping is a complete misnomer, because the point is not to map data to objects but to create the illusion that the tables do not even exist.

Perhaps ORM should stand for Obscuring Reality Machine.

Getting Back to That Head Transplant

So what about that weird title involving head transplants? Obviously a head transplant is impossible, making it also very unlikely, besides being silly and non-sensical. It came to mind as a kind of aggregrate of all of the bizarre and unrealistic ideas about abstracting data designs that I have heard over the years.

One of these ideas is that it is possible and beneficial to create a design that is abstract so that it can be implemented in any model: relational, hierarchical, or network. I'm not saying such a thing is impossible, it is likely just a small matter of programming, but for heaven's sake, what's the point?

Conclusion

So don't waste time creating abstractions that add steps, possibly obscure the goal, and add no value. Don't plan for things that are not likely to happen, and avoid abstraction contraptions.

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Other philosophy essays are:

Categories: Development

Database Skills

Kenneth Downs - Sat, 2010-11-13 11:14

It seems strange to me that I've been working on this blog for 3 years or so (with one very long break) and somehow never got around to writing a simple list of skills that all database experts need. So here it is!

Various Job Tiles for Database People

There are three common job titles in the database area, which are Database Administrator (DBA), Database Programmer, and Database Architect. These titles tend to be somewhat variable from shop-to-shop, but generally the "Architect" term indicates the highest level of skill combined with considerable management responsibilities. The "Programmer" term is somewhere below that, but the "DBA" is extremely variable. I have seen shops where a person was called a DBA and filled a relatively constrained role closer to IT or operations (routine tasks, no real programming) and other shops where a person with the DBA title was basically the Architect.

Because of this high variability in what titles mean, I am not going to waste time categorizing skills as belonging to one job title or another, I am simply going to list them all out.

The various levels of skills are these:

  • Before Hello World!: The basics of tables, columns, rows
  • The Hello World! Level: SQL Select
  • Just after Hello World!: Writing Data
  • Commands to create, modify and drop tables, or Data Definition Language (DDL)
  • Knowing how to use a Query Analyzer or optimization tool
  • Understanding Normalization
  • Understanding Denormalization
  • Understanding Primary Keys, Foreign Keys and Constraints
  • Understanding Transactions
  • Understanding ACID
  • Understanding Indexes as optimization tool
  • Views
  • Database Security
  • Upgrades and Installs
  • Efficient access of database from application
  • Bulk operations: loading or exporting large amounts of data
  • Understanding of Contexts and how they lead to different sets of Best Practices
  • Preventing performance degradation through various maintenance tasks
  • Deployment strategies: partitioning, tablespaces
  • Deployment strategies, failure protection, from simple backup to hot standbys
  • Server side coding: stored procedures and functions
  • Server side coding: triggers
  • Temporary tables

As long as that list is, it only covers those of us who use database systems. There is an entire set of skills for those who actually create and maintain these systems, but that is not something that will be treated in this blog.

Before Hello World!: Tables and Columns

If you have never so much as typed a single SQL command, or seen a table diagram, or anything like that, then it is worth a few minutes to go through the basics of what a database does, which is to organize atomic values into tables.

I am going to write an essay on this soon, even though it may seem so obvious as to be completely unnecessary. But I will do it because the most popular essay on this blog is about using GROUP BY, which tells me newer programmers are starving for useful tutorials at the beginner level. So it seems to me, why not put something out there at the very beginning of the beginning?

The Hello World! Level: SQL Select

If you are starting completely from scratch and want to know about database programming, you want to start with the SQL SELECT command. This is the (almost) only command used to extract data from a database, and all of the possible ways to combine, filter and extract data are expressed in the many clauses of this command.

Just after Hello World!: Writing Data

When it comes time to change the data in a database there are three commands, listed below. These commands are based on the tables-and-rows nature of databases, and allow to add a row (or rows), change a row (or rows) and delete a row (or rows).

  • The INSERT command
  • The UPDATE command
  • The DELETE command
Commands to create, modify and drop tables, or Data Definition Language (DDL)

The term "DDL" stands for "Data Definition Language" and includes all of the commands use to build the tables that will hold the data for the INSERT, UPDATE, DELETE and SELECT statements. The basic list of commands to be familiar with is:

  • Understanding Data Types (databases are strongly typed)
  • CREATE TABLE and ALTER TABLE
  • Commands to add and drop primary keys
  • Commands to add and drop foreign keys
  • Commands to add and drop constraints
  • Commands to add and drop indexes

There are also numerous commands that are specific to different products. Those will not be listed here today, but who knows what the future may bring.

Knowing how to use a Query Analyzer or optimization tool

Database programmers, once they get started with the skills listed above, tend to become more and more obsessed with performance. Every major database has some type of tool that lets you examine how the server is going to process a SQL SELECT, and database programmers depend on these tools to discover where they might alter tables or indexes or the SELECT itself to make the queries go faster.

Understanding Normalization

The term "normalization" refers to the process of analyzing the data that your system is required to store, and organizing it so that every fact is stored in exactly one place. Understanding how to normalize data is an absolute requirement for the database programmer who wants to design databases.

We speak of normalization in "forms" as in "first normal form", "second normal form", and so on. It is a good idea to understand The argument for normalization and then to pursue at very least:

Normalization is a a fascinating topic to study, and it extends all they way up to "Domain-key Normal Form" which is considered the most complete normalization for a database.

Understanding Denormalization

Every database programmer, after fully understanding normalization, realizes that there are severe practical problems with a fully normalized database, such a database solves many problems but generates problems of its own. This has led programmer after programmer down the path of denormalization, the deliberate re-intoduction of redundant values to improve the usability of the database.

There is a surprising lack of material available on the web regarding denormalization strategies. Most of what you find is arguments and flame wars about whether or not to do it, with little to nothing on how to actually do it. For this reason, I provide my own essays on this blog on the strategies and methods I have worked out over the years:

After reviewing The Argument For Denormalization it is worthwhile to follow up with:

The arguments for and against denormalization are heavily affected by the Pay me now or pay me later design tradeoff.

Understanding Primary Keys, Foreign Keys and Constraints

One might argue that this list of skills belongs much higher up the list, up there with the CREATE TABLE command. However, I have it useful to distinguish between simply knowing the commands to make a primary key and actually understanding the tremendous power of keys.

In this author's opinion it is not truly possible to understand how powerful and beneficial Primary keys and Foreign Keys are for an entire application stack until you have learned the commands, built some databases, and worked through the concepts of normalization and denormalization. Only then can you revisit these humble tools and realize how powerful they are.

Understanding Transactions

The word "transaction" has two meanings in common day-to-day database talk. One meaning is very loose and refers to some individual command or set of commands. You might hear somebody using the term loosely when they say, "We're seeing about 10 transactions per second this week."

The more rigorous use of the term refers to a statement or set of statements that must be guaranteed to either complete in their entirety or fail in their entirety. This is a profoundly important concept once you get beyond simply making tables with keys and get into real-world heavy multi-user activity. And this leads us to the next topic...

Understanding ACID

Modern relational databases expect multiple simultaneous users to be writing and reading data all of the time. The term "ACID Compliance" refers to both the philosophy of how to handle this and the actual methods that implement that philosophy. The term ACID refers to:

  • The Atomic nature of each transaction
  • The Consistentcy of the database during and after simultaneous overlapping transactions
  • The Isolation of each transaction
  • The Durability of the results
Understanding Indexes as optimization tool

An index is a special tool databases use to provide very rapid access to large amounts of data. Just like keys, it is not enough to know the commands, it is necessary to understand the subtle power of indexes when used with some craftsmanship. The basic uses of indexes are:

  • A simple index on a column to provide rapid search on that column
  • A "covering index" that includes extra columns that can further speed up certain common access patterns
  • Clustered indexes (MS SQL Server) and what they give and what they take away
  • The cost of indexes on write operations
Views

A view looks like a table to the SQL SELECT command. The view itself is a stored SQL SELECT command that encodes some query that is either used very often or is very compex. In all cases, views are used to present the database data to the application in some simplified convenient or secure form. The two major uses of views are:

  • To simplify the application programmer's job
  • To provide a read-only interface for some applications
Upgrades and Installs

If you are a single programmer or hobbyist working with a database, it is all well and good to just add and drop tables as you wish. But as soon as you get into development with quality control stages and multiple programmers, it becomes evident that you need a strategy for handling the schema changes that come with with new versions of the system. There are multiple essays available on this blog, covering:

Database Security

Databases provide incredible security provisions that are just about completely ignored by modern web developers. Sometimes there is good reason for this, but overall anybody who wants to become a truly accomplished Database Programmer or Database Architect must have a thorough understanding of database security and how it can simplify the entire system stack.

Database security comes down to specifying who is allowed to perform the 4 basic operations of INSERT, UPDATE, DELETE and SELECT against which tables:

My basic introduction to security is here.

  • Understanding roles (we used to say users and groups)
  • Simple table-level security
  • Column-level security (not widely supported)
  • Row-level security (not widely supported)
Efficient access of database from application

Imagine you have the perfectly designed database, with every nuance and subtlety excellently crafted in the ares of keys, indexes, normalization, denormalization and security. At this point your job branches out into several new areas, but one of the most important is knowing how to write application code that efficiently accesses the database.

Bulk operations: loading or exporting large amounts of data

Some database applications involve a large number of small transactions, where each trip to the database writes only a single row or reads only a dozen or so rows.

But in many cases you need to bulk load large amounts of data in one shot, thousands or even millions of rows. In these cases the techniques that govern small transactions are useless and counter-productive, and you need to learn some new commands and strategies to handle the bulk loads.

Understanding Contexts and how they lead to different sets of Best Practices

Not all databases are created for the same purpose. If you have a very large operations then it will likely have multiple independent databases that fill the classical roles, while in a smaller shop the roles may be combined in one database. I like to refer to these roles as "contexts" because they determine how the tables will be designed and how acess to the tables will be governed. The major contexts are:

  • OLTP or Online Transaction Processing, characterized by simultaneous reads and writes, generally assumes little or no periods of inactivity, and generally assumes that the individual transactions are very small. The apps we were all writing in the 80s and 90s to do accounting, ERP, MRP, Job control, payroll, airline reservations and many others fall into this context.
  • Data Warehouse context, characterized by periodic bulk loads of new information with most activity being reads. The Data Warehouse context is largely associated with the "Star Schema" table design. Data in a Warehouse is historical, it never changes after it is loaded.
  • CMS or Content Management System, also characterized by very few writes compared to reads, but more likely to have a normalized structure. Unlike a Data Warehouse, the data is subject to change, just not that often.
  • Any other Read Only Context. I include this category because I spent some time working on Direct Marketing databases, which are like a Data Warehouse in that they are updated periodically and the data does not change, but the Star Schema is completely inappropriate for them.

If you consider a huge online shopping system, you can see that within that application there are at least two contexts. The product catalog is likely to see vastly fewer writes than reads, but the shopping cart tables will be in a constant state of reads and writes.

Preventing performance degradation through various maintenance tasks

Once the database and its application stack is up and running, and the reads and writes and coming through, the laws of thermodynamics come into play and system performance can begin to degrade even if the database stays the same size and the load on the system is steady.

Different vendors have different tools for combatting this, but they tend to come down to reclaiming temporary space and occassionally rebuilding indexes. There are also log files that have to be purged, regular backups to be made, and other operations along those lines.

Deployment strategies: partitioning, tablespaces

When systems become sufficiently large, it is no longer possible to just attach some disks to a box and run a database server. The Database Architect must consider breaking different tables out onto different sets of spindles, which is usually done with "tablespaces", and moving older data onto slower cheaper spindles, which is often done with Partitioning.

Deployment strategies, failure protection, from simple backup to hot standbys

Because a database typically experiences simultaneous reads and writes from multiple sources, and may be expected to be up and running 24/7 indefinitely, the concept of making a backup and recovering from a failure becomes more complicated than simply copying a few files to a safe location.

In the most demanding case, you will need to provide a second complete box that can become fully live within seconds of a disastrous failing of the main box. This is called various things, but Postgres calls it a "hot standby" in version 9 and some MS SQL Server shops call it a "failover cluster."

The ability to come up live on a second box when the first one fails is made possible by the way databases handle ACID compliance, and the fact that they produce something called a Write-Ahead-Log (WAL) that can be fed into a second box that "replays" the log so that its copy of the database is getting the same changes as the master copy.

Server side coding: stored procedures and functions

I really could not figure out where to put this entry in the list, so I just punted and put it near the end. It could really go anywhere.

Stored procedures or functions are procedural routines (not object oriented) that are on the database server and can be invoked directly from an application or embedded inside of SQL commands. Generally speaking they provide various flow-control statements and rudimentary variable support so that you can code multi-step processes on the server itself instead of putting them in application code.

Server side coding: Triggers

Triggers are quite possibly the most elegant and beautiful technology that servers support, absolutely the least understood, and definitely the most reviled by the ignorant. You will find virtually no web content today that will explain why and how to use triggers and what they are good for.

Except of course for my own essay on triggers that discusses them in terms of encapsulation. Temporary tables

Temporary tables are like Stored Procedures inasmuch as I had no idea where to put them in the list, so they just ended up at the end.

As the name implies, a temporary table is a table that you can create on-the-fly, and which usually disappears when your transaction is complete. They are most often found in Stored Procedures. They can impact performance for the worst in many ways, but can be extremely useful when you are doing multi-staged analsysis of data in a Data Warehouse (that's where I use them the most).

Categories: Development

Recursive Queries with Common Table Expressions

Kenneth Downs - Sat, 2010-11-06 12:20

This week The Database Programmer returns after almost 18 months with an entry on using Common Table Expressions (CTEs) to do recursive queries. Relational databases were plagued from their inception with a lack of meaningful treatment for recursive operations, and CTEs have finally plugged that hole.

Common Table Expressions appeared in SQL Server 2005, and in PostgreSQL 8.4, and are also available in Oracle. As for mySQL, since I don't use it, I did a quick Google search and looked at the Docs for 5.5, and couldn't really find anything. I generally tend to assume mySQL cannot do the tough stuff.

But First, A Rant

There have always been plenty of people who claimed SQL was a bloated and clumsy language to work with. Most of the time I tend to agree, but I find the advantages of relational/SQL system to be so large that I'm willing to pay that price.

But with Commom Table Expressions (CTEs) I just can't help drifting into conspiracy theories involving the enemies of SQL infiltrating the committees and deliberately suggesting the most twisted, bloated, complicated way they could think of to do what is really a very basic operation. In other words, I am profoundly unimpressed with the syntax of CTEs, but as long as they are here and they work, we'll go along.

The Basic Example

Your basic recursive table contains a foreign key to itself, so that some rows in the table are children of some other row in the table. This recursion can nest to any depth, and the chart below shows a very simple example:

Primary_key   |   Parent_Key  |  Notes  
--------------+---------------+---------------------------
     A        |     null      |   top level row, no parent
     B        |      A        |   first level child of A
     C        |      B        |   child of B, grandchild
              |               |   of A
     D        |      C        |   child of C, grandchild 
              |               |   of B, great-grandchild 
              |               |   of A
     X        |     null      |   top level row, no parent
     Y        |      X        |   child of X
     Z        |      Y        |   child of Y, grandchild 
              |               |   of X

What we want is a query that can return a given row and all of its children out to any level, including helpful information about the structure of the recursion, something like this:

Primary_key | Level | Top_Key | Immediate_Parent_key 
------------+-------+---------+-----------------------
     A      |   1   |  A      | null
     B      |   2   |  A      | A    
     C      |   3   |  A      | B   
     D      |   4   |  A      | C
     X      |   1   |  X      | null
     Y      |   2   |  X      | X   
     Z      |   3   |  X      | Y   
And Another Rant

At this point the mind boggles at how long this blog entry needs to be to explain this simple operation. But lets get going anyway.

The First Step and Last Step

A Common Table Expression begins with the "WITH" clause and ends with a standard SQL Select:

;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
  ....we'll get to this below....
)
select * from myCTEName

The basic idea is that we are going to define a CTE with a name and a list of columns, and then SELECT out of it. Without that final SELECT statement the CTE does not actually do anything. The SELECT can also be arbitrarily complex, doing aggregrations, WHERE clauses and anything else you might need.

The first thing to notice is the leading semi-colon. This is a trick adopted by MS SQL Server users. SQL Server does not require statements to be terminated with a semi-colon, but a SQL Server CTE requires the previous statement to have been terminated with a semi-colon (nice huh?). So SQL Server programmers adopted the strategy of starting the CTE with a semi-colon, which keeps the syntactical requirement with the CTE, where it belongs.

A given CTE sort of has a name. That is, you have to name it something, but think of it as a table alias in a SQL SELECT, such as "Select * from myTable a JOIN otherTable b...", it exists only during the execution of the statement.

The columns listed in the parantheses can have any names (at least in SQL Server). But these column names are what you will refer to in the final SQL SELECT statement.

Coding The Inside of the CTE, Step 1

Now we code the inside of the CTE in two steps. The first step is called the "anchor", and it is a straightforward query to find the top-level rows:

;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
    select primary_key   as primary_key
         , 1             as level
         , primary_key   as top_key
         , null          as immediate_parent_key
      from myRecursiveTable
     where Parent_key is null
)
select * from myCTEName

This should be self-explanatory, we are querying only for rows that have no parent (WHERE Parent_key is null) and we are hardcoding the "level" column to 1, and we are also hardcoding the "immediate_parent_key" column to null.

This query alone would return two of the rows from our desired output:

Primary_key | Level | Top_Key | Immediate_Parent_key 
------------+-------+---------+-----------------------
     A      |   1   |  A      | null
     X      |   1   |  X      | null
Coding The Inside of the CTE, Step 2

Now we are going to add the actual recursion. When I first learned CTEs this was the hardest part to figure out, because it turned out my hard-won set-oriented thinking was actually slowing me down, I had to think like a procedural programmer when defining the second half of the query.

;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
    select primary_key,1,primary_key,null
      from myRecursiveTable
     where Parent_key is null
    UNION ALL
    select chd.primary_key,par.level+1,par.top_key,chd.parent_key
      FROM myCTEName        par
      JOIN myRecursiveTable chd ON chd.parent_key = par.primary_key
)
select * from myCTEName

Thinking step-wise, here is what is going on under the hood:

  1. The server executes the "anchor" query, generating a result set called "myCTEName" containing just the top level rows.
  2. The server then executes the second query. At this point the result set "myCTEName" exists and can be referenced, so that you can link children to their parents. (That's why you see the JOIN)
  3. Step 2 is repeated recursively, adding grand-children, great-grand-children, and so on, until no more rows are being added, at which point it stops, and...
  4. The final result set is passed to the trailing SELECT, which pulls results out of "myCTEName" as if it were a table or view.

So when we code the 2nd part of the inside of the CTE, the part after the UNION ALL, act as if the first query has already run and produced a table/view called "myCTEName" that can be referenced. Once you understand that, the query is pretty easy to understand:

  • The "From myCTEName par" clause tells us we are pulling from the previously generated set. I like to use the alias "par" for "parent" to remind myself that the prior result is the parent row.
  • We then join to the original source table and use the alias "chd" to remind ourselves we are pulling child rows from there. The "ON chd.parent_key = par.primary_key" defines how children are joined to parents.
  • Our first column, "chd.primary_key", is the unique key for the results.
  • Our second column, "par.level+1" gives us a nifty automatically incremented "level" column.
  • Our third column, "par.top_key" ensures that all rows contain a reference to their top-most parent.
  • Our final column, "chd.parent_key", makes sure each row contains a reference to its immediate parent.
Finding Various Statistics

Once you have the inside of the CTE coded, the fun part moves to the final SELECT, which is operating on the complete set of results. You do not necessarily have to pull the complete list. For instance, you may want to find out the maximum nesting level for each parent, or the count of children for each parent:

;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
    select primary_key,1,primary_key,null
      from myRecursiveTable
     where Parent_key is null
    UNION ALL
    select chd.primary_key,par.level+1,par.top_key,chd.parent_key
      FROM myCTEName        par
      JOIN myRecursiveTable chd ON chd.parent_key = par.primary_key
)
select top_key
     , max(level) as maxNestingLevel
     , count(*)   as countRows
     , count(*)-1 as countChildren
  from myCTEName
Conclusion

Common Table Expressions give SQL-based databases the (very) long-needed ability to execute recursive queries, albeit with a rather complex syntax. Once you grasp the basics of how to code them, there are many possible uses that go far beyond the simple example shown here.

Categories: Development

ruby-plsql-spec upgraded to use RSpec 2.0

Raimonds Simanovskis - Thu, 2010-10-21 16:00

Initial version of ruby-plsql-spec gem was using RSpec version 1.3. But recently RSpec 2.0 was released which API is not compatible with previous RSpec 1.x API and as a result plsql-spec utility was failing if just RSpec was upgraded to version 2.0.

Therefore I updated also ruby-plsql-spec to use latest RSpec 2.0 gem and released ruby-plsql-spec gem version 0.2.1. You can install the latest version with

gem install ruby-plsql-spec
Upgrade from previous version

If you previously already installed initial ruby-plsql-spec version 0.1.0 then you need to update your spec/spec_helper.rb file to use RSpec 2.0. You can do it by running one more time

plsql-spec init

which will check which current files are different from the latest templates. You need to update just spec_helper.rb file. When you will be prompted to overwrite spec_helper.rb file then at first you can enter d to see differences between current file and new template. If you have not changed original spec_helper.rb file then you will see just one difference

- Spec::Runner.configure do |config|
+ RSpec.configure do |config|

You can then answer y and this file will be updated. When you will be prompted to overwrite other files then you can review the changes in the same way and decide if you want them to be overwritten or not (e.g. do not overwrite database.yml file as it has your specific database connection settings).

HTML output option

In addition plsql-spec utility now has --html option which will generate test results report as HTML report. It might be useful for usage in text editors where you can define which command line utility to run when pressing some shortcut key and then display generated HTML output report. If you will execute

plsql-spec run --html

then it will generate HTML report in test-results.html file. You can override this file name as well using --html output_file_name.html option.

Questions or suggestions

If you have any other feature suggestions or questions about ruby-plsql-spec then please post comments here or report any bugs at GitHub issues page.

Categories: Development

How to Programmatically Disclose a Row in ADF Faces Table

JHeadstart - Tue, 2010-10-12 23:53

If you have defined the detailStamp facet on an ADF Faces Table, you can disclose the content of this facet using the expand/collapse icon rendered at the beginning of the row.
Sometimes, you want to programmatically disclose this content, for example when the user adds a new row, it is convenient that alle inout fields in the detailStamp facet are displayed right away.

In ADF 10.1.3, the following code used to work:

RowKeySet rks = table.getDisclosureState();
rks.getKeySet().add(row.getKey());

In ADF 11, implementation has slightly changed. Since a tree binding is now also used to render plain tables, we need to pass in a list of keys, the key path. Unlike a tree or treeTable, the path is always one level deep, so we can create a list with just the key of the row we want to disclose:

RowKeySet rks = table.getDisclosedRowKeys();
List keyList = Collections.singletonList(row.getKey());
rks.add(keyList);

Note that when you actually use this code when adding a new row, the primary key must be pre-populated by the model. This is required anyway with the current ADF 11.1.3 release, as is documented in the JDeveloper release notes (JDeveloper 6894412).

Categories: Development

ADF Faces: How to Prevent Closing of Dialog When Pressing Enter Key

JHeadstart - Tue, 2010-10-05 00:05

The af:dialog element supports a number of built in buttons that can be displayed at the bottom of the dialog. The type property governs which buttons are displayed, like OK/Cancel or Yes/No. If the type property is not set, an OK and Cancel button is displayed by default.
When using the built in buttons, the first button is the default button. If you press the Enter key the dialog will close because the default button action is executed.

If your dialog window contains an af:subform with a default command button, and the cursor is in an input component inside the af:subform, clicking enter closes the dialog, it does not execute the default command associated with the subform.

To solve this, you can set the type property to "none" and use the buttonBar facet to include custom OK and Cancel buttons.
Here is an example:

<af:popup id="p0" contentDelivery="lazyUncached" eventContext="launcher">
  <af:dialog modal="true" title="My Dialog" type="none" id="dialog2"
             resize="on" stretchChildren="first">
    <af:region value="#{bindings.myTaskflowWithSubForm.regionModel}" id="region2"/>
    <f:facet name="buttonBar">
      <af:group id="g1">
        <af:commandButton text="OK" actionListener="#{myDialogBean.handleOK}" id="cb1"/>
        <af:commandButton text="Cancel" actionListener="#{myDialogBean.handleCancel}" id="cb2"/>
      </af:group>
    </f:facet>
  </af:dialog>
</af:popup>

JHeadstart users who use JHeadstart-generated List of Values, will see this enhanced behavior in the upcoming 11.1.1.3 release, which means you can use the Enter key to perform a quick search in the LOV dialog window.

Categories: Development

ruby-plsql-spec gem and code coverage reporting

Raimonds Simanovskis - Mon, 2010-10-04 16:00

During recent Oracle OpenWorld conference I presented session PL/SQL unit testing can be fun! where I demonstrated how to do PL/SQL unit testing with Ruby:

Audience was quite interested and had a lot of questions and therefore it motivated me to do some more improvements to ruby-plsql-spec to make it easier for newcomers.

ruby-plsql-spec gem and plsql-spec command line utility

Initially ruby-plsql-spec was just repository of sample tests and if you wanted to start to use it in your project you had to manually pick necessary files and copy them to your project directory.

Now ruby-plsql-spec is released as a gem which includes all necessary dependencies (except ruby-oci8 which you should install if using MRI Ruby implementation) and you can install it with

gem install ruby-plsql-spec

See more information about installation in README file or see specific installation instructions on Windows.

When you have installed ruby-plsql-spec gem and want to start to use it in your existing project then go to your project directory and from command line execute

plsql-spec init

It will create spec subdirectory in current directory where all initial supporting files will be created. The main configuration file which should be updated is spec/database.yml where you should specify username, password and database connection string that should be used when running tests:

default:
  username: hr
  password: hr
  database: orcl

If you specify just database: name then it will be used as TNS connection string (and TNS_ADMIN environment variable should point to directory where tnsnames.ora file is located) or you can also provide hostname: and if necessary also port: parameters and then you can connect to database without tnsnames.ora file.

Now you can start to create your tests in spec directory and your tests file names should end with _spec.rb. You can see some examples at ruby-plsql-spec examples directory

To validate your installation you can try to create simple dummy test in spec/dummy_spec.rb:

require "spec_helper"

describe "test installation" do
  it "should get SYSDATE" do
    plsql.sysdate.should_not == NULL
  end
end

And now from command line you can try to run your test with:

plsql-spec run

If everything is fine you should see something similar like this:

Running all specs from spec/
.

Finished in 0.033782 seconds

1 example, 0 failures
Code coverage reporting

During my Oracle OpenWorld presentation I also showed how to get PL/SQL code coverage report (which shows which PL/SQL code lines were executed during tests run). It might be useful when you want to identify which existing PL/SQL code is not yet covered by unit tests.

Now code coverage reporting is even easier with new ruby-plsql-spec gem. It uses Oracle database DBMS_PROFILER package to collect code coverage information and I took rcov reports HTML and CSS files to present results (so that they would be very similar to Ruby code coverage reports).

To try code coverage reporting let’s create simple PL/SQL function:

CREATE OR REPLACE FUNCTION test_profiler RETURN VARCHAR2 IS
BEGIN
  RETURN 'test_profiler';
EXCEPTION
  WHEN OTHERS THEN
    RETURN 'others';
END;

and simple test to verify code coverage reporting:

require "spec_helper"

describe "test code coverage" do
  it "should get result" do
    plsql.test_profiler.should == 'test_profiler'
  end
end

And now you can run tests with --coverage option which will produce code coverage report:

plsql-spec run --coverage

As a result code coverage reports are created in coverage/ subdirectory. Open coverage/index.html in your browser and click on TEST_PROFILER function and you should see something similar like this report:

You can see that RETURN 'test_profiler'; line (with green background) was executed by test but RETURN 'others'; line (with red background) was not. Lines with light background are ignored by DBMS_PROFILER and I do not take them into account when calculating code coverage percentage (but they are taken into account when calculating total coverage percentage).

Questions or feedback

If you have any other questions about using ruby-plsql-spec for PL/SQL unit testing then please post comments here or if you find any issues when using ruby-plsql-spec then please report them at GitHub issues page.

Categories: Development

ADF Faces: Avoid Use of &lt;jsp:include&gt; Tag!

JHeadstart - Wed, 2010-09-29 00:50

When reviewing ADF applications, I frequently see ADF Faces pages that use the <jsp:include/> tag to include another ADF Faces page fragment.

This is really old-school JSP programming and should be avoided:

  • Component id's in the included page fragment might conflict with component id's in the base page.
  • You cannot use it as a partial source or partial target in partial page refresh
  • You cannot use JSF expression language in the <jsp:param/&gt since it was designed for JSP, not JSF
  • You cannot customize this tag using MDS

It is much better to use the native ADF Faces <af:declarativeComponent> tag, which is a first-class UI Component:
  • It is a naming container with its own id property, you will never have id conflicts with the page using the component
  • It has a partialTriggers property and can be specified as partial target component by other UI components
  • You can pass parameters using custom properties (see below) or the <f:attribute> tag, and you can use JSF expression language to specify parameter values.
  • You can customize the component using MDS

So, replace code like this:

<jsp:include page="/includes/myIncludeFragment.jsff">
  <jsp:param name="myParam" value="myValue"/>
 </jsp:include>

with this:

<af:declarativeComponent id="dc0" viewId="/includes/myIncludeFragment.jsff" myParam="myValue"/>

Note the use of custom property "myParam" inside the tag, this property should be defined as attribute in the declarative component definition as shown below:

<?xml version='1.0' encoding='UTF-8'?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" xmlns:f="http://java.sun.com/jsf/core"
          xmlns:af="http://xmlns.oracle.com/adf/faces/rich" version="2.1">
    <af:componentDef var="attrs">
        <af:xmlContent>
            <component xmlns="http://xmlns.oracle.com/adf/faces/rich/component">
                <description>my fragment</description>
                <attribute>
                    <attribute-name>myParam</attribute-name>
                    <attribute-class>java.lang.String</attribute-class>
                    <required>true</required>
                </attribute>
            </component>
        </af:xmlContent>
        <!-- Actual content goes here-->
        <af:outputText value="#{attrs.myParam}"/>
    </af:componentDef>
</jsp:root>

Categories: Development

UIShell with Dynamic Tabs: Marking the Current Tab Dirty

JHeadstart - Mon, 2010-09-27 20:26

THIS POST IS OBSOLETE, THE TECHNIQUE DESCRIBED BELOW HAS SOME ISSUES. A NEW AND BETTER IMPLEMENTATION IS DISCUSSED HERE:  http://blogs.oracle.com/jheadstart/entry/core_adf11_uishell_with_dynamic

The Dynamic Tabs UI Shell Template includes an API to mark the current tab as dirty or clean.
By marking a tab as dirty, the tab label will appear in italics, and when closing the tab, the end user will get an alert that pending changes will be lost. Nice functionality, but it is left to the ADF developer to write custom code to call this API.

In the upcoming 11.1.1.3 release of JHeadstart, we added support to automatically call this API by checking the data control state of the current tab. This is done using a custom phase listener, that contains the following code:

package oracle.jheadstart.controller.jsf.listener;

import javax.faces.event.PhaseEvent;
import javax.faces.event.PhaseId;
import javax.faces.event.PhaseListener;

import oracle.adf.controller.internal.binding.DCTaskFlowBinding;
import oracle.adf.model.BindingContext;
import oracle.adf.model.binding.DCBindingContainer;
import oracle.adf.model.binding.DCDataControl;

import oracle.adf.view.rich.context.AdfFacesContext;

import oracle.binding.BindingContainer;

import oracle.ui.pattern.dynamicShell.Tab;
import oracle.ui.pattern.dynamicShell.TabContext;

import org.apache.log4j.Logger;

/**
* JHeadstart JSF PhaseListener.
* When using Dynamic tabs template, the current tab state is set to dirty or clean,
* depending on the state of the taskflow data control of the current tab
*/
public class JhsPhaseListener
implements PhaseListener
{

private static Logger sLog = Logger.getLogger(JhsPhaseListener.class);

public static final String PAGE_TEMPLATE_BINDING = "pageTemplateBinding";

@Override
/**
* While refreshing the curent tab seems more effective to do only just before render response
* it turns out that when closing a tab, and then returning to a dirty tab, the data control
* of the dirty tab is no longer seen as dirty, so we do it after the three phases that can
* change the state of the current tab: apply request values, update model values and invoke application
*/
public void afterPhase(PhaseEvent event)
{
if (event.getPhaseId() == PhaseId.APPLY_REQUEST_VALUES
|| event.getPhaseId() == PhaseId.UPDATE_MODEL_VALUES
|| event.getPhaseId() == PhaseId.INVOKE_APPLICATION)
{
checkCurrentTabDirtyState();
}
}

@Override
public void beforePhase(PhaseEvent event)
{
}

@Override
public PhaseId getPhaseId()
{
return PhaseId.ANY_PHASE;
}

public void checkCurrentTabDirtyState()
{
TabContext tabContext = TabContext.getCurrentInstance();
if (tabContext == null || tabContext.getSelectedTabIndex() < 0)
{
return;
}
BindingContainer bc =BindingContext.getCurrent().getCurrentBindingsEntry();
if (bc==null)
{
return;
}
DCBindingContainer pageTemplateBc =
(DCBindingContainer) bc.get(PAGE_TEMPLATE_BINDING);
if (pageTemplateBc == null)
{
return;
}
DCTaskFlowBinding tfb =
(DCTaskFlowBinding) pageTemplateBc.get("r" + tabContext.getSelectedTabIndex());
if (tfb == null || tfb.getExecutableBindings() == null ||
tfb.getExecutableBindings().size() == 0)
{
return;
}
DCBindingContainer taskFlowBc =
(DCBindingContainer) tfb.getExecutableBindings().get(0);
DCDataControl dc = taskFlowBc.getDataControl();
if (dc==null)
{
// no data control, we cannot detect pending changes
return;
}
boolean isDirty = dc != null && (dc.isTransactionDirty() || dc.isTransactionModified());
// calling covenience method markCurrentTabDirty adds content area as partial target,
// causing any popups currently displayed to be hidden again.
// Therefore retrieve current tab instance and call setDirty directly
// tabContext.markCurrentTabDirty(isDirty);
Tab tab = tabContext.getTabs().get(tabContext.getSelectedTabIndex());
if (tab.isDirty() != isDirty)
{
sLog.debug("Setting dirty state of dynamic tab with index " +
tab.getIndex() + " to " + isDirty);
tab.setDirty(isDirty);
AdfFacesContext.getCurrentInstance().addPartialTarget(tabContext.getTabsNavigationPane());
}
}
}

Just copy and past this code to create your own phase listener class, and register the phase listener in faces-config.xml like this (substitute with your own class name):


<lifecycle>
<phase-listener>oracle.jheadstart.controller.jsf.listener.JhsPhaseListener</phase-listener>
</lifecycle>

And make sure that the taskflows that are accessed through the dynamic tabs have the data-control-scope element set to shared.
That's all you need to do to enjoy this nice functionality.

Categories: Development

Pages

Subscribe to Oracle FAQ aggregator - Development