A better view
Accessing the database from the outside world basically comes down to two options - direct querying or executing stored procedures. Procedural access is often chosen for the wrong reasons - making maintenance significantly harder.
Using exclusively or mostly stored procedures seems to be an established practice in SQL Server community, due to various limitations of the optimiser and security model in earlier versions of that server. However, even Oracle developers are not immune to procedural overdose. It seems to me that people coming from Forms background prefer procedures to relational access, and I occasionally come across a data access layer consisting completely of stored procedures, designed and implemented by seasoned Oracle professionals. Most popular explanations for that design approach are:
- Direct table access is a crime, so procedures must be built around them to restrict, control and audit access
- Procedures are required because some columns must be calculated on the fly
- Procedures provide clients with a way to request only 10 (50 or 100) records, and not retrieve everything
- Procedures are needed so that client programmers do not have to understand all the details of the underlying table model
To (most of) that, I say "Rubbish". If you are reading this line and becoming furious, before scrolling down to write an angry comment, I'd like to offer you a (different point of) view:
- Direct table access really is a crime, but procedures are not the only solution. Any restrictions you can impose on procedures, you can also implement with views.
- Additional data can be calculated on the fly with PL/SQL functions and incorporated into views to look just like normal columns. Analytical functions can perform even very exotic calculations like ranking and finding local maximums and minimums.
- Limiting the number of rows transferred from database to client code can be solved easily in client access drivers, using features like server-side cursors and paging. Even if the clients are using a driver without such support, it can trivially be implemented by adding "WHERE rownum<" or analytical ranking.
- Procedures are not really good for hiding the complexity of the underlying data model - views are much better as they are a lot easier to maintain and use.
Imagine a software company - let's call them Effective Oracle Inc. Effective Oracle gets a contract from Clint (named after the world famous spaghetti western star) to develop his Web bookshop, and they give that task to their two star developers, Deborah and Webber. Deborah is a database specialist, and Webber is programming the front end.
Deborah and Webber are working on the offer listing page, and Deborah knows that she cannot let Webber read directly from the book table, as that would get her fired from any self-respecting software company (and Effective Oracle is self-respecting). She might give Webber a stored procedure to call. A month later, Clint asks for another page, similar to the offer listing, but also displaying sales and web browsing statistics. Just to make things more interesting, those statistics must be calculated using estimated forex rates, not just read from tables. Now Deborah is faced with a choice - either create an additional procedure, very similar to the old one, or add a few columns into the cursor returned by the old procedure. If she chooses the first option, then she is duplicating code and making it harder to maintain in the future. If she takes the second, then Oracle will calculate sales and popularity even for the first report, though they will just be discarded. If she originally used a view, than she could just add two columns to it, and Webber could choose which columns to read. On the old page, he will not retrieve additional columns, and Oracle will not execute the calculations.
Here is another scenario: Clint's marketing people ask for some weird report, in order to hunt down customers which have not spent enough money in the last few months. Deborah and Webber do it again as a combination of database stored procedure and a web page. Everything is OK for a while, but then Clint wants another similar report, including only clients registered this year. Deborah is again faced with a tough choice - either do a new procedure (duplicating code), or add a 'filter' parameter to the old one, with some default value that signals 'no filtering'. If she starts adding parameters, Deborah knows that very soon she will have to check whether to use the first filter, second filter, none or both. That procedure would quickly turn into a hydra, and she would be scared to look at it, let alone modify it. Making Webber fetch unnecessary rows and filter them in the Web code is not really a choice. Had she used a view for the report, Webber could just add a condition into the WHERE clause.
The situation gets even more complicated when that report has to be combined with some other data (customers' account information, for example). If the report was produced by a procedure, Deborah could not just join it with the Account table - she would have to develop a stored procedure that would duplicate all the code. I've actually seen one more solution to this problem, with both cursors retrieved and then joined in Web code1 - but I would not suggest doing that, since you would soon find if your employer is self respecting.
By giving Webber a view, Deborah is letting him filter, combine and order data using the database engine. Oracle will be able to optimise queries better, less data will travel from database to web servers, and Webber does not have to reinvent the wheel and implement relational model functionality. I would trust guys from Oracle much more for this task than Web programmers - no matter how good those Web programmers are. Deborah does not have to maintain duplicated code. If Clint wants to change the ordering or filtering - Webber can do it by modifying the query. Deborah does not even have to do anything. If she is currently busy, Webber does not have to wait for her, and they will actually finish the job faster. Deborah can, of course, re-use the views and join them with other views and tables easily, again not duplicating code.
It gets even better. Half a year later, in the middle of a busy trading day, system starts running slow and a DBA finds that one of Deborah's queries is the bottle-neck. 'Just add a RULE', they say, 'and it will be fine'. But if the query is in PL/SQL code, changing that during busy hours is not always an option. Any other procedures and packages that depend on that procedure will be invalidated, connected clients will get an exception first time when they try to execute changed procedures, and if any jobs depend on those procedures (or packages), they might need to restart. So, a simple optimiser hint now becomes a major problem, requiring downtime, which might not be possible during busy hours. If Deborah used a view instead of the procedure, then the DBA would be able to insert the hint without much trouble.
Read-only procedures are not the only ones that get abused, though they are the most obvious example. I consider "Update" procedures generally a bad idea, since they quickly grow into monsters. Imagine a customer service application with three important use cases - in one use case, just the name and address should be updated in the customer table; another use case updates all customer data, and the third just deletes the address, leaving other customer data intact. All cases can be implemented with three different update procedures, or a single procedure encapsulating three update statements, selecting which one to use based on parameters. When another developer takes over that little monster pet, he will wonder what to do if the address parameter is null, so this code is very likely to attract errors.
Procedures are often used as a wrapper over inserts, in order to simplify the API or define default values. Personally, I don't like using procedures to restrict inserts or simplify them by defining column values, and think that triggers and default value specifications are much better for that. If some other developer decides to update the tables directly, triggers and defaults will still execute, unlike the values I defined using a procedure.
Views can also be updateable, so in most cases writing an 'update' procedure as a simple wrapper around a single SQL statement is just a waste of time. If the clients already access the data using a view, then let them update it - you will not have to write separate procedures for different parameter combinations, and they don't have to worry about calling the appropriate procedure. What ever data they want to modify, they should just update it.
And again, if they want to update or insert a number of records, you don't have to write a new procedure and they don't have to execute database calls in a loop. Sure, not all views are updateable, but instead-of triggers are a good workaround for that problem.
Procedures are often used to implement column-level security - to let a user update only certain columns, without the ability to modify others. Even in those cases, implementing a view that hides columns is much easier than writing a stored procedure wrapper over Update statements. Separating access by privileges is not really an excuse to force updates through procedures.
Any logging or auditing should not be done in an update stored procedure anyway, but as a trigger, since it should catch updates even if somebody runs them directly on the table.
Functions are excellent for encapsulating business rules (like "should discount apply" or "estimated forex rate" mentioned above), and I suggest using functions for business rules for the same reason I suggested views as read-write access layers: code will be easier to maintain. When the discount rate rule changes (and it will one day, no doubt), I will have to change just one function, and will not have to search through dozens of views and modify them.
Procedures are most useful for capturing business processes and automating operations, such as multi-step updates or creating a hierarchy of elements. Also, having different combinations of "default" values is really hard to implement without procedures, so they are good as "wizards" for creating records in complex cases. Since execution order of triggers cannot be guaranteed, procedures are excellent as wrappers around several trigger bodies that would need to be executed in a specific order.
I can even name a few cases when it's better to use a procedure for read-only access - typically involving several tables merged and filtered all by date ranges, when data could not be properly filtered with views. Though in most of those cases the report can be expressed as a query, and then filtered with an external Where clause, the execution plan would be significantly less optimal then if the query was parametrised from inside. In that case, I would suggest extracting the query into a parametrised cursor, and then (if the clients cannot access the parametrised cursor directly), writing an additional stored procedure that would just open the cursor and return it.
Procedures have a big advantage in familiarity with other programming environments, so they are easier to understand at first - but try to resist them. Using procedures just as dumb wrappers over SQL code does not pay off - and only brings maintenance problems. Stored procedures and functions are not really good for simplifying the API, hiding the underlying data model or implementing security restrictions - views are much better for that. However, this does not mean that procedures and functions don't have a place in the client access layer - but I strongly suggest preferring the relational approach when possible. You should do that because of a very selfish reason - your code will be much easier to maintain, so you will be able to spend time doing something smarter.
Gojko Adzic is an IT consultant specialised in designing and building high-throughput transaction processing systems and enterprise system integrations. His story so far includes equity and energy trading, mobile content delivery, e-commerce, online betting and complex configuration management. In his free time, Gojko maintains a blog about programming on gojko.net.
1 I've actually seen this once - profiler found that piece of code and politely warned us that the merging loop was consuming 90% of the processor - it took us quite some time to figure out what the author really wrote, but on the end, it turned out to be another fresh invention of the Cartesian product.