Feed aggregator

Working with Oracle 11g

Jeff Hunter - Fri, 2010-12-03 05:11
As I continue to work on Oracle 11g, I am constantly reminded of a quote from Bertrand Russell: In all affairs it's a healthy thing now and then to hang a question mark on the things you have long taken for advantage.

Live Webcast: Eliminate Silent Data Corruption with Oracle Linux

Sergio's Blog - Fri, 2010-12-03 03:49

On Thursday 16 December at 9:00 am Pacific, Martin Petersen, Linux Kernel Developer at Oracle, and I are hosting a live webcast covering:

  • The impact of data corruption to the business environment
  • How Oracle's Unbreakable Enterprise Kernel reduces the potential for incorrect data to be written to disk
  • Data integrity features that decrease application and database errors and system down-time
Sign up here.

Categories: DBA Blogs

A Case When Table Design is Easy and Predictable

Kenneth Downs - Thu, 2010-12-02 20:36

Good table design is a great foundation for a successful application stack. Table design patterns basically resolve into master tables and transaction tables. When we know a thing or two about the master tables (or entities if you prefer), we can infer a great deal about the transactions.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

A Time Billing System

Imagine we have been asked to recode the company's time-billing system. Because this is for the company we work for, we have some inside knowledge about how things work. We know that:

  • There are, of course, customers.
  • ....and employees who record time
  • Each hour we record goes against a Work Order
  • There are different kinds of jobs, like project management, programming, programming management, and others.

Knowing only this, is it possible to anticipate what the system will look like? A safe answer is "no", on the claim that we will undoubtedly learn more, but this safe answer happens to be wrong. We can in fact anticipate the overall shape of the system, and new information will shift details, but it will not change the shape.

We can anticipate the nature of the transactions if we determine the upper bound of complexity and the combinatorial completeness of the system.

The Upper Bound of Complexity

We can safely assume that the big number to get right is going to be the billing rate. Our employer assumes we will get everything else right, but the billing rate is going to have them chewing their fingernails until they know we understand it and have coded it correctly.

The cool thing is that we already have enough information to establish an upper bound on the complexity of the system by looking at the master tables, where a master table is generally one that lists details about real things like people, places, things, or activities. So far we know (or think we know) about three master tables:

  • Customers
  • Employees
  • Services

Now we define the upper bound of complexity as:

The upper bound of complexity occurs when the billing rate is determined by all three master entities.

In plain English, calculating a billing rate can be as complicated as looking up a rate specific to a customer for a service for an employee but cannot be more complex than that because there are no other entities with which to work.

Combinatorially Complete

We can also anticipate all possible calculations for the billing rate by working through the complete set of combinations of master entities. This would look like the list below. Note that we are not trying to figure out right now which of these is likely to occur, we just want to get them listed out:

  • Each service has a default rate
  • Each customer has a negotiated rate
  • Each employee bills out at a default rate
  • The combination customer-service may have a rate
  • The combination customer-employee may have a rate
  • The combination customer-service-employee may have a rate (this is the upper bound of complexity, all three master entities determine the rate).

Unless we live in a super-simple world where only the first item in the list is present, we will end up dealing with several if not all of the combinations listed above.

Each of these combinations then becomes a table, and we know the billing rate will be determined by a resolution. New Information

Now comes the big day and we interview with somebody we'll call "The Explainer" who is going to officially explain the billing system. Can he break what we already know? No. At most he can:

  • Make us aware of new master entities, perhaps there are "projects" and "contracts" that get their own billing arrangements.
  • Dispel our notions about some of the combinations by saying, "Oh we never give a customer a default rate, the default rates come out of the services."
Going in Cold

What about the case where we know absolutely nothing about an assignment when we go in to begin the interviews? We can do a good job of thinking on our feet if we draw "The Explainer" towards the master entities. As we gain confidence that we know what the master entities are, we can ask questions to probe Combinatorial Completeness and the Upper Bound of Complexity.

One caveat: This method works for transactions between master entities. When "The Explainer" starts describing something that cannot be recognized as an interaction between master entities, do not try to stuff the problem into this box, it may not fit.

What About the Application?

At this point, we can also anticipate a lot of what the application will look like. We will need maintenance screens for all of the master entities, and a really slick UI will allow for very easy editing of those various cross-reference combination tables. As long as that much is done, we are almost finished, but not yet.

There will be some billing process that pulls the time entries, finds the correct billing rate for each one, and permanently records the invoices. If we use a resolution this task is child's play to code, debug, and maintain.

Then of course there is the presentation, the actual bill. Depending on the company, these may be delivered as hardcopy or in email. That will of course have to be coded up.

Conclusion

There are two conclusions. First, as originally stated, many transactions can be anticipated when you know what the master entities are.

But secondly, and every bit as important, once the table design is sound, the application pretty much writes itself. On a personal note, this is probably why I do not find application coding as exciting as I once did. Once I realized that the real challenge and satisfaction was in working out the tables, the coding of the app became a bit of a drudge, it requires no judgment as far as business rules are concerned.

Categories: Development

Happy Holidays everyone!

Eduardo Rodrigues - Thu, 2010-12-02 15:57


This is a summary only. Please, visit the blog for full content and more.

Happy Holidays everyone!

Java 2 Go! - Thu, 2010-12-02 15:57


This is a summary only. Please, visit the blog for full content and more.
Categories: Development

Simba previews Cognos8 Analysis Studio accessing Oracle Database OLAP Option cubes

Keith Laker - Thu, 2010-12-02 06:26
Hot on the heels of support for BusinessObjects Voyager, and in addition to the native Excel 2003/2007/2010 pivot table access, Simba are previewing the same connectivity for Cognos8 Analysis Studio - the dimensionally aware UI in the Cognos BI suite.

Together with the unique SQL access to the same multidimensional data & calculations in Oracle Database OLAP cubes (meaning that *any* tool or application capable of connecting to Oracle and issuing simple SQL can leverage the power of Database OLAP - like Oracle Application Express for example), plus the existing support for Oracle's own BI tools including
together with the big functionality and performance improvements in 11g , there is now every reason to move to Oracle Database 11gR2 and to fully exploit the OLAP Option - whatever your choice of front end tool(s).

For Cognos fans: Here is the Video on YouTube:

</param><param name="allowFullScreen" value="true" frameborder="0">
More information, see the Simba website : http://www.simba.com/MDX-Provider-for-Oracle-OLAP.htm
Categories: BI & Warehousing

Creating a centered page layout using ADF Faces (only)

Eduardo Rodrigues - Wed, 2010-12-01 14:07
by Eduardo Rodrigues It’s been a long time since I last posted here about my old friend: ADF. But now that I’m working in ADF BC’s awesome development team, I was already feeling the self-pressure...

This is a summary only. Please, visit the blog for full content and more.

Creating a centered page layout using ADF Faces (only)

Java 2 Go! - Wed, 2010-12-01 14:07
by Eduardo Rodrigues It’s been a long time since I last posted here about my old friend: ADF. But now that I’m working in ADF BC’s awesome development team, I was already feeling the self-pressure...

This is a summary only. Please, visit the blog for full content and more.
Categories: Development

Applying Grid Infrastructure Patch Set 1 (GI PSU1) to non-RAC installations

Yasin Baskan - Wed, 2010-12-01 07:17
If you are trying to patch an existing 11.2.0.1 installation on a non-RAC GI (also called single instance HA) system be aware that some steps will be different from the patch readme or the documentation.

Here is what I experienced.

The documentation for Grid Infrastructure 11.2 states that if you want to upgrade from 11.2.0.1 to 11.2.0.2 you need to apply some patches first.

"To upgrade existing 11.2.0.1 Oracle Grid Infrastructure installations to Oracle Grid Infrastructure 11.2.0.2, you must first do at least one of the following:
  • Patch the release 11.2.0.1 Oracle Grid Infrastructure home with the 9413827 and 9706490 patches.
  • Install Oracle Grid Infrastructure Patch Set 1 (GI PSU1) or Oracle Grid Infrastructure Patch Set 2 (GI PSU2)."
If you choose to apply GI PSU1 and follow the README there is a section that tells you to prepare the GI home.

"2.2.5 Prepare GI home for patch


      As an owner of Oracle GI home software owner, run the command:
      %/bin/srvctl stop home -o -s -n "

When I ran this on a single instance system I got:

[oracle@oel5 patches]$ srvctl stop home -o /u01/app/product/11.2.0/grid -s /tmp/stat.txt
PRCH-1002 : Failed  to stop resources running from  crs home /u01/app/product/11.2.0/grid
PRCH-1030 : one or more resources failed to stop PRCH-1026 : Failed to stop ASM
PRCD-1027 : Failed to retrieve database orcl
PRCD-1035 : orcl is the database unique name of a single instance database, not a cluster database

This is a problem with srvctl and this step must be skipped for non-RAC GI installations as the next step will stop GI. After this README tells you to unlock the grid home:

"You must invoke this script as root user to open protection on GI software files for patch application
      #/crs/install/rootcrs.pl -unlock"

rootcrs.pl is the script used for RAC environments. If you have non-RAC GI the script you need to use is roothas.pl. After changing rootcrs.pl with roothas.pl and unlocking the grid home you can continue with applying the patch.

MOS note 1089476.1 explains the steps to patch a non-RAC GI home. Keep this note in mind when you are working on such a system.

Most Ridiculous Password

Jeff Hunter - Wed, 2010-12-01 05:11
I encountered a website yesterday that required a complex password.  It's password rules were: at least 9 characters must include a capitol letter must include three digits must not repeat a letter, but can repeat a number must include one "special" character above a number [!@#$%^&*()] can not include "special" characters that is not above a number can not start with a capitol letter can not

The Really Cool NTILE() Window Function

Kenneth Downs - Tue, 2010-11-30 21:29

If you regularly code queries and have never been introduced to the windowing functions, then you are in for a treat. I've been meaning to write about these for over a year, and now it's time to get down to it.

Support in Major Servers

SQL Server calls these functions Ranking Functions.

PostgreSQL supports a wider range of functions than MS SQL Server, having put them in at 8.4, and PostgreSQL and calls them Window Functions.

Oracle's support is broader (by a reading of the docs) than SQL Server or PostgreSQL, and they call them Analytic Functions.

I try to stay away from MySQL, but I did a quick Google on all three terms and came up with a few forum posts asking when and if they will be supported.

The NTILE() Function

In this post we are going to look at NTILE, a cool function that allows you to segment query results into groups and put numbers onto them. The name is easy to remember because it can create any -tile, a percentile, a decile, or anything else. In short, an n-tile. But it is much easier to understand with an example, so let's go right to it.

Finding percentiles

Consider a table of completed sales, perhaps on an eCommerce site. The Sales Manager would like them divided up into quartiles, four equally divided groups, and she wants the average and maximum sale in each quartile. Let's say the company is not exactly hopping, and there are only twelve sales, which is good because we can list them all for the example. If we already had the quartiles provided then the query would be easy, so if we were lucky enough to be starting with this:

 CUSTTYPE | AMOUNT  | QUARTILE
----------+---------+----------
 RETAIL   |   78.00 |   1
 RETAIL   |  234.00 |   1
 DEALER   |  249.00 |   1
 DEALER   |  278.00 |   2
 RETAIL   |  392.00 |   2
 RETAIL   |  498.00 |   2
 DEALER   |  500.00 |   3
 RETAIL   |  738.00 |   3
 DEALER   | 1250.00 |   3
 RETAIL   | 2029.00 |   4
 RETAIL   | 2393.00 |   4
 RETAIL   | 3933.00 |   4

The query would be child's play if we already had the quartile:

Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM ORDERS
 GROUP BY quartile
 ORDER BY quartile
The Problem is We Do Not Have Quartile

The problem of course is that we do not usually have handy columns like QUARTILE provided, but we can generate the QUARTILE column during the query by using NTILE.

Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        -- The subquery is necessary
        -- to process all rows and add the quartile column
        SELECT amount
             , ntile(4) over (order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY quartile
 ORDER BY quartile

This query will give us what the Sales Manager wants.

Dissecting the Function and The OVER Clause

The NTILE() function takes a single argument, which tells the server how many groups to divide the data into. If there are not an exact number of rows in each group, the server decides which groups will be missing one row. So in an exact case all of your groups have the same count of rows, but when it does not divide evenly, one or more of them will be one row short.

If you pass 100 to NTILE(), you get a percentile. If you pass 10, you get a decile, and so forth.

The magic is in the OVER() function. This supports two clauses, and the example shows one, the ORDER BY. Quite simply, the ORDER BY clause tells the server how to line up the rows when adding the NTILE values. The clause is very flexible, and has nothing to do with your query's overall ORDER BY clause. The Second Clause: PARTITION

Now we will pretend the Sales Manager is not satisfied, and wants separate numbers for the two Customer Types. We could do this if the NTILE() function would create two sets of quartiles, one for each Customer Type, like so:

 CUSTTYPE | AMOUNT  | QUARTILE
----------+---------+----------
 DEALER   |  249.00 |   1
 DEALER   |  278.00 |   2
 DEALER   |  500.00 |   3
 DEALER   | 1250.00 |   4
 RETAIL   |   78.00 |   1
 RETAIL   |  234.00 |   1
 RETAIL   |  392.00 |   2
 RETAIL   |  498.00 |   2
 RETAIL   |  738.00 |   3
 RETAIL   | 2029.00 |   3
 RETAIL   | 2393.00 |   4
 RETAIL   | 3933.00 |   4

We can do this by using the PARTITION BY clause, which tells the server to break the rows into groups and apply the NTILE() numbering separately within each group. The new query would be this:

Select custtype
     , quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        -- The subquery is necessary
        -- to process all rows and add the quartile column
        SELECT amount
             , ntile(4) over (partition by custtype
                                 order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY custtype,quartile
 ORDER BY custtype,quartile
Bonus Points: The Median

Now once again the Sales Manager, who is never satisified, comes down and says that the average is no good, she needs the max and the median sale value within each quartile. To keep it simple, she does not need this broken out by customer type, it can be applied to the entire set.

This is a case where we can use NTILE() twice. The first time we will break all sales up into four groups, to get the quartiles, and then we will break up each quartile into two groups to get the median. The code looks like this:

Select quartile
     , max(case when bitile=1 then amount else 0 end) as medAmount
     , max(amount) as maxAmount
  FROM (
        -- The second pass adds the
        -- 2-tile value we will use to find medians
        SELECT quartile
             , amount
             , ntile(2) over (partition by quartile
                                  order by amount) as bitile
          FROM (
                -- The subquery is necessary
                -- to process all rows and add the quartile column
                SELECT amount
                     , ntile(4) over (order by amount) as quartile
                  FROM ORDERS
               ) x1
       ) x2
 GROUP BY quartile
 ORDER BY quartile

The magic here is that we know we've divided the data evenly into four sets, so the median will be the maximum value half way through each set. In other words, it will be the maximum value when the value of bitile=1 for each quartile.

One More Note About Oracle

Once you get down the basics of the OVER clause, Oracle looks really good, because they support the clause over the largest range of functions, at least going by the respective doc pages for each platform.

Categories: Development

Oracle Enterprise Linux 5.5 Installation on HP DV6 Quad Core with 8Gb RAM Notebook

Madhu Thatamsetty - Tue, 2010-11-30 09:03
Oracle Enterprise Linux 5 Update 5 Installation on HP DV6 (I7 700 Series) Quad Core 8GB RAM panics with Kernel panic - not syncingSolution: At the boot prompt use "linux acpi=off" , this will help you overcome the above kernel panic error mentioned and puts one more in front.In the next screen, it will prompt you to select the media location. If you are installing using CD it will prompt Madhu Sudhanhttp://www.blogger.com/profile/11947987602520523332noreply@blogger.com1

Revisited: Search Engine Plugins for Firefox to get directly to specific Oracle Patches, MOS Notes, Bugs

Gareth Roberts - Tue, 2010-11-30 02:03

Revisited: Following the upgrade from Metalink to My Oracle Support (MOS) I've updated the Note and Bug search engines (files oranote.xml and orabug.xml) per my prior post.

Revisited again 30-NOV-2010: Following the ARU change I've updated the Patch search engine (file orapatch.xml) per my prior post.

Navigate directly to a specific Oracle Patch, MOS/Metalink Note or Bug, speeding things up & sidestep that Flash! You gotta know the Patch/Note/Bug number you wanna get to:

Catch ya!
Gareth
This is a post from Gareth's blog at http://garethroberts.blogspot.com

 

Related Posts

 

Search Engine Plugins for Firefox to get directly to specific Oracle Patches, Metalink Notes, Bugs

Gareth Roberts - Tue, 2010-11-30 02:02

Revisited again 30-NOV-2010: Following the ARU change I've updated the Patch search engine (file orapatch.xml)

Update: The Note and Bug search engines (files oranote.xml and orabug.xml) have been updated post upgrade to My Oracle Support (MOS).

Navigating directly to a specific Oracle Patch, Metalink Note or Bug is a bit of a chore. Not to mention Metalink / My Oracle Support (MOS) could do with a mobile interface to speed things up & sidestep that Flash! Cut'n'pasting from my text file with the URL templates was getting tedious. So with inspiration from Eddie Awad's posts, I've put together three custom Firefox Search Engines, well, not really Search Engines, but "I'm Feeling Lucky" engines. You gotta know the Patch/Note/Bug number you wanna get to:

Once you've installed them, hit Control-K, choose the Patch, Note or Bug "search engine" (Control-Down Arrow), enter or paste the exact Patch, Note or Bug number, hit enter and voila, you're there ... if you're logged into the target site!

Give it a try: e.g. Patch 5612820, Note 444524.1, Bug 6074498.

If you get the XML files, put them in your C:\Program Files\Mozilla Firefox\searchplugins folder (or similar), restart your browser and you'll be up and running!

If you need a generic Metalink search engine in the same vain look here.

PS. You will need your Metalink (MOS) username/password to get to the target pages.

PPS. Hoping Oracle doesn't change the URL structures 2 minutes after I post this ;-) Let me know if I don't notice when that happens!

References

    Eddie's posts:

Related Posts

 

Loops Without Cursors

Kenneth Downs - Mon, 2010-11-29 21:19
Looping Without Cursors

Sometimes you need to process a table row-by-row, and the established approach is to use cursors, which are verbose, slow, and painful to code and use.

The Cursor Example

Here is the basic minimum syntax required to loop through a table and get something done. The SQL flavor is MS SQL Server, but its not much better in any other flavor.

-- I coded this off the top of my head, there
-- may be a minor syntax error or two

-- Most of this is pseudo-code, but take
-- note that it is ordered on column1
declare someCursorName cursor for
 select column1, column2, column3 
   from anyTable
  ORDER BY column1

-- Have to do this now
open someCursorName

-- Now you need to declare some variables
-- For the example I'm just making everything int
declare @column1 int
      , @column2 int
      , @column3 int

-- Gosh, we're actually about to start the loop!  Finally!
fetch next from someCursorName into @column1,@column2,@column3
while @@fetch_status = 0 begin

   --  If you still remember what you actually wanted
   --  to do inside the loop, code it here:

-- Repeat this line from the top here again:
fetch next from someCursorName into @column1,@column2,@column3
end

-- Not done yet, these two lines are crucial
close someCursorName
deallocate someCursorName

Call me petty, but what I hate about that code is that I have to refer to specific columns of interest 3 times (not counting the declarations). You refer to them in the cursor declaration and in the two FETCH commands. With a little clever coding, we can vastly simplify this and do it only once.

Using An Ordered Column

We can execute the same loop without the cursor if one of the columns is ordered and unique. Let us say that column1 is the primary key, and is an auto-incremented integer. So it is ordered and unique. The code now collapses down to:

-- I coded this off the top of my head, there
-- may be a minor syntax error or two

-- We can't get around declaring the vars, so do that
declare @column1 int
      , @column2 int
      , @column3 int

-- If you know a safe value for initialization, you
-- can use the code below.  If this is not 100% 
-- safe, you must query for the value or it must
-- be supplied from some other source
set @column1 = -1

-- BONUS POINTS: Can this become an infinite loop?
while 1 = 1 begin

-- Now we code the query and exit condition
 select TOP 1
        @column1 = column1
      , @column2 = column2
      , @column3 = column3 
   from anyTable
  WHERE column1 > @column1  -- this is what advances the loop
  ORDER BY column1

if @@rowcount = 0 begin
    break
end

        -- Put the actions here        

end
Final Notes

The only requirement for this approach is that you have a unique ordered column. This usually means a unique key or primary key. If "column1" is not unique, the loop will skip all but the first value in each group.

Also, it is very nice if you know a safe value to use as an initializer. Without that, you must query for the minimum value that matches the condition and then decrement it by one.

Finally, can this loop become infinite? No. Well, if, in the extremely unlikely situation that rows are being added to the base table faster than you are processing them, then yes, it could go on for a very long time. But if that were happening I'd say there was a separate problem to look at.

It should probably go without saying, but if the particular loop is going to happen very often, the table should be indexed on your unique ordered column. If it is a primary key or you already have a unique constraint it is not necessary to create an index explicitly because there will be one as part of the key or constraint.

Categories: Development

Project Shaphan Updated

Digital Eagle - Sat, 2010-11-27 19:13
This page has moved.  Please update your links:
http://psst0101.digitaleagle.net/2010/11/27/project-shaphan-updated/

I just finished releasing a new version of my Project Shaphan to the Google Code site.  My goal with the project is to create a database query tool.  Oracle has SQL Developer which is a great, but it is geared more for database development.  Most PeopleSoft database users will never see a line of plsql or a single stored procedure.  What we need is a tool that understands PeopleSoft databases more and offers tools for querying it.

New Feature: Navigation Search

It isn’t very pretty, but here is a tool where you can enter a Component Name, and the tool will display all of the navigations for that component:

Selection_598

 

Meta SQL

I only have %CurrentDateIn working right now, but I plan to get more:

Selection_599

Getting Your Copy

To try it out, you can download the Jar file from the right side of the main page.  Make sure you have Java 6 installed.  Then, place the Jar file in a directory by itself, and double click on it.

Selection_600


You can always learn something new.

Jared Still - Sat, 2010-11-27 13:57
It’s high time for this blog to come off hiatus.  I really don’t know why I let it go so long, just pre-occupied with work and extra curricular activities I guess.

One of those activities was to contribute two chapters to a new book from Apress, Pro Oracle SQL.  Though it was only two chapters, it did consume a significant amount of time.  Some folks seem to be able to bang out well written prose and code with seemingly little effort.  It seems that I labor over it more than most, at least it feels that way at times.

On to something new.  Not really new, but it was new to me the other day.  Or if it was not new, I had completely forgotten about it.

It has to do with the innocuous date formats used with to_date().  I ran into to some unexpected behavior from to_date() while running one of the scripts used for the aforementioned book.
When logging into a data base, part of my normal login includes setting the nls_date_format for my session:

 alter session set nls_date_format='mm/dd/yyyy hh24:mi:ss'  

The main purpose of doing so is so that DBA scripts that include dates will display in my preferred format without the need to use to_date(to_char()) to display the preferred format while preserving the data type.

When writing scripts that may be used by others, or any circumstance where I cannot rely on a setting for nls_date_format, I will use to_char() and to_date() with format masks to ensure the script will run without error.

When developing scripts for use in published writing, I normally do not set nls_date_format for my sessions, but this time I had forgot to disable it.

So, when double checking the scripts to be included with the book chapters, I was rather surprised to see that one of them did not work.

 SQL> l  
1 select
2 to_date('01/01/2011 12:00:00','mm/dd/yyyy hh24:mi:ss')
3 , to_date('01/01/2011')
4* from dual;
SQL>
, to_date('01/01/2011')
*
ERROR at line 3:
ORA-01843: not a valid month

The SQL session I was checking it from was connected to a  completely new and different database, setup just for the purpose of verifying that the scripts all worked as I expected, but one script failed on the to_date().  I at first thought it just do to not having a format mask specified in the second to_date(), but then immediately wondered why script had always worked previously. You can probably guess why, though at first I did not understand what was occurring.

The new environment was not setting nls_date_format upon login.  I had inadvertently setup my initial test environment where the scripts were developed with nls_date_format=’mm/dd/yyyy hh24:mi:ss’.

What surprised me was that to_date(‘01/01/2011’) had worked properly without a specific date format mask, and a date format that did not match the nls_date_format.

The “new” bit is that as long as the date format corresponds to part of the session nls_date_format setting, the conversion will work.

So, with nls_date_format set to ‘mm/dd/yyyy hh24:mi:ss’, we should expect to_date(‘01/01/2011’) to succeed.

This can easily be tested by setting a more restrictive nls_date_format, and then attempting to use to_date() without a format mask.

 SQL> alter session set nls_date_format = 'mm/dd/yyyy';  
Session altered.
SQL> select to_date('01/01/2011 12:00') from dual;
select to_date('01/01/2011 12:00') from dual
*
ERROR at line 1:
ORA-01830: date format picture ends before converting entire input string

When I saw that error message, I then understood what was happening. to_date() could be used without a format mask, as long as the date corresponded to a portion of the nls_date_format.  When the specified date exceeded could be specified with nls_date_format, an ORA-1830 error would be raised.

In this sense it is much like number formats.  I was a little surprised that I didn’t already know this, or had forgotten it so completely.

But, here’s the real surprise.  The following to_date calls will also be correctly translated by nls_date_format.

 SQL> select to_date('Jan-01 2011') from dual;  
TO_DATE('JAN-012011
-------------------
01/01/2011 00:00:00
1 row selected.

SQL> select to_date('Jan-01 2011 12:00:00') from dual;
TO_DATE('JAN-012011
-------------------
01/01/2011 12:00:00
1 row selected.

This was quite unexpected it.  It also is  not new.  I tested it on various Oracle versions going back to 9.2.0.8, and it worked the same way on all.

There’s always something to learn when working with complex pieces of software such as Oracle, even something as seemingly simple as formatting dates.
Categories: DBA Blogs

Revisiting Normalization and Denormalization

Kenneth Downs - Sat, 2010-11-27 12:43

In this blog I have done at many articles on Normalization and Denormalization, but I have never put all of the arguments together in one place, so that is what I would like to do today.

There are links to related essays on normalization and denormalization at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The What and Why of Normalization

Normalization is the process of designing tables so that each fact is stored in exactly one place. A "fact" in this case is any detail that we have to keep track of, such as a product's description, a product's price, an employee's social security number, and so forth.

The process is all about figuring out what tables you need and what columns each table will have. If we are talking about an employee's social security number, then we can guess right from the start that will have a table of EMPLOYEES, and that one of the columns will be SSN. As we get more details, we add more tables and columns.

The advantage of normalization comes when your application writes data to the database. In the simplest terms, when the application needs to store some fact, it only has to go to one place to do it. Writing this kind of code is very easy. Easy to write, easy to debug, easy to maintain and improve.

When the database is not normalized, you end up spending more time writing more complicated application code that is harder to debug. The chances of bad data in your production database go way up. When a shop first experiences bad data in production, it starts to become tempting to "lock down" access to the database, either by forcing updates to go through stored procedures or by trying to enforce access to certain tables through certain codepaths. Both of these strategies: stored procedures and code paths, are the actually the same strategy implemented in different tiers, they both try to prevent bugs by routing access through some bit of code that "knows what to do." But if the database is normalized, you do not need any magic code that "knows what to do."

So that, in brief, is what normalization is and why we do it. Let's move on now to denormalization.

Denormalization is Harder to Talk About

Normalization is easy to explain because there is a clearly stated end-goal: correct data. Moreover, there are well-defined methods for reaching the goal, which we call the normal forms, First Normal Form, Second Normal Form, and higher forms. By contrast, denormalization is much harder to talk about because there is no agreed-upon end goal. To make matters worse, denormalization violates the original theory of Relational Databases, so you still have plenty of people screaming not to do it all, making things even more confusing. What we have now in our industry is different shops denormalizing in different ways for different reasons.

The arguments that I have heard in my career boil down to two basic groups. The first set of arguments centers around calculated or derived values, and the second set centers around programmer convenience.

Arguments for Derived Values

My own experience comes down heavily in favor of denormalizing by storing derived values directly into the tables, with the extremely signficant caveat that you must have a way to ensure that they are always correct. In this paradigm you maintain strict normalization for facts supplied from the outside, and then layer on additional facts that are calculated during write operations and saved permanently.

Here is a very simple example. A strictly normalized database happens to be missing data that many programmers would automatically assume should be stored. Believe it or not, a simple value in a shopping cart like EXTENDED_PRICE is forbidden by 3rd normal form because it is a non-key dependency, or, in plain English, since it can be derived from other values (QTY * PRICE), then it is redundant, and we no longer have each fact stored in exactly one place. The value of EXTENDED_PRICE is only correct if it always equals QTY * PRICE, and so there is now a "fact" that is spread across three locations. If you store EXTENDED_PRICE, but do not have a way to ensure that it will always 100% of the time equal QTY * PRICE, then you will get bad data.

So, given the risk of bad data, what is to be gained by putting EXTENDED_PRICE into the cart? The answer is that it adds value to the database and actually simplifies application code. To see why, imagine a simple eCommerce shopping cart that does not store any derived values. Every single display of the cart to the user must go all over the place to gather lots of details and recalculate everything. This means re-calculating not just the EXTENDED_PRICE, but adding in item level discounts, taking account of possible tax exemptions for different items, rolling the totals to the cart, adding in tax, shipping, perhaps a customer discount, a coupon, and who knows what else. All of this just to display the cart, every time, no matter what the purpose.

This situation leads to three problems. A pitifully slow application (too many disk reads and lots of cycles calculating the values), maddening bugs when an application update has subtle changes to the calculations so the customer's order no longer displays the same numbers as it did yesterday, and the frustrating requirement that the simplest of reports must route through application code to calculate these values instead of simply reading them off the disk, which leads to reporting systems that are orders of magnitude slower than they could be and horribly more complicated than they need to be because they can't just read straight from the tables.

Now let's look at how that same shopping cart would be used if all of those calculated values were generated and saved when the order is written. Building on your foundation of normalized values (price, qty), you need only one body of code that has to perform calculations. This magic body of code takes the user-supplied values, adds in the calculations, and commits the changes. All other subsequent operations need only to read and display the data, making them faster, simpler, and more robust.

So the obvious question is how to make sure the derived values are correct. If they are correct, we gain the benefits with no down side. If there is the smallest chance of bad data, we will quickly pay back any benefit we gained by chasing down the mistakes.

From a technical standpoint, what we really need is some technology that will make sure the calculations cannot be subverted, it cannot be possible for a stray bit of program code or SQL Statement to put the wrong value in for EXTENDED_PRICE. There are a few generally accepted ways to do this:

  • Require all writes to go through a certain codepath. The only PRO here is that you keep the logic in the application code, and since most shops have more programmers than database people, this makes sense. The only CON is that it never works. One programmer working alone can maintain discipline, but a team cannot. All it takes is one programmer who did not know about the required codepath to screw it all up. Also, it makes your system inflexible, as it is no longer safe to write to the database except through a single application.
  • Require all writes to go through stored procedures. This is nominally better than the codepath solution because it is not subvertible, and you can allow different side apps and utilities to safely write to the database. But it makes a lot of work and tends to be very inflexible.
  • Putting triggers onto tables that perform the calculations and throw errors if a SQL statement attempts to explicitly write to a derived column. This makes the values completely non-subvertible, ensures they will always be correct, and allows access from any application or utility. The downside is that the triggers cannot be coded by hand except at extreme cost, and so must be generated from a data dictionary, which is fairly easy to do but tends to involve extreme psychological barriers. In these days of ORM many programmers mistakenly believe their class files define reality, but this is not true. Reality is defined by the users who one way or another create the paychecks, and by the database, which is the permanent record of facts. But a programmer who thinks his classes define reality simply cannot see this and will reject the trigger solution for any number of invalid reasons.

So denormalizing by putting in derived values can make a database much more valuable, but it does require a clear systematic approach to generating the derived values. There is no technical problem associated with ensuring the values are correct because of course the application has to do that somehow somewhere anyway, the real barriers tend to be the psychological and political.

Arguments For Programmer Convenience

The second set of arguments for denormalization tend to be rather weak, and come down to something like this (you have to picture the programmer whining like a child when he says this), "I don't like my data scattered around so many tables, can't we play some other game instead?"

Many programmers, when they first learn about normalization and build a normalized database, discover that the data they need to build a screen is "scattered" about in many tables, and that it is tedious and troublesome to get it all together for presentation to the user. A simple example might be a contacts list. The main table is CONTACTS, and it contains not much more than first and last name. A second table is a list of PHONES for each contact, and a third table is a list of various mailing addresses. A fourth table of EMAILS stores their email addresses. This makes four tables just to store a simple contact! We programmers look at this and something inside of us says, "That's just way too complicated, can't I do something else instead?"

This is a case of programmer convenience clashing with correctness of data. Nobody argues (at least not that I've heard) that they do not want the data to be correct, they just wonder if it is possible to simplify the tables so that they do not have to go out to so many places to get what they need.

In this case, programmers argue that denormalization will make for simpler code if they deliberately skip one or more steps in the normalizing process. (Technically I like to call the result a "non-normalized" database instead of denormalized, but most people call it denormalized, so we will go with that.)

The argument goes something like this: I know for a fact that nobody in the contacts list will have more than 3 emails, so I'm going to skip the EMAILS table and just put columns EMAIL1, EMAIL2, and EMAIL3 into the main CONTACTS table. In this case, the programmer has decided to skip 1st Normal Form and put a repeating group into the CONTACTS table. This he argues makes for simpler database retrieval and easier coding.

The result is painfully predictable. The simplification the programmer sought at one stage becomes a raft of complications later on. Here is an example that will appear trivial but really gets to the heart of the matter. How do you count how many emails a user has? A simple SELECT COUNT(*)...GROUP BY CONTACT that would have worked before now requires more complicated SQL. But isn't this trivial? Is it really that bad? Well, if all you are coding is a CONTACTS list probably not, but if you are doing a real application with hundreds of tables and this "convenience" has been put out there in dozens of cases, than it becomes a detail that programmers need to know on a table-by-table basis, it is an exception to how things ought to be that has to be accounted for by anybody who touches the table. In any shop with more than 5 programmers, whatever convenience the original programmer gained is lost quickly in the need to document and communicate these exceptions. And this is only a single trivial example.

Other examples come when it turns out you need more than three slots for phone. In the normalized case this never comes up. Any user can have any number of phones, and the code to display the phones is running through a loop, so it does not need to be modified for the case of 1 phone, 2 phones, etc. But in the "convenient" denormalized case you now must modify the table structure and the code that displays the contacts, making it quite inconvenient.

Then you have the case of how to define unused slots. If the user has only one email, do we make EMAIL2 and EMAIL3 empty or NULL? This may also seem like a silly point until you've sat through a flamewar at the whiteboard and discovered just how passionate some people are about NULL values. Avoiding that argument can save your shop a lot of wasted time.

In short, programmer convenience should never lead to a shortcut in skipping normalization steps because it introduces far more complications than it can ever pay for.

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The normalization essays on this blog are:

Categories: Development

OT: Good value first class fare?

Nigel Thomas - Fri, 2010-11-26 05:43
From the NationalRail website:




One million squids for a day return?

Speaker Awards and a Pantomime.....

Lisa Dobson - Thu, 2010-11-25 15:42
Before you think this is something completely random (which it probably is) yes, I am indeed talking about this year’s UKOUG TEBS Conference.The Inspiring Presentation Awards (IPA’s) will be handed out for the first time at this year’s UKOUG Technology and E-Business Suite Conference.The event will be taking place in the Exhibition Hall on Tuesday evening from 18:30 with the awards and the Lisahttp://www.blogger.com/profile/16434297444320005874noreply@blogger.com0

Pages

Subscribe to Oracle FAQ aggregator