Re: normalization

From: <rjsearle_at_gmail.com>
Date: Wed, 2 Nov 2005 09:33:30 +1000
Message-ID: <392977e50511011533g4ba2f351k63871963215ee7cf@mail.gmail.com>

Hi Chris,
Is everyone clear on the objectives of this database? Normalisation is not the holy grail, it is one path to the holy grail! The primary benefit of normalisation (IMHO) is to avoid update anomalies. So that is the goal. Therefore when deciding to normalise or not, or in this case whether to store derivable values or not, you must consider the risk of update anomalies, resulting in a loss of data integrity. Now if your environment is an OLTP, then your data integrity risk is higher . If you proceed with storing derivable values in an OLTP, I believe that you must have additional mechanisms to enforce data integrity, such as background processes that verify the accuracy of the derived columns, triggers, MVs, function based indexes etc. SO it may still make sense to store derivable values, but remember that there is always a price to pay, it's just a question of when the price is paid. Conversely, if the environment is DW then the risks are far lower, possibly zero if the app only inserts and never updates. In this case, I advocate storing derivable data, because it is stable, therefore there are no update anomalies ('cause there are no updates!) My guess is that this is an OLTP env with stringent performance criteria. In this situation, my preferred option are (in order) 1. Ensure that the existing hardware is being efficiently used. Try to design the database for performance BEFORE jumping on the denormalise option. This is a lazy approach that has many hidden costs. So look at function based indexes, MVs, physical storage design etc. IE. start at the existing hardware and work up from there. there are many, many stages of tuning before a change is required to the data model. This possibly the cheapest option
2. reconsider the hardware platform. It is easier and cheaper to upgrade the hardware to achieve performance than to have a team of people design, build, test, implement and support additional software to enforce data integrity. 3. Once the first two options are fully exhausted, then assess carefully the costs and benefits of denormalising and decide if there is any real value. If denormalisation is truly the only option then, remember that you are compromising the primary goal above as you are introducing the risk of update anomalies! So you must have measures to first avoid such problems and also detect and correct problems should they arise (and they will!). So first up, use constraints on the column, triggers on inserts and updates etc. They are some of the problem avoidance measures. Secondly, you must detect the problems, so background processes may be necessary. These measures will cost to produce, support and employ. THere is a performance hit from these measures as well as more design, develop, test, implement and support costs. After all of this, there is still a risk of innaccuracies. Remember to factor in to your cost model the cost to the business of inaccurate data because this data will be used to underpin commercial decisions, so I' recommend avoiding introducing flaws into the decision process. That process already has enough flaws without compounding it with dodgy data!.

Where do I leave the $0.02?
Russell

On 11/2/05, stephen booth <stephenbooth.uk_at_gmail.com> wrote:
>
> On 01/11/05, Powell, Mark D <mark.powell_at_eds.com> wrote:
> >
> > Jared >> Calculated fields are for reporting. eg. data marts and/or data
> > warehouses. <<
> >
> > For a simple colA + ColB * Col7 = calc_col value the cost should not be
> that
> > high. But what if you have to extract 50,000,000 rows in one shot then
> the
> > calculation time cost adds up quickly. So it will depend somewhat on
> > business requirements. Storing calculated columns is something to avoid
> > doing if at all possible, but doing so will generally not cause DML
> activity
> > anomalies like failing to normalize normal columns can.
> >
>
> In contemporary versions of Oracle you could possibly use a
> Materialized View instead of calculated fields in the table itself.
> Obviously there's a whole bunch of other issues that come along with
> MVs but they're probably worth looking at as an option.
>
> Calculated fields are a breach of 3NF. But then, like most things in
> life, business applications don't always fit into nice mathematical
> models so every rule has to have an addendum something like "...unless
> there's a really good reason to do otherwise." I think it's important
> to be aware that you are breaching a rule and to spend some time
> thinking about the implications.
>
> e.g. You might need a before update trigger to make sure that when any
> field that is used in a calculation is changed the fileds thaqt are
> calculated from that are recalculated. Actually you might be best to
> put the calculation in a trigger and make it a before insert or update
> if you can.
>
> Stephen
>
> --
> It's better to ask a silly question than to make a silly assumption.
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Nov 01 2005 - 17:35:36 CST