multilevel modelling and constraints

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Fri, 23 Oct 2009 20:40:22 -0700 (PDT)
Message-ID: <8eccfc59-9fb2-49ab-a392-61a010c27028_at_p36g2000vbn.googlegroups.com>



I just realized that, for a number of years already, I've continuously been bumping into something that could perhaps be generalized into a bona fide data modelling (problem?) pattern. Bear with me...

As an example, think about a generalization hierarchy where we have a) a bunch of organizations at a higher level, and b) also the disaggregated mother companies and the physical business locations which the parent company controls, the two of which naturally can differ rather wildly in the detailed data we have on them, plus c) all of the extra metadata that can mesh between companies and business locations, while staying fully clear of the aggregated level data.

That's all the data we have so far, it's been neatly normalized and all of the necessary integrity constraints are in place. Us DBAs are living the good life, pretty much having already completed the job when we put up all of them neat productivity enhancing constraints. Now we're basically dreaming about our happy place, with complimentary bunnies and stuff.

Except that then comes down the latest business requirement: we suddenly know for a fact that Albania(tm) does not allow companies to own other companies. Ever. So, since we're conscientious DBAs, we immediately add an assertion which effectively says: "relationships from companies in Albania(tm) involved in a relationship with the type of "owned-by", to another company which also lives in Albania(tm), ist verboten". Everything is fine from there on, and the DBA can once again go to sleep.

But do notice what happened. At the higher level of companies, there would have been a numbre of integrity constraints. They couldn't have catched the case of our very special Albania(tm), though. Instead we had to build a second line of constraints, essentially a special case, at the lower level of abstraction, to deal with that.

If we had flattened the generalization hierarchy by using trickery like nulls and the like, and dealt with all of the special cases by hand, nothing like that would have been necessary. But then, we couldn't have managed the complexity. I mean, even now I'm dealing with an instance of this precise logic which would take manual, expert assessment of the validity of some 20K+ assertions to make it work at the lowest level. Using general rules and exceptions, I think the problem can be reduced to far under 500 human decisions.

Thus I think constraints should often be set at multiple levels of generalization, simultaneously. After proper normalization this implies that under an RDBMS, such constraints/assertions might spread wide and far, and be a bit more complicated than most are used to. Still, this is an *extremely* common pattern in everyday database work, so I think shying away from the implications woudl be a bit foolish. Received on Sat Oct 24 2009 - 05:40:22 CEST

Original text of this message