Re: Generalised approach to storing address details

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Fri, 15 Dec 2006 11:26:16 +0200
Message-ID: <Pine.SOL.4.62.0612151057580.24773_at_kruuna.helsinki.fi>


On 2006-12-14, Marshall wrote:

> I've never been exactly clear on what the term "network model" means;
> I've assumed it's quite comparable to an object graph, with which I am
> quite familiar.

It is.

> Join is not supported. No joins means denormalization is the rule;
> various offline processes manage the maintenance of the denormalized
> data. [...] When you ask people why they like it, well, "it's so
> FAST." (I will admit that it does scale like crazy.)

What you're describing is essentially the OLTP/OLAP tradeoff. Denormalized data can be much faster when you're dealing with a read-mostly environment with well-understood read patterns, because you're basically precomputing, or materializing, all the joins needed by your query mixture. This amounts to amortizing the join cost over multiple read transactions.

The problem is, this sort of thing really ought to be a user-transparent performance optimization. Whatever you precompute, really ought to become a mixture of physical access paths and materialized views added on the fly by the DBA, and only used via the query rewrite and physical optimization machinery. That's because there's no guarantee that the same data wouldn't one day be used with a heavier write load, in which case precomputation no longer scales the way it used to. If you've hardcoded the physical optimizations into your application, then you're in for a major rewrite. Adding a second image of the DB with a different physical organization -- the typical data warehousing solution -- is not going to be any easier, because you can only share code if it's written for the logical organization, and not a specific data layout. Not to mention all the integrity trouble that comes with maintaining the master copy of the database in denormalized form, especially given that neither ad hoc solutions nor current DBMSs have the formal machinery to declare and enforce all the arbitrary dependencies that can exist among your denormalized data items.

The only reasons I can think of to do it the way you described are that today's DBMSs do not support radical enough layout changes transparently, that query rewrite capabilities might be limited, and that sometimes we don't want the dynamic overheads associated with this sort of processing. But if somebody uses that to argue for object, hierarchical or ad hoc organizations, I think they have it the wrong way around. What should be fixed is the DBMS, not the data model, because there's no reason why an RDBMS couldn't efficiently support read-mostly, or navigational, or array oriented, or recursive, or whatnot, access. All you need is a little bit of query caching, explicit parallelism, physical layout capability, language extension and optimizer intelligence.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Fri Dec 15 2006 - 10:26:16 CET

Original text of this message