Re: Mapping arbitrary number of attributes to DB

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Wed, 25 Oct 2006 20:01:24 +0300
Message-ID: <Pine.SOL.4.62.0610251851110.23598_at_kruuna.helsinki.fi>


On 2006-10-25, Roy Hann wrote:

> To repeat what I already said in this thread, even the crappiest
> so-called SQL DBMSs on the market already solve the problem extremely
> well for practical purposes.

I don't think this is true, so we probably disagree about what the problem to be solved really is. You seem to think this is a nonissue because even the crappiest SQL database allows for schema evolution. Fair enough, but somebody skilled in data modeling has to issue those DDL statements, that somebody is going to have to be paid for the effort, and so there has to be a tangible return on the modeling investment before the data can be stored in bona fide relations.

Often the ROI just isn't there eventhough the data is. Instead we only know that we have some data, some proper subset of it will be useful later on, and so we want to store it, to be sorted out later. It doesn't make sense to invest into real data modeling now because later on we'll know better which subset of the data is going to be used. That both enables us to cut down on the eventual total modeling cost and to get the first application out without delay, while allowing today's data to be built upon at a later date. If the DBMS doesn't support inclusion of such data in some semistructured format, then the data is going to get stored as a blob, and if even this is disallowed, the DBMS will be ditched for flat file storage in an ad hoc format, like XML. This doesn't make a whole lot of sense because then we'll risk losing some of the structure that already exists, and we'll either have to reimplement or do without all of the useful, schema independent features of the DBMS, like recovery guarantees and access control.

A second typical example is data that is useful right now, but that is part of an application that not supported well enough by the existing DBMSs. (Some examples would be free text, objects used in hard realtime systems and large scientific datasets.) In this case we still want to leverage some parts of the system, like the recovery mechanisms, but also want to implement the core data model somewhere else (say, because of performance issues). The end result is the same as above; making fully developed schemas a requirement only causes the DBMS to be traded in for more primitive tools, and the data to be locked into formats with no clear evolutionary path towards a proper data model.

The way I see it, DBMS's primarily exist to make the total cost of ownership of data lower. Since all of the above happens on cost grounds, allowing limited use of semistructured data within a DBMS is just good engineering, provided it isn't overdone.

> We aren't going to move one inch further forward until we acknowledge
> where the problem really is now, and it sure ain't in the one part
> that's already solved it.

So what is the problem? In your earlier post you didn't really elaborate on that.

> This is an application development problem. Let's look at what needs
> to change there instead of trashing the one success we can already
> point to.

I don't think I'm trashing much of anything, because I do advocate proper modeling and relational storage as soon as we know what to model and the RDBMS adequately supports the application.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Wed Oct 25 2006 - 19:01:24 CEST

Original text of this message