Re: Attribute-values in separate table

From: Roy Hann <specially_at_processed.almost.meat>
Date: Sat, 27 Oct 2007 15:44:02 +0100
Message-ID: <hpednbXnZKAi077anZ2dnUVZ8qminZ2d_at_pipex.net>


"Brian Selzer" <brian_at_selzer-software.com> wrote in message news:yoBUi.7699$Pv2.3001_at_newssvr23.news.prodigy.net...
>
> "Authorised User" <bg_at_microsoft.com> wrote in message
> news:6t0Ui.4613$CN4.2731_at_news-server.bigpond.net.au...

[snip]

>> So, having lots of tables is good from a data-purity point of view... but
>> spare a thought for the poor programmer who has to code for each table.
>> One general table is far from perfect, and we will lose of a great many
>> specific constraints that could help reduce data integrity problems...
>> BUT
>> we think we can run with it. In this particular case.
>
> It is a truly poor programmer indeed who would trade off data integrity
> for a faulty expectation of reduced work. If there are eighty types of
> atomic facts, then whether you're coding for one table or for eighty, you
> still have to code for eighty types of atomic facts. Is it really less
> work to lump it all together?

Unfortunately, given the piss-poor tools that programmers are content to use--for a whole lot of reasons that I could rage about at length some other time--the answer to your question isn't as clear as it should be. I will give you one example from my own recent experience that illustrates the point.

The client, the taxation authority of a small European country, had a requirement to provide an application to its helpdesk users who handle calls from the public about their personal tax returns. The application needed to present the helpdesk users with a myriad details about each of the tax allowances and credits each taxpayer might claim. At the time there were about 120 different allowances for everything from mortgage interest relief on investment properties to vet bills for guide dogs for the blind, and everything in between. The descriptions of the benefits were wildly different (as you might guess), and some required a variable number of diverse supplimentary facts. (In terms of complexity this subsystem dwarfed the rest of the tax system by an order of magnitude.)

To make this truly wild, the system also had to be able to respond to annual changes in tax legislation, within a legally defined six week period. To fail to implent an announced change within six weeks would require going back to the legislature and getting them to literallly change the law of the land. You can see where this is going.

Now normally I only get involved with performance issues at this site, but having demonstrated how the database design can strongly influence the performance of the applications, they invited me to review the proposed solution, which was absolutely classic EAV. To cut a long story short I told them, in terms, that they were embarking on a "massive blunder", and had an absolutely bruising meeting with several customer representative and also the prime contractor. It was 10 against 1, and my poor choice of words ensured I had no allies in the room at all. But what sealed my fate was the fact that my recommended solution (i.e. as many tables as it takes) just wouldn't work in their environment. The DBMS could handle the requirement in its sleep, and it probably would not have been any more difficult to code a sufficiently versatile front-end application (though more about that below). But what killed it dead was the middleware, which was some extremely over-mature version of Tuxedo that couldn't be replaced for a whole lot of very good (expensive) reasons. It turned out that this rev of Tuxedo could support only 128 distinct services, of which something like 100 were already in use. (I may not have the details right, but the point is Tuxedo imposed severe constraints.) Within their existing architecture my solution would have required n-hundred additional services, where n would grow each year.

My proposed solution to that problem was a single dynamic service. The database would have its hundreds of tables, and there would be a single service that would use dynamic SQL to query/update them. It would convert the data to EAV for transit, but it would be done automatically using the fully-documented, standard meta-data from the DBMS, and the process could be revered at the client end to recover the original tables. Elegant, I thought.

But no. This was unfamiliar territory for the programmers, who necessarily always outnumber the database wonks (precisely because they are so unproductive and therefore you need lots of them). There was just no persuading them that dynamic SQL is respectable, and enshrined in the ANSI/ISO standards. To them it was not a basic skill that they ought to have mastered in school, nor even a skill to be acquired within a one-day on-the-job training course. It was impossibly esoteric, risky, obscure, and just not going to happen. So not only would they not build a generic, dynamic Tuxedo service driven by the DBMS's own meta-data that they get for free, they would not build a client application that used the same meta-data to recover the tables from the EAV stream. To them, it was less risky to re-invent that meta-data their own way and store the all data in a single bloated EAV table in the database and build a single generic service that way.

In case anyone is wondering how it turned out, well, today it stumbles along. It has so-far appeared to meet the need for rapid change, at least in the sense that no one can tell if it doesn't work. What is not so clear is whether anyone has checked how reliable it is, and what is certain is that even simple queries are insanely complex; there are no integrity constraints of any kind, and no way to verify that every application is treating the meta-data consistently. Unfortunately, although this is the most egregious waste of expensive DBMS technology in the system, it is also fairly typical. Expections are very low both for run-time performance and for programmer productivity. This subsystem did not stand out as being particularly bad by the local standards.

So, if programmers are content to re-invent the wheel, and if they regard dynamic SQL as being too esoteric (or worse, they simply don't know about it and don't know the DBMS already provides more and better meta-data than they can devise--plus the logic to support it), then to them it could well look like EAV is less work. They are imagining they have to code 200 distinct services by hand to support 200 distinct tables. To them, the math shows EAV is cheaper.

(Note that I've written about dynamic SQL here. There are other tools available and one could imagine even better ones bing invented. I am not suggesting dynamic SQL is the only solution. I am just pointing out that at least one pretty good solution exists and one possible explanation why no one uses it.)

Roy Received on Sat Oct 27 2007 - 16:44:02 CEST

Original text of this message