Re: Mapping arbitrary number of attributes to DB

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Thu, 26 Oct 2006 20:16:40 +0300
Message-ID: <Pine.SOL.4.62.0610261915190.15263_at_kruuna.helsinki.fi>


On 2006-10-26, Bob Badour wrote:

> You invented an up front modelling cost that doesn't necessarily even
> exist and then alerted us to this 'problem'.

No. I started with a proposed, EAV-like model. Since people around these parts don't exactly like EAV and to me it seems like a reasonable first pass design when your understanding of the data is as limited as the proposer's is, I have to come up with something that can be used to support my position, or alternatively can be refuted. I have to probe into why one would want to use EAV eventhough the relational model exists. This is simple abduction. After that, we have a deductive, noncircular argument backwards from the cost to why one might sometimes want to consider EAV.

> You assumed a non-existent large up-front 'modelling cost' to prove a
> large up-front 'modelling cost'.

No, to prove that EAV has at least one valid application. But yes, at this point the cost is a factual hypothesis, not a tried and true fact.

You claim that the cost is low. I on the other hand claim that the cost of training and employing relational experts is great, and that with current DBMS's this cost is largely incurred up front because there is no incremental path from totally unstructured, opaque blobs stored within the system to a fully relational, normalized, tightly constrained and well annotated design. Your proposed solution is partial at best, because in it the kinds of analysis that are often needed to generate and validate subsequent design decisions reaches over huge numbers of relations. Generating queries like that is cumbersome, current DBMS's aren't really prepared to handle them and tools that would help organize all this intermediate data are not readily available, whereas for at least some EAV-like representations they are.

>>> A dbms will allow one to dump each file into a relation that directly
>>> reflects the structure of the file. That doesn't cost a whole lot in terms
>>> of designing, and it does accomplish exactly what you describe above.
>>
>> Correct, but how precisely is that better than EAV?
>
> In every way imaginable.

Then surely you must be able to list at least a few. My list starts with the ease in EAV of querying for the presence of certain attributes across all of the data, because this sort of thing is rather natural when you're looking for potential dependencies and unifiable types in your data. In your representation this would translate into complicated SQL against a nonstandard catalog, in EAV it's a matter of self-joins on A. Queries involving both A and V in EAV are also easy enough, but in your representation would already involve quantification over relvars, which existing DBMS's fail to support unless you resort to custom procedural code.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Thu Oct 26 2006 - 19:16:40 CEST

Original text of this message