Re: Modelling objects with variable number of properties in an RDBMS
Date: 2 Nov 2005 08:31:40 -0800
Roy Hann wrote:
> "vc" <boston103_at_hotmail.com> wrote in message
> > I gave you a reason why someone might be forced into using EAV to which
> > you gave a smart-alecky response.
> I had intended it to be a mildly witty way of expressing an important point.
> I didn't realize it might seem to lack proper gravity.
> > That was tried and rejected primarily for performance reasons. The EAV
> > approach turned out to be faster and various SQL queries turned out to
> > be simpler in comparison to trying to combine results from multiple
> > tables representing a single entity.
> We weren't discussing performance, but I'll let that pass.
> I find both claims quite astonishing. That is entirely contrary to what I
> would expect. Evidently it was contrary to what they expected too since you
> say they tried one of my suggestions. Since it seems we agree that wasn't
> the expected outcome that makes it an extraordinary claim.
OK, a SQL Server row can be 8K long maximum. Say, a drug formula is a relation like (some_stable_info_about 2K long, ~5000-6000_lab_data_results). The SQL Server row can be 8K long maximum which leaves room for 750 double precision columns for storing trial results. In a multi-table, approach it would amount to 8 tables partitioned horizontally, by attribute subsets. A simple request like select formulas for a given date with drug trial parameters in some given ranges can be expressed as a single SQL statement with an EAV table and would require 8 unions, or thereabout, with multiple tables. The optimizer could not build an access plan with acceptable performance with the multiple table approach. Joins complicated the problem even more.
Additionally, on a daily basis, about a dozen or more drug trial parameters were added or dropped from consideration which would mean potentially modifying all the 9 tables with DDL statements. Clearly, it creates a severe maintenance headache.
> An extraordinary claim always requires extraordinary proof and I think it is
> quite reasonable for me to remain skeptical about this until the evidence
> makes it perverse of me to go on denying it. I can be persuaded by
> evidence. Can you provide full details?
Unfortunately, I cannot provide more detail than above since the design discussion I mentioned happened about two years ago. However, the stuff I reproduced to the best of my recollection should show that the matter is not so black and white as you've tried to paint it.
> Since it was unexpected, I would
> need to be convinced that no ordinary explanation could account for why the
> EAV design was (a) faster, and (b) "various SQL queries turned out to be
> simpler". Examples of the latter would be extremely compelling.
If you'd carefully read my other contributions to the thread, you'd have been able to figure out that, in fact, I am strongly opposed to the EAV approach. Besides, we are talking about a couple of entities implemented in an EAV manner, the rest of the data model was quite traditionally relational.
> Is that the claim you you make, or are you just saying that in
> this one time and place it worked better than several other things they
The latter. The circumstances I've described are not so rare in
pharmaceutical industry and bioinformatics.
> > There is a burden on you because you did not offer a sensible
> > alternative after having rejected EAV in toto.
> That is not the way rational discourse proceeds. The burden is never on the
> party that takes the established position (that's what established means).
The implication of your original statement was that EAV is bad in all the cases. My objection was that the generalization is not fair and I gave an example when it made sense to represent a table (or a couple of table) as an EAV set. You've offered first a saying and then a multitable approach that was not acceptable for the reasons I tried to describe above. Re: "rational discourse" and the "established position" see below.
> The EAV design has been widely and repeatedly criticised, has numerous clear
> weaknesses with respect to the established approach, and there are plausible
> reasons to doubt it should be intrinsically easier to code for. It is
> contrary to the established position. There is no burden on me to defend
> the established position.
The "established position" means nothing in a logical discourse. Every situation in a database design world has to be argued on its own merits -- it's not a perpetuum mobile problem that has been settled once and for all.
Received on Wed Nov 02 2005 - 17:31:40 CET