Re: Modelling objects with variable number of properties in an RDBMS

From: VC <boston103_at_hotmail.com>
Date: Wed, 2 Nov 2005 18:27:52 -0500
Message-ID: <cOWdnUFVM4rt1vTenZ2dnUVZ_smdnZ2d_at_comcast.com>

"Roy Hann" <specially_at_processed.almost.meat> wrote in message news:V9idnUh2xK2tjvTeRVnyjQ_at_pipex.net...
> "vc" <boston103_at_hotmail.com> wrote in message
> news:1130949100.933681.55190_at_z14g2000cwz.googlegroups.com...
>> > I find both claims quite astonishing. That is entirely contrary to
>> > what
> I
>> > would expect. Evidently it was contrary to what they expected too
>> > since
> you
>> > say they tried one of my suggestions. Since it seems we agree that
> wasn't
>> > the expected outcome that makes it an extraordinary claim.
>>
>> OK, a SQL Server row can be 8K long maximum. Say, a drug formula is a
>> relation like (some_stable_info_about 2K long,
>> ~5000-6000_lab_data_results). The SQL Server row can be 8K long
>> maximum which leaves room for 750 double precision columns for storing
>> trial results. In a multi-table, approach it would amount to 8 tables
>> partitioned horizontally, by attribute subsets.
>
> I assume you mean "vertically" not horizontally.

Right, sorry about that.

>
>> A simple request like
>> select formulas for a given date with drug trial parameters in some
>> given ranges can be expressed as a single SQL statement with an EAV
>> table and would require 8 unions, or thereabout, with multiple tables.
>
> I guess if I wanted to make the effort I might come up with the sort of
> queries you are thinking about here, but it's not my argument and I can't
> be
> bothered.

What's your agument then ? That EVA is "bad" ? But, we already know that. What alternative do you suggest other than using multitables?

>You are re-asserting they were tough queries, but you already
> said that.

The queries were not substantially hard. They did not perform well with multitable access due to various reasons. With "unions" unindexed multiple table access was comparable to a single table access, but indexed access was substantially slower due to the fact that multiple indexes had to be traversed from the root index node down to index leaves (similarly to locally partitioned indexes in Oracle). With "joins" they experienced performance hit both for indexed access and unindexed table queries (more for the latter). A trivial example in Oracle :

single table create table t select object_id,owner, object_type, created, last_ddl_time, status, timestamp from all_objects;
two tables representing the vertically partitioned table 't' with object_id being the primary key create table t1 as select object_id,owner, object_type, created from all_objects; create table t2 as select object_id, last_ddl_time, status, timestamp from all_objects;
two queries with stats SQL> select * from t where object_id between 200 and 400 and object_type='TABLE' 2 /

Statistics

         39  consistent gets
          0  sorts (memory)

SQL> select * from t1 join t2 on t1.object_id=t2.object_id where t1.object_id between 200 and 400 and object_type='TABLE' 2 Statistics

         97  consistent gets
          2  sorts (memory)

So when we try to glue together what'd better be a single table, we get two times more get operations and two sort operations. They experienced approximately 5-6 times performance hit for 7-8 way joins (which is quite understandable).

>
> Also, I am starting to wonder if the tables you are describing are really
> EAV tables (like what the OP came up with). I wonder if your tables might
> not be a perfectly legitimate design--something along the lines of
>
> create table result
> (
> experiment ...,
> datetime ...,
> analyte ...,
> concentration ...
> )
>

As I said before, their entities were: (chemical_substance_id, some_permanent_set_of_columns, date, trial#, 6000_measurements_per_trial)

>> The optimizer could not build an access plan with acceptable
>> performance with the multiple table approach. Joins complicated the
>> problem even more.
>
> No doubt there are very hard queries that tax an optimizer. You are
> asserting this is one of those cases but I choose not to take your word
> for
> it. Only examples will do.

Just to be fair to the optimizer. As is obvious from the simple example above, the optimizer, however smart it might be, simply cannot do any better because there will alway be substantial overhead caused by joining (or multiple index traversals) compared to the single table access.

>
>> Additionally, on a daily basis, about a dozen or more drug trial
>> parameters were added or dropped from consideration which would mean
>> potentially modifying all the 9 tables with DDL statements. Clearly,
>> it creates a severe maintenance headache.
>
> Maybe, maybe not. Would it have been so hard to create a tool to do it?
> I
> assume a tool was created to make sure the EAV tables were well-behaved,
> that no one entered spurious rows, nor left rows out. But perhaps not?

The EAV table modifications did not require much substantial effort. New entity_id-attribute_name-value triplets are just added to the table by ordinary insert statements with some integrity enforcement via the entity_id+attribute_name "foreign key". The no longer reqiured attributes are just as easily dropped by using a delete statement.

>
>> > An extraordinary claim always requires extraordinary proof and I think
> it is
>> > quite reasonable for me to remain skeptical about this until the
> evidence
>> > makes it perverse of me to go on denying it. I can be persuaded by
>> > evidence. Can you provide full details?
>>
>> Unfortunately, I cannot provide more detail than above since the design
>> discussion I mentioned happened about two years ago.
>
> Pity. I remain skeptical then.

So be it.

>
> I can think of lots of reasons why
> those folks might not have got their first (and therefore presumably
> preferred) approach to work.

Such as ?

>
>> You've offered first a saying
>
> In fairness it was a saying that summed up an important point. Do you
> disagree that one should only reluctantly discard a successful approach?

As I said many times, they decided to use the EAV appraoch very reluctantly, after going through multiple iterations with vertically sliced tables and a home-grown "user-defined type" where they concatenated multiple metrics into semicolon/comma delimited strings (attribute_name1:value, attribute_name2:value,...). Needless to say, the concatenation did not work out either.

With the multitable approach, they simply could not process experimental results in a timely fashion because of performance problems caused by excessive joins and unions.

> I have no doubt that you passionately believe in your position.

What's that supposed to mean ? Are we talking religion here, or simple easily interpretable experimental facts ?

>> Every
>> situation in a database design world has to be argued on its own merits
>> -- it's not a perpetuum mobile problem that has been settled once and
>> for all.
>
> That is not the implication of what I said. Of course the established
> orthodoxy might be toppled. I am just saying that the burden of toppling
> it
> is not on me if I take the orthodox position. I do.

Cool with me ;)

>
> Roy
>
>
Received on Thu Nov 03 2005 - 00:27:52 CET

This message: [ Message body ]
Next message: Roy Hann: "Re: Modelling objects with variable number of properties in an RDBMS"
Previous message: VC: "Re: Modelling objects with variable number of properties in an RDBMS"
Maybe in reply to: Roy Hann: "Re: Modelling objects with variable number of properties in an RDBMS"
In reply to Roy Hann: "Re: Modelling objects with variable number of properties in an RDBMS"
Next in thread: Roy Hann: "Re: Modelling objects with variable number of properties in an RDBMS"
Reply: x: "Re: Modelling objects with variable number of properties in an RDBMS"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message