Re: Modelling objects with variable number of properties in an RDBMS

From: VC <boston103_at_hotmail.com>
Date: Fri, 4 Nov 2005 07:00:03 -0500
Message-ID: <naOdnUMTycmk0PbeRVn-tw_at_comcast.com>


"Frank Hamersley" <terabitemightbe_at_bigpond.com> wrote in message news:2eyaf.8082$Hj2.1929_at_news-server.bigpond.net.au...

> VC wrote:

>> "Bernard Peek" <bap_at_shrdlu.com> wrote in message
>> news:xGctaeGgBWaDFwgS_at_shrdlu.com...
>>
>>>In message <xpednZGpW5rOofTeRVn-pQ_at_comcast.com>, VC
>>><boston103_at_hotmail.com> writes
>>>
>>>>>Should these tests really be considered attributes? Wouldn't the act
>>>>>of presenting them as a two-tuple relation <test#, test result> simple
>>>>>be an wise step of normalization - not an example of an EAV
>>>>>decomposition? Am I totally missing the boat on this?
>>>>
>>>>It's a single test with ~6000 measurements.
>>>
>>>That doesn't sound like something that requires multiple attributes. If
>>>it's a single test then the results are presumably in one domain so a
>>>single entity should work.
>>
>> I am not intimately familiar with the testing process. According to the
>> person who created the model, it's a drug discovery chemical compound
>> testing process which runs daily. The test result for a given compound
>> is represented by many thousands of numbers, hence 600 attributes per
>> entity. They generated close to half a terabyte of experimental data each
>> day.
>>
>>>What distinguishes one set of test results from another? Is it the time
>>>the test was done, the temperature?
>
> It seems to me that this sounds like a situation where practical concerns 
> to cope with the volume of data has induced an attempt to achieve 
> compression (by reducing the repeated occurence of key information were 
> the schema to be normalised per RM theory.

"Compression" or disk space saving was not a consideration. As I said several time before, bad performance due to excessive unions/joins was. They simply could not process the data o time.

> If the first model produces 1/2 Tb then the second will surely occupy much > more space.

See above.

>don't think I could ever sell out the RM and go with option 1.  It is much 
>better IMO to find ways to overcome the problems of the second option by 
>choosing the right DBMS and configuring it appropriately, rather than (to 
>coin a metaphor) to "use an SUV (or even a fleet of SUV's) to move 20 
>tonnes of freight from coast to coast"!

As I said before, replacing SQL Server with another db was not an option due to non-technical reasons. Received on Fri Nov 04 2005 - 13:00:03 CET

Original text of this message