Re: Object databases beat joins (was: Re: ODMG Website?)

From: Bob Badour <bbadour_at_golden.net>
Date: Thu, 15 May 2003 20:45:54 -0400
Message-ID: <ZIYwa.88$226.18740260_at_mantis.golden.net>


"Mikito Harakiri" <mikharakiri_at_ywho.com> wrote in message news:ewVwa.12$MU1.103_at_news.oracle.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message
> news:RfVwa.58$oC4.13359172_at_mantis.golden.net...
> > Clustering will further reduce the single random IO to zero.
>
> I'm skeptical about clustering idea. Physical locality is a goal that is
> difficult to achieve. When implemented it is so complex that requires
> extraordinary DBA effort to maintain it. I was never convinced that
> mastering Oracle Clusters, for example, is worth the effort.

I can only assume the way oracle implements clustering is more complex than it needs to be.

> In general, we can't be certain how many layers of indirection is between
> the data stored on disk and query output. We might think that blocks x and
y
> are collocated, but filer has striping. Also, storing records with the
same
> join key value in the same block migh be good for that particular join
> order, but may adversely affect other queries.

Absolutely. And creating an index means there is redundant information that must be maintained. And adding physical pointers means there are redundant structural artifacts to maintain. etc.

Every physical structure biases performance in favour of some uses and against others. That's just the nature of the beast no matter the logical data model.

Your comment does not invalidate the point that a join need not have any performance cost.

> Then, query optimization is
> so overwhelmed with problems

Perhaps your optimizer is, but I am not convinced this has to be.

> that it simply couldn't devote sufficient
> attention to developing a convincing cost estimation model in the
clustering
> case.

Tell me, what is the physical difference between clustering 25 binary relations on a common key vs. storing a 26-ary relation with all non-key columns nullable? It seems to me they should have identical cost estimation models.

> Finally, clustering is only important for sequential-read devices (aka
> disks), and would progressively become less relevant as soon as
> random-access persistent storage (solid state disks, etc) become more
> common.

It has potential benefits for any block-read device that operates with significant latency or at a significantly slower speed than the cpu. Received on Fri May 16 2003 - 02:45:54 CEST

Original text of this message