Re: Announcing the "Instant Data Warehouse" Product

From: Venky Harinarayan <venky_at_cs.stanford.edu>
Date: 1996/02/15
Message-ID: <3123E85C.7D55_at_cs.stanford.edu>#1/1


Neil Raden wrote:
>
> In <31222AAA.D4_at_strategy.com> "Michael J. Saylor" <saylor_at_strategy.com>
> writes:
> >
> >Regarding:
> >" Deliverable ROLAP capacity may be less limited
> >by theoretical database size than by performance and the hassles of
> >maintaining hundreds or thousands of summary tables, so their
> >deliverable capacity is not necessarily larger than MDBs, and the
> >cost of their scalability is high in people and hardware costs."
> >
> >Nigel,
> >Consider the Woolworths data warehousing project again, in light of
 the above
> >comments, <snipped>
>
> Nigel and Michael,
> <deleted>

 After all, it's the first aggregation that
> summarizes the most detailed data to its next level; that gives you the
> bang for the buck. And if you visualize the intersection of the 1/x and
> ln(1+x) curves (where x is the number of aggregates taken), you see
> that the most dramatic drop in query cost (1/x) and the smallest number
> of additional records to the database (ln(1+x)) occur very early. This
> implies that a very few aggregates can deliver dramatic improvement.
>

Neil,  

I agree.
It appears that the benefit of creating more aggregates follows a law of diminishing returns. Our experiment with a small subset of the TPC-D benchmark database showed that the best 5 aggregates (out of 12) gave almost all the performance improvement.

These best 5 aggregates had a total of slightly more than 6 million rows (in fact, the "core" of the cube which we had to create , since we did not want to access the atomic data,had 6 million rows).  The worst 7 aggregates had a total of more than 16 million rows and gave almost no benefit.

Also, any general strategy that decides what to aggregate without taking into account the size of the aggregate tables can peform very poorly. By a similar token any analysis similar to what you have above has to take the sizes of the aggregates into account.

We have a paper that is to appear in this year's SIGMOD conference that looks at this issue in detail. A postcript copy hangs off my home page if you're interested.

Venky

-- 
-------------------------------------------------------------
venky_at_cs.stanford.edu
http://db.stanford.edu/~venky
-------------------------------------------------------------
Received on Thu Feb 15 1996 - 00:00:00 CET

Original text of this message