Re: Announcing the "Instant Data Warehouse" Product

From: Stan Zanarotti <srz_at_dimins.com>
Date: 1996/02/17
Message-ID: <4g5mit$m1d_at_sundog.tiac.net>#1/1


In article <31222AAA.D4_at_strategy.com>, Michael J. Saylor writes:
>The vendors which rely primarily on proprietary data structures for their
>analysis (Pilot, SAS, Dimensional Insight) did not know what their largest
>atomic database was (or refused to offer this information). All noted that
>their average installation relied upon approximately 300-700 MEGABYTES of
>atomic data. Let's keep in mind that Arbor's famous 40 gigabyte benchmark
>database had approximately 50 megabytes of atomic information. Based upon
>the underlying mathematics and our own industry sources, I suspect that there
>are few instances of MDDB applications which crack the 5 gigabyte atomicity
>barrier.

In our case it was a matter of not having the information at hand. After all, which metric does the market think is important today? I've always thought that measuring the output size of a database is pretty bogus, because that penalizes vendors who find ways of saving space. Also, I think number of rows or atomic items is a better measure than megabytes, because it factors out compression from duplicate strings and the like.

At the conference, Fred did mention that one of our customers uses our software to go against a DB/2 parts database that eats up most of a mainframe. It turns out it's 110 million rows of data, taking up 80 gigabytes. The user selects his query, and DB/2 does some of the selection and/or summarization that it needs to do before feeding it into our multidimensional builder.

Now you may call that ROLAP, since the base data is stored in a relational database, but this means the MDDB vs ROLAP split in the market is not as clean as people are claiming. After all, there's nothing magical about relational databases from a performance standpoint. If you have to access 100 million rows of information, you have to access 100 million pieces of information, whether it's inside a relational database or a flat file. Indexes help you gain selectivity, and preaggregation controls when the calculations get done, but the name of the game of decision analysis is extracting/compressing the atomic rows of data into the numbers that the users want, and there are multiple ways of doing this depending on what you know ahead of time and what the users bring to the show. What Woolworths did was throw hardware at the problem to do everything on the fly. It's impressive, but not something that every customer can or should do.

        -stan


Stan Zanarotti, V.P Research & Development	Dimensional Insight, Inc.
srz_at_dimins.com					(617) 229-9111
Received on Sat Feb 17 1996 - 00:00:00 CET

Original text of this message