Re: On the subject of Data Warehouses, Data Cubes & OLAP....

From: Bob Badour <bbadour_at_golden.net>
Date: Thu, 23 Oct 2003 12:48:12 -0400
Message-ID: <yv2cncJdW_MemwWiU-KYiQ_at_golden.net>


"DataMan" <dataman_at_ev1.net> wrote in message news:vpfms541cpv8ec_at_corp.supernews.com...
>
> "Bob Badour" <bbadour_at_golden.net> wrote:
> >"DataMan" <dataman_at_ev1.net> wrote in message
> >news:vpedvrc3kbf74_at_corp.supernews.com...
> >>
> >> "Bob Badour" <bbadour_at_golden.net> wrote:
> >> >"DataMan" <dataman_at_ev1.net> wrote in message
> >> >news:vpdppi7ccdn0f1_at_corp.supernews.com...
> >> >>
> >> >> "Bob Badour" <bbadour_at_golden.net> wrote:
> >> >> >"DataMan" <dataman_at_ev1.net> wrote in message
> >> >> >news:vpdad0mimma52e_at_corp.supernews.com...
> >> >> >>
> >> >> >> Most OLTP systems do support the information requested. It's
 just
 very
 difficult
> >> >> >> to retrieve.
> >> >> >
> >> >> >That is a remarkable and very interesting assertion. What can you
 offer
 to
> >> >> >support such a remarkable assertion?
> >> >> >
> >> >> The support information or difficult to retrieve statement?
> >> >
> >> >I do not find the "support information" assertion remarkable, and I
 doubt
> >> >anyone would. What can you offer to support your remarkable assertion
 that
> >> >information is difficult to retrieve?
> >> >
> >> I have no idea what you find remarkable.
> >
> >I already told you what I found remarkable. Can you offer any support for
> >your remarkable assertion?
> >
> >
> >> The world is full of remarkable
> >> people all finding different things remarkable. I find it remarkable
 that
> >> you find no value in dimensional data warehouses.
> >
> >Did I say that?
> >
>
> No, but since you make assumptions, I thought I would too.

I suppose we all make assumptions, and I try to make mine explicit. To which assumptions do you refer?

> >> As your opinion on this
> >> matter is already formed, and many books on the subject have
 unsuccessfully
> >> shown you the value of data warehouses, I don't believe I can sway your
 opinion
> >> on the subject.
> >
> >Did I say that?
> >
>
> ibid.

Again, to which assumptions do you refer?

> >> If you would like some recommendations on books that cover
> >> the material in depth, I would be happy to provide some.
> >
> >That seems a very long way of saying you have nothing to offer to support
> >your remarkable assertion. Apparently, you simply assumed it was true
> >because you heard it somewhere, and now you are out of your depth.
> >
>
> That's quite possible. But this is fun, so I continue. Please reference
> the attached data model. This is the standard movies data model supplied
> by CA when purchasing ERwin. If I wanted a report that provided the
 amount
> of rentals customer's are making using electronic payments by store, the
> resulting sql would be more complex than a dimensional model built around
> a transaction fact table with associated dimensions for customer, store,
> etc.

I don't see any attachments. Regardless, how does the requirement exceed the capabilities and features of a snapshot?

> >> >> >> Additionally when the business requires enterprise level
 reporting
> >> >> >> across the various functional applications, a warehouse becomes
 a
 necessity.
> >> >> >> You know the scene, large company with 10 different order entry
 systems
> >> >> >> for each functional area of the company (i.e. cost center). One
 for
 small
> >> >> >> biz, one for consumer, one for z product, one for y product, etc.
 And
 of
> >> >> >> course there was never any upfront effort by a data management
 organization
> >> >> >> to ensure consistent representation of enterprise level entities.
 So
 you
> >> >> >> end up with a hodge podge set of data that must be abstracted to
 make
 consistent
> >> >> >> within a DW environment. And since you're already there go ahead
 and
 put
> >> >> >> it in a dimensional format so somebody can understand it and
 easily
 pull
> >> >> >> data/reports.
> >> >> >
> >> >> >What exactly is a dimensional format? How does it differ from
 relational
> >> >> representation?
> >> >> They are both relational. Assuming both the application database
 and
 the
> >> >> data warehouse are both stored in relational databases. Was your
 question
> >> >> directed to differences between a data warehouse database (let's say
 OLAP
> >> >> for convenience) and an application database (let's say OLTP for
 convenience)?
> >> >> The primary difference is purpose. OLTP must support consistent
 real
 time
> >> >> data. OLAP only need support historical reporting.
> >> >
> >> >So then, OLAP is just another word for "snapshot" as in "a derived
 stored
> >> >relation" ?
> >> >
> >> I'm sorry, I don't understand your meaning in "a derived stored
 relation".
> >
> >What part do you not understand? Do you know what a relation is? Do you
 know
> >the difference between something that is stored versus something that is
 not
> >stored? Do you know what it means to derive something?
> >
>
> The derived stored relation part. Yes. Yes. Yes.

If you know what a relation is and you know what a derived relation is and you know what a stored relation is, what causes you trouble with understanding a derived stored relation? It all seems quite clear to me.

> >> I was trying to illustrate one of the basic differences between an
 application
> >> database and a dimensional data warehouse.
> >
> >You appear to have regurgitated meaningless pap. If you were illustrating
> >anything, you could offer support for your assertions.
>
> I thought it was meaningful and remarkable. Oh well. I guess in a
 nutshell,
> the only support I can offer is empirical. Most of the business community
> I have dealt with have difficulty navigating and understanding what is
 shown
> in complex, highly normalized data models.

Why would anyone navigate something that does not require navigation? The idea seems somewhat absurd to me. Where can I find the raw data for your empirical measurements? Can you email them to me?

> In fact, most business folks
> don't care.

I agree. Most of them accept meaningless pap without objection.

> They just want the application to work and they want their reports.

Is there supposed to be some meaningful point in the above observation?

> Those same folks who previously needed the IT shop to develop and write
> reports for them are now using BI tools to do their own data mining, data
> queries against dimensional models. Why or how they can do one and not
 the
> other is obviously a matter for debate. I see the success of the DW
 industry
> and it's prevalence in corporate world and decide that people see value in
> the technology.

What technology is that? Be specific: We are all engineers here.

> >> And that difference was that
> >> the application database is constantly changing over time as the
 business
> >> data changes. OTOH, in the data warehouse the data is never updated.
> >
> >Really? Now that is a truly remarkable statement! Do you realise this
>
> Sweet. Now I'm making progress!
>
> >fundamentally alters every cosmogeny I have ever heard: "In the beginning
> >was the data warehouse..." One immediately wants to ask whence this
 eternal
> >data came, but Ockham already took care of that one.
> >
>
> Great statement! But I had to look up Ockham. You are a scholar! The
 jury's
> still out on the gentleman part.

Definitely not a gentleman. The jury came back on that verdict ages ago.

> >> So
> >> a report run in a data warehouse today will be the tomorrow and the
 month
> >> after that.
> >
> >What would be the point in running the same report on unchanging data?
> >
>
> You want two copies?

Okay. But wouldn't a photocopier or Kinkos be cheaper?

> End of year summary?

Since nothing gets updated, what would be the point?

> Year to date?

Since nothing changes, what would the year to date show? Why wouldn't one just report on the data at the beginning of the year?

> >> >> >How or why can it improve understanding? How have you measured
 this
 improved
> >> >> understanding?
> >> >> I didn't say that it does.
> >> >
> >> >"And since you're already there go ahead and put it in a dimensional
 format
> >> >so somebody can understand it and easily pull data/reports."
> >> >
> >> >It sounds to me like you did. If you did not say that, what the hell
 did
 you
> >> >intend to say?
> >> >
> >> Hmmm, I guess I did say that. Sorry I was busy at work and responded
 too
> >> quickly to the post. A fully normalized model for a large application
 can
> >> have 100's of entities. A dimensional model is built to support
 specific
> >> reporting and consists of a fact table and its related dimension
 tables.
> >
> >Yes, potentially hundreds of dimension tables... Does a data warehouse
> >comprising multiple data marts have only a single fact table or is it not
 a
> >dimensional model?
> >
>
> I guess it's possible for a single fact table to have hundreds of
 dimensions.
> I've not seen one. I usually see 5 to 15 dimensions. A dimensional
 model
> supports only specific reporting. As such you would need to create a fact
> for each type of reporting required.

I see. We have lots of dimensions and lots of fact tables. How does one know which dimensions and facts to use?

> >> Occasionally, a dimension will also have a relationship to another
 table
> >> resulting in the snowflake term.
> >
> >So, if we rearrange our entities into a snowflake pattern, we get a data
> >warehouse?!?
> >
>
> I'm not quite sure how you came to that conclusion. There is the
 application
> database(s) and there's the data warehouse. "Snowflaking" entities in an
> application database doesn't make a DW.

Then what does make a DW? Received on Thu Oct 23 2003 - 18:48:12 CEST

Original text of this message