Re: On the subject of Data Warehouses, Data Cubes & OLAP....

From: DataMan <dataman_at_ev1.net>
Date: Thu, 23 Oct 2003 21:16:54 -0000
Message-ID: <vpgha67etvtd6f_at_corp.supernews.com>


"Bob Badour" <bbadour_at_golden.net> wrote:
>"DataMan" <dataman_at_ev1.net> wrote in message
>news:vpfms541cpv8ec_at_corp.supernews.com...
>>
>> "Bob Badour" <bbadour_at_golden.net> wrote:
>> >"DataMan" <dataman_at_ev1.net> wrote in message
>> >news:vpedvrc3kbf74_at_corp.supernews.com...
>> >>
>> >> "Bob Badour" <bbadour_at_golden.net> wrote:
>> >> >"DataMan" <dataman_at_ev1.net> wrote in message
>> >> >news:vpdppi7ccdn0f1_at_corp.supernews.com...
>> >> >>
>> >> >> "Bob Badour" <bbadour_at_golden.net> wrote:
>> >> >> >"DataMan" <dataman_at_ev1.net> wrote in message
>> >> >> >news:vpdad0mimma52e_at_corp.supernews.com...
>> >> >> >>
>> >> >> >> Most OLTP systems do support the information requested. It's
 just
 very
 difficult
>> >> >> >> to retrieve.
>> >> >> >
>> >> >> >That is a remarkable and very interesting assertion. What can you
 offer
 to
>> >> >> >support such a remarkable assertion?
>> >> >> >
>> >> >> The support information or difficult to retrieve statement?
>> >> >
>> >> >I do not find the "support information" assertion remarkable, and
 I
 doubt
>> >> >anyone would. What can you offer to support your remarkable assertion
 that
>> >> >information is difficult to retrieve?
>> >> >
>> >> I have no idea what you find remarkable.
>> >
>> >I already told you what I found remarkable. Can you offer any support
 for
>> >your remarkable assertion?
>> >
>> >
>> >> The world is full of remarkable
>> >> people all finding different things remarkable. I find it remarkable
 that
>> >> you find no value in dimensional data warehouses.
>> >
>> >Did I say that?
>> >
>>
>> No, but since you make assumptions, I thought I would too.
>
>I suppose we all make assumptions, and I try to make mine explicit. To which
>assumptions do you refer?
>
>
>> >> As your opinion on this
>> >> matter is already formed, and many books on the subject have
 unsuccessfully
>> >> shown you the value of data warehouses, I don't believe I can sway
 your
 opinion
>> >> on the subject.
>> >
>> >Did I say that?
>> >
>>
>> ibid.
>
>Again, to which assumptions do you refer?
>
>
>> >> If you would like some recommendations on books that cover
>> >> the material in depth, I would be happy to provide some.
>> >
>> >That seems a very long way of saying you have nothing to offer to support
>> >your remarkable assertion. Apparently, you simply assumed it was true
>> >because you heard it somewhere, and now you are out of your depth.
>> >
>>
>> That's quite possible. But this is fun, so I continue. Please reference
>> the attached data model. This is the standard movies data model supplied
>> by CA when purchasing ERwin. If I wanted a report that provided the
 amount
>> of rentals customer's are making using electronic payments by store, the
>> resulting sql would be more complex than a dimensional model built around
>> a transaction fact table with associated dimensions for customer, store,
>> etc.
>
>I don't see any attachments. Regardless, how does the requirement exceed
 the
>capabilities and features of a snapshot?
>

What does this question have to do with the my previous point? Either the query is simpler or it's not. The board prevents uploading binary data. If you're interested, I can send it to you. You'll need ERwin to read it.

>
>> >> >> >> Additionally when the business requires enterprise level
 reporting
>> >> >> >> across the various functional applications, a warehouse becomes
 a
 necessity.
>> >> >> >> You know the scene, large company with 10 different order entry
 systems
>> >> >> >> for each functional area of the company (i.e. cost center).
 One
 for
 small
>> >> >> >> biz, one for consumer, one for z product, one for y product,
 etc.
 And
 of
>> >> >> >> course there was never any upfront effort by a data management
 organization
>> >> >> >> to ensure consistent representation of enterprise level entities.
 So
 you
>> >> >> >> end up with a hodge podge set of data that must be abstracted
 to
 make
 consistent
>> >> >> >> within a DW environment. And since you're already there go ahead
 and
 put
>> >> >> >> it in a dimensional format so somebody can understand it and
 easily
 pull
>> >> >> >> data/reports.
>> >> >> >
>> >> >> >What exactly is a dimensional format? How does it differ from
 relational
>> >> >> representation?
>> >> >> They are both relational. Assuming both the application database
 and
 the
>> >> >> data warehouse are both stored in relational databases. Was your
 question
>> >> >> directed to differences between a data warehouse database (let's
 say
 OLAP
>> >> >> for convenience) and an application database (let's say OLTP for
 convenience)?
>> >> >> The primary difference is purpose. OLTP must support consistent
 real
 time
>> >> >> data. OLAP only need support historical reporting.
>> >> >
>> >> >So then, OLAP is just another word for "snapshot" as in "a derived
 stored
>> >> >relation" ?
>> >> >
>> >> I'm sorry, I don't understand your meaning in "a derived stored
 relation".
>> >
>> >What part do you not understand? Do you know what a relation is? Do you
 know
>> >the difference between something that is stored versus something that
 is
 not
>> >stored? Do you know what it means to derive something?
>> >
>>
>> The derived stored relation part. Yes. Yes. Yes.
>
>If you know what a relation is and you know what a derived relation is and
>you know what a stored relation is, what causes you trouble with
>understanding a derived stored relation? It all seems quite clear to me.
>

Great. What's your point then?

>
>> >> I was trying to illustrate one of the basic differences between an
 application
>> >> database and a dimensional data warehouse.
>> >
>> >You appear to have regurgitated meaningless pap. If you were illustrating
>> >anything, you could offer support for your assertions.
>>
>> I thought it was meaningful and remarkable. Oh well. I guess in a
 nutshell,
>> the only support I can offer is empirical. Most of the business community
>> I have dealt with have difficulty navigating and understanding what is
 shown
>> in complex, highly normalized data models.
>
>Why would anyone navigate something that does not require >navigation? The
idea seems somewhat absurd to me.

And to alot of other people. Some people figured out how to get around all the navigation.

>Where can I find the raw data for your
>empirical measurements? Can you email them to me?
>

Isn't that an oxymoron?

>
>> In fact, most business folks
>> don't care.
>
>I agree. Most of them accept meaningless pap without objection.
>
>
>> They just want the application to work and they want their reports.
>
>Is there supposed to be some meaningful point in the above observation?
>
>
>> Those same folks who previously needed the IT shop to develop and write
>> reports for them are now using BI tools to do their own data mining, data
>> queries against dimensional models. Why or how they can do one and not
 the
>> other is obviously a matter for debate. I see the success of the DW
 industry
>> and it's prevalence in corporate world and decide that people see value
 in
>> the technology.
>
>What technology is that? Be specific: We are all engineers here.
>

Ummm, data warehousing?

>
>> >> And that difference was that
>> >> the application database is constantly changing over time as the
 business
>> >> data changes. OTOH, in the data warehouse the data is never updated.
>> >
>> >Really? Now that is a truly remarkable statement! Do you realise this
>>
>> Sweet. Now I'm making progress!
>>
>> >fundamentally alters every cosmogeny I have ever heard: "In the beginning
>> >was the data warehouse..." One immediately wants to ask whence this
 eternal
>> >data came, but Ockham already took care of that one.
>> >
>>
>> Great statement! But I had to look up Ockham. You are a scholar! The
 jury's
>> still out on the gentleman part.
>
>Definitely not a gentleman. The jury came back on that verdict ages ago.
>
>
>> >> So
>> >> a report run in a data warehouse today will be the tomorrow and the
 month
>> >> after that.
>> >
>> >What would be the point in running the same report on unchanging data?
>> >
>>
>> You want two copies?
>
>Okay. But wouldn't a photocopier or Kinkos be cheaper?
>
>
>> End of year summary?
>
>Since nothing gets updated, what would be the point?
>

All 12 months in one report. Don't you get one of those from you CC company?  Do you keep it or throw it away?

>
>> Year to date?
>
>Since nothing changes, what would the year to date show? Why wouldn't one
>just report on the data at the beginning of the year?
>

How about trending? Don't folks at your company want to see how sales of a certain product are trending on a monthly, quarterly basis? This is real easy to do with a dimensional model BTW. Try it in a normalized data model (if they keep history at all).

>
>> >> >> >How or why can it improve understanding? How have you measured
 this
 improved
>> >> >> understanding?
>> >> >> I didn't say that it does.
>> >> >
>> >> >"And since you're already there go ahead and put it in a dimensional
 format
>> >> >so somebody can understand it and easily pull data/reports."
>> >> >
>> >> >It sounds to me like you did. If you did not say that, what the hell
 did
 you
>> >> >intend to say?
>> >> >
>> >> Hmmm, I guess I did say that. Sorry I was busy at work and responded
 too
>> >> quickly to the post. A fully normalized model for a large application
 can
>> >> have 100's of entities. A dimensional model is built to support
 specific
>> >> reporting and consists of a fact table and its related dimension
 tables.
>> >
>> >Yes, potentially hundreds of dimension tables... Does a data warehouse
>> >comprising multiple data marts have only a single fact table or is it
 not
 a
>> >dimensional model?
>> >
>>
>> I guess it's possible for a single fact table to have hundreds of
 dimensions.
>> I've not seen one. I usually see 5 to 15 dimensions. A dimensional
 model
>> supports only specific reporting. As such you would need to create a
 fact
>> for each type of reporting required.
>
>I see. We have lots of dimensions and lots of fact tables. How does one
 know
>which dimensions and facts to use?
>

The one that you need. Is this a trick question?

>
>> >> Occasionally, a dimension will also have a relationship to another
 table
>> >> resulting in the snowflake term.
>> >
>> >So, if we rearrange our entities into a snowflake pattern, we get a data
>> >warehouse?!?
>> >
>>
>> I'm not quite sure how you came to that conclusion. There is the
 application
>> database(s) and there's the data warehouse. "Snowflaking" entities in
 an
>> application database doesn't make a DW.
>
>Then what does make a DW?
>
>

That's really a good question. Because there are lots of people who do data warehousing and probably really are not. Let me think about this for awhile and see if I can come up with the characteristics of a data warehouse versus an application database. What are the characteristics of your DW? Received on Thu Oct 23 2003 - 23:16:54 CEST

Original text of this message