Re: On the subject of Data Warehouses, Data Cubes & OLAP....

From: Bob Badour <bbadour_at_golden.net>
Date: Fri, 24 Oct 2003 00:13:50 -0400
Message-ID: <a8OdnW15a5KlOgWiU-KYvA_at_golden.net>


"DataMan" <dataman_at_ev1.net> wrote in message news:vph4uu60id4e3c_at_corp.supernews.com...
>
> "Bob Badour" <bbadour_at_golden.net> wrote:
> >"DataMan" <dataman_at_ev1.net> wrote in message
> >news:vpgha67etvtd6f_at_corp.supernews.com...
> >>
> >> "Bob Badour" <bbadour_at_golden.net> wrote:
> >> >"DataMan" <dataman_at_ev1.net> wrote in message
> >> >news:vpfms541cpv8ec_at_corp.supernews.com...
> >> >>
> >> >> "Bob Badour" <bbadour_at_golden.net> wrote:
> >> >> >"DataMan" <dataman_at_ev1.net> wrote in message
> >> >> >news:vpedvrc3kbf74_at_corp.supernews.com...
> >> >> >>
> >> >> >> "Bob Badour" <bbadour_at_golden.net> wrote:
> >> >> >> >"DataMan" <dataman_at_ev1.net> wrote in message
> >> >> >> >news:vpdppi7ccdn0f1_at_corp.supernews.com...
> >> >> >> >>
> >> >> >> >> "Bob Badour" <bbadour_at_golden.net> wrote:
> >> >> >> >> >"DataMan" <dataman_at_ev1.net> wrote in message
> >> >> >> >> >news:vpdad0mimma52e_at_corp.supernews.com...
> >> >> >> >> >>
> >> >> >> >> >> Most OLTP systems do support the information requested.
 It's
 just
 very
 difficult
> >> >> >> >> >> to retrieve.
> >> >> >> >> >
> >> >> >> >> >That is a remarkable and very interesting assertion. What can
 you
 offer
 to
> >> >> >> >> >support such a remarkable assertion?
> >> >> >> >> >
> >> >> >> >> The support information or difficult to retrieve statement?
> >> >> >> >
> >> >> >> >I do not find the "support information" assertion remarkable,
 and
 I
 doubt
> >> >> >> >anyone would. What can you offer to support your remarkable
 assertion
 that
> >> >> >> >information is difficult to retrieve?
> >> >> >> >
> >> >> >> I have no idea what you find remarkable.
> >> >> >
> >> >> >I already told you what I found remarkable. Can you offer any
 support
 for
> >> >> >your remarkable assertion?
> >> >> >
> >> >> >
> >> >> >> The world is full of remarkable
> >> >> >> people all finding different things remarkable. I find it
 remarkable
 that
> >> >> >> you find no value in dimensional data warehouses.
> >> >> >
> >> >> >Did I say that?
> >> >> >
> >> >>
> >> >> No, but since you make assumptions, I thought I would too.
> >> >
> >> >I suppose we all make assumptions, and I try to make mine explicit. To
 which
> >> >assumptions do you refer?
> >> >
> >> >
> >> >> >> As your opinion on this
> >> >> >> matter is already formed, and many books on the subject have
 unsuccessfully
> >> >> >> shown you the value of data warehouses, I don't believe I can
 sway
 your
 opinion
> >> >> >> on the subject.
> >> >> >
> >> >> >Did I say that?
> >> >> >
> >> >>
> >> >> ibid.
> >> >
> >> >Again, to which assumptions do you refer?
> >> >
> >> >
> >> >> >> If you would like some recommendations on books that cover
> >> >> >> the material in depth, I would be happy to provide some.
> >> >> >
> >> >> >That seems a very long way of saying you have nothing to offer to
 support
> >> >> >your remarkable assertion. Apparently, you simply assumed it was
 true
> >> >> >because you heard it somewhere, and now you are out of your depth.
> >> >> >
> >> >>
> >> >> That's quite possible. But this is fun, so I continue. Please
 reference
> >> >> the attached data model. This is the standard movies data model
 supplied
> >> >> by CA when purchasing ERwin. If I wanted a report that provided the
 amount
> >> >> of rentals customer's are making using electronic payments by store,
 the
> >> >> resulting sql would be more complex than a dimensional model built
 around
> >> >> a transaction fact table with associated dimensions for customer,
 store,
> >> >> etc.
> >> >
> >> >I don't see any attachments. Regardless, how does the requirement
 exceed
 the
> >> >capabilities and features of a snapshot?
> >> >
> >>
> >> What does this question have to do with the my previous point?
> >
> >If a simple snapshot meets the reporting requirement, why would anyone
 need
> >or want anything else? What drives the need for a different model or
 stars
> >or snowflakes or dimension tables or fact tables or anything else for
 that
> >matter? The relevance seems very clear to me.
> >
>
> Perhaps if that were you're original point. But it wasn't.

Regardless of my original point, the question relates to your previous point as requested. Now, can you answer it?

> You asked for
> examples of how it would be more difficult to retrieve data out of a
 normalized
> database compared to a dimensional model. I offered an example to which
> you replied "If a simple snapshot meets the reporting requirement, why
 would
> anyone need
> or want anything else? What drives the need for a different model or
 stars
> or snowflakes or dimension tables or fact tables or anything else for that
> matter?"

Exactly. If a snapshot meets the requirement, your example requirement offers no support for your original contention. The example does not demonstrate what was requested.

> Which just reinforces my original assumption on this whole thread.

Which assumption is that?

> But to address your second question, a simple snapshot doesn't meet the
> requirement all the time. In most cases your sourcing data from many
 different
> applications all with their own unique rules and constraints.

The rules and constraints have nothing to do with whether a snapshot meets the requirement. If one can derive the relation, one can derive the relation. Period. When exactly will a snapshot not meet the requirement? ie. In order to support your contention, you must demonstrate how it is impossible to derive a suitable relation in one or more circumstances.

> >> >> >> >> >> Additionally when the business requires enterprise level
 reporting
> >> >> >> >> >> across the various functional applications, a warehouse
 becomes
 a
 necessity.
> >> >> >> >> >> You know the scene, large company with 10 different order
 entry
 systems
> >> >> >> >> >> for each functional area of the company (i.e. cost center).
 One
 for
 small
> >> >> >> >> >> biz, one for consumer, one for z product, one for y
 product,
 etc.
 And
 of
> >> >> >> >> >> course there was never any upfront effort by a data
 management
 organization
> >> >> >> >> >> to ensure consistent representation of enterprise level
 entities.
 So
 you
> >> >> >> >> >> end up with a hodge podge set of data that must be
 abstracted
 to
 make
 consistent
> >> >> >> >> >> within a DW environment. And since you're already there go
 ahead
 and
 put
> >> >> >> >> >> it in a dimensional format so somebody can understand it
 and
 easily
 pull
> >> >> >> >> >> data/reports.
> >> >> >> >> >
> >> >> >> >> >What exactly is a dimensional format? How does it differ from
 relational
> >> >> >> >> representation?
> >> >> >> >> They are both relational. Assuming both the application
 database
 and
 the
> >> >> >> >> data warehouse are both stored in relational databases. Was
 your
 question
> >> >> >> >> directed to differences between a data warehouse database
 (let's
 say
 OLAP
> >> >> >> >> for convenience) and an application database (let's say OLTP
 for
 convenience)?
> >> >> >> >> The primary difference is purpose. OLTP must support
 consistent
 real
 time
> >> >> >> >> data. OLAP only need support historical reporting.
> >> >> >> >
> >> >> >> >So then, OLAP is just another word for "snapshot" as in "a
 derived
 stored
> >> >> >> >relation" ?
> >> >> >> >
> >> >> >> I'm sorry, I don't understand your meaning in "a derived stored
 relation".
> >> >> >
> >> >> >What part do you not understand? Do you know what a relation is? Do
 you
 know
> >> >> >the difference between something that is stored versus something
 that
 is
 not
> >> >> >stored? Do you know what it means to derive something?
> >> >> >
> >> >>
> >> >> The derived stored relation part. Yes. Yes. Yes.
> >> >
> >> >If you know what a relation is and you know what a derived relation is
 and
> >> >you know what a stored relation is, what causes you trouble with
> >> >understanding a derived stored relation? It all seems quite clear to
 me.
> >> >
> >>
> >> Great. What's your point then?
> >
> >See my original question: Does that mean OLAP is just another word
 meaning
> >"snapshot" ?
> >
>
> No. Facts can be events not just transactional data.

An event is transactional data.

[invalid argument based on false assumption snipped]

> >> >> >> I was trying to illustrate one of the basic differences between
 an
 application
> >> >> >> database and a dimensional data warehouse.
> >> >> >
> >> >> >You appear to have regurgitated meaningless pap. If you were
 illustrating
> >> >> >anything, you could offer support for your assertions.
> >> >>
> >> >> I thought it was meaningful and remarkable. Oh well. I guess in a
 nutshell,
> >> >> the only support I can offer is empirical. Most of the business
 community
> >> >> I have dealt with have difficulty navigating and understanding what
 is
 shown
> >> >> in complex, highly normalized data models.
> >> >
> >> >Why would anyone navigate something that does not require >navigation?
 The
> >> idea seems somewhat absurd to me.
> >>
> >> And to alot of other people. Some people figured out how to get around
 all
> >> the navigation.
> >
> >What navigation?
> >
> >
> >> >Where can I find the raw data for your
> >> >empirical measurements? Can you email them to me?
> >>
> >> Isn't that an oxymoron?
> >
> >No. Why would it be? Valid empiricism requires measurement. I suggest you
> >read _How we know what isn't so_ by Thomas Gilovich.
> >
>
> Thanks, I'll look it up. My empiricism is invalid.
>
> >
> >> >> In fact, most business folks
> >> >> don't care.
> >> >
> >> >I agree. Most of them accept meaningless pap without objection.
> >> >
> >> >
> >> >> They just want the application to work and they want their reports.
> >> >
> >> >Is there supposed to be some meaningful point in the above
 observation?
> >> >
> >> >
> >> >> Those same folks who previously needed the IT shop to develop and
 write
> >> >> reports for them are now using BI tools to do their own data mining,
 data
> >> >> queries against dimensional models. Why or how they can do one and
 not
 the
> >> >> other is obviously a matter for debate. I see the success of the DW
 industry
> >> >> and it's prevalence in corporate world and decide that people see
 value
 in
> >> >> the technology.
> >> >
> >> >What technology is that? Be specific: We are all engineers here.
> >>
> >> Ummm, data warehousing?
> >
> >I am afraid you are using circular logic. You have yet to describe what
> >exactly data warehousing is in any comprehensible manner as a technology.
 It
> >appears you continue to repeat meaningless pap without any real
> >comprehension yourself and you continue to find yourself beyond your
 depth.
> >
> That's classic.
>
> >
> >> >> >> And that difference was that
> >> >> >> the application database is constantly changing over time as the
 business
> >> >> >> data changes. OTOH, in the data warehouse the data is never
 updated.
> >> >> >
> >> >> >Really? Now that is a truly remarkable statement! Do you realise
 this
> >> >>
> >> >> Sweet. Now I'm making progress!
> >> >>
> >> >> >fundamentally alters every cosmogeny I have ever heard: "In the
 beginning
> >> >> >was the data warehouse..." One immediately wants to ask whence this
 eternal
> >> >> >data came, but Ockham already took care of that one.
> >> >> >
> >> >>
> >> >> Great statement! But I had to look up Ockham. You are a scholar!
 The
 jury's
> >> >> still out on the gentleman part.
> >> >
> >> >Definitely not a gentleman. The jury came back on that verdict ages
 ago.
> >> >
> >> >
> >> >> >> So
> >> >> >> a report run in a data warehouse today will be the tomorrow and
 the
 month
> >> >> >> after that.
> >> >> >
> >> >> >What would be the point in running the same report on unchanging
 data?
> >> >> >
> >> >>
> >> >> You want two copies?
> >> >
> >> >Okay. But wouldn't a photocopier or Kinkos be cheaper?
> >> >
> >> >
> >> >> End of year summary?
> >> >
> >> >Since nothing gets updated, what would be the point?
> >> >
> >>
> >> All 12 months in one report. Don't you get one of those from you CC
 company?
> >> Do you keep it or throw it away?
> >
> >If the data is not going to change, why would I wait until next month for
> >next month's report?
> >
> >
> >> >> Year to date?
> >> >
> >> >Since nothing changes, what would the year to date show? Why wouldn't
 one
> >> >just report on the data at the beginning of the year?
> >> >
> >>
> >> How about trending? Don't folks at your company want to see how sales
 of
> >> a certain product are trending on a monthly, quarterly basis?
> >
> >Trends imply change over time. Since according to you the data in the
 data
> >warehouse never changes, what meaning would a trend have?
> >
>
> I'm not sure why you're having such difficulty with this concept. Take a
> look at the stock market and rolling averages.

The stock market changes constantly. If the data warehouse is never updated, the data warehouse cannot possibly reflect the changes in the stock market.

> >> This is real
> >> easy to do with a dimensional model BTW. Try it in a normalized data
 model
> >> (if they keep history at all).
> >
> >Other than stating that the data in the data warehouse never changes, you
> >have still not described the difference between dimensional and
 normalized
> >data models. Actually, you have posted a number of mutually contradictory
> >statements so it is impossible to discern what the hell you are talking
> >about.
> >
>
> You forgot a few:
> Their requirements to model are different
> And as a result their structure is different
> And supports easier retrival

It would seem to me the requirement is to model data in both cases. You contradicted yourself regarding the data structure claiming both that the data structure is the same (ie. the relation) and that the data structure is different.
I found your comment regarding easier retrieval very remarkable, and so far you have refused to provide any support for the assertion.

As I observed, your mutually contradictory statements and refusal to back anything up make it impossible to discern what the hell you are talking about. You have convinced me you do not know what you are talking about in the first place.

> >> >> >> >> >How or why can it improve understanding? How have you
 measured
 this
 improved
> >> >> >> >> understanding?
> >> >> >> >> I didn't say that it does.
> >> >> >> >
> >> >> >> >"And since you're already there go ahead and put it in a
 dimensional
 format
> >> >> >> >so somebody can understand it and easily pull data/reports."
> >> >> >> >
> >> >> >> >It sounds to me like you did. If you did not say that, what the
 hell
 did
 you
> >> >> >> >intend to say?
> >> >> >> >
> >> >> >> Hmmm, I guess I did say that. Sorry I was busy at work and
 responded
 too
> >> >> >> quickly to the post. A fully normalized model for a large
 application
 can
> >> >> >> have 100's of entities. A dimensional model is built to support
 specific
> >> >> >> reporting and consists of a fact table and its related dimension
 tables.
> >> >> >
> >> >> >Yes, potentially hundreds of dimension tables... Does a data
 warehouse
> >> >> >comprising multiple data marts have only a single fact table or is
 it
 not
 a
> >> >> >dimensional model?
> >> >> >
> >> >>
> >> >> I guess it's possible for a single fact table to have hundreds of
 dimensions.
> >> >> I've not seen one. I usually see 5 to 15 dimensions. A
 dimensional
 model
> >> >> supports only specific reporting. As such you would need to create
 a
 fact
> >> >> for each type of reporting required.
> >> >
> >> >I see. We have lots of dimensions and lots of fact tables. How does
 one
 know
> >> >which dimensions and facts to use?
> >> >
> >>
> >> The one that you need. Is this a trick question?
> >
> >No. I guess the user would just use the entity the user needs. >What
 purpose
 does the data warehouse serve? How does it improve understanding? How have
> >you measured this improved understanding? You have yet to answer any
> >question with a meaningful answer.
>
> I disagree. But I knew we would.

With all due respect, if the people you converse with cannot discern anything meaningful from what you say, you have failed to communicate anything meaningful.

> >> >> >> Occasionally, a dimension will also have a relationship to
 another
 table
> >> >> >> resulting in the snowflake term.
> >> >> >
> >> >> >So, if we rearrange our entities into a snowflake pattern, we get
 a
 data
> >> >> >warehouse?!?
> >> >> >
> >> >>
> >> >> I'm not quite sure how you came to that conclusion. There is the
 application
> >> >> database(s) and there's the data warehouse. "Snowflaking" entities
 in
 an
> >> >> application database doesn't make a DW.
> >> >
> >> >Then what does make a DW?
> >> >
> >> >
> >> That's really a good question. Because there are lots of people who do
 data
> >> warehousing and probably really are not. Let me think about this for
 awhile
> >> and see if I can come up with the characteristics of a data warehouse
 versus
> >> an application database. What are the characteristics of your DW?
> >
> >My DW is neither more nor less than a snapshot. As such, it is neither
 more
> >nor less than a derived stored relation. I generally ignore the
 meaningless
> >pap.

Since you don't have a clue what you are talking about and are only repeating meaningless pap, I am adding you to my twit filter. Toodles. Received on Fri Oct 24 2003 - 06:13:50 CEST

Original text of this message