Re: On what OLAP can and what OLAP can't - A little problem

From: <pamelafluente_at_libero.it>
Date: 13 Sep 2006 06:43:48 -0700
Message-ID: <1158155028.310463.179620_at_i42g2000cwa.googlegroups.com>


Hi Alexandr ,

That's fine and correct. But I need the general logic (that's is what I meant for "algorithm") to deal with *any* arrangement of tables and relationship. Again my perspective is a programmer's one. One or more istances of a problem are good for discussions and counterexamples, but I need the general logic to deal with any set of "tables" (can be views or whatever actually) and relationships.

The information I assume to be known is:

Then the "algorithms" (a systematic method to perform this task) must be able to make automatically the partioning and union of subqueries, according to some logic, which is the object of my question.

PS
> LINE_A CLIENT LINE_B
> \ / \ /
> TRANS_A TRANS_B

  1. I am not sure to understand this notation. What do you mean by "reference". You mean is in "relationship with" ?

Intuitively I see it as (I am "inventing" a representation)

                                           m / 1    LINE_A
                1 / m    TRANS_A
  CLIENT
                1 \ m    TRANS_B    m \ 1
                                                        LINE_B

1/m means the kind of relationship (one to many).

2. Not familiar with the term "dimension" in the way you use it (my fault). How do you count to 6?

-P

Alexandr Savinov ha scritto:

> You can order your tables as follows:
>
> LINE_A CLIENT LINE_B
> \ / \ /
> TRANS_A TRANS_B
>
> If a table references another table then it is positioned below. It is a
> 6-dimensinoal model (2 numeric dimensions for amounts are not shown). If
> you formally add bottom table then it will have the following structure:
>
> ItemA | Name | Name | ItemB | AmountItemA | AmountItemB
>
> You see that two tables TRANS_A and TRANS_B are formally independent
> because they do not have a common subtable. So in order to get the
> desired result one can perform two steps:
>
> 1. Produce two aggregate queries for TRANS_A and TRANS_B by grouping
> records and summing up the amount.
>
> 2. Combine them into one result set using the same client name as a
> criterion. (However, I seriously doubt if such a result will be meaningful.)
>
> --
> http://conceptoriented.com
>
> pamelafluente_at_libero.it schrieb:
> > Hi Jan. Thanks for the input. "semijoin" ? hmm I have to take a look at
> > that ...
> >
> > I make a first example for discussion. Let take 5 simple tables. Let's
> > avoid talking about name, key issue or pathological design
> > issue. Let's just focus on the abstract reporting problem.
> >
> > CLIENT
> > Name
> > U1
> > U2
> > U3
> >
> > LINE_A
> > ItemA
> > A_1
> > A_2
> > A_3
> > A_4
> >
> > LINE_B
> > ItemB
> > B_1
> > B_2
> > B_3
> >
> > TRANS_A
> > Name ItemA Amount
> > U1 A_1 10
> > U1 A_3 20
> > U1 A_1 20
> > U1 A_3 10
> > U1 A_1 20
> > U3 A_2 10
> >
> > TRANS_B
> > Name ItemB Amount
> > U1 B_1 20
> > U1 B_1 10
> > U1 B_3 10
> > U1 B_1 20
> > U1 B_2 10
> > U3 B_1 20
> >
> >
> > Assume one wants this simple report (given the above):
> >
> > Name | ItemA | ItemB | TotalAmountItemA | TotalAmountItemB
> >
> > One first question is what is the best way to create this kind of
> > result. And
> > for a general scheme which might incorporate similar problems what is
> > the best strategy to build a query that provides the expected info.
> > Other examples might include functions on "dimension" tables.
> >
> > -P
> >
> >
> > Jan Hidders ha scritto:
> >
> >> pamelafluente_at_libero.it schreef:
> >>
> >>> Make it simple. Assume you have a few tables. Assume that there are a
> >>> few relationships and that for instance if you have 2 table A, B in
> >>> 1-N relationship, you
> >>> have on table A defined some function that is, say, "replication
> >>> sensitive", such as count (not count distinct) or sum. When whe make
> >>> some join of these tables and compute such functions we obtain an
> >>> incorrect calculation of the functions due to record replication.
> >>> Leader softwares such as Business Objects are able to devise some union
> >>> of subquery to avoid that replication problem. Experiment show that
> >>> such software are usually able to deal with that if functions are on
> >>> "fact" table (say on the N side of the relationship), but they seem to
> >>> have big problems when the functions are applied on the dimension table
> >>> (say on side 1 of relationship).
> >> What do you mean by "algorithm"? Do you want the SQL expression or a
> >> program that would implement that expression?
> >>
> >> For now, if I understand your question correctly, it seems the magic
> >> word you are looking for is "semijoin". There are several algorithms
> >> for that operator and the choice depends on several factors such as the
> >> presence of indexes, selectivity of join conditions,sizes of buffers,
> >> sizes of the involved tables, et cetera.
> >>
> >> -- Jan Hidders
> >
Received on Wed Sep 13 2006 - 15:43:48 CEST

Original text of this message