Re: On what OLAP can and what OLAP can't - A little problem

From: Alexandr Savinov <spam_at_conceptoriented.com>
Date: Wed, 13 Sep 2006 15:01:18 +0200
Message-ID: <ee8vf0$fg8$1_at_f1node01.rhrz.uni-bonn.de>


You can order your tables as follows:

LINE_A CLIENT LINE_B

      \  /      \   /
     TRANS_A    TRANS_B

If a table references another table then it is positioned below. It is a 6-dimensinoal model (2 numeric dimensions for amounts are not shown). If you formally add bottom table then it will have the following structure:

ItemA | Name | Name | ItemB | AmountItemA | AmountItemB

You see that two tables TRANS_A and TRANS_B are formally independent because they do not have a common subtable. So in order to get the desired result one can perform two steps:

  1. Produce two aggregate queries for TRANS_A and TRANS_B by grouping records and summing up the amount.
  2. Combine them into one result set using the same client name as a criterion. (However, I seriously doubt if such a result will be meaningful.)
--
http://conceptoriented.com

pamelafluente_at_libero.it schrieb:

> Hi Jan. Thanks for the input. "semijoin" ? hmm I have to take a look at
> that ...
>
> I make a first example for discussion. Let take 5 simple tables. Let's
> avoid talking about name, key issue or pathological design
> issue. Let's just focus on the abstract reporting problem.
>
> CLIENT
> Name
> U1
> U2
> U3
>
> LINE_A
> ItemA
> A_1
> A_2
> A_3
> A_4
>
> LINE_B
> ItemB
> B_1
> B_2
> B_3
>
> TRANS_A
> Name ItemA Amount
> U1 A_1 10
> U1 A_3 20
> U1 A_1 20
> U1 A_3 10
> U1 A_1 20
> U3 A_2 10
>
> TRANS_B
> Name ItemB Amount
> U1 B_1 20
> U1 B_1 10
> U1 B_3 10
> U1 B_1 20
> U1 B_2 10
> U3 B_1 20
>
>
> Assume one wants this simple report (given the above):
>
> Name | ItemA | ItemB | TotalAmountItemA | TotalAmountItemB
>
> One first question is what is the best way to create this kind of
> result. And
> for a general scheme which might incorporate similar problems what is
> the best strategy to build a query that provides the expected info.
> Other examples might include functions on "dimension" tables.
>
> -P
>
>
> Jan Hidders ha scritto:
>
>> pamelafluente_at_libero.it schreef: >> >>> Make it simple. Assume you have a few tables. Assume that there are a >>> few relationships and that for instance if you have 2 table A, B in >>> 1-N relationship, you >>> have on table A defined some function that is, say, "replication >>> sensitive", such as count (not count distinct) or sum. When whe make >>> some join of these tables and compute such functions we obtain an >>> incorrect calculation of the functions due to record replication. >>> Leader softwares such as Business Objects are able to devise some union >>> of subquery to avoid that replication problem. Experiment show that >>> such software are usually able to deal with that if functions are on >>> "fact" table (say on the N side of the relationship), but they seem to >>> have big problems when the functions are applied on the dimension table >>> (say on side 1 of relationship). >> What do you mean by "algorithm"? Do you want the SQL expression or a >> program that would implement that expression? >> >> For now, if I understand your question correctly, it seems the magic >> word you are looking for is "semijoin". There are several algorithms >> for that operator and the choice depends on several factors such as the >> presence of indexes, selectivity of join conditions,sizes of buffers, >> sizes of the involved tables, et cetera. >> >> -- Jan Hidders
>
Received on Wed Sep 13 2006 - 15:01:18 CEST

Original text of this message