Pizza, anyone? Was:Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: David Cressey <dcressey_at_verizon.net>
Date: Fri, 15 Sep 2006 18:56:31 GMT
Message-ID: <zHCOg.14$zs6.4_at_trndny07>


<pamelafluente_at_libero.it> wrote in message news:1158224084.007892.62800_at_d34g2000cwd.googlegroups.com...
> I 'd like to repropose my original problem which got lost among vain
> fights
> and pollution.
>
> I am infact trying to write as an exercise a "reporting" program. Or
> more
> precisely a program which is capable to generate queries useful to make
> reports.
>
> This can be useful in OLAP systems as well as for classical dbms's.
>
>
> I'd like to get 2 pieces of information:
>
> 1. First what is the relevant literature and how this problem is
> called in the theory
> 2. Most importantly, I need the actual "algorithm", that is a
> systematic way to produce
> these queries
>
>
> I report here my part of my original posting and some answer.
>
> ---
> Assume you have a few tables. Assume that there are a
> few relationships and that for instance if you have 2 table A, B in
> 1-N relationship, you have on table A defined some function that is,
> say, "replication
> sensitive", such as count (not count distinct) or sum. When whe make
> some join of these tables and compute such functions we obtain an
> incorrect calculation of the functions due to record replication.
> Many softwares such as Business Objects are able to devise some union
> of subquery to avoid that replication problem. Experiment show that
> such software are usually able to deal with that if functions are on
> "fact" table (say on the N side of the relationship), but they seem to
> have big problems when the functions are applied on the dimension table
>
> (say on side 1 of relationship).
>
>
> Would you be able to suggest an optimal split/union algorithm for this
> situation. I do have some "empirical" ideas on how to do that in
> general,
> but I would like to confront it with you.
>
> ---
>
> I make a first example to start discussion. Let take 5 simple tables.
> Let's
> avoid talking about name, key issue or pathological design
> issue. Let's just focus on the abstract reporting problem.
>
>
> CLIENT
> Name
> U1
> U2
> U3
>
>
> LINE_A
> ItemA
> A_1
> A_2
> A_3
> A_4
>
>
> LINE_B
> ItemB
> B_1
> B_2
> B_3
>
>
> TRANS_A
> Name ItemA Amount
> U1 A_1 10
> U1 A_3 20
> U1 A_1 20
> U1 A_3 10
> U1 A_1 20
> U3 A_2 10
>
>
> TRANS_B
> Name ItemB Amount
> U1 B_1 20
> U1 B_1 10
> U1 B_3 10
> U1 B_1 20
> U1 B_2 10
> U3 B_1 20
>
>
> Assume one wants this simple report (given the above):
>
>
> Name | ItemA | ItemB | TotalAmountItemA | TotalAmountItemB
>
>
> One first question is what is the best way to create this kind of
> result. And
> for a general scheme which might incorporate similar problems what is
> the best strategy to build a query that provides the expected info.
> Other examples might include functions on "dimension" tables.
>
> -----
>
>
> Alexandr Savinov's reply
>
> You can order your tables as follows:
>
> LINE_A CLIENT LINE_B
> \ / \ /
> TRANS_A TRANS_B
>
>
> If a table references another table then it is positioned below. It is
> a
> 6-dimensinoal model (2 numeric dimensions for amounts are not shown).
> If
> you formally add bottom table then it will have the following
> structure:
>
>
> ItemA | Name | Name | ItemB | AmountItemA | AmountItemB
>
>
> You see that two tables TRANS_A and TRANS_B are formally independent
> because they do not have a common subtable. So in order to get the
> desired result one can perform two steps:
>
>
> 1. Produce two aggregate queries for TRANS_A and TRANS_B by grouping
> records and summing up the amount.
>
>
> 2. Combine them into one result set using the same client name as a
> criterion. (However, I seriously doubt if such a result will be
> meaningful.)
>
> --------
>
>
> Hi Alexandr ,
>
> That's fine and correct. But I need the general logic (that's is what I
>
> meant for "algorithm") to deal with *any* arrangement of tables and
> relationship. Again my perspective is a programmer's one. One or more
> istances of a problem are good for discussions and counterexamples, but
>
> I need the general logic to deal with any set of "tables" (can be views
>
> or whatever actually) and relationships.
>
>
> The information I assume to be known is:
>
>
> - table structure for each table
> - relationships
> - functions applied to each field and their properties(row function/
> aggregate function / replication sensitive or not)
>
>
> Then the "algorithms" (a systematic method to perform this task) must
> be able to make automatically the partioning and union of subqueries,
> according to some logic, which is the object of my question.
>
>
> PS
>
>
>
> > LINE_A CLIENT LINE_B
> > \ / \ /
> > TRANS_A TRANS_B
>
>
> 1. I am not sure to understand this notation. What do you mean by
> "reference". You mean is in "relationship with" ?
>
> Intuitively I see it as (I am "inventing" a representation)
>
>
> m / 1 LINE_A
> 1 / m TRANS_A
> CLIENT
> 1 \ m TRANS_B m \ 1
> LINE_B
>
>
> 1/m means the kind of relationship (one to many).
>
>
> On the observation that this may not be very easy to do in general,
> I noticed that:
>
> All decent reporting programs are able to deal with any database.
> Since I was thinking to create as an exercise a reporting tool, I was
> trying
> to understand how to do that. Programs must be able to deal with
> anything.
>
>
> A relatively naive reporting program would probably spit out something
>
> like:
>
>
> SELECT
> C.[Name] AS [Name],
> sum(T1.Amount) AS [Tot_A],
> sum(T.Amount) AS [Tot_B],
> T.ItemB AS [ItemB],
> T1.ItemA AS [ItemA]
> FROM
> Trans_A T1,
> Client C,
> Trans_B T
> WHERE
> C.[Name] = T1.[Name] AND
> C.[Name] = T.[Name]
> GROUP BY
> C.[Name],
> T.ItemB,
> T1.ItemA;
>
>
> where instead, if I understand your suggestion, it would be better to
> give:
>
>
> SELECT
> C.[Name] AS [Name],
> sum(T1.Amount) AS [Tot_A],
> NULL AS [Tot_B],
> NULL AS [ItemB],
> T1.ItemA AS [ItemA]
> FROM
> Trans_A T1,
> Client C
> WHERE
> C.[Name] = T1.[Name]
> GROUP BY
> C.[Name],
> T1.ItemA;
> UNION
>
>
> SELECT
> C.[Name] AS [Name],
> NULL AS [Tot_A],
> sum(T.Amount) AS [Tot_B],
> T.ItemB AS [ItemB],
> NULL AS [ItemA]
> FROM
> Trans_B T,
> Client C
> WHERE
> C.[Name] = T.[Name]
> GROUP BY
> C.[Name],
> T.ItemB;
>
>
> or perhaps the possibility to execute the 2 subqueries separately
>
>
> I need a systematic method to split set of relationships and make the
> subqueries.
>
> -P
>

I'm going to suggest that you take a look in a topic discussed in this newsgroup a year or two ago, concerning how to report on pizza orders, where each pizza ordered can have a set of cheeses (like mozarella, romano, etc.) and an independent set of toppings (like pepperoni, mushroom, etc.)

There is a remarkable similarity between the pizza order scenario and the example you provided us here to start the discussion.

Where you have a table named "Clients" the pizza example had a table named "orders". Your "LINE A" and "LINE B" are like "CHEESES" and "TOPPINGS".

And your "TRANS A" and "TRANS B" Correspond to two multivalued fields in the order record, that each provide a list of cheeses and toppings for every order.

The only difference is the two fields called "amount", which didn't exist in the pizza example. But that's a relatively trivial difference.

Dawn was kind enough to show us all how a simple multivalued file was able to pull up a report very much like the one you outlined, with none of the difficulties attendant on the use of the relational model and relational joins. In particular, by avoiding joins, the MV model would almost surely avoid the replication that your totals are sensitive to.

I am sure you and Dawn could carry on a spirited discussion on the subject of MV as compared with RM. You seem to have a great deal in common.

You both have a sharp eye for keeping things simple for the programmer, and on avoiding the inflexibilities that go with structure. That seems to be the thing that us poor benighted data architects and theoreticians always lose sight of, to hear programmers tell it.

I look forward to a lively and entertaining dialogue. Received on Fri Sep 15 2006 - 20:56:31 CEST

Original text of this message