Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: <pamelafluente_at_libero.it>
Date: 17 Sep 2006 01:26:42 -0700
Message-ID: <1158481602.227224.242010_at_k70g2000cwa.googlegroups.com>


<>

Right Marshall , I replied to you instead of the group. Was using another computer and got mixed up.

To sum up, the problem makes sense when you have real world joins with several tables and fields involved. There you can have duplication of record because 1-N relationships caused by the choice (for the report) of fields that belong to "distant" tables. So in other words, this kind of "problem" might occur and one (certainly not a very expert users) may not be really aware of it. Smart tools which make queries for reports, where these aggregate function which are affected by value duplication are involved, usually do not allow this kind of duplication, or in any case give the possibility to choose. What they do, when possible, in an automatic splitting (if appropriate and possible) of the original query in a union of subqueries (can be several), in order to make sure that the computation of these function is not wrong.

My original question was about a systematic method (i was calling it "algorithm") to create this union of subqueries (I was thinking that there is an underlying partitioning of tables and relationships, as in the recursive logic I tried to sketch above). Systematic method means to me that it would be capable to formally analyze and possibly split (if necessary) any query (records are not concerned: only fields, relationships and functions).

here is the reply I accidentally emailed:

On 9/16/06, pamelafluente_at_libero.it <pamelafluente_at_libero.it> wrote:
>
> Marshall 작성:
>
> > A minimal example to demonstrate this, with two tables and four
> > columns and three rows:
>

 I am glad we are finally tuned.

>
> > Whether this is wrong or not depends on what you want. If you want
> > an aggregate for the unjoined values of T.z, then
>

That is obvious.

> > > Smart software is perfectly capable to split this query in
> > > subqueries, as also Alexander pointed out in the previous post.
> >
> > If you want to split S join T into subqueries, the first subquery is
> > "select * from S" or simply "S" and the second subquery is "T".
>

 Ok but the programmer's perspective is that of dealing with dozen or hundreds tables and hundreds or thousands field . For instance SAP  has over half million field in it. For simple cases there is no necessity of reporting software. In real world value duplication in big queries can occur without the user can see it. In such cases you  may get wrong reports, and you might get fired. I agree that this can  also be due due to poor design, but it is always helpful to have a  powerful automatic tool that helps you out.

 Modern reporting software is capable to detect that and present the  user all possible choices. It is able to split the whole set of tables

 and relationship in subsets and build all the various possible unions  of subqueries.

 Then the user can see what is going on and in case even redesign his  report or the table schema.

> > > My original question was:
> > >
> > > 1. how is this problem called in the theory (are there
> > > references to
> > > study?)
> >
> > This is not a problem. This is what the operator does. It's what it
> > is supposed to do.
>

  The problem is to make aware the user, by presenting him all the  possible way to split & partition the query.

> > > 2. what is the algorithm to do that (or is it some kind of
> > > industry secret that only few leader softwares implements?)
> >
> > Which "that" are you asking about? Join?
>

  The systematic method to build the subqueries which avoid to have  wrong result due to data replication, as for instance Business Objects

 does [http://www.businessobjects.com/] and as you can see from their  revenue they are not exactly the last in the field.

> >
> > > > What is wrong with "don't do that" as the answer?
> > >
> > > That's not allowed :)) (It's allowed to have an algorithm that
> > > warn the user about some possible problem in the design)
> >
> > The language can't read the user's mind. If the user asks a
> > perfectly valid and well-formed question of the system, the system
> > has no way of knowing that the user "really meant" something else.
>

 No, but may make aware and present alternatives.

> >
> > Marshall
>

 Thanks, time for me to go sleep. See you guys tomorrow.

 /P Received on Sun Sep 17 2006 - 10:26:42 CEST

Original text of this message