Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sun, 17 Sep 2006 17:20:55 GMT
Message-ID: <XtfPg.21718$9u.254600_at_ursa-nb00s0.nbnet.nb.ca>


Marshall wrote:

> pamelafluente_at_libero.it wrote:
>

>>[records are unimportant. In any case, in practice there can be
>>transactions where one buys the same item and spend the same money.

>
>
> In practice, those two invoices will have different invoice ids.
>
>
>
>>Here it "seems" record are equal because field key for trans are
>>missing. But that's irrelevant to our purpose. I wanted to avoid
>>discussion about contents because really immaterial here]

>
>
> It's okay to leave some stuff out, but it's not okay to leave
> essential things out. The fact that sets do not have duplicates
> is essential, from both a practical and theoretical perspective.
>
>
>
>>>So your question is how to handle the repetition of
>>>values from TRANS when aggregating over a column
>>>in TRANS with a non-idempotent aggregate?
>>
>>The duplication I am talking about is the one caused by joining tables
>>in 1-N relationship.

>
>
> A minimal example to demonstrate this, with two tables and four columns
> and three rows:
>
> S(x,y)
> -----
> 1, a
> 2, a
>
> T(y, z)
> -----
> a, b
>
> S join T(x, y, z)
> ----------------
> 1, a, b
> 2, a, b
>
> b appears twice in S join T but it only appears once in T.
>
> (This can also be illustrated with two tables, one column each,
> one row each.)
>
>
>
>>When you make the join ONE-TRANS-SUBTRANS the
>>values in TRANS are replicated due to the relationship 1-N with
>>SUBTRANS . So (non idempotent) functions computed on TRANS consider the
>>same values more times they should (duplicated values).

>
>
> Yes. The result of count(z) from T will not be equal to count(z) from
> (S join T).
>
>
>
>>This yeld a wrong result in the report for these functions.

>
>
> Whether this is wrong or not depends on what you want. If
> you want an aggregate for the unjoined values of T.z, then
> do the aggregate. Don't take the aggregate over some other
> set of numbers.
>
>
>
>>Smart software is perfectly capable to split this query in subqueries,
>>as also Alexander pointed out in the previous post.

>
>
> If you want to split S join T into subqueries, the first subquery
> is "select * from S" or simply "S" and the second subquery is "T".
>
>
>
>>My original question was:
>>
>>  1. how is this problem called in the theory (are there references to
>>study?)

>
>
> This is not a problem. This is what the operator does. It's
> what it is supposed to do.
>
>
>
>>  2. what is the algorithm to do that (or is it some kind of industry
>>secret that only few leader softwares implements?)

>
>
> Which "that" are you asking about? Join?
>
>
>
>>>What is wrong with "don't do that" as the answer?
>>
>> That's not allowed :)) (It's allowed to have an algorithm that warn
>>the user about some possible problem in the design)

>
>
> The language can't read the user's mind. If the user asks
> a perfectly valid and well-formed question of the system,
> the system has no way of knowing that the user "really
> meant" something else.

In other words, no matter how proud Pamela is of her ignorance, the primary role of a dbms is not to compensate for ignorance. The primary role of a dbms is to manage data as instructed.

It's obvious to any reasonably intelligent person that if one doesn't want the effects of a join, one doesn't do a join. Likewise, if one doesn't want the effects of a restrict, one doesn't do a restrict. etc. I fact, I would expect it to seem rather obvious even to people of rather modest intelligence.

Does anyone remember the "I hate it when I do that" sketches from SNL? Received on Sun Sep 17 2006 - 19:20:55 CEST

Original text of this message