Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: Marshall <marshall.spight_at_gmail.com>
Date: 16 Sep 2006 15:40:37 -0700
Message-ID: <1158446437.422510.209930_at_d34g2000cwd.googlegroups.com>


pamelafluente_at_libero.it wrote:
>
> [records are unimportant. In any case, in practice there can be
> transactions where one buys the same item and spend the same money.

In practice, those two invoices will have different invoice ids.

> Here it "seems" record are equal because field key for trans are
> missing. But that's irrelevant to our purpose. I wanted to avoid
> discussion about contents because really immaterial here]

It's okay to leave some stuff out, but it's not okay to leave essential things out. The fact that sets do not have duplicates is essential, from both a practical and theoretical perspective.

> > So your question is how to handle the repetition of
> > values from TRANS when aggregating over a column
> > in TRANS with a non-idempotent aggregate?
>
> The duplication I am talking about is the one caused by joining tables
> in 1-N relationship.

A minimal example to demonstrate this, with two tables and four columns and three rows:

S(x,y)



1, a
2, a

T(y, z)



a, b

S join T(x, y, z)



1, a, b
2, a, b

b appears twice in S join T but it only appears once in T.

(This can also be illustrated with two tables, one column each, one row each.)

> When you make the join ONE-TRANS-SUBTRANS the
> values in TRANS are replicated due to the relationship 1-N with
> SUBTRANS . So (non idempotent) functions computed on TRANS consider the
> same values more times they should (duplicated values).

Yes. The result of count(z) from T will not be equal to count(z) from (S join T).

> This yeld a wrong result in the report for these functions.

Whether this is wrong or not depends on what you want. If you want an aggregate for the unjoined values of T.z, then do the aggregate. Don't take the aggregate over some other set of numbers.

> Smart software is perfectly capable to split this query in subqueries,
> as also Alexander pointed out in the previous post.

If you want to split S join T into subqueries, the first subquery is "select * from S" or simply "S" and the second subquery is "T".

> My original question was:
>
> 1. how is this problem called in the theory (are there references to
> study?)

This is not a problem. This is what the operator does. It's what it is supposed to do.

> 2. what is the algorithm to do that (or is it some kind of industry
> secret that only few leader softwares implements?)

Which "that" are you asking about? Join?

> > What is wrong with "don't do that" as the answer?
>
> That's not allowed :)) (It's allowed to have an algorithm that warn
> the user about some possible problem in the design)

The language can't read the user's mind. If the user asks a perfectly valid and well-formed question of the system, the system has no way of knowing that the user "really meant" something else.  

Marshall Received on Sun Sep 17 2006 - 00:40:37 CEST

Original text of this message