Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: <pamelafluente_at_libero.it>
Date: 15 Sep 2006 03:00:05 -0700
Message-ID: <1158314405.126558.316070_at_b28g2000cwb.googlegroups.com>


Alexandr Savinov ha scritto:

> pamelafluente_at_libero.it schrieb:
This is why to have a meaningful use case is so important.

Thanks Alexandr,

  I like your suggestion and I want to work on this direction.

[About the meaningful part I still believe that is irrelevant. My purpose is to implement something that gives the best output given a certain input. It's up then to the user to judge if this makes sense or not. But the program must be anyway work. It cannot be involved in semantics.]

>
> > The input of the problem are:
> >
> > - Tables (in abstract sense, as a list of field) and a list of their
> > field including datatype
> > - Relationships
> > - Structure (list of fields) of the report with properties of involved
> > functions:
> > - Function is aggregate: Yes/No
> > - Function is sensitive to replication (eg. sum): Yes/no
> >
> > That's it. Simple to state.
>
> I like your problem formulation. But in this form it hardly can be
> solved. And the main reason is that providing "a list of fields" as a
> report specification is in most cases ambiguous, i.e., it cannot be
> translated into one query unambiguously.
>
> A solution to your problem could be as follows:
>
> 1. Choose one *primary* table with entities listed in the report.
>
> Notice 1.1 No fields are chosen yet from this and other tables.
>
> Notice 1.2 If this table does not exist then it has to be somehow
> defined, e.g., as a view or via query.

I have begun doing some attempts. I think the first step is to attain a "partioning" of the relationships so that we are able to identify the tables which will go in each subqueries. The final subqueries clearly might have tables in common.

I was thinking to proceed as follows:

  1. Identify all the tables which have at least one field in the report which has applied on it a replication sensitive aggregate function and that are, say, on side 1 of a 1-N relationship
  2. For each ot the above tables Start from that table and "navigate" recursively through relationships 1-N until you find on side N a table which has a field (a pure dimension) in the report. When you find it stop, cutting out that table.
  3. At this point you should have a partion of tables and relationships. By construction, within this partition there cannot be issues due to record replication: because a 1-N situation with an used field on side N is not possible.
  4. Now to build the subqueries, for each partition for each table in the partition which have at least one field in the report which has applied on it a replication sensitive aggregate function navigate recursively through relationship always in direction N-1 until you cannot go farther (for instance you find 1-N) *allowing to exit from partition*. Make a union all of these subqueries.

But what I cannot believe is that this problem has not been systematically settled up by someone and we must be reinventing this wheel. As it seems to me that is the most basic issue one finds out when extracting data from a database. Isn't this a known problem in theory with some name attached to it ? There must be literature references somewhere ...

-P

>
> 2. Define properties of the primary entities by indicating fields from
> other tables.
>
> 2.1 Properties may have 3 major types: from supertables via dimension
> (column) names, from subtables via inverse dimensions (columns taken in
> the opposite direction), zigzag properties which combine the previous
> two types.
>
> Notice 2.2 Sometimes specifying a field is not enough and it is
> necessary to provide a hint for choosing an appropriate access path.
>
> Notice 2.3 An aggregation function can be applied to properties
> returning a collection (see 2.1.2 and 2.1.3)
>
> This approach can be more or less easily implemented but here I would
> like to warn you that the main problem is not here. It appears when you
> start imposing restrictions in your report (say, via WHERE and HAVING
> clause in SQL). You may easily get cyclic dependencies which need to be
> somehow resolved. But it is already another story...
>
> >> -P
> >
>
> --
> http://conceptoriented.com
Received on Fri Sep 15 2006 - 12:00:05 CEST

Original text of this message