Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: Alexandr Savinov <spam_at_conceptoriented.com>
Date: Sun, 17 Sep 2006 12:43:13 +0200
Message-ID: <eej8s8$117s$1_at_f1node01.rhrz.uni-bonn.de>


pamelafluente_at_libero.it schrieb:
> Here one small example I can think of.
> The minimum number of table is probably 3 because:
> 1 table: it's impossible to duplicate values
> 2 tables in 1-N relationship: placing a function on the dimension table
> would probably be too artificious an example
>
> 3 tables
> --------------
>
> ONE
> |
> TRANS
> |
> SUBTRANS
>
> Table: ONE
> Name
> David
> Marshall
>
> Table: TRANS
> Name Item Fact
> David IT2 1
> Marshall IT1 1
> Marshall IT1 1
>
> Table: SUBTRANS
> Item SubFact
> IT1 1
> IT2 1
> IT2 1
> IT2 1
>
> Report:
> One.[Name] AS [Name],
> count(Trans.Fact) AS [FactCount],
> count(SubTrans.SubFact) AS [SubFactCount]
>
>
> David has 1 fact and Marshall 2 facts.
>
> "questionable" select:
>
> SELECT
> One.[Name] AS [Name],
> count(Trans.Fact) AS [FactCount],
> count(SubTrans.SubFact) AS [SubFactCount]
> FROM
> One,
> Trans,
> SubTrans
> WHERE
> One.[Name] = Trans.[Name] AND
> Trans.Item = SubTrans.Item
> GROUP BY
> One.[Name];

You actually have at least two problems:

  1. One is that you need to write a query that produces a correct result set. Repeated or replicated records is not a problem as well as aggregate functions which are sensitive or not sensitive. So the systematic method is just to write a correct query which returns precisely what you expect. In your examples one approach consists in using *nested* queries. An attempt to use a "concise" form like this one

One.[Name] = Trans.[Name] AND Trans.Item = SubTrans.Item

is naive and will not produce what you need. Unfortunately, SQL is not very suitable for this task. I find it easier and more "systematic" if you formulate one concrete task:

  • how to get all records from a direct or indirect subtable (SUBTRANS in your example) referencing a selected subset of records from the main table (ONE in your example)

It is one of the main procedures needed for analytical reports and if you have it (if you can implement it) then everything else is more or less easy. You simply write:

REPORT (primary table: ONE)

Field 1: Name
Field 2: count( deproject("TRANS->Name","Fact") )
Field 3: count( deproject("TRANS->Item->Name","Fact") )

Here function deproject takes two arguments: - a string starting with the name of some subtable followed by a sequence of column names leading to the main table. - column name from the subtable

2. The second problem is about translating a user-friendly specification of the report to its formal specification. In particular, you wanted to translate a list of arbitrary fields taken from any table to a formal specification of the expected (meaningful) result. (How to express this formal specification in SQL or any other language is already another problem.) This problem already can be characterized as a theoretical and requiring a systematic approach. In contrast to the first problem above it is more difficult and cannot be solved by simply encoding something in SQL.

> -P
>

--
http://conceptoriented.com
Received on Sun Sep 17 2006 - 12:43:13 CEST

Original text of this message