Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news-han1.dfn.de!news-koe1.dfn.de!news.uni-bonn.de!not-for-mail
From: Alexandr Savinov <spam@conceptoriented.com>
Newsgroups: comp.databases.theory
Subject: Re: Real world issue: How to "split" queries in presence of record
 replication and replication sensitive aggregate functions ?
Date: Sun, 17 Sep 2006 12:43:13 +0200
Organization: HRZ - University of Bonn (Germany)
Lines: 107
Message-ID: <eej8s8$117s$1@f1node01.rhrz.uni-bonn.de>
References: <1158224084.007892.62800@d34g2000cwd.googlegroups.com>   <45094ae2.98.41@news2>   <1158238742.439337.121720@e3g2000cwe.googlegroups.com>   <1158245952.200358.109740@b28g2000cwb.googlegroups.com>   <eedppj$pkq$1@f1node01.rhrz.uni-bonn.de>   <1158314405.126558.316070@b28g2000cwb.googlegroups.com>   <eedv3d$l3u$1@f1node01.rhrz.uni-bonn.de>   <1158323706.211856.230060@i42g2000cwa.googlegroups.com>   <1158369429.438007.258820@b28g2000cwb.googlegroups.com>   <1158401826.131193.54000@e3g2000cwe.googlegroups.com>   <1158419725.399846.65610@k70g2000cwa.googlegroups.com>   <1158420999.000849.10290@d34g2000cwd.googlegroups.com>   <1158424664.022587.300380@m73g2000cwd.googlegroups.com> <1158436853.002537.22720@d34g2000cwd.googlegroups.com>
NNTP-Posting-Host: ip010184.vpn.iai.uni-bonn.de
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: f1node01.rhrz.uni-bonn.de 1158489800 34044 131.220.10.184 (17 Sep 2006 10:43:20 GMT)
X-Complaints-To: abuse@uni-bonn.de
NNTP-Posting-Date: 17 Sep 2006 10:43:20 GMT
User-Agent: Thunderbird 1.5.0.7 (Windows/20060909)
In-Reply-To: <1158436853.002537.22720@d34g2000cwd.googlegroups.com>
Xref: dp-news.maxwell.syr.edu comp.databases.theory:44955

pamelafluente@libero.it schrieb:
> Here one small example I can think of.
> The minimum number of table is probably 3 because:
> 1 table: it's impossible to duplicate values
> 2 tables in 1-N relationship: placing a function on the dimension table
>   would probably be too artificious an example
> 
> 3 tables
> --------------
> 
> ONE
>   |
> TRANS
>    |
> SUBTRANS
> 
> Table: ONE
> Name
> David
> Marshall
> 
> Table: TRANS
> Name        Item  Fact
> David	  IT2   1
> Marshall      IT1   1
> Marshall      IT1   1
> 
> Table: SUBTRANS
> Item   SubFact
> IT1        1
> IT2        1
> IT2        1
> IT2        1
> 
> Report:
>        One.[Name] AS [Name],
>        count(Trans.Fact) AS [FactCount],
>        count(SubTrans.SubFact) AS [SubFactCount]
> 
> 
> David has 1 fact and Marshall 2 facts.
> 
> "questionable" select:
> 
> SELECT
>        One.[Name] AS [Name],
>        count(Trans.Fact) AS [FactCount],
>        count(SubTrans.SubFact) AS [SubFactCount]
> FROM
>        One,
>        Trans,
>        SubTrans
> WHERE
>        One.[Name] = Trans.[Name] AND
>        Trans.Item = SubTrans.Item
> GROUP BY
>        One.[Name];

You actually have at least two problems:

1. One is that you need to write a query that produces a correct result 
set. Repeated or replicated records is not a problem as well as 
aggregate functions which are sensitive or not sensitive. So the 
systematic method is just to write a correct query which returns 
precisely what you expect. In your examples one approach consists in 
using *nested* queries. An attempt to use a "concise" form like this one

One.[Name] = Trans.[Name] AND Trans.Item = SubTrans.Item

is naive and will not produce what you need. Unfortunately, SQL is not 
very suitable for this task. I find it easier and more "systematic" if 
you formulate one concrete task:

- how to get all records from a direct or indirect subtable (SUBTRANS in 
your example) referencing a selected subset of records from the main 
table (ONE in your example)

It is one of the main procedures needed for analytical reports and if 
you have it (if you can implement it) then everything else is more or 
less easy. You simply write:

REPORT (primary table: ONE)
Field 1: Name
Field 2: count( deproject("TRANS->Name","Fact") )
Field 3: count( deproject("TRANS->Item->Name","Fact") )

Here function deproject takes two arguments:
- a string starting with the name of some subtable followed by a 
sequence of column names leading to the main table.
- column name from the subtable


2. The second problem is about translating a user-friendly specification 
of the report to its formal specification. In particular, you wanted to 
translate a list of arbitrary fields taken from any table to a formal 
specification of the expected (meaningful) result. (How to express this 
formal specification in SQL or any other language is already another 
problem.) This problem already can be characterized as a theoretical and 
requiring a systematic approach. In contrast to the first problem above 
it is more difficult and cannot be solved by simply encoding something 
in SQL.

> -P
> 

--
http://conceptoriented.com