Re: question about queries

From: James S. Wolfe 913-4379 <jim_wolfe_at_sra.com>
Date: 1996/11/25
Message-ID: <57cfbc$bom_at_inetserv2.fs.lmco.com>#1/1


In article <Pine.SOL.3.95.961118170129.24926A-100000_at_saltmine.radix.net>, Jason Grace <jgrace_at_saltmine.radix.net> writes:
|> Lets say you have 3 tables, table A table B and table C.
|>
|> Say table A has info for different id's such as salary, name etc, on a
|> one line per person basis. say for each line, there corresponds
|> multiple lines in table B for that id, and similarly for table c.
|>
|> Now say tables A B and C have values valA, valB, and valC respectively
|> as fields (for instance salary, payments), and I want to sum up these
|> values for a given month.
|>
|> What is the fastest most efficient way to do this correctly? Say for
|> id '01' John Doe, valA=500 and there are 3 rows in table B with valB=500,
|> and 10 rows in table C with valC =500 for this same guy. My result should
|> give :
|> sum(valA)=500, sum(valB)=1500, sum(valC)=5000
|> but the select statement:
|> SELECT SUM(VALA), SUM(VALB), SUM(VALC) FROM A, B, C
|> WHERE A.ID=B.ID AND B.ID=C.ID AND MONTH=..
|> would give the result of:
|> 30*500=5000, 30*500=5000, 30*500=5000 for valA, valB and valC
|> respectively.
|>
|> Summing each one individually or in subloops would take much longer than
|> one query, so what's a neat way to do it really quick? Thanks.
|>
|> Jason, jgrace_at_radix.net, http://www.wavecon.com
|>

I believe you have hit on one of the limitation of the relational model vs. other models, e.g., hierarchical. What you describe is a database design with two independent, multi-valued dependencies: A ->> B and A ->> C. The way that the relational model represents the independence of two values is to provide a cross-product of their values. Since B and C are independent of one another, a join of A, B, and C would produce results with a cross product of rows from B and C. One way you might address this is to construct a union of two queries, one for the join from A to B and one for the join from A to C:

select <columns from A>, <columns from B>, <nulls for columns from C> from A, B
where <join between A and B>
union
select <columns from A>, <nulls for columns from B>, <columns from C> from A, C
where <join between A and C

The other problem I see is what I call the hierarchical agregation problem. In your case, you have a numeric value in A, valA, that you only want included one for each row from A. You might try modifying the above as follows:

select <A columns except for Aval = null>, <B columns>, <C nulls> from A, B
where <join A to B>
union
select <A columns except for Aval = null>, <B nulls>, <C columns> from A, C
where <join A to C>
union
select <A.key, A.Aval>, <nulls for everything else>

The problem with the above query is that detail data for a given key value for A will be broken out, some rows return the B values, some return the C values and another will return the numeric A values. However, aggregate functions on those values should return the correct values. Hope this helps.
Jim Wolfe, SRA, Intl.
(visit web site http://www.sra.com for job opportunities) Received on Mon Nov 25 1996 - 00:00:00 CET

Original text of this message