Re: Ambiguous column names in subselects

From: Malcolm Dew-Jones <yf110_at_vtn1.victoria.tc.ca>
Date: 17 Sep 2007 18:55:42 -0800
Message-ID: <46ef301e$1@news.victoria.tc.ca>

fitzjarrell_at_cox.net (fitzjarrell_at_cox.net) wrote:

: I wouldn't be including 'counter' in my result set since you're using
: it as a limiting condition on the number of rows returned, [...]

: SELECT a, b, c
: FROM (
: SELECT ta.a, tb.b, ta.c, ROWNUM as counter
: FROM A ta, B tb
: WHERE tb.c = ta.c -- proper join condition here, else cartesian
: product
: ORDER BY ta.a
: )
: WHERE counter <= 25;

As a general rule, I have to disagree with that.

Columns that are used for filtering should be left in the result set unless there is a need to remove them. This makes is easier for the original developer and subsequent maintainers to confirm the query is selecting the anticipated data (i.e. it is easier to confirm the query is correct).

Leaving the limiting data in the result set also makes it easier for a maintenance programmer to examine the data when creating ways to test any changes that are required.

Consider a query that has an over all structure like

	select columns,including,limiters
	from table(s)
	where 
		join conditions
	and major category conditions
	and specific conditions such as ID=:the_id

This query can be run with three different levels of specificity by cut and paste with no edits.

1 Cut and paste including just the join condition and you see all available data to confirm the join condition is correct (null data? outer joins? etc etc).

2 Cut and paste including the category test and you see all the data that would be useful for running tests.

3 Finally, run the entire query to ensure the data seen in step two is correctly filtered by the final conditions..

I'm not dogmatic about this, but I find that is a useful habit. A compromise when there are loads of columns and you really want very few, is to leave them in as comments. On a few occasions I have even found it useful to select everything and then wrap that in an outer query that targets the one column you want. Something like

	select max(the_date) into my_date from
	(
	select lots of columns used during development
	from tables
	where complicated calculation.
	-- order by the_date
	)

As a real example, in one application, on various occasions after an upgrade the customer questioned the result of a particular complicated data calculation. The date query is written in the form shown above. When the customer questioned the results it was trivial to run the inner portion of the query, with no changes required, and they could see all the involved data leading up to the final (maximum) date. That allowed them to easily confirm the calculation was indeed correct. I did not structure the query that way for them, but it was a useful side effect of my rule of thumb from above.

$0.10 Received on Mon Sep 17 2007 - 21:55:42 CDT

Re: Ambiguous column names in subselects - how resolve?