Re: SQL Bashing - the Sport of Champions!
Date: Wed, 18 Feb 2015 05:15:03 -0800 (PST)
Message-ID: <18295932-0c88-4038-b725-10bd95ceb339_at_googlegroups.com>
> On Wednesday, 18 February 2015 20:56:00 UTC+11, Derek Asirvadem wrote:
Dieter's correction has prompted me to examine this again.
> > Why SELECT DISTINCT but UNION ALL?
The database you are operating on is supposed to be Relational. Relational prohibits duplicate rows (note, rows, not records). SQL cannot force you to implement unique rows, same as awk cannot force you to implement Relational Keys that are not duplicated (awk is easier, though). Therefore the assumption is that the rows in your tables are unique. Therefore SELECT always shows exactly the rows you selected, including dupes, otherwise you would not see them, and DISTINCT is an option to tell it to suppress dupes.
The one and only method of eliminating dupes in the base relations (tables), to achieving that Relational demand, is to implement an unique index. It is not SQLs job to do that for you, just as it is not awks. Something that the Dates and Darwens and Abitebouls of the world simply do not understand, they need a wet nurse.
SELECT rows is not an aggregation.
For exactly the same reasoning (p1) ... but the context for UNION is different ... since UNION is an aggregation of sets, and you are supposed to have the SELECTs that feed it in good working order, etc, you don't want dupes. If you do have dupes, it is not a gross error, such as dupes in base relations (tables), but either (a) dupes in the SELECTS, ie. derived relations, xor (b) dupes across the SELECTs, which are much harder to control and eliminate. Aggregation and other operations would return the incorrect results if dupes were included. So the default is to exclude those dupes, and ALL is an option to show them.
Cheers