Re: Are redundant fields ever appropriate?
Date: 21 Nov 2001 10:25:51 -0800
Message-ID: <c0d87ec0.0111211025.26b440c0_at_posting.google.com>
>> The join is implicit in the foreign key reference,.. <<
I think our problem is that you are talking about one particular product -- MS SQL Server -- and I am concerned about SQL the language and good design.
In some products, the joins between primary and foreign keys are actually pre-joined. You have an index that is made up of nodes that look like this:
(index key value,
pointer to primary key row,
pointer to foreign key row in table #1,
...
pointer to foreign key row in table #n)
In Standard SQL, there is no such thing as an index, much less a clustered index.
>> Here's an example simplified from one I used recently with MS SQL
Server. It records Organisation Units as SDOU
(Site/Domain/OrganisationUnit) using X.500 DNs, then records Computers
belonging to those SDOUs, then records Occurrences
of known Software objects on those computers. The catch is that the
most important query is to summarise over a single SDOU. The query is
easy to write without the non-normal SDOUID in SoftwareOccurrences,
and fairly efficient to perform, but clustering the
SoftwareOccurrences records by SDOU makes it more efficient.
Clustering by ComputerID isn't as big a win. Any thoughts on this? <<
>> Apologies for using SQL-Server extensions in the form of IDENTITY
fields - this reduces the keys to a manageable size. What equivalent
support is there in other DMBSs (Oracle?) or in the SQL standard for
this sort of thing (apart from allocating unique IDs manually)? <<
Real SQL has nothing like the IDENTITY column. It is not only proprietary, it is non-relational, not verifiable aganst the reality being modeled, and usually the result of a lazy database designer who did not want to deal with thinking about how to model his situation. I have a really long rant about the evils of IDENTITY I might post later.
There are some other matter in your DDL, besides the proprietary NULL
constraints, etc. Tables are sets of things, so their names should be
plural.
You have a OrgUnit_id and a Computer_OrgUnit_id; please explain what
the LOGICAL difference between these attributes is. I do not think
that there is one, so they should have same name. NEVER name an
attribute after its PHYSICAL location in the schema.
Let me try to put this into a normalized schema. But you need some rules first. My guess is that:
I came up with a consistent format, shortened the names that seemed to be too long. I tired to imply that I had looked for industry standard codes.
CREATE TABLE SiteDomainOrgUnits
(org_unit_id INTEGER NOT NULL PRIMARY KEY,
org_unit_name CHAR(80) NOT NULL,
- other columns);
CREATE TABLE Computers
(computer_serial_nbr INTEGER NOT NULL PRIMARY KEY,
- other columns);
CREATE TABLE SoftwarePackages
(software_id INTEGER NOT NULL, -- is there a standard industry code??
software_name VARCHAR (80) NOT NULL,
version VARCHAR(10) NOT NULL
- other columns ...
CREATE TABLE Configurations
(org_unit_id INTEGER NOT NULL
REFERENCES SiteDomainOrgUnits (org_unit_id) ON UPDATE CASCADE ON DELETE CASCADE, computer_serial_nbr INTEGER NOT NULL REFERENCES Computers (computer_serial_nbr) ON UPDATE CASCADE ON DELETE CASCADE, software_id INTEGER NOT NULL UNIQUE REFERENCES SoftwarePackages (software_id) ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY (org_unit_id, computer_serial_nbr, software_id));
>> ... most important query is to summarise over a single
SiteDomainOrgUnit. <<
SELECT org_unit_id,
COUNT(DISTINCT computer_serial_nbr) AS computer_tally,
COUNT(software_id) AS package_tally
FROM Configurations AS C1
GROUP BY C1.org_unit_id;
Then I would worry about clustered versus unclustered, etc. Received on Wed Nov 21 2001 - 19:25:51 CET