Re: Are redundant fields ever appropriate?

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 21 Nov 2001 10:25:51 -0800
Message-ID: <c0d87ec0.0111211025.26b440c0_at_posting.google.com>


>> The join is implicit in the foreign key reference,.. <<

I think our problem is that you are talking about one particular product -- MS SQL Server -- and I am concerned about SQL the language and good design.

In some products, the joins between primary and foreign keys are actually pre-joined. You have an index that is made up of nodes that look like this:

 (index key value,
  pointer to primary key row,
  pointer to foreign key row in table #1,   ...
  pointer to foreign key row in table #n)

In Standard SQL, there is no such thing as an index, much less a clustered index.

>> Here's an example simplified from one I used recently with MS SQL
Server. It records Organisation Units as SDOU (Site/Domain/OrganisationUnit) using X.500 DNs, then records Computers belonging to those SDOUs, then records Occurrences of known Software objects on those computers. The catch is that the most important query is to summarise over a single SDOU. The query is easy to write without the non-normal SDOUID in SoftwareOccurrences, and fairly efficient to perform, but clustering the SoftwareOccurrences records by SDOU makes it more efficient. Clustering by ComputerID isn't as big a win. Any thoughts on this? <<  

>> Apologies for using SQL-Server extensions in the form of IDENTITY
fields - this reduces the keys to a manageable size. What equivalent support is there in other DMBSs (Oracle?) or in the SQL standard for this sort of thing (apart from allocating unique IDs manually)? <<

Real SQL has nothing like the IDENTITY column. It is not only proprietary, it is non-relational, not verifiable aganst the reality being modeled, and usually the result of a lazy database designer who did not want to deal with thinking about how to model his situation. I have a really long rant about the evils of IDENTITY I might post later.

There are some other matter in your DDL, besides the proprietary NULL constraints, etc. Tables are sets of things, so their names should be plural.
You have a OrgUnit_id and a Computer_OrgUnit_id; please explain what the LOGICAL difference between these attributes is. I do not think that there is one, so they should have same name. NEVER name an attribute after its PHYSICAL location in the schema.

Let me try to put this into a normalized schema. But you need some rules first. My guess is that:

  1. A computer is in one and only one organizational unit.
  2. A software package residences on one and only one computer.
  3. An Organizational unit can have multiple computers.
  4. A computer has one or more software packages.

I came up with a consistent format, shortened the names that seemed to be too long. I tired to imply that I had looked for industry standard codes.

CREATE TABLE SiteDomainOrgUnits
(org_unit_id INTEGER NOT NULL PRIMARY KEY,  org_unit_name CHAR(80) NOT NULL,

  • other columns);

CREATE TABLE Computers
(computer_serial_nbr INTEGER NOT NULL PRIMARY KEY,

  • other columns);

CREATE TABLE SoftwarePackages
(software_id INTEGER NOT NULL, -- is there a standard industry code??  software_name VARCHAR (80) NOT NULL,
 version VARCHAR(10) NOT NULL

  • other columns ...

CREATE TABLE Configurations
(org_unit_id INTEGER NOT NULL

          REFERENCES SiteDomainOrgUnits (org_unit_id)
          ON UPDATE CASCADE
          ON DELETE CASCADE,
 computer_serial_nbr INTEGER NOT NULL
                     REFERENCES Computers (computer_serial_nbr)
                     ON UPDATE CASCADE
                     ON DELETE CASCADE,
 software_id INTEGER NOT NULL UNIQUE
             REFERENCES SoftwarePackages (software_id)
             ON UPDATE CASCADE
             ON DELETE CASCADE,

 PRIMARY KEY (org_unit_id, computer_serial_nbr, software_id));

Notice that use of UNIQUE constraint on the software packages in the Configurations table, and the way the table takes care of itself with referential actions.

>> ... most important query is to summarise over a single
SiteDomainOrgUnit. <<

 SELECT org_unit_id,

        COUNT(DISTINCT computer_serial_nbr) AS computer_tally,
        COUNT(software_id) AS package_tally 
   FROM Configurations AS C1
  GROUP BY C1.org_unit_id;

Then I would worry about clustered versus unclustered, etc. Received on Wed Nov 21 2001 - 19:25:51 CET

Original text of this message