Re: Design question regarding data typing

From: Bob Hairgrove <invalid_at_bigfoot.com>
Date: Fri, 24 Feb 2006 22:24:41 +0100
Message-ID: <1dtuv1hfn48t6hmf7ss2in2vl7187kivbk_at_4ax.com>


On 24 Feb 2006 11:55:47 -0800, "Jacob JKW" <jacobcdf_at_yahoo.com> wrote:

>Bob Hairgrove wrote:
>> On 24 Feb 2006 09:39:43 -0800, "Jacob JKW" <jacobcdf_at_yahoo.com> wrote:
>>
>> >Sure. I was trying to explain that one column of my Insitutions table
>> >was the foreign key InstitutionTypeId which indexes the name (and
>> >description) of the type of institution (which in this simplification
>> >can be either "school" or "bank"). I have this in addition to separate
>> >Banks and Schools tables. The reason for using the InstitutionType
>> >descriptor (or as I gather it's called from Roy's post above,
>> >"discriminant") is so that if I want summary instituition data from
>> >Insitutions I can easily figure out the institution type without having
>> >to scan for each InstutionId foreign key of interest in Banks Schools.
>> >And that's the kludge that's causing me such consternation. :-( It just
>> >reeks of bad form.
>>
>> Yes, InstitutionTypeId is redundant data if you already have separate
>> tables as described. I don't see the need for it unless for some
>> reason you want to avoid the join or subquery which would be necessary
>> without it should you want to group or sort by schools or banks. A
>> simple natural join of two tables on their primary key columns is
>> going to perform very fast in every RDBMS that I know of ... you'll
>> never have to do a full table scan for that.
>OK. But I don't actually see how I could accomplish this with a "simple
>natural join". To me it seems I'd have to use a union. Let's say I've
>got this:
>
>CREATE TABLE buildings (
> building_id VARCHAR(100) NOT NULL,
> building_color VARCHAR(100) NOT NULL,
> building_type VARCHAR(100) NOT NULL,
> -- other columns...
> CONSTRAINT pk_buildings PRIMARY KEY (building_id));
>
>CREATE TABLE schools (
> building_id VARCHAR(100) NOT NULL,
> school_name VARCHAR(100) NOT NULL,
> -- other columns...
> CONSTRAINT pk_schools PRIMARY KEY (building_id),
> CONSTRAINT uk_schools UNIQUE (school_name),
> CONSTRAINT fk_schools FOREIGN KEY (building_id)
> REFERENCES buildings (building_id));
>
>CREATE TABLE banks (
> building_id VARCHAR(100) NOT NULL,
> BIC VARCHAR(10) NOT NULL,
> -- other columns...
> CONSTRAINT pk_banks PRIMARY KEY (building_id),
> CONSTRAINT uk_banks UNIQUE (BIC),
> CONSTRAINT fk_banks FOREIGN KEY (building_id)
> REFERENCES buildings (building_id));
>
>Now let's say all I'm looking for is a table of building_id's,
>building_color's, and building_type's. That's just:
> SELECT building_id, building_color, building_type FROM buildings;

What does selecting the building_type column buy you here? Are you going to do a "group by" on it? Filter it as in "where building_type='school'"?

>But let's say that I get rid of the troublesome building_type column:
> ALTER TABLE buildings DROP COLUMN building_type;
>
>Then the simplest way I can think to accomplish the same thing would be
>with the union:
> (
> SELECT buildings.building_id, building_color, "school" AS
>building_type
> FROM buildings INNER JOIN schools
> ON buildings.building_id=schools.building_id
> ) UNION (
> SELECT buildings.building_id, building_color, "bank" AS
>building_type
> FROM buildings INNER JOIN banks
> ON buildings.building_id=banks.building_id
> );
>
>Which gets even more absurd if I have a large number of building_types.
>
>I guess what I'd *like* to have would be a building_types table which
>along with a short text description of the specific building type
>contained a pointer to the appropriate subtype table (banks, or
>schools, or hospitals, or whatever). But I have no idea of this is even
>possible in standard SQL.

The union poses no real problems. I still think we need to know what purpose having the label "school" or "bank" will serve. Because as soon as you need to have just one additional column from the specialized table (i.e. schools or banks) in addition to the building data, you will have to write a join -- and that makes your building_type column instantly redundant.

Besides, there are several other problems with having such a type_id column:

(1) You cannot enforce the fact that a new building will get the appropriate type information inserted without using triggers. There are no built-in constraints that let you enforce something like: "IF EXISTS (record in schools WHERE new.building_id=schools.building_id) THEN (new.building_type := 'school') ELSE -- etc." Once it is in the table, you have to constantly watch over that column to make sure that no one is mischievous and updates it with the wrong value. Of course, you could use a before update trigger to accomplish that, or forbid all direct DML statements and provide an API with stored proceduresto do it. Sometimes, for certain situations, there is no other way. But it's a big hassle, hard to get right, and ... still, it's redundant data, regardless of how you look at it.

(2) If there is a value in the building_type column, but no corresponding record in the appropriate table, you will get the wrong results if you rely solely on building_type! Using a join you will get consistent data. This is probably the strongest argument against having it.

(3) When a bank is sold, what happens to the building? Let's say the school district buys the building and it becomes a school. How much easier is it to keep things in order with the building_type column as opposed to without it?

I'm sure there must be more reasons...but I'm done for now.

--
Bob Hairgrove
NoSpamPlease_at_Home.com
Received on Fri Feb 24 2006 - 22:24:41 CET

Original text of this message