Re: Design question regarding data typing

From: Jacob JKW <jacobcdf_at_yahoo.com>
Date: 24 Feb 2006 14:50:48 -0800
Message-ID: <1140821448.590154.109070_at_v46g2000cwv.googlegroups.com>

Bob Hairgrove wrote:
> On 24 Feb 2006 11:55:47 -0800, "Jacob JKW" <jacobcdf_at_yahoo.com> wrote:
>
> >Bob Hairgrove wrote:
> >> On 24 Feb 2006 09:39:43 -0800, "Jacob JKW" <jacobcdf_at_yahoo.com> wrote:
> >>
> >> >Sure. I was trying to explain that one column of my Insitutions table
> >> >was the foreign key InstitutionTypeId which indexes the name (and
> >> >description) of the type of institution (which in this simplification
> >> >can be either "school" or "bank"). I have this in addition to separate
> >> >Banks and Schools tables. The reason for using the InstitutionType
> >> >descriptor (or as I gather it's called from Roy's post above,
> >> >"discriminant") is so that if I want summary instituition data from
> >> >Insitutions I can easily figure out the institution type without having
> >> >to scan for each InstutionId foreign key of interest in Banks Schools.
> >> >And that's the kludge that's causing me such consternation. :-( It just
> >> >reeks of bad form.
> >>
> >> Yes, InstitutionTypeId is redundant data if you already have separate
> >> tables as described. I don't see the need for it unless for some
> >> reason you want to avoid the join or subquery which would be necessary
> >> without it should you want to group or sort by schools or banks. A
> >> simple natural join of two tables on their primary key columns is
> >> going to perform very fast in every RDBMS that I know of ... you'll
> >> never have to do a full table scan for that.
> >OK. But I don't actually see how I could accomplish this with a "simple
> >natural join". To me it seems I'd have to use a union. Let's say I've
> >got this:
> >
> >CREATE TABLE buildings (
> > building_id VARCHAR(100) NOT NULL,
> > building_color VARCHAR(100) NOT NULL,
> > building_type VARCHAR(100) NOT NULL,
> > -- other columns...
> > CONSTRAINT pk_buildings PRIMARY KEY (building_id));
> >
> >CREATE TABLE schools (
> > building_id VARCHAR(100) NOT NULL,
> > school_name VARCHAR(100) NOT NULL,
> > -- other columns...
> > CONSTRAINT pk_schools PRIMARY KEY (building_id),
> > CONSTRAINT uk_schools UNIQUE (school_name),
> > CONSTRAINT fk_schools FOREIGN KEY (building_id)
> > REFERENCES buildings (building_id));
> >
> >CREATE TABLE banks (
> > building_id VARCHAR(100) NOT NULL,
> > BIC VARCHAR(10) NOT NULL,
> > -- other columns...
> > CONSTRAINT pk_banks PRIMARY KEY (building_id),
> > CONSTRAINT uk_banks UNIQUE (BIC),
> > CONSTRAINT fk_banks FOREIGN KEY (building_id)
> > REFERENCES buildings (building_id));
> >
> >Now let's say all I'm looking for is a table of building_id's,
> >building_color's, and building_type's. That's just:
> > SELECT building_id, building_color, building_type FROM buildings;
>
> What does selecting the building_type column buy you here? Are you
> going to do a "group by" on it? Filter it as in "where
> building_type='school'"?
>
> >But let's say that I get rid of the troublesome building_type column:
> > ALTER TABLE buildings DROP COLUMN building_type;
> >
> >Then the simplest way I can think to accomplish the same thing would be
> >with the union:
> > (
> > SELECT buildings.building_id, building_color, "school" AS
> >building_type
> > FROM buildings INNER JOIN schools
> > ON buildings.building_id=schools.building_id
> > ) UNION (
> > SELECT buildings.building_id, building_color, "bank" AS
> >building_type
> > FROM buildings INNER JOIN banks
> > ON buildings.building_id=banks.building_id
> > );
> >
> >Which gets even more absurd if I have a large number of building_types.
> >
> >I guess what I'd *like* to have would be a building_types table which
> >along with a short text description of the specific building type
> >contained a pointer to the appropriate subtype table (banks, or
> >schools, or hospitals, or whatever). But I have no idea of this is even
> >possible in standard SQL.
>
> The union poses no real problems. I still think we need to know what
> purpose having the label "school" or "bank" will serve.
Let's say I'm looking for building-level attributes of some subset of buildings types (and understand that in reality there are more than just two different building types).

> Because as
> soon as you need to have just one additional column from the
> specialized table (i.e. schools or banks) in addition to the building
> data, you will have to write a join -- and that makes your
> building_type column instantly redundant.
Your point is very well taken here, but the business-level fact is that there would be no reason to simulatenously require "specialized table" attributes from more than one "specialized table". (Of course I might well need to determine the same building-level attributes of multiple building types.)

I guess the point I'm clumsily trying to make is that the building_type attribuite isn't there to avoid the JOIN when I'm trying to get specialized data on a single building type, it's there to avoid the UNIONs when I'm trying to get building-level data on multiple building types.

Anyway, this is why I think of the building_type attribute as a "good" thing.

> Besides, there are several other problems with having such a type_id
> column:
>
> (1) You cannot enforce the fact that a new building will get the
> appropriate type information inserted without using triggers. There
> are no built-in constraints that let you enforce something like: "IF
> EXISTS (record in schools WHERE new.building_id=schools.building_id)
> THEN (new.building_type := 'school') ELSE -- etc." Once it is in the
> table, you have to constantly watch over that column to make sure that
> no one is mischievous and updates it with the wrong value. Of course,
> you could use a before update trigger to accomplish that, or forbid
> all direct DML statements and provide an API with stored proceduresto
> do it. Sometimes, for certain situations, there is no other way. But
> it's a big hassle, hard to get right, and ... still, it's redundant
> data, regardless of how you look at it.
>
> (2) If there is a value in the building_type column, but no
> corresponding record in the appropriate table, you will get the wrong
> results if you rely solely on building_type! Using a join you will get
> consistent data. This is probably the strongest argument against
> having it.
You've definitely hit the nail on the head here. These two points sum up my issues with the building_type more accurately and coherently than I ever could. Those reasons are exactly why I'm trying to *avoid* the building_type attribute.

I'm just trying to come up with a way to address all these complications.

And thanks much your comments, Bob. Received on Fri Feb 24 2006 - 23:50:48 CET

This message: [ Message body ]
Next message: mAsterdam: "Re: Database design"
Previous message: Mark Johnson: "Re: cdt glossary 0.1.0 [Data]"
In reply to Bob Hairgrove: "Re: Design question regarding data typing"
Next in thread: Bob Hairgrove: "Re: Design question regarding data typing"
Reply: Bob Hairgrove: "Re: Design question regarding data typing"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message