Table design question - variant record[sic], etc.

From: Martin Maney <maney_at_pobox.com>
Date: Mon, 17 Jan 2005 21:42:17 +0000 (UTC)
Message-ID: <cshbfp$1ev$1_at_wheel2.two14.net>



Digging through the last couple weeks' traffic from this group has given me at least some strong clues about the answers, but I'd like to run the former design past you. This is the "events" table from a scheduling system. It really does use quantized time - there's no one mapping from class period X into start and end times, as they change both from year to year and within the year when there's a half-day, assembly, etc. And it's not uncommon for the schedule for a given day to change after events have been scheduled (at least they can't seem to tell us which days the assembly will be on in time to avoid that).

This is implemented in Postgres, and as you can see some of this predates my getting it sorted out which data types (or names for types) were standard and which not (and I thought I'd banished all the int2s before now). "serial" is shorthand that both configures an auto-increment counter and makes it the default value for the column.

CREATE TABLE "events" (

"event_id" serial PRIMARY KEY,
"event_state_id" varchar(4) NOT NULL REFERENCES event_states,
"calendar_date" char(10) NOT NULL REFERENCES days,
"period" varchar(4) NOT NULL,
"facility_id" varchar(8) NOT NULL REFERENCES facilities,
"location_id" varchar(8) DEFAULT '',
"headcount" int2 REFERENCES headcounts,
"requested" timestamp DEFAULT CURRENT_TIMESTAMP,
"last_changed" timestamp DEFAULT CURRENT_TIMESTAMP,
"responsible_person" text NOT NULL,
);

[did I grab an obsolete copy of the schema? I'm sure responsible_person should be varchar, not text! Another item for my checklist...]

There may be a natural key, but at least once I've been forced by real-world usage to revise my belief about what it would be comprised of. I was surprised to find that there *can* be two classes scheduled for the same lab at the same time. It makes for a hectic period, but they will insist on doing that occasionally, so calendar_date, period, facility_id isn't enough; responsible_person would have to be added to it to make what might turn out to be, at least for as long as the scheduling system is in use, sufficently unique. So I think the serial-number as PK might be practical here - one pragmatic advantage it has proven to have is that it's a compact way to identify the set of events being edited (in the context of a RESTful web app).

Oh, and besides, the requests are sometimes made in one name and it later turns out that he just listed all the sections for the three teachers that teach sections of that course. For some reason this is mostly a problem with science classes... :-/

event_state_id is a text tag; the table it references serves both to define its domain as well as more verbose descriptions for display. facility_id, location_is, and, surprisingly, headcount are mechanically similar.

There are, it turns out, two different sorts of facilties we shall wish to track: what I call "fixed" (labs, sections of seating in the LLC) and "movable" (laptops that travel by the cart-load, LCD projectors, etc.) The design largely grew up when only "fixed" facilities were planned for, and the messy bit that follows arose out of trying to extend the model. I'm not sure which approach is less bad here...

As above is a sort of variant record (yes, actually variant row, and that's why I'm deeply suspicious about this); depending on the type of facility, either location_id (used w/movable) or headcount (probably only counted in labs, maybe in other fixed facilities) is meaningless.

The alternative is to have two tables - fixed_events and movable_events. The problem is that for a number of purposes we don't want them to be handled visbly differently. The canonical example is a teacher looking to schedule a class access for, say, internet access. They may have some preference between a lab versus getting the laptop cart to their room, but nearly allof them would want to see what was available based on "facilities that permit internet access", cutting across the fixed/movable line. So having them in separate event_* tables is at least a nuisance - applications have to deal with both tables and paper over the differences...

So I guess the two main issues are event_id vs. an awkward, compound key that may prove to be not quite unique in real world use, and the issue of a mildly variant row structure versus dealing with two almost identical tables (that will still have NULLs, thogh not quite so many overall).

-- 
To read a book, to think it over, and to write out notes
is a useful exercise; a book which will not repay some hard thought 
is not worth publishing.  -- Maria Mitchell
Received on Mon Jan 17 2005 - 22:42:17 CET

Original text of this message