Re: Dreaming About Redesigning SQL

From: Marshall Spight <mspight_at_dnai.com>
Date: Sat, 25 Oct 2003 04:14:08 GMT
Message-ID: <kMmmb.21671$Fm2.11400_at_attbi_s04>


"Mike Preece" <michael_at_preece.net> wrote in message news:1b0b566c.0310240634.24c30c30_at_posting.google.com...

>

> Can you tell me what exactly is the function of the central integrity
> enforcement in the DBMS for a single table with no parent and no
> child? What is required to be done in order to maintain integrity in
> this case?

Sure. Note that the complexity of the schema is independent of the need to enforce integrity centrally. (Although it is likely that the more complex the schema, the more integrity rules there will be.)

Also note that you've left it up to me to come up with a specific example, so I will.

Let's say you've got a hotel room booking table that participates in no relationships with any other table. (This example is somewhat artificial, but I'll try to make it as real as possible.)

Uh, attributes, let's see:

create table bookings
(
RoomNumber int, -- the room number at the hotel CreditCardNumber varchar(16), -- the credit card number to reserve the room Name varchar(200), -- the customer's name StartDate datetime,
EndDate datetime,
Occupants int, -- together with StartDate, determines price AmountCharged float -- float is a terrible choice for currency in the real world )

Integrity constraints are relevant everywhere we have an application that *updates* the database. If we just want to read, no big deal.

So let's say we have a big Java swing application for the front desk, a PHP web application for remote booking by our two booking agents, and a Python biller. (I am not necessarily endorsing any of these choices, but they do happen in the real world.)

Okay, so the most obvious kind of integrity constraint is the domain constraint for the columns. We want to ensure that the only room numbers that get used are actually ints. We wouldn't want to have a row with "hello world" as the room number, would we? That would certainly count as data corruption. Etc. etc. for the other columns: "hello world" is also problematic as a StartDate (although maybe it could be a legal Name. :-)

Other things we want to verify that are also column constraints:

Legal room numbers, which are in the range [100-130] or [200-220]. So you have a constraint

CHECK (RoomNumber >= 100 AND RoomNumber <= 130) OR (RoomNumber >= 200 AND RoomNumber <= 220)

You also want to be sure you have a valid credit card number. There are specific algorithms for this you can get from the credit card companies if they trust you.

CHECK ValidCreditCardNumber(CreditCardNumber)

and for good measure

CHECK !ReportedStolen(CreditCardNumber)

You want to be sure the name is filled in. (Omitted.)

You want to be sure the booking is sensible. Now we move into table constraints, because more than one column is mentioned. Just roughly:

CHECK EndDate > StartDate
CHECK EndDate > Now()

Also,

CHECK Occupants > 0
CHECK (
  RoomNumber >= 100
    AND
  RoomNumber <= 130
    AND
  Occupants < 4)
OR (
  RoomNumber >= 200
    AND
  RoomNumber <= 220
    AND
  Occupants < 6)
-- upstairs rooms are bigger

Note that this most recent constraint is redundant with the well-formed room constraint; that's okay.

CHECK AmountCharged >= 0. -- or whatever.

So, these declarative constraints are pretty nice. We just say what we want to make sure of, and behold! They are enforced.

If we don't have these enforced centrally, then we have to write that code in Java, again in PHP for the web app, and again in Python for the biller. You can't reuse the code, because each app is written in a different language, and each app has different update usage. (The booking app inserts; the biller updates, etc.)

If the constraints change, we have to remember where each of them is enforced, and make sure they are all always consistent, or else ouch.

Also, if we write them procedurally, we have to be sure to put the code for the check in the right place, each time. If we are sleepy and put the check immediately after the insert instead of immediately before, then ouch again. If we miss an update, then ouch again. Any of these mistakes is a hole though which data corruption can enter.

I hope this example has been very easy
to follow; I tried to make it quite simple and yet at least have some obvious
degree of utility.

Comments welcome.

Marshall Received on Sat Oct 25 2003 - 06:14:08 CEST

Original text of this message