Re: Normalization, Natural Keys, Surrogate Keys

From: Galen Boyer <galenboyer_at_hotpop.com>
Date: 17 May 2002 08:49:20 -0500
Message-ID: <ud6vulwdb.fsf_at_grossprofit.com>


On Fri, 17 May 2002, comedyharris_at_hotmail.com wrote:
>
> "Galen Boyer" <galendboyer_at_yahoo.com> wrote in message
> news:uit5n9t61.fsf_at_rcn.com...

>> On Fri, 17 May 2002, comedyharris_at_hotmail.com wrote:
>>
>> > 1. By making the parent id a foreign key, it seems like
>> > you're losing information in the design. More specifically,
>> > a child is related to a parent, but not dependent on it for
>> > it's existance. So here, the business requirements are not
>> > well comunicated through the design?
>>
>> I'm don't understand how this is different with natural keys?

>
> The example that Pablo gave was looking at replacing a natural
> composite key with a single surrogate key (whilst still keeping
> the composite key columns in, but with a unique constraint
> rather than a primary key constraint). I'm just saying that the
> composite key is useful for indicating that there are
> dependencies between entities. For example:
>
> Customer
> ======
> (PK) customer_id
>
> Order
> ====
> (PK) customer_id
> (PK) order_no
>
> rather than
>
> Order
> ====
>
> (PK) order_no
> (FK) customer_id
>
>
> Here we can see that an order doesn't just have to be related
> to a customer, it can't exist without the customer existing
> since the customer_id is part of it's primary key. I agree that
> this approach is questionable though.

I don't consider it questionable, at all. I just find that if I need a table to be a parent of another and the PK is multiple columns, then the actual keys to reference get unwieldy.

> If the order_no is unique, then this could be made the PK, with
> a unique constraint put on customer_id and order_no.

Well, in an order table, I would assume one customer, so the order_no would be the unique key, but I would probably put a NOT NULL constraint on the foriegn key cust_id. Therefore, that relationship is then guaranteed.

> One approach is to opt for a surrogate key if there are more
> than one combination of columns that could be a composite key.
>

>> I've found that most developers, me included, know nothing of
>> the natural keys (at least when we start our projects).  The
>> people that know the natural keys are the client.  If I start
>> off the project with the design philosophy that each table has
>> a single surrogate key and its name is very similar to the
>> table, then as we move through the project, when we talk
>> entities we can immediately envision and code the SQL.  We
>> need to join customer_table to address_table?

>
> I don't know. The votes still out with me, but I'm not sure I
> like this this 'automatic' use of surrogate keys.

Yeah, some do, some don't.

> I suppose the main reason for me is that I associate it with
> sloppyness, which might be a little unfair.

Hm... I could certainly see how it could be a way for someone to be sloppy and get away with it. I started using it because I felt it was the easiest way for me to talk with the OO guys. I had an entity and it had an id. As it started being used in development and it seemed easier to talk and think about, in my opinion. I'm developing against a natural key'd database now, and I always have to revisit the keys to make my joins. I'd rather define how the uniqueness gets maintained by the app, make sure that is solid, but when just putting tables together, especially when I'm just trying to pass through to another table,... You say that once the SQL is coded ..., I say once the uniqueness has been taken care of by the app ...

> I've seen people create autonumber keys everywhere because they
> can't be bothered to do any analysis, and they just need a key
> on the column so they can write their next line of
> code. However, I *can* see that there are clear benefits to
> this practice, but providing you do it as a design decision
> rather than a lazy kludge!
>

>> We then keep the natural keys around for defining the actual
>> unigueness and for display in the app so the customer can
>> understand what she is seeing.

>
> This sounds reasonable, do you put unique constraints on your
> 'candidate' natural key?

This has to be a requirement, or it is just a lazy kludge. I also sprinkle in NOT NULL as much as possible, of course when it is appropriate.

The app also has to take care of these keys. They are how the client views the world. This has got to be bullet-proof.

>> But, all of this surrogate talk would mean nothing if we
>> couldn't come up with the natural key when asked.  The end
>> user really matters.  I just like to try and make it as easy
>> as possible to get the end user a solid app in the time frames
>> given.  Lots of keys required to make a join happen is just
>> another roadblock in the way of achieving that goal.

>
> Hmmm. It becomes a little inpractical. I always think 'how many
> times am I going to write this SQL?' Usually, once it's done,
> it's done.

Not in development. I'm constantly joining tables to find out why something is behaving the way it is. It is this constant day-to-day on-the-spot SQL authoring that I'm trying to speed up. I can't tell you have many times I've rewritten my daily on-the-spot queries against these natural key'd tables I'm working against on my current project. All I want to know is some field's distribution in some table, and I need to join a few tables to get it because I'm starting with information at some other level. With natural keys, this researching process just takes longer. I have to include the correct natural keys in the join and I sometimes get that wrong. When I just have one key, I never get it wrong. Multiply it by everybody researching why the app is behaving the way it is, or some proc doing such and such. They also usually have to wait on SQL. These incorrect queries usually are doing some cartesian product, so the wait is longer. The person then has to analyze the output and realize he left something out. By the time all that is done, he might have seen 20 minutes or more go by.

> If you use views, you may only have to write your awquard joins
> once for selects.

Yeah, but I only code views for the app. I've never worked where we had views just for development. I've tried to make that work, but I end up always going after the underlying tables anyways. If something isn't working, I usually don't trust the view either.

> Incidentally, I've also use the surrogate key approach
> sometimes, but I'm just getting opinions as to whether it's the
> best way.

-- 
Galen Boyer
Received on Fri May 17 2002 - 15:49:20 CEST

Original text of this message