Re: Surrogate primary key plus unique constraint vs. natural primary key: data integrity?

From: Jan Hidders <hidders_at_gmail.com>
Date: Fri, 15 Mar 2013 01:03:43 +0100
Message-ID: <5142655f$0$599$e4fe514c_at_dreader34.news.xs4all.nl>


On 2013-03-14 10:55:42 +0000, robur.6_at_gmail.com said:

> On Thursday, March 14, 2013 1:38:36 AM UTC+2, Jan Hidders wrote:

>> Sure. But that does not mean that all arguments in favor of surrogate> 
>> identifiers are therefore complete nonsense, even if most of the 
>> people> that favor them tend to be crappy data modelers with a weak 
>> grasp of> the importance of having proper integrity constraints. :-) 
>> I'd like to> see a bit more precise analysis on when and why they are a 
>> problem,> rathern then a naive sweeping generalization that declarese 
>> them as> always evil.
>> 
>> -- Jan Hidders

>
>
> I summarized my pros and cons on surrogates:
> Pros:
> - Stability
> - "Convenience" to use
> - Performance (?)

Doubtful in my experience. Programmers tend to think that since the join condition becomes simpeler the joins become faster, but that is usually not the bottle neck. There can be a observable gain if the size of many records decreases, which might typically happen for cross-reference tables representing relationships. But significant gains are rare in my experience.

> Cons:
> - Difficult to synchronize with external systems.

Not so much if there is also a natural key.

> - You need more joins (performance?)

Really? Why you think that? One typically would have the same number of tables in the schema, so why would we need more joins?

> - You lose the simplicity and power of overlapping composite foreign
> keys constraints (discussed in this thread).

Using surrogate keys does not imply that you cannot use composite foreign keys.

> - In some cases you lose the ability to manipulate data as a set (see bellow*).
>
>
> * If you need the newly allocated value you have to process data one
> row at time and use some system functions such as "get_next" and
> "get_current". Things get complicated if you need the value twice,
> because you have to use "get_next" first time and "get_current" further
> on. Since you can’t control the order of evaluation of expression terms
> you have to rely on some procedural language.
>
> If you have an alternate natural key, you can manipulate data as a set,
> but have to do it in two steps, first insert, then retrieve the
> surrogate values by using the natural key. In practice you may also
> need a temporary table to avoid constraint violations.

Fair enough, although set-operations are rare at insertion-time in my experience. But indeed, if you are inserting two linked records you have to do some extra work. Could still be done in just two INSERTs, though. :-)

> As can be seen, two "pros" are disputable, and two "cons" are very
> subtle (and many users may never encounter in practice). So ultimately
> is a matter of tradeoff.

Indeed it is.

  • Jan Hidders
Received on Fri Mar 15 2013 - 01:03:43 CET

Original text of this message