Choosing a surrogate key vs. a composite key - "holy war" issue or clear best practice?

From: <dananrg_at_yahoo.com>
Date: Sat, 28 Jun 2008 04:36:45 -0700 (PDT)
Message-ID: <9c004dc3-e8b9-4a4b-b4d9-78c2719225cd@m73g2000hsh.googlegroups.com>


Early in my career, I was accustomed to seeing simple, sequence- generated surrogate keys. Seemed clean and logical. Later in my career, I saw seemingly monstrous (to me) composite keys of up to 5 fields. Found them initially repulsive because they made queries difficult to read and write. I'm wondering if there is a clear best practice on choosing a surrogate key over a composite key--or if this is a "holy war" issue no one (collectively) currently agrees on.

Some pros to Composite primary keys:


  • They enforce uniqueness; meaning you could have a sequence-generated surrogate key, yet redundant rows may still creep into your table. They will be uniquely identifiable in that, yes, the surrogate key values are unique; but all other column values in > 1 rows could be identical.
  • Simpler in the aspect of not having to create, manage, and have programmers reference a surrogate key sequence
  • Can't think of any other Pros to composite keys. Anyone?

Some cons to Composite primary keys:


  • Composite PKs with a large # of columns make queries difficult to read and write
  • Composite PKs with a large # of columns makes referencing tables larger; e.g. a foreign key in a child table is a copy of a composite PK in a parent table. This means more data storage is required in referencing child tables and the burdens that presents.
  • Some applications, like ESRI's ArcGIS Desktop, seem not to be composite key aware, e.g. to those who know the product, you seemingly can't do a "relate" to any existing external tables that use composite keys; this makes database integration with legacy relational and current relational databases, difficult. This may be the case generally in OO programming and approaches and not at all unique to ESRI--the presumption that all primary keys are only ever composed of a single-column.

Some pros to surrogate primary keys:


  • Aside from the obvious ones, given the cons I've listed for composite keys, it seems you can have a surrogate key yet still enforce more than nominal row uniqueness by creating a unique constraint/index on the fields you'd otherwise.
  • Sometimes there's no good combination of keys for making a composite primary key

What have I missed in this debate?

I'm open minded. And though it seems I'm biased toward surrogate keys, I've found myself creating composite keys recently where either a maximum of two or three fields make a suitable composite key (there's a certain pleasure in not adding more columns than necessary; shaving Chris Dates' opponents' beards with Occam's Razor?) and/or the table I'm creating the composite key on is a "leaf" table--no other tables are likely to reference it as a parent (saving people the difficulty of reading/writing queries involving a composite pk).

If anyone knows of any good articles or book chapters fairly weighing the pros and cons of surrogate vs. composite keys, I'm all eyes. Would be interesting to see someone make the best case for one, then another author make the best case for the other.

Thanks.

Dana Received on Sat Jun 28 2008 - 06:36:45 CDT

Original text of this message