Re: data modeling confessions, part 2
Date: Tue, 20 Oct 2009 08:56:57 -0700 (PDT)
A long time ago when I thought I already knew something about data but hadn't yet worked with databases -- telling in itself -- I had the opportunity to spread my views on file formats to a bunch of nerds. Of course I already knew that no such discussion would be complete without at least mentioning normalization. So I decided to illustrate it in passing by an example. Since obviously the goal of normalization is to remove redundancy, the first example naturally was pushing strings into a lookup table and linking to them via their index in it. Now, perhaps I could have gotten away with that had a gone on to actually factor something into 3NF, but no. That was the only example. I don't think Hail Maries quite cut that one, eventhough in a very limited sense the part about redundancy is true.
Half a decade later I was designing one of my first schemas. Now I actually knew what normal forms and integrity were about. But it seems old habits die hard. The result was a beautifully constrained and normalized base...with every single entity keyed by surrogate. I was happy and went my merry way.
A couple of years ago I decided to see how the base was doing nowadays. It was humming along nicely. The surrogates bothered me a bit, but since they did simplify a couple of practical things and indeed compressed the base (to its then whopping 20MB), I thought everything was fine. Until I realized two minor details. First, I hadn't been stern enough with the web developers, so they'd of course put my surrogates on the user visible reports, and now they'd leaked into the wild. Ooops. And second, it suddenly dawned on me why the rest of the stuff seemed to be in order: during development I had indeed been strict enough in some things. There was not a single null in the schema...nor a single table auto generating surrogates which wasn't also constrained by a unique key other than the surrogate.
There are no free lunches, but there is such a thing as karmic debt. I think that goes a long way towards explaining why I now work with the internals of a Siebel instance. In case you haven't heard, that's the product whose trainer asks on the top of the introductory course whether anybody present is a DBA. In case one is found, he's told he's not going to like what he's about to see. One of those things is that everything, and I mean *everything* is keyed by surrogates, and in one particular table not a single field besides that surrogate is declared NOT NULL. Of course, my debt being what it is, we're not quite done yet. You see, our setup is the central hub for a number of independent Siebel installations. We distribute, deduplicate and publish data between them. That means that each of the systems has its own, identically formatted, and overlapping in range surrogate for each of the objects. Since our system is supposed to have something to do with master data, it of course additionally generates its own, authoritative, master ID for each new object. Which isn't to say that some of the affiliate systems wouldn't have their own, autogenerated, user visible keys in addition to what Siebel does under the covers. The result is that the maximum number of separate autogen keys for a single object I've seen thus far has been in the vicinity of ten.
My title is global data administrator. It basically means I'm responsible for the quality and uniqueness of the data.
-- SampoReceived on Tue Oct 20 2009 - 17:56:57 CEST