Re: teaching relational basics to people, questions
Date: Fri, 4 Dec 2009 11:35:27 -0800 (PST)
On Nov 21, 6:24 pm, Roy Hann <specia..._at_processed.almost.meat> wrote:
> The most persistent (and most common) reason people incorporate nullable
> columns into designs is because they have a misplaced desire to minimize
> the number of tables in the design, and think that conflating multiple fact
> types in one table is clever, efficient, and harmless.
(Apologies for the delay.) That is absolutely correct. That is the general mistake, and my particular example (validity time spans) was just one, if recurring, instance of the more general class. Here, too, we refuse to consider relations with and without an end date two separate types of objects, and then conflate them into one relation with nulls allowed in the end date.
I think there are two aspects to this problem. The first and obvious one is the psychological: people do have immense problems getting a hold on a data model with lots of separate entities. E.g. think of SAP or the foundation schemata Teradata peddles, with their thousands of relation types. Trying to sort through something like that, e.g. to update it, is a real pain and you can never be quite certain you've actually taken everything necessary into consideration.
But then there's the second, more subtle aspect: it is quite possible that the problem is in sparse metadata and/or in the substandard tools/ interfaces used to access it, so that it could in fact be solved with the right set of tools and methodology/process. I mean, envision the metadata heaven where we have all of the formal semantics of our data model neatly encoded within the database catalog, and further, we have tools which automatically, graphically, and in a procedure oriented way guide our way through all of the possible dependencies, remembering for us which ones we've already covered. In that case there is no possibility that you could have forgotten something once you're done. The hassle in learning the necessary semantic information, and *only* the necessary information, is minimal, because all of the formal aspects are recorded within the database and your tools show it to you on the fly, plus any linkages to the outside, semantic world are also documented in a human readable and easily accessible form.
In this ideal world, when you're about to handle one of those thousand relations, you just click a column in the database client and out pops a representation of the full generalization hierarchy of this particular type. Want a semantic description of your entity? One click more. Want the constraints? Another click, and voilà, it gives you every single one of them in a nice chart alongside the top and bottom parts of the type lattice, so that with one click more, you can get the human readable version which maps the constraint to the external model/business process equivalent that was used to derive the formal constraint in the first place. Want to see where the particular domain you're handling is used? One click. Want to see the domain definition itself? Click.
Want to find a suitable, existing domain for your new relation, beginning with a blank slate? Your database/client of course easily handles a mixture of both formal and free form queries, at the level of specificity that rivals Google, so that when you put in something like "person company (cardinality>10) (last access within two years) (list all relations with conflicting keys)", in a couple of seconds you can find every single existing relation in the system which might have something to do with your problem domain, in a well-ordered query result form. Plus of course you should be able to dig in from the perspective of a humanly composed ontology, or three, of everything that is in the system, ODP style.
That sort of thing is of course a pipe dream at the moment, but then quite a lot could already be done to organize database schemata using existing technology. I mean, even a freely manipulable selection list of "my relations" which retains a history of every relation you've touched, with metadata on what you have done to them, would be a real asset. Or when we talk about this typical case of relations related by contraction of nulls, a UI side tool which lets us work with views of otherwise identical relations, some of which contain "null, value not applicable" values while updating multiple underlying tables/views, would take 95% out of the steam to not work with more than one relation at the same time.
So I think this thing is mostly about UI design and formal conceptual organization of the data model, and not about any intrinsic cognitive limitation (which do exist as well: I for one don't think there is a single person who could reliably spot an irreducible, ten component decomposition into PJ/NF).
-- SampoReceived on Fri Dec 04 2009 - 20:35:27 CET