Re: foundations of relational theory?
Date: 20 Oct 2003 21:27:55 -0700
Message-ID: <6db906b2.0310202027.58324c36_at_posting.google.com>
"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:<bn15ap$g1a$1_at_gazette.almaden.ibm.com>...
> "Dawn M. Wolthuis" <dwolt_at_iserv.net> wrote in message
> news:6db906b2.0310191644.13b47642_at_posting.google.com...
> [snip]
> > While this is not an academic statement of why PICK works nicely, it
> > might give some hints on why folks who have put their dollars into the
> > SQL-based RDBMS world can become born-again when they see the
> > difference in dollars needed for a comparable PICK system.
>
> Mind if I take it as an academic statement?, and use it to give hints at why
> an MV system will be less useful than a relational system everything else
> being equal.
Also, since IBM has the top-selling MV databases, I hope there are some good IBM discussions on the relative merits of the various data persistence approaches that IBM now sells. I would LOVE to see a breakdown of cost and revenue for IBM on each of its databases. Where would DB2 be compared to Universe, for example? How about IMS? Informix? Is the better bang for the buck for IBM in the relational model or in the non-RDBMS?
>
> > Again, it is not that PICK is flawless (by any stretch), but as a
> > basis for moving forward, I'd sure rather start with this big bang for
> > the buck implementation than any SQL-based RDBMS I've seen. And, yes,
> > I know this is a theory forum and not necessarily for opinions based
> > on practice, so I'll go back to the claim, even though not fleshed
> > out, that persisting data based on language -- such as modeling entire
> > propositions together rather than piecing them apart to the extent
> > done in an RDBMS -- makes sense because we are not trying to persist
> > mathematical relations ultimately -- we are trying to persist
> > propositions.
>
> Guess what, "trying to persit propostions" is *exactly* what the relational
> model is all about. A tuple (row) is a fact - a proposition.
>
Yes -- did you think I pulled that phrase out of my never-studied-this-subject-before-spouting-off-on-it head? smiles. The tuples in 1NF rows, however, are typically pieces of a proposition that has been split apart in order to normalize it.
> > For example, "Jane Doe has three kids -- John, George, & Paul -- and
> > also three cars -- a 1967 Mustang Fastback and a 1968 VW Bug, but the
> > car she usually drives is her other car -- a 2002 Ford Thunderbird".
> > A lousy sentence, but easy to image on a form. This sentence/form is
> > about a single person -- Jane Doe
>
> Any why, prey, is that sentence not also about the person John (or George or
> Paul)? Or equally it is not about 2002 Ford Thunderbirds?
I'll answer with a question (as obnoxious as that can be) -- why is the form that Jane Doe filled out filed under Jane Doe in a paper system -- just because we don't want to use a scissors to cut up the form and file different pieces in different places? Sure, there could be questions that would cause us to have to walk through each form in each folder, but with computers, we can actually cross-reference all this -- indexing the children if useful, for example. So, yup, there are queries that might take a different tact on the data than the way we captured the data, but for the most part, filing the form that Jane Doe filled out in the Jane doe folder, in the PEOPLE filing cabinet would be a very logical thing to do, don't you think? It seems it would be more logical to the average human being (and, yup, that's me) than sticking the car information in another filing cabinet and writing "see car mnop in the auto filing cabinet" on Jane's form. There might be times when this makes the most sense, but doing this with every bloomin' piece of data that Jane filled in if the cardinality of the data COULD be greater than 1 -- that is surely not intuitive, is it?
>
> The point of the relational model is that it is *democratic* - i.e. all data
> is treated equally. Cars have people, people have cars. We do not bias
> ourselves one way or another. Children are no more (or less) important that
> Parents.
Yup, that is definitely a good point -- the relational model does not retain the same value for the various pieces of propositions that the person stating the proposition did. It elevates less important information to the same status as the subject of a proposition, for example. If Jane Doe fills out a form, then "Jane" simply is more important data than the make of her third car. Relational databases have a skewed perspective on political correctness. Jane's children are important as they relate to Jane, but we don't have any reason to believe at this point that they have importance (to our org) on their own. If or when we do, then they can have their own forms filled out, but until then, keep the original proposition together.
>
> We would say that your single sentence above is in need of normalisation
> precisely because it favours some data over some others.
> Now if some data is truly and always more important than some other data, so
> that say you are only ever interested in the *set of kids* that a person
> has, then sure model them as a set valued attribute, but otherwise go the
> extra mile and make kids (and cars) first class citizens.
why? Just 'cause it sounds politically correct? Because if we service customers today, we might also think of their car on an even plane with them when we start servicing cars in the future? The idea might be to protect us from any future changes to applications by treating all data as equal, but then we find that we still need to make changes to the database -- we simply cannot guess every possible change we might have, so making it easy to change (providing an agile database management environment) is a better strategy than trying to fix all data as if there is some pure understanding of that data that we can capture, outside of any context.
Also, in a 1NF data structure, attributes whose values could be many are treated VERY differently than those that we think will remain single-valued. Why such discrimination in a democratic data society?
> > and would all be filed in a single
> > "folder" in PICK except, possibly, for code files used for validation
> > -- there could be a code file for makes & models of cars.
> > Depending on the application, it might make sense to have 4 "forms" in
> > this folder -- one for each person: Jane and each of her kids or we
> > might decide it isn't important to treat the kids as separate "filed"
> > persons in our system at this point. If they are filed as separate
> > people, then the Jane Doe record (form/document/proposition) would
> > have a multivalued foreign key (pointer) to each of the child
> > (literally!) records, else it would store the actual data, such as
> > first names of each child.
> >
> > Playing the game with language to parse it out, split it into many
> > fragments (often based on the nouns) for the purpose of storing it,
> > only to need to retrieve it again as a whole, doesn't gain us
> > anything, on the face of it.
>
> It gains your data a flat playing field. It allows you to formulate queries
> about (in you example) people who happen to be kids as easily formulating
> queries about people who happen to have kids.
> The relational model encourages this flat, democratic, playing field.This
> ideal has sometimes caused people to say that non-flat data (to speak very
> loosely) has no place in the relational model. I.e. that 'multi-valued (e.g.
> relation/set valued) attributes are not allowed.
> Nowadays we (and I think I speak for most relational advocates here) allow
> relation valued attributes in the relational model. SQL implementations
> however, I hardly need say, have (mostly) not caught up.
I'm glad to see you are one of the new relational proponents who recognize relations as possible attributes. SQL-99 has provisions for that, but it seems that very few of those who have SQL-92 capatibility care about upgrading. Now, take your extended relational model (with relations as possible attributes) and add in a language that that works with it and toss out the standard that thinks attributes should be fixed in length. You might want to also make the relations more like mathematical relations by giving each attribute a location within the "row". Then you are getting very close. But when I see XML docs persisted in Oracle as BLOBS, it almost makes me weep (that might be a hormone issue too, but ...). It is definitely the case that today the relational model is evolving faster than the MV model (and both need to), so maybe eventually the RDBMS will be where it needs to be, but until then, I'll still be frugal (aka stewardly) and put my money on the bigger bang for the buck approaches to data persistence.
>
> However, we would still generally caution against an over reliance on using
> 'multi-values' in a relation database. If used 'badly', we begin to loose
> the flat playing field. Relation valued attributes are OK. Short hands for
> 'multivalued foreign key' constraints would make me nervous, and
> (infinitely) recursive relation valued attributes seem to be particularly
> troublesome.
The ability to make changes when needed, minimizing costs & risks in the process, is important. My experience with RDBMS's (primarily Oracle, but brief encounters with others) is that such changes tend to be a big deal. Average cost of an MV shop to prepare for Y2K = ? Average cost of an Oracle shop = ??
> I would urge those who wish to understand more of these issues to read Chris
> Date's paper "What First Normal Form Really Means" (It will cost you
> though )
>
> http://www.dbdebunk.com/page/page/629796.htm
>
I would read it, but I've paid for several Date books and decided that instead perhaps he would like to exchange it for the opportunity to read my paper on "Why it Isn't Important to Persist Data in First Normal Form" (you're right -- it is still just in my brain)
>
> One thing the relational guys might learn from the experiences of MV
> systems, is a better idea of when relational valued attributes are OK and
> when they are not. I'm sure you guys know when MV attributes get misused for
> example.
>
Yes, I have seen some good dialog related to this topic. I would
guess that XML folks might have had such discussions as well.
However, I think that putting a relational valued attribute in an
RDBMS has risks and costs that are different from persisting relations
within records in an MV system. So, some tips should translate (such
as just how deep do you want to go -- designing with relations within
a record/row is OK and then refactoring when a change is necessary to
permit relations within relations within a row is acceptable, but
going deeper than that just doesn't tend to pan out well) while others
would not (make everything single-valued until you know for certain
that some of your data will require multiple values in that attribute
and then just switch the cardinality with no changes to anything else,
or with only a change to an input screen to change a single value to a
scrollable window of values, depending on how your data entry is
configured).
>
>
> > [I understand the purpose is to be able to provide a simple language
> > that enforces various data constraints, but I think that purpose is
> > somewhat flawed too. Protecting data is a noble goal -- controlling
> > data and programmers with fixed, global constraints isn't necessarily
> > the best way to lend such protection in my opinion, but that's a
> > tangent to this post, so later on that.]
>
> If a constraint is not global, then it is not, in fact, a constraint.
But what I have seen when going from one use of the data to another is that constraints seen as global because the first application saw them that way, can become local -- they are still constraints for the fisrt application, but not for the second. If all constraints are globally encoded at the database level, then local constraints need to be encoded in the application programming language. Why have two separate languages for constraints?
>
> E.g. these are global constaints.
>
> "WHEN (In Rome) DO (As the Romans)"
> AND
> "WHEN (Not in Rome) DO (As you like)"
>
> The 'local' constraints
>
> "DO (As the Romans)"
> AND
> "DO (As you like)"
>
> mean nothing (i.e. they always evaluate to TRUE, they constrain nothing).
When selling products in product line XYZ, then all sales must be to people in the US so assume that is the country and validate the postal code against US zip codes (and, good deal, our database permits us to identify the table against which to edit this attribute), but when selling new product line ABC, we can take in a country code too and the postal code should now not be constrained to just the US, although in the first application it should be. Now just how should we change the database and how should we change the first application? We KNOW that this type of scenario occurs over and over again. If all of the constraints about the data are in a programming language used by applications, then the first application can be left in tact and the new application can use different code than the first to handle the fact that what was once thought to be a global constraint is now known to be a local one.
>
> Regards
> Paul Vernon
> Business Intelligence, IBM Global Services
Thanks, Paul -- I enjoyed the discourse. --dawn Received on Tue Oct 21 2003 - 06:27:55 CEST