Re: Dreaming About Redesigning SQL

From: Mike Preece <michael_at_preece.net>
Date: 4 Nov 2003 17:53:37 -0800
Message-ID: <1b0b566c.0311041753.1b809532_at_posting.google.com>


andrewst <member14183_at_dbforums.com> wrote in message news:<3556922.1067944594_at_dbforums.com>...
> Originally posted by Mike Preece
> > > > Do you agree that a list of a person's phone numbers can and should be
> > > > considered an attribute of the person - just as a single number is -
> > > > provided an acceptable means exists to access all "rows" containing a
> > > > given phone number?
> > > No, I don't. Well, perhaps I am just being contrary! But if you have a
> > > list of phone numbers for a person, you probably need further attributes
> > > to differentiate them, e.g. an indication of whether the phone is home,
> > > work, mobile; or whether one phone is best for day time and another for
> > > evenings. So if this is a single attribute, it is actually a relation-
> > > valued attribute with columns phone_no, phone_type etc. However, I
> > > would prefer to normalise this data out into a separate table as it
> > > actually makes it simpler to query in a relational database(!)
> > OK. This is how it could be done in Pick:
> > We start with:
> > File: People
> > ID: Person
> > 001: First Name(s)
> > 002: Family Name
> > 003: Date of Birth
> > 004: Phone number
> > We can use exactly the same database design and begin to add second,
> > third... any number of phone numbers to each person. I should perhaps
> > have pointed out that there is no need to reserve nulls for everyone
> > else if just one person has more than one phone - as there would be in
> > a relational table.
> > Now then, suppose we do want to add "an indication of whether the
> > phone is home, work, mobile; or whether one phone is best for day time
> > and another for evenings". We could simply expand the existing file as
> > follows:
> > File: People
> > ID: Person
> > 001: First Name(s)
> > 002: Family Name
> > 003: Date of Birth
> > 004: Phone number(s) (multivalued; controlling)
> > 005: Phone type(s) ('H'ome, 'W'ork, 'M'obile) (multivalued; dependent
> > on Phone number)
> > 006: Preferred usage period(s) ('D'aytime, 'E'vening) (multivalued;
> > dependent on Phone number)
> > There is still no need to go to a separate table for this data. It
> > might be worth reminding you that we *could* store this additional
> > data in a separate file. With the Pick model you have more flexibility
> > - more options. You can maintain a many:many relationships as a
> > separate table, as a list of phones for a person or as a list of
> > people against a phone number. You can also store data associated with
> > a person's phone against that person - as in this example. It very
> > much depends on the requirements as to which option is the best fit.
> > Relevance - the real world requirement - is important, as is the
> > relatedness. Disregarding relevance and on concentrating simply on
> > relatedness is *not* the best way to go. It is perhaps also worth
> > pointing out that, by storing this data as variable in both length and
> > number of occurances for each individual, far less disk space is
> > required on the physical level. Also, being able to access all of the
> > data relating to a person's phone numbers, type of phone & preferred
> > usage in one 'hit' results in optimum performance.
> > Do you see why I say the relational approach is a "she'll be right"
> > approach in comparison - and definitely not the best tool for the job?
> No, I totally disagree with you on your last statement.

You know - I kinda thought you would.

> You are
> completely and utterly wrong there, sorry! You have several
> misunderstandings of the relational model within the argument above.

OK. I don't think I'm wrong on the important points I made but I'll admit that I might have some flaws in my understanding of the relational model. Let's talk about them. I am convinced Pick is a better model for the reasons I gave. Now I expect good arguments from you to show how and why you think relational is better. That's the point of this exchange after all.

> First of all, as Bob pointed out yesterday, the relational model does
> allow for relation-valued attributes, so that all the phone numbers for
> a person could be stored within the single person record. However, that
> would not be the "traditional" relational approach to this.

Why not? Is there something in relational theory that says that would be wrong in some way?  

Could or would you also store data (type & usage) associated with each person's phone number in a relation-valued attribute within the Person's record?  

> The "traditional" relational approach is, of course, to have a separate
> table for the phone numbers, like this (NB I will use SQL syntax; SQL is
> not trully relational, but never mind):
> CREATE TABLE person_phones
> ( person_id INT
> , phone_no VARCHAR2(20)
> , phone_type VARCHAR2(1) CHECK (phone_type IN ('H','W','M')
> , preferred_usage VARCHAR2(1) CHECK (preferred_usage IN ('D','E')
> );

Yes. I understand that. I don't think it is always the best way to do things though.

> Now, if a person has 1 phone, we will insert 1 record into this table;
> if she has 2 we will insert 2; if none, we will insert none. There is
> no need, as you say there is above, to "reserve nulls for everyone else
> if just one person has more than one phone". That would only apply if
> we had used a separate COLUMN in the person table for each phone type
> (home_phone_no, work_phone_no, mobile_phone_no) - which presumably would
> be the same in Pick if you chose that approach.

OK. I guess I should have said column instead of row. I'm not used to thinking in terms of columns and rows - especially as what would be a column in a table in relational terms equates more to an attribute in an item in Pick, and we see attributes as rows across the screen when we look at an item. It can get confusing.

It does hold true though that if we've only got one phone recorded against each Person and we add a second phone to one person, or record a type against one person's phone number, or anything like that, we don't have to change a single thing about any of the other items (or rows) in the Persons file (or table).

> In any case, those "reserved nulls" may well occupy NO space, e.g.
> in Oracle any trailing NULL values in a record are simply not
> stored at all.

That makes sense. I didn't know Oracle did this. See - I am learning. Thanks. Some flavours of Pick remove trailing null attributes and some don't. My generic write subroutine removes them - but that's beside the point. We're not here to discuss my own development framework unfortunately.

> (Aside: some people will splutter at me mentioning Oracle (or any other
> SQL DBMS) in a discussion about the relational model. Fair enough.
> The pure relational model is better than SQL, and SQL is better than
> the rest.)

And I just have to say that, from my admittedly limited understanding, they all seem to focus on relatedness and turn a blind eye to relevance.

> So your point about saving disk space is debatable at least.

Removing trailing nulls does not provide grounds for the above statement. I have yet to see any evidence or explanation to show how a relational database could possibly occupy less space than an equivalent amount of data in Pick - even if there was an abundance of numeric data. My experience tells me the reverse is in fact the case. How could it be otherwise when all data in Pick is variable length? If you have 35 characters for a surname field to store "Andrews" then in Pick it will take up 7 bytes (plus one for a system delimiter). When you think about all the extra tables and all the fixed length fields... I'm going to take a lot of convincing I'm afraid.

> In
> practice, I suspect Pick will indeed use less disk space than a DBMS
> like Oracle, probably because it dates from an era when disk space
> conservation was the holy grail.

Granted. The number of bytes is not so important now. It is still a factor though - not only for disk space but for other things like memory and transmission.

> However, if so it is not the
> relational model that requires more disk space, it is merely the
> physical implementation.

The difference in the logical models - if there is any at all - concerns the ability to nest tables without "references" of any kind to the nested data. We can, as I have said, implement a database design according to the relational model on a Pick system. We simply wouldn't use multivalues or subvalues. That would take up more space and be less efficient than it would need to be though, because of the extra tables. I've worked on an implementation like this and I found it a bit like being told to swim a length fully clothed.

> If minimising disk space is an issue, a DBMS
> like Oracle offers strategies for doing so, such as clustering
> (physically storing the phone numbers with the person record - without
> changing the logical 2 table design). Clustering also achieves the
> "person together with all phone data in one hit" requirement.

Hmmm... I think I'd like to understand the distinction here a little better. Perhaps if it's good enough for Oracle then maybe we could say that maintaining the many:many relationship and data associated with each phone in the Persons file using multivalues also retains the "logical two table design". Doesn't seem quite kosher to me though.

> As far as performance is concerned, a DBMS like Oracle has far more
> tricks up its sleeve than merely storing related data physically
> together. There is caching of recently retrieved data in memory;

Also in Pick.

> there
> are "index-organized tables" for which person_phones is a likely
> candidate;

Without knowing exactly what they are I would guess they're similar to Pick indexes based on correlatives.

> there are hash join algorithms used when the person and phone
> data for many people is required e.g. by a report or batch process.

I don't think we'd need this so much in Pick.

> At the end of the day, we are doomed to disagree.

Not necessarily. There are, I'm sure, many things we can agree about. You never know - maybe at the end of this I'll become a relationalist.

> You don't know how
> relational works, and I don't know how Pick works (beyond what I've
> learned from you).

So we 'talk'.

> But the idea that the relational approach is a
> "she'll be all right" approach is utterly absurd! I urge you to educate
> yourself about it and understand why. Unlike Pick, the relational model
> is not a product, it is a theory on which products can be based (wholly
> or partially); a theory whose purpose is to find the BEST way to store,
> manipulate and retrieve data.

How about if I tone it down and say instead it's an "any colour so long as it's black" approach? I have to say again that I understand that relational theory does an excellent job at determining the optimum database design for *related* data - *BUT* it ignores *relevance*. I think it is absurd to do so.

On reflection, we seem to have drifted off the "logical" differences into "physical implementation" issues. Can we return to the differences in the logical models. My argument is that Pick is more flexible in adapting to real world requirements - *and* that it is more efficient as a result.

Cheers
Mike. Received on Wed Nov 05 2003 - 02:53:37 CET

Original text of this message