Re: Dreaming About Redesigning SQL

From: Mike Preece <michael_at_preece.net>
Date: 7 Nov 2003 19:46:17 -0800
Message-ID: <1b0b566c.0311071946.479f0135_at_posting.google.com>


andrewst <member14183_at_dbforums.com> wrote in message news:<3571843.1068213473_at_dbforums.com>...
> Originally posted by Mike Preece
> > > > Why were RVAs intoduced (or accepted) into relational theory in the
> > > > firs..., I mean, 10 years ago?
> > > If you really want to understand this,
> > Not really, to be honest. From what I've read I'm pretty sure we've
> > had a much better way of doing this in Pick for more than 30 years -
> > with multivalued attributes. I wonder where Date & Darwen got the
> > idea.
> > > read The Third Manifesto by C J
> > > Date and H Darwen. I haven't got it with me and to be honest, I can't
> > > remember why. I think it is more a case of accepting it as not
> > > subverting the relational model rather than advocating doing it.
> > I wonder how many more things they and others will come to accept in
> > time to come. I guess it might just help someone to come up with the
> > first "true" relational database. "It's relational Jim - just not as
> > we know it".
> As I have said before, the fact that RVAs are acceptable doesn't mean
> they should be used to model many:many relationships. I would only use
> an RVA if I considered the RVA data to be some kind of "blob" of data I
> had no more intention of interrogating in the database than I would a
> bitmap image. And Date and Darwen were responding to the "demand" for
> complex data types from the OO camp, not trying to replicate Pick. They
> say RVAs are "legal", they don't say they should be used instead of
> normalisation.

Why do they say they're "legal" now when they weren't considered to be 10 years ago?

How *could* they be "used instead of normalisation" and still be "legal"?

[snip]
>But what is "a0"? Why not "Name"?

a0 is a system default for the "id" (primary key) for a file (table). The Name is an attribute of an item in the Person file. For instance, "Mike" might be the item-id (a0) for an item where the Name attribute is actually "Michael".

[snip]
> > > So storing the data assymetrically has no impact on performance then,
> > > good or bad?
> > It's logical to store phone numbers against a person. It's also more
> > efficient.
> .. in Pick. It is not more efficient in an RDBMS. But anyway my point
> was that you are saying that the query by phone would also be "extremely
> efficient". But presumably not AS efficient as the query by name, since
> you say that is "more efficient".

If we have phone number(s) as an attribute of a Person and we want to get the Person with that attribute we have to use an index to get the item-id. It's still very efficient, but not as efficient as it would be if we have the item-id to begin with.

> So, you have:
> Query By Efficiency
> Name X
> Phone Y
> where X > Y
> Whereas in relational we have:
> Query By Efficiency
> Name X
> Phone Y
> and we can choose whether X = Y, X > Y or X < Y according to decisions
> made at the PHYSICAL level, which do not require any change at the
> LOGICAL level, i.e. the SQL for a given query remains unchanged.

As I've said before, we can also implement a PersonsPhones file in Pick. In that case we would index the file on the Person and also on the Phone number. It would be equally as efficient to get the Phone number for a Person as it would be to get the Person for a Phone number. If we think about this logically, it makes more sense to organise phone numbers into sets according to the person they relate to than it would to organise persons into sets according to their shared phone numbers. Have a look at a telephone directory. How is it organised? That's a physical representation of relationships between people and phone numbers - but it's also the most logical.

> You of course will contend that the Pick X is better than the RDBMS X,
> and I will disagree, but we can't really test that here can we?

I'd be happy to have a go. It wouldn't have to be anything grand. Maybe someone has an old 8Mhz PC-AT that we could set up to run some tests with 17 concurrent users updating and querying the DBMS. That way if there are any differences they would probably show up more.

> However, I will say that in Oracle (e.g.) it is perfectly possible that
> the query to list "Mike" and all his phone numbers can be achieved with
> 1 disk read, provided the data is clustered and fits into 1 block
> (typically 4K or 8K); or 0 disk reads if we read the same block earlier
> and it is still in the cache.
> > > > Let's look at the evidence:
> > > > We agree that the person is more relevant in the persons to phones
> > > > relationship.
> > > But we disagree over whether that is necessarily relevant to logical
> > > database design.
> > I thought we agreed that when we were concentrating only on logical
> > design.
> No, unless you mean you agreed that it is irrelevant.

Huh? I agreed *what* is irrelevant? Logically, it makes sense to organise phone numbers into sets for each person. Do you want to go back over the ground we covered in the lead-up to the "point of agreement"?

> All I have
> agreed, for the sake of argument, is that one relationship may be more
> relevant than the other to the USER.

Yes. I take that to be a logical argument.

> (I hope) I have never said that it
> needs to be taken into account during logical design. It does not.

Huh? Aren't you making a distinction between what is logical and what fits in with relational theory? The difference seems to me to be that to fit it into the relational model you have to discount relevance. Not entirely logical.

> > > We agree that it may be relevant to physical database
> > > design, but you don't seem to understand or accept that logical and
> > > physical design are different issues.
> > It just so happens that the physical implementation is a "natural" one
> > in Pick. If you decide it's logical to record phone numbers against a
> > person - then that's what you do in Pick and that's how it's stored
> > physically. I'm sorry if you folks have a hard time accepting that
> > that makes sense. Maybe it's because it's nigh on impossible to do in
> > relational without using RVAs.
> It just so happens that Pick has no concept of logical/physical
> distinction. That is Pick's handicap, not relational's. Sorry if you
> have a hard time understanding that. You have NO CHOICE over logical
> design: it HAS to correspond to the physical design you select for
> performance reasons; we have choice: our logical design is optimised
> for querying, our physical design is optimised for performance.
> Forget RVAs.

That's kind of "key" isn't it - that bit about the distinction between logical design being optimised for querying and physical design being optimised for performance?

By "querying" do you mean specifically SQL?

Suppose you had a more flexible way to query your data. If your applications could directly access your physical data without using SQL (or something similar), but could directly read, write and delete data, what would determine your logical design then? How would your logical design differ from your physical design? Are there some physical optimisation techniques that could be made redundant?

With Pick we can take any database design you care to come up with based on relational theory and implement it. We could also use SQL.

It makes sense (ie. is logical) to organise data into related sets. It makes even more sense to organise THOSE sets into sets according to their relevance. If you look in your phone book you will often see more than one phone number against a name. You will also see other, relevant, data alongside each name - things like address and an indication of which number is a fax and which is a cellular phone. The name *could* have been printed against every number - but that would be silly wouldn't it?

What choice is available in relational theory that is not available using Pick?

Let's talk about "handicaps". We would probably not choose to tie one hand behind our backs or swim a length fully clothed. That would be silly. We *can* use multivalues and subvalues but we don't have to - just as you can use RVAs but don't have to use them. It so happens that multivalues and subvalues are seen as a distinct advantage in Pick and a PITA in an RDBMS. Design in Pick transcends the boundaries imposed by the table/column/row schema - Pick adds a couple of levels and does so in a way that is entirely logical.

[snip]
> > List Persons "Mike" Name PhoneNumber PhoneType PhoneUsage
[snip]
> > What is the equivalent SQL statement to get the same information?
> SQL:
> select Name, PhoneNumber, PhoneType, PhoneUsage
> from person_phones
> where Name = "Mike";
> Slightly longer, I'll grant you.
> And no, we can't beat your single read of 2k from disk, but we can match
> it (see above).
> > Would the statement be identical whether the data is coming from the
> > PersonsPhones table or from RVAs in the Persons file?
> No, because those are 2 different LOGICAL designs.

I though they were two physically different files.

> But the statement
> would be identical regardless of any physical optimisations we might
> make such as:
> 1) physically cluster the phone data with the person (like your Persons
> file with the phone MV attribute)
> 2) physically cluster the person data with the phone
> 3) store the phone data and person data independently

Could you give me the SQL statement to query the Persons table using RVAs? Received on Sat Nov 08 2003 - 04:46:17 CET

Original text of this message