Re: Dreaming About Redesigning SQL

From: Steve Lancour <stevel_at_lancour.com>
Date: Fri, 31 Oct 2003 10:38:03 -0500
Message-ID: <BoSdncG2r-XbHD-iRVn-sQ_at_comcast.com>


Bob Badour wrote:

> "Steve Lancour" <stevel_at_lancour.com> wrote in message
> news:mKudnYI38-U5KTyiRVn-gw_at_comcast.com...
>

>>Bob Badour wrote:
>>
>>
>>>"Steve Lancour" <stevel_at_lancour.com> wrote in message
>>>news:M_GdnSiEV8OzRAKiRVn-ug_at_comcast.com...
>>>
>>>
>>>>Bob Badour wrote:
>>>>
>>>>
>>>>
>>>>>Steve Lancour <stevel_at_lancour.com> wrote in message
>>>
>>>news:<KcWdnaY9u-hd1wCiRVn-vw_at_comcast.com>...
>>>
>>>
>>>>>>andrewst wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Originally posted by Steve Lancour
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>In Pick I'd do it with two files (tables):
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Persons Table:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Marshall
>>>>>>>
>>>>>>>
>>>>>>>>Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Phones Table:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>123-4567
>>>>>>>
>>>>>>>
>>>>>>>>234-5678
>>>>>>>
>>>>>>>
>>>>>>>>222-2222
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>So far, we've used the same logical structure.  Where in the
>>>>>>>>relational
>>>>>>>
>>>>>>>
>>>>>>>>model you'd use a third table to represent the intersections of
>>>>>>>>Persons
>>>>>>>
>>>>>>>
>>>>>>>>and Phones I would define a multi-valued attribute (column) in

>
> Persons
>
>>>>>>>
>>>>>>>>for Phones and in Phones for Persons:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Phones attribute (column) in Persons records:
>>>>>>>
>>>>>>>
>>>>>>>>Marshall  123-4567]234-5678
>>>>>>>
>>>>>>>
>>>>>>>>Mike      234-5678]222-2222
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Persons attribute (column) in Phones records:
>>>>>>>
>>>>>>>
>>>>>>>>123-4567  Marshall
>>>>>>>
>>>>>>>
>>>>>>>>234-5678  Marshall]Mike
>>>>>>>
>>>>>>>
>>>>>>>>222-2222  Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Referential integrity is the responsibility of the developer.  In a
>>>>>>>
>>>>>>>
>>>>>>>>poorly-designed system there may be any number of programs updating
>>>>>>>
>>>>>>>
>>>>>>>>these records.  Each such program must enforce the rules and any

>
> rule
>
>>>>>>>
>>>>>>>>changes must be made separately in each.  The opportunity for

>
> mistakes
>
>>>>>>>
>>>>>>>>leading to corruption is undeniable.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>In a well-designed system, only one Pick-Basic subroutine handles
>>>>>>>
>>>>>>>
>>>>>>>>updates and deletions to the Person/Phone relations for *all* input
>>>>>>>
>>>>>>>
>>>>>>>>sources (other Pick programs, web interface, batch load, etc.).

>
> This
>
>>>>>>>
>>>>>>>>subroutine may be implemented as a trigger or as part of a business
>>>>>>>
>>>>>>>
>>>>>>>>rules layer "talking to" a variety of "front-end" applications.  If
>>>>>>>
>>>>>>>
>>>>>>>>desired, the two record updates may be enclosed in a transaction to
>>>>>>>
>>>>>>>
>>>>>>>>guarantee integrity.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>In this example, the query to get Marshall's phone numbers would be:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>select Persons "Marshall" Phones
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>To query for Persons using 234-5678:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>select Phones "234-5678" Persons
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>In both cases, Pick's record-key hashing allows a "direct hit" on

>
> the
>
>>>>>>>
>>>>>>>>records requested without the need for an index or a table scan.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Steve Lancour
>>>>>>>
>>>>>>>
>>>>>>>Right, well I'm not in the business of Pick-bashing, really.  But we
>>>>>>>could achieve the same "direct hit" performance in Oracle (the SQL
>>>>>>>DBMS I happen to be familiar with) using 2 hash clusters if we wanted
>>>>>>>to - provided we store the person/phone intersection data twice as

>
> you
>
>>>>>>>would in Pick.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>We wouldn't actually do that, of course, because 99.999% of the time
>>>>>>>ensuring data integrity is more important than shaving a CPU
>>>
>>>millisecond
>>>
>>>
>>>>>>>off the access time.  And as you admit, storing the same data twice
>>>>>>>separately runs an increased risk of corruption.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>So yes, we will add indexes (unless we are sure our tables will

>
> always
>
>>>>>>>be very small), and a typical keyed query may involve a few logical
>>>>>>>reads rather than 1.  If the query is performed a lot, the index
>>>>>>>entries are likely to be cached in the server memory, so that the
>>>>>>>number of physical reads may be fewer (zero if we are really lucky);
>>>>>>>if it isn't performed a lot, we probably won't notice the additional
>>>>>>>physical reads anyway.
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Posted via http://dbforums.com
>>>>>>
>>>>>>
>>>>>>In my example, if I bracket the two updates inside a transaction I
>>>>>>eliminate the possibility of corruption during the update.
>>>>>
>>>>>
>>>>>How do you eliminate the possibility of corruption due to flaws in
>>>>>your programming logic or another programmer's logic?
>>>>
>>>>I can't *eliminate* the possibility of corruption in Pick any more than
>>>>I could on any other system if someone with full DBA privileges makes a
>>>>mistake.
>>>
>>>
>>>A DBA's mistake cannot harm any data protected by a declared integrity
>>>constraint. You postulate two mistakes: one mistake to remove the

>
> constraint
>
>>>and a second mistake that corrupts the data. That seems rather less

>
> likely
>
>>>than an application programming error.
>>>
>>>
>>
>>No, I postulate only one mistake.  If the DBA makes a mistake in setting
>>up the constraint the applications may be perfect and yet still corrupt
>>data.

>
>
> No, they will not, because they are perfect after all.
>
>
>
>> Same as if there were a mistake in the example Pick code I posted.

>
>
> You are wrong. I suggest you think about the issue a little longer.
>
>
>

No, I'm not. You're being obtuse. Probably intentionally so as your posts lead me to think you're smart enough not to have so misunderstood my point. Received on Fri Oct 31 2003 - 16:38:03 CET

Original text of this message