Re: Dreaming About Redesigning SQL

From: Mike Preece <michael_at_preece.net>
Date: 27 Oct 2003 21:38:13 -0800
Message-ID: <1b0b566c.0310272138.6e8fdf96_at_posting.google.com>


andrewst <member14183_at_dbforums.com> wrote in message news:<3526485.1067262233_at_dbforums.com>...
> Originally posted by Mike Preece
> > "Marshall Spight" <mspight_at_dnai.com> wrote in message
> > news:<UZ3nb.30451$ao4.58659_at_attbi_s51>...
> > > I would have a persons table, with a primary key, and a
> > > phone numbers table with a primary key (possibly the phone
> > > number itself) and another table with two foreign keys:
> > > person id and phone id.
>
> > > Marshall
>
> > Oh. You mean like an index? We could just have a single file:
> >
> > File Phones
> > ID: Phone Number
> > 001: Phone type ('C'ell, 'H'ome, 'F'ax, 'W'ork, 'D'irect or
> > whatever)
> > 002: PersonalIDs (multivalued)
> > ..and any other data relating primarily to the phone
> >
> > and index it on PersonalID.
> >
> > I think it's more or less the same - or are b-tree indexes not
> > allowed?
> >
> > Mike.
>
> No, Marshall's point has nothing whatever to do with indexes, it is all
> about data. Suppose you have the following "rules":
>
> 1) A person may be associated many phone numbers
> 2) A phone number may be associated with many persons
>
> For example:
> a) Marshall is associated with phone numbers: 123-4567, 234-5678
> b) Mike is associated with phone numbers: 222-2222, 234-5678
>
> i.e. you both have 2 phone numbers, one of which you share.
>
> The relational model for this would have the 3 tables Marshall
> described, with data like this:
>
> Persons table:
> Marshall
> Mike
>
> Phones table:
> 123-4567
> 234-5678
> 222-2222
>
> Person_phones table:
> Marshall, 123-4567
> Marshall, 234-5678
> Mike, 234-5678
> Mike, 222-2222
>
> Then to find out Marshall's phone numbers you would query
> something like:
>
> select phone_no from person_phones
> where person_name = 'Marshall';
>
> And to find out who is on 234-5678 you would query something like:
>
> select person_name from person_phones
> where phone_no = '234-5678';
>
> See? The person/phone data is not stored "with" the person, nor "with"
> the phone, and certainly not recorded twice independently. And the
> answers to both queries are guaranteed to be consistent, because they
> query the same data.
>
> There may be indexes defined on these tables to aid performance, but
> these are irrelevant to the point, and in any case are maintained by
> the DBMS so there is no possibility of the index becoming out of sync
> with the data.

Interesting. I come from a more practical background that one based on theory, so I hope you'll bear with me on this...

I consider indexes to be, if not exactly data, then at least a valid means of obtaining data.

Phones don't have people, people have phones. In most real-world situations we'd have a Persons file, and one of the attributes would be a multivalued list of (>=0) phone numbers, possibly with an associated type, possibly with a defined delimited string pattern (as in +##-0###-###-####). We probably wouldn't have a person's phone number anywhere other than as a value on the item in the Person's file. If two people share the same number - then that phone number would be recorded twice, once against each person. If we have a need to get a list of possible callers from a particular phone number we would most probably index the file on phone number. Because we're more likely to want information about people than we are about their phones, we have direct hashed access to the Persons file to get, among other things, a list of their phone details, but have to go through an index to get a list of people with a particular phone number. File: People
ID: Person
001: Phone number (multivalued)
...etc, with an index on attribute 1.

If the requirements are different and primarily concern identifying the location and/or list of people related to a phone number, with caller-id for instance, that's different. Then we'd most likely have a Phones file with a Location ID for each phone. We'd probably not have the people's names or other details on each Phone item - it's more likely that we'd reference a Location first, and then reference the People recorded against the Location for more details. If we need the phone numbers for a location or person we would use an index or two (sound familiar).
File: Phones
ID: Phone number
001: LocationID
...etc., with an index on attribute 1.
File: Locations
ID: LocationID
001: PersonalID (multivalued)
...etc., with an index on attribute 1.
File: People
ID: PersonalID
001: Name
...etc.

In both scenarios we have a many:many relation - but they're not equivalent. Can you describe a scenario in which it would be equally as important to get one part of a many:many relation as the other?

In Pick it's quite rare to have a file containing only null items. This would usually only occur if there is a very large number of possible keys - or "item id"s. It's more common to use a single item with an attribute list or, even more commonly, a multivalued list in a single attribute - so that the full list can be read into memory in a single read. In reality, there would probably be additional data with a one:one relationship to Persons which would 'justify' each Person having its own item.

That aside, we could also have similar files, like:

File: Persons
ID: PersonalID
001: (null/nada/zip/nothing at all/zero bytes)

File: Phones
ID: PhoneNumber
001: Phone type ('C'ell, 'H'ome, 'F'ax, 'W'ork, 'D'irect or whatever)

and a third file:

File: PersonsPhones
ID: PersonalID : (optional delimiter) : PhoneNumber

This is quite similar to the example you gave - it does have a single occurance for each relationship and not a multivalue in sight.

It does not, however, provide direct hashed read access to data in the PersonsPhone file for either PersonalID or Phone number, because we simply don't have the 'key' (itemid). We would have to create an index on each of PersonalID & PhoneNumber and use that to obtain the full item.id. This is very easily done and the DBMS would automatically maintain the index. To ensure full referential integrity we could put an update trigger on each of the Persons & Phones files.

Thanks guys. A good discussion. Well - I think so anyway.

Regards,
Mike. Received on Tue Oct 28 2003 - 06:38:13 CET

Original text of this message