Re: Extending my question. Was: The relational model and relational

From: Steve Kass <skass_at_drew.edu>
Date: Wed, 19 Feb 2003 03:25:34 -0500
Message-ID: <b2vevo$fsi$1_at_slb5.atl.mindspring.net>


Bob Badour wrote:

>"Steve Kass" <skass_at_drew.edu> wrote in message
>news:b2ukru$g79$1_at_slb2.atl.mindspring.net...
>
>
>>Bernard,
>>
>> Then perhaps we agree more than we disagree. I went back and read
>>http://www.dbdebunk.com/cjddtdt.htm, which was referred to earlier
>>in this thread, and two comments on it might be useful, if not
>>
>>
>specifically
>
>
>>in answer to this post of yours:
>>
>> Date characterizes the bag-advocate as saying "But I don't need to
>>distinguish among the duplicates--all I want to do is be able to count
>>them." I think that's an unfair characterization, and I would say "But
>>I don't need to distinguish among the duplicates--all I want is to know
>>how many there are."
>>
>>
>
>Oh dear. Then you haven't any duplicates or any multisets. You have a
>relation with a count indicating how many there are.
>
>
That's an absolutely reasonable way to look at it. I never said multisets could represent real-world models that sets are incapable of representing. It's quite literally "six of one, a half dozen of the other."

>
>
>
>>I would go further and ask Date, to whom it is
>>important to be able to count (requiring distinguishability) three cans
>>of cat food, whether it is equally important for him to distinguish and
>>count five pounds of flour, or better yet, $1000 in a bank account.
>>
>>
>
>Chris Date is a very intelligent man. You need to start by assuming he would
>make very intelligent decisions. He wouldn't record the money in a bank
>account as 100,000 distinct pennies. You are constructing a straw man.
>
>Date, like any sensible person, would require a logically identifiable
>account tuple with a balance attribute.
>
I'll apologize for my rhetoric, and say that I'm quite sure he would be happy
if his dollars were indistinguishable. Why then does he require cans of cat food
to be distinguishable. He claims they can't be counted otherwise, yet isn't the
requirement of being able to count only a way of requiring that a cardinality be

determinable? We can determine how many dollars or how many cans there are with either a multiset or set logical model, I believe.

>
>
>
>
>>I think that for some purposes only one fact is needed to
>>record the purchase of three cans of cat food,
>>
>>
>
>True. A single tuple with a quantity suffices. I strongly suggest the use of
>one.
>
>
I'm not sure Date does. He claims that "Individual objects must be identifiable (that is, distinguishable from all other objects). Cans of cat food are individual objects, aren't they? Are dollars, also? The only reason I can think of to shoot down the multiset logical model is because you don't like the specific physical implementation with duplicate rows. The logical model doesn't care whether the three cans of cat food are one thing, like $1000 is one thing, or three things, because it's only a logical model. Its only burden is to provide the correct answers
to questions, and as long as it allows there to be three cans of cat food, or two, or seventeen, what's the problem whether I use <item, multiplicity> or <item, item, item> as my mental crutch that helps me picture the logical model?

>
>
>
>>and many stores reflect
>>this conceptualization by representing this fact with a single line on a
>>receipt.
>>
>>
>
>Receipts are external physical representations and are no longer managed
>data.
>
>
True, but as I said in another answer, I think its fair to expect a logical model to go hand in hand with the real world. If not, why would I want to see every report a business generates before designing a logical model for that business?

>
>
>
>>Does Date, or anyone, think that a withdrawal of ten dollars from
>>a bank account requires the deletion of ten identical facts from a
>>table?
>>
>>
>
>No. Sensible folks would expect an update statement to adjust the balance
>attribute within a relation variable. What does this have to do with
>multisets?
>
It's an attempt to point out the inconsistency between demanding separate rows for cans of cat food, but not for dollars. The multiset model recognizes that each can be represented as an integer, and that an integer can be represented equally well with positional notation or with set cardinality.

>
>
>
>
>>Later in the article, Date provides two bags, of parts and of
>>suppliers, and asks the question (query) "list part numbers for
>>parts that either are screws, or supplied by supplier S1, or both."
>>Then he proposes twelve queries that might answer the question,
>>and uses the fact that nine different results appear to support his
>>argument against bags.
>>
>>Unfortunately when Date asks a stupid question, he gets many
>>stupid answers.
>>
>>
>
>How are casual users to know which of the twelve seemingly reasonable
>solutions will yield the answer they want? Are you suggesting that every
>user of a dbms must have expert level knowledge of dbms internals?
>
>
>
Forget casual users. I don't know what the question is, so how can I possibly be expected to know what query will answer it?

>
>
>>The problem is that in his sample bag-database,
>>"part number" is not an entity.
>>
>>
>
>'Entity' is not well defined. 'Domains' are well defined. Part number is a
>domain.
>
>
My slip - thanks for the correction.

>
>
>
>> So even if the query were "list THE
>>part numbers ...", which might mean list each part number in the
>>part numbers bag ..., doesn't fly, since there is no part numbers bag.
>>
>>
>
>You are making less and less sense as time goes on.
>
>
>
Hm. I'm squinting, but I still can't tell the first "less" from the second "less." Maybe it would be better to say the amount of sense I'm making has decreased by two.

>
>
>>The question he asks is not well-defined.
>>
>>
>
>Sure it is. Table 'P' states which parts exist. ( ie. Part P# has name
>PNAME) Table 'SP' states which suppliers supply which parts. ( ie. Supplier
>S# supplies part P#)
>
>
You've lost me now. For Date to successfully argue against multisets, he must start with meaningful data. The four rows of P are meaningful if table P describes a bag containing three screws of type P1 and one screw of type P2. The three rows of SP, however, either do not represent any real-world relationship between suppliers and parts, or they represent such a relationship incompletely (attributes being missing), or they are a good
representation of something that Date fails to point out. A table of Parts should allow duplicate parts (if we're using multisets), but a table stating
who supplies which parts should not be permitted to store the same fact twice,

as his example does, or it should be providing some other information.

There's nothing wrong with allowing one table to be a bag but constraining another
table to be a set. Table P's three P1 rows unambiguously represent the existence
of three P1 screws (just as multisets don't permit the distinguishability of duplicates,
there's no requirement, as one might impose for sets, that three separate facts be
identified - the point is that the implementation works). The presence of only
two such rows would represent a different situation, and so not only do the three
rows unambiguously describe the existence of three P1 screws, there is no other
way to represent three P1 screws with a different collection of rows in table P.

But what do the two identical rows in SP represent that is distinct from the situation that one fewer such row would represent? You've provided two possible answers below, and the meaning of the query hinges on what physical state is represented by the data in the tables.

I think Date's query question is open to interpretation, assuming one allows a
query result to contain duplicates. It could be a request for a subset of the
part names domain, hence a true set, or it could be a request for a collection
of part names in one-to-one correspondence with a specific collection of individual parts, in which case it need not be a set. The only way to resolve
the ambiguity is to disallow any query from returning a multiset, but then the
ambiguity could be resurrected by asking for "list part numbers and the number of times they appear for parts that are ..." It's not the multiset model
that causes the ambiguity.

>
>
>
>>For it to be a valid query,
>>it must specify the source of the items to be listed. I would interpret
>>it to mean "For each part in the bag of Parts, list the part's part number
>>if the part is a screw or if the part is supplied by supplier S1, or both.
>>
>>
>
>All you have done is replace the name 'P' with the name 'Parts'. However,
>Date actually shows bag 'P' and you have not shown bag 'Parts'. I would say
>that makes Date's original example just a tad better defined.
>

You lost me again. I'm happy with the bag P as Date presents it. I might be happy with SP if I knew what it meant. I'm not happy with the absence of a specified domain of part numbers - is the domain of part numbers a set or a multiset? In the multiset model, either might be the case, and just like domains must be specified in the relational model, they must be as well in the multiset model. Since Date's query asks for part numbers, the domain should be provided.

>
>
>
>
>>This question has an unambiguous answer, despite the fact that Date
>>has intentionally thrown a wrench into the works as well, by providing
>>a table SP (suppliers-parts?) that contains the same fact twice - a
>>suppliers-parts table should be subject to a constraint that the
>>multiplicity
>>of any supplier-part fact be 1
>>
>>
>
>Why? If multisets are useful, why are they not useful in an abstract example
>such as this one? Perhaps they mean 'stock on hand'. Or perhaps each
>duplicate represents an anonymous warehouse that supplies the part. If
>duplicates have a use anywhere, certainly they have a use here.
>

Good points - they're only useful if you tell someone what they mean, though. "Careful! Don't accidentally rub that number off the hammer. It's very important, even though I don't know what it means." The duplication of a row in SP could mean any number of things, and expecting me to know just because I'm a fan of duplication is a cheap shot and a cheap shot.

>
>
>

>
>
>>since there is no real-world meaning to
>>the two identical rows he lists, in contrast to the clear real-world
>>
>>
>meaning
>
>
>>of the three P1-screws in the parts table.
>>
>>
>
>Please provide a similar example using bags where the duplicates have
>real-world meaning, then. And propose a similar query.
>
>
>
I did.- right about where I started making less and less sense to you.

>
>
>>His example is entirely
>>unconvincing.
>>
>>
>
>It convinces me.
>
>
>
That's fine, and since the relational model is sufficient, no harm is done. The hardest part of teaching mathematics is explaining how a proof can be wrong if the theorem is true.

>
>
>>SK
>>
>>Bernard Peek wrote:
>>
>>
>>
>>>In message <b2uat1$f0t$1_at_slb9.atl.mindspring.net>, Steve Kass
>>><skass_at_drew.edu> writes
>>>
>>>
>>>
>>>>Bernard,
>>>>
>>>>This isn't a matter of opinion. There is one determinant: "there are
>>>>two
>>>>employees named John Smith". There are many consequential
>>>>truths, such as "there is at least one employee", "there is an employee
>>>>whose first name is not Nancy", "there are at least two employees
>>>>whose first and last names share a common letter of the alphabet.",
>>>>and so on.
>>>>
>>>>I don't deny that it can be important to distinguish between two
>>>>John Smiths.
>>>>
>>>>
>>>That's not my argument. My argument is that there may be no need to
>>>distinguish between two real-world objects, each of which is
>>>referenced by a single record in a database. The relational model (and
>>>databases based on it) require that a distinguishing key be created
>>>even if there is none in the logical data structure.
>>>
>>>I don't dispute that there are real pragmatic reasons for accepting
>>>that deviation from the logical structure of the data. But as this is
>>>a theory newsgroup I wanted to point out that this is a (minor)
>>>failing in the relational model.
>>>
>>>[...]
>>>
>>>
>>>
>>>>I'm not redefining any words, but we have a fundamental
>>>>difference in understanding logical vs. physical models. You are
>>>>saying that the real-world scenario of books in a library must be
>>>>represented by a logical model that does not keep track of an
>>>>actual attribute of "book" (acquisition number), and then you
>>>>blame the model for not being able to distinguish two identical
>>>>books,
>>>>
>>>>
>>>No, that's not my objection. My objection is that the relational model
>>>declares that there must be a distinction, when there is no such
>>>requirement in the real world.
>>>
>>>It does makes the maths easier, and it makes the implementation much
>>>easier.
>>>
>>>
>>>
>>>
>>>
>
>
>
>
Received on Wed Feb 19 2003 - 09:25:34 CET

Original text of this message