Re: Extending my question. Was: The relational model and relational

From: Bob Badour <>
Date: Wed, 19 Feb 2003 22:42:39 -0500
Message-ID: <kwY4a.118$>

"Steve Kass" <> wrote in message news:b2vevo$fsi$
> Bob Badour wrote:
> >"Steve Kass" <> wrote in message
> >news:b2ukru$g79$
> >
> >
> >>Bernard,
> >>
> >> Then perhaps we agree more than we disagree. I went back and read
> >>, which was referred to earlier
> >>in this thread, and two comments on it might be useful, if not
> >>
> >>
> >specifically
> >
> >
> >>in answer to this post of yours:
> >>
> >> Date characterizes the bag-advocate as saying "But I don't need to
> >>distinguish among the duplicates--all I want to do is be able to count
> >>them." I think that's an unfair characterization, and I would say "But
> >>I don't need to distinguish among the duplicates--all I want is to know
> >>how many there are."
> >>
> >>
> >
> >Oh dear. Then you haven't any duplicates or any multisets. You have a
> >relation with a count indicating how many there are.
> >
> >
> That's an absolutely reasonable way to look at it. I never said multisets
> could represent real-world models that sets are incapable of representing.
> It's quite literally "six of one, a half dozen of the other."

Actually, you claimed the above set is a multiset when it is not.

> >>I would go further and ask Date, to whom it is
> >>important to be able to count (requiring distinguishability) three cans
> >>of cat food, whether it is equally important for him to distinguish and
> >>count five pounds of flour, or better yet, $1000 in a bank account.
> >
> >Chris Date is a very intelligent man. You need to start by assuming he
> >make very intelligent decisions. He wouldn't record the money in a bank
> >account as 100,000 distinct pennies. You are constructing a straw man.
> >
> >Date, like any sensible person, would require a logically identifiable
> >account tuple with a balance attribute.
> >
> I'll apologize for my rhetoric, and say that I'm quite sure he would be
> happy
> if his dollars were indistinguishable. Why then does he require cans of
> cat food
> to be distinguishable.

Cans on the store shelf already are identifiable. Whether he requires it there is moot.

He doesn't require it in the dbms. Business requirements determine the data managed in the dbms. Depending on those business requirements, he seems quite satisfied with a single tuple for all cans of cat food. If one needs to make a statement about cat food without distinguishing cans, a single tuple suffices. If the tuple has a quantity attribute, the dbms has no need to count cans.

Note the difference between the dbms and the store shelf. Staff taking inventory need to identify the individual cans on the shelf to count them. The requirement for identifiability on the shelf is not a problem because the staff can simply use location in space.

Obviously, a database only needs to make statements about things of interest. If individual cans are not of interest, it need not make statements about them. Neither do we need to model everything down to the subatomic particle.

> He claims they can't be counted otherwise, yet
> isn't the
> requirement of being able to count only a way of requiring that a
> cardinality be
> determinable?

You are confused--determining cardinality requires counting. In the example above, the staff taking inventory determine cardinality by counting.

> We can determine how many dollars or how many cans there are
> with either a multiset or set logical model, I believe.

If you use a multiset model, you must be able to distinguish the multiple statements of the same fact; otherwise, you won't be able to determine the cardinality.

If you don't make multiple statements of the same fact, you are using a set model.

> >>I think that for some purposes only one fact is needed to
> >>record the purchase of three cans of cat food,
> >>
> >>
> >
> >True. A single tuple with a quantity suffices. I strongly suggest the use
> >one.
> >
> >
> I'm not sure Date does.

Well, he does. Given constraints such that only one fact is needed to record the purchase of three cans of cat food, I have absolute certainty that Date would accept a single tuple with a quantity attribute as a valid statement of the fact.

> He claims that "Individual objects must
> be identifiable (that is, distinguishable from all other objects).

In the case of the single tuple above, the individual object is a particular brand of cat food or SKU. For the staff taking inventory, the individual object is a can on the shelf.

> Cans
> of cat food are individual objects, aren't they?

For the staff taking inventory, they are. In a given dbms? Maybe they are and maybe they aren't.

> Are dollars, also?

In my pocket, they are. In my bank account, they are not.

> The only reason I can think of to shoot down the multiset logical model
> is because you don't like the specific physical implementation with
> duplicate rows.

I shoot it down because it's not a good model. The fact that one must pollute the logical model with physical details suffices for me to require a MAJOR benefit to compensate for the lack of logical identity and for the loss of physical independence. Since no benefit exists--let alone a major benefit--I find it inferior to the relational model in all ways.

> The logical model doesn't care whether the three cans
> of cat food are one thing, like $1000 is one thing, or three things,
> it's only a logical model.

That may be the case. It might be that a SKU is one thing to the dbms. (Quite likely actually.) Sets of SKU's are still sets.

> Its only burden is to provide the correct
> answers
> to questions,

I disagree. There are many principles of data management by which we may determine the 'goodness' of a particular logical data model. For instance, the less a logical data model burdens users before it provides the correct answer the better the data model.

> and as long as it allows there to be three cans of cat food,
> or two, or seventeen, what's the problem whether I use <item,
> or <item, item, item> as my mental crutch that helps me picture the
> model?

How you picture the situation is a matter for the conceptual level of discourse and as such has no bearing on a discussion of logical data models. Your mental crutch belongs in the sphere of information--not data.

> >>and many stores reflect
> >>this conceptualization by representing this fact with a single line on a
> >>receipt.
> >>
> >>
> >
> >Receipts are external physical representations and are no longer managed
> >data.
> >
> >
> True, but as I said in another answer, I think its fair to
> expect a logical model to go hand in hand with the real world.

A logical model is necessarily abstract. That's part of the definition of logical model.

> If not, why would I want to see every report a business generates
> before designing a logical model for that business?

Um... Because you want to create the correct abstract model of the business needs?

> >>Does Date, or anyone, think that a withdrawal of ten dollars from
> >>a bank account requires the deletion of ten identical facts from a
> >>table?
> >
> >No. Sensible folks would expect an update statement to adjust the balance
> >attribute within a relation variable. What does this have to do with
> >multisets?
> >
> It's an attempt to point out the inconsistency between demanding
> separate rows for cans of cat food, but not for dollars.

You are beating up a straw man. Date doesn't require separate tuples for cans of cat food. Proponents of multisets do.

> The multiset
> model recognizes that each can be represented as an integer, and that
> an integer can be represented equally well with positional notation or
> with set cardinality.

Huh? You are no longer making any sense at all.

> >>Later in the article, Date provides two bags, of parts and of
> >>suppliers, and asks the question (query) "list part numbers for
> >>parts that either are screws, or supplied by supplier S1, or both."
> >>Then he proposes twelve queries that might answer the question,
> >>and uses the fact that nine different results appear to support his
> >>argument against bags.
> >>
> >>Unfortunately when Date asks a stupid question, he gets many
> >>stupid answers.
> >>
> >>
> >
> >How are casual users to know which of the twelve seemingly reasonable
> >solutions will yield the answer they want? Are you suggesting that every
> >user of a dbms must have expert level knowledge of dbms internals?
> >
> Forget casual users.

Absolutely not. The users needs are the dbms' raison d'etres.

> I don't know what the question is, so
> how can I possibly be expected to know what query will
> answer it?

If you read the description of the problem and still don't know what the question was, the ultimate problems lies in your lack of comprehension of written english.

> >>The question he asks is not well-defined.
> >>
> >>
> >
> >Sure it is. Table 'P' states which parts exist. ( ie. Part P# has name
> >PNAME) Table 'SP' states which suppliers supply which parts. ( ie.
> >S# supplies part P#)
> >
> >
> You've lost me now. For Date to successfully argue against multisets,
> he must start with meaningful data.

He did. As I noted above, the apparent problem lies in your incapacity to read and comprehend written english.

> The four rows of P are meaningful
> if table P describes a bag containing three screws of type P1 and one
> screw of type P2. The three rows of SP, however, either do not represent
> any real-world relationship between suppliers and parts, or they represent
> such a relationship incompletely (attributes being missing), or they are
> a good
> representation of something that Date fails to point out.

Hey, you are the one arguing for multisets. Arguing that they are nonsense is my position.

'Stock on hand' is as good a meaning as any for the multiplicity in SP.

> A table of
> Parts should allow duplicate parts (if we're using multisets), but a
> table stating
> who supplies which parts should not be permitted to store the same fact
> twice,

What if the supplier has two of the part on hand? What if the supplier supplies the part from two locations?

If duplicate parts rows have meaning, duplicate anything has meaning.

> There's nothing wrong with allowing one table to be a bag but
> constraining another
> table to be a set.

If you are a proponent of multisets, then there shouldn't be anything wrong with allowing both to be bags either.

> Table P's three P1 rows unambiguously represent the
> existence
> of three P1 screws (just as multisets don't permit the
> distinguishability of duplicates,
> there's no requirement, as one might impose for sets, that three
> separate facts be
> identified - the point is that the implementation works). The presence
> of only
> two such rows would represent a different situation, and so not only do
> the three
> rows unambiguously describe the existence of three P1 screws, there is
> no other
> way to represent three P1 screws with a different collection of rows in
> table P.
> But what do the two identical rows in SP represent that is distinct from
> situation that one fewer such row would represent?

I have already explained it could mean 'stock on hand' or that the supplier supplies the part from multiple locations. Take your pick.

> You've provided two
> possible answers below, and the meaning of the query hinges on what
> state is represented by the data in the tables.

I could also suggest that the duplicate rows in P represent the number of customers who purchase the part. In this case, we know that three customers purchase P1. It's an equally valid use of multiplicity. Saying that duplicates can mean one and only one thing is just dumb. Either multisets are generally useful or they are not.

> I think Date's query question is open to interpretation, assuming one
> allows a
> query result to contain duplicates.

If that is your position, then it means users must have expert level knowledge of the dbms internals to know which of the queries represents their given interpretation.

What interpretation would apply to each of the twelve queries? Or to make things easier, what interpretation would apply to each of the nine answers? Or to make thing easier still, of the two interpretations I gave previously for SP, which of the nine answers is correct, which of the twelve queries is correct, and what process will users follow to compose a correct query?

> It could be a request for a subset
> of the
> part names domain,

All of the requests are for submultisets of the part number domain. That's quite explicit in the stated problem.

Since you are arguing in favour of multisets, you should be able to explain the nine different results and how users should know how to express the correct query for their specific needs.

> >>For it to be a valid query,
> >>it must specify the source of the items to be listed. I would interpret
> >>it to mean "For each part in the bag of Parts, list the part's part
> >>if the part is a screw or if the part is supplied by supplier S1, or
> >
> >All you have done is replace the name 'P' with the name 'Parts'. However,
> >Date actually shows bag 'P' and you have not shown bag 'Parts'. I would
> >that makes Date's original example just a tad better defined.
> You lost me again. I'm happy with the bag P as Date presents it. I
> might be happy with SP if I knew what it meant.

With all due respect, you know as much about SP as you do about P.

> I'm not happy with
> the absence of a specified domain of part numbers - is the domain of
> part numbers a set or a multiset?

I suggest you learn what a domain is. Before doing that, you haven't earned any credibility at all.

> >>This question has an unambiguous answer, despite the fact that Date
> >>has intentionally thrown a wrench into the works as well, by providing
> >>a table SP (suppliers-parts?) that contains the same fact twice - a
> >>suppliers-parts table should be subject to a constraint that the
> >>multiplicity
> >>of any supplier-part fact be 1
> >>
> >>
> >
> >Why? If multisets are useful, why are they not useful in an abstract
> >such as this one? Perhaps they mean 'stock on hand'. Or perhaps each
> >duplicate represents an anonymous warehouse that supplies the part. If
> >duplicates have a use anywhere, certainly they have a use here.
> >
> Good points - they're only useful if you tell someone what they
> mean, though.

For both of the meanings given above, please explain which of the nine results is correct and describe how users will compose the correct query for the result they need.

> >>since there is no real-world meaning to
> >>the two identical rows he lists, in contrast to the clear real-world
> >>
> >>
> >meaning
> >
> >
> >>of the three P1-screws in the parts table.
> >>
> >>
> >
> >Please provide a similar example using bags where the duplicates have
> >real-world meaning, then. And propose a similar query.
> >
> >
> >
> I did.- right about where I started making less and less sense to you.

I must disagree.

> >>His example is entirely
> >>unconvincing.
> >
> >It convinces me.
> >
> That's fine, and since the relational model is sufficient,
> no harm is done. The hardest part of teaching mathematics
> is explaining how a proof can be wrong if the theorem is
> true.

You haven't demonstrated the ability to read and fully comprehend written english. You do not know what a domain is. You haven't demonstrated a firm grasp of the difference between sets and multisets. You haven't demonstrated any understanding of the difference between conceptual, logical and physical levels of discourse. Your posts offer nothing more than handwaving.

Now you want us to accept you as an authority on mathematics??? Steve, get real.

> >>SK
> >>
> >>Bernard Peek wrote:
> >>
> >>
> >>
> >>>In message <b2uat1$f0t$>, Steve Kass
> >>><> writes
> >>>
> >>>
> >>>
> >>>>Bernard,
> >>>>
> >>>>This isn't a matter of opinion. There is one determinant: "there are
> >>>>two
> >>>>employees named John Smith". There are many consequential
> >>>>truths, such as "there is at least one employee", "there is an
> >>>>whose first name is not Nancy", "there are at least two employees
> >>>>whose first and last names share a common letter of the alphabet.",
> >>>>and so on.
> >>>>
> >>>>I don't deny that it can be important to distinguish between two
> >>>>John Smiths.
> >>>>
> >>>>
> >>>That's not my argument. My argument is that there may be no need to
> >>>distinguish between two real-world objects, each of which is
> >>>referenced by a single record in a database. The relational model (and
> >>>databases based on it) require that a distinguishing key be created
> >>>even if there is none in the logical data structure.
> >>>
> >>>I don't dispute that there are real pragmatic reasons for accepting
> >>>that deviation from the logical structure of the data. But as this is
> >>>a theory newsgroup I wanted to point out that this is a (minor)
> >>>failing in the relational model.
> >>>
> >>>[...]
> >>>
> >>>
> >>>
> >>>>I'm not redefining any words, but we have a fundamental
> >>>>difference in understanding logical vs. physical models. You are
> >>>>saying that the real-world scenario of books in a library must be
> >>>>represented by a logical model that does not keep track of an
> >>>>actual attribute of "book" (acquisition number), and then you
> >>>>blame the model for not being able to distinguish two identical
> >>>>books,
> >>>>
> >>>>
> >>>No, that's not my objection. My objection is that the relational model
> >>>declares that there must be a distinction, when there is no such
> >>>requirement in the real world.
> >>>
> >>>It does makes the maths easier, and it makes the implementation much
> >>>easier.
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
Received on Thu Feb 20 2003 - 04:42:39 CET

Original text of this message