Re: A pk is *both* a physical and a logical object.

From: Brian Selzer <brian_at_selzer-software.com>
Date: Wed, 15 Aug 2007 05:45:15 GMT
Message-ID: <Lpwwi.33356$2v1.17872_at_newssvr14.news.prodigy.net>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:1187104634.660617.322210_at_o61g2000hsh.googlegroups.com...
> On Aug 13, 7:27 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:

>> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>
>> news:1187005635.391467.51520_at_l70g2000hse.googlegroups.com...
>>
>>
>>
>> > On Aug 13, 6:56 am, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>> >> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>
>> >>news:1186967829.283726.289850_at_o61g2000hsh.googlegroups.com...
>>
>> >> > On Aug 5, 3:26 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>> >> >> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>
>> >> >>news:1185445415.561100.98380_at_o61g2000hsh.googlegroups.com...
>>
>> >> >> > Just as another example of what i'm on about with this construct
>> >> >> > m'larkey: Imagine the library has two copies of "harry potter and
>> >> >> > the
>> >> >> > deathly hallows". Are they the same book?
>>
>> >> >> > 1) If your construct is the one that uses the barcode on the 
>> >> >> > sleeve
>> >> >> > as
>> >> >> > an identifier, then no, different books.
>> >> >> > 2) If your construct is the one that uses the ISBN number as an
>> >> >> > identifier, then yes, same book.
>>
>> >> >> > There's no correct answer, and which you pick just depends on the
>> >> >> > application. A Loans database could use Barcodes; A library 
>> >> >> > listings
>> >> >> > database could use ISBN.
>>
>> >> >> A very thought-provoking example.  Are they the same book?  From 
>> >> >> the
>> >> >> information given, no, they're not the same book.  They are two
>> >> >> different
>> >> >> physical manifestations of the same abstract individual.  Abstract
>> >> >> individuals are incomplete in the sense that they cannot exist 
>> >> >> apart
>> >> >> from
>> >> >> their physical manifestations, for to exist is to be 
>> >> >> spatiotemporally
>> >> >> located.
>> >> >> As a consequence, the identity relation fails just in case there
>> >> >> are no physical manifestations; therefore, it must be assumed that
>> >> >> there
>> >> >> exist physical manifestations.  So if each tuple in a relation
>> >> >> describes
>> >> >> a
>> >> >> specific abstract individual, then that relation must be a 
>> >> >> projection
>> >> >> of
>> >> >> another--even if it isn't defined in the schema.  Since the 
>> >> >> abstract
>> >> >> individual exemplifies all of its physical manifestations and 
>> >> >> cannot
>> >> >> exist
>> >> >> apart from those physical manifestations, the existence of a tuple 
>> >> >> in
>> >> >> a
>> >> >> relation that uses ISBNs as key values implies the existence of at
>> >> >> least
>> >> >> one
>> >> >> tuple in a relation that uses barcodes as key values--even if the
>> >> >> barcode
>> >> >> relation is not defined in the schema.  If at some point in the 
>> >> >> future
>> >> >> the
>> >> >> loans and library listings databases were combined, there would
>> >> >> clearly
>> >> >> be a
>> >> >> cyclical relationship between the set of abstract individuals 
>> >> >> denoted
>> >> >> by
>> >> >> ISBNs and the set of concrete individuals denoted by barcodes.
>>
>> >> > I'm glad you thought it was an interesting example. I personally see
>> >> > no distinction between your "abstract" and "physical 
>> >> > manifestations".
>> >> > To illustrate this all i'm asking is that you just extend the 
>> >> > example
>> >> > to use more constructs - maybe I now have five books, the two harry
>> >> > potters from before,  another that's got illustrations, one 
>> >> > translated
>> >> > into mandarin and a digital version.  We now have an almighty
>> >> > conundrum if someone asks us "which of these are the same book". How
>> >> > do you split up "physical" and "abstract" now? It would be an 
>> >> > absolute
>> >> > spaghetti to try to hazard an answer!
>>
>> >> It is simple.  An abstract individual cannot be spatiotemporally 
>> >> located.
>> >> The one thing that the five individuals above have in common is the
>> >> abstract
>> >> individual: they are all physical manifestations of it.  Neither the
>> >> addition of illustrations, the translation into mandarin nor the 
>> >> encoding
>> >> into digital form changes the fact that the abstract individual
>> >> exemplifies
>> >> each of those five tangible instances.
>>
>> > Nope, you've missed the point. There are now several possible
>> > 'abstract' individuals. There are now also about a dozen ways of
>> > answering the question "which of these books are the same". Have a
>> > look at the different possible answers.
>>
>> I may be dense, but you're right, I've missed your point.  There is only 
>> one
>> abstract individual that exemplifies all of the concrete instances. 
>> There
>> may be additional abstract individuals, such as the set of illustrations, 
>> or
>> the translation.  Is that your point?
>

> Yes, pretty much. Lots of possible constructs. Which of the books are
> the same?
>

> 1) All - all harry potter and the deathly hallows (identifying
> attribue for a "book" - title)
> 2) None - all the copies are different (identifying attribute for a
> "book" - barcode)
> 3) The two paperpack versions (identifying attribute for a "book" -
> isbn)
> 4) All the english versions (identifying attribute for a "book" - its
> content)
> 5) All the english versions without illustrations (identifying
> attribute for a "book" - its text)
> 6) etc, etc...
>

> All are valid answers. No context to the question - no suitable
> answer. Pick the wrong one for the context you need, broken schema.
> Here we are comparing different items, but we could just as easily be
> comparing the things at different points in time. Something is only
> the same entity if /for the context we chose/ its identifying
> attribute is the same - all of its other properties may change, but if
> the identifying attribute changes then it is a different thing as far
> as that context is concerned.
>

> Again let me emphasize that this is all at the conceptual level. But
> it is only when one has that level sorted that one can move down to
> the logical encoding.
>
>>
>>
>>
>> >> Try a simpler example: the number 5
>> >> is an abstract individual.  You have 5 senses; you have 5 digits on 
>> >> each
>> >> of
>> >> your hands; there are 5 points on each star of each American flag; 
>> >> most
>> >> cars
>> >> leave the factory with 5 wheels (including the spare).  There are
>> >> uncountable physical manifestations of the number 5, but the abstract
>> >> concept, the number 5, cannot exist apart from them.
>>
>> >> > There just isn't a correct response, without knowing the correct
>> >> > context over the lifetime of an application.
>>
>> >> The lifetime of database often exceeds the lifetime of the 
>> >> applications
>> >> for
>> >> which it was originally designed.  I have clients that are still using
>> >> databases that were designed in the early '90s.  The applications that
>> >> were
>> >> built to use the database have evolved or have been replaced over the
>> >> years.
>>
>> >> > And then hopefully its only a short jump to see that if "Mrs Smith"
>> >> > gets married and our database breaks because we chose surnames as an
>> >> > identifier, it was our mistake when we were doing the conceptual
>> >> > modeling and no problem with the theory. Her name didn't identify 
>> >> > over
>> >> > her whole time at a company, and /that/ was our context we should 
>> >> > have
>> >> > considered.
>>
>> >> It would be a giant leap backward to assume that any update that 
>> >> affects
>> >> a
>> >> key necessarily selects a different individual.
>>
>> > Au contraire, that is exactly what is happening as far as the database
>> > is concerned. There is no 'individual' outside how we designed our
>> > conceptual model. It would be a "giant leap forward" to realise this,
>> > as one wouldn't need to be make kludge fixes to a broken schema after
>> > the event.
>>
>> How then do you account for the case when a relation has more than one 
>> key,
>> but only one differs?
>> The individuals are obviously the same because they
>> have the same key value.
>> But wait, they must not be the same because they
>> don't have the same key value.
>> How do you reconcile this apparent  contradiction?
>

> Perhaps you could clarify the contradiction you see with an example
> brian?
>

A simple example: Suppose that you built several identical computers. Each has one motherboard, one DIMM, one hard drive, one video board, and one case. Each component is serialized, so the serial number for each component would be a candidate key value in a relation describing the composition of each computer. You're having trouble with one of the computers, but you're not sure which component is failing, or even if it is a hardware problem, so you swap the hard drives from two of the computers to see if the problem moves. For either of the two computers affected by the swap, the motherboard is the same, the DIMM is the same, the video board is the same, and the case is the same. So obviously, since those serial numbers are the same before and after the swap, it's the same computer, right? But wait, the hard drive is different; therefore, it must not be the same computer because the serial numbers for the hard drives are different.

>>
>>
>>
>> >>  It is definitely not the case for relation schemata with more than 
>> >> one
>> >> key, and it reduces by half
>> >> the ways that individuals can be identified in queries.  Where is the
>> >> logic
>> >> in the assuption?  Since it is clearly not the case when there is more
>> >> than
>> >> one key, how can it possibly stand when there is only one key. 
>> >> Certainly
>> >> a
>> >> relation that has more than one key can be decomposed into an 
>> >> equivalent
>> >> set
>> >> of relations where one has only one key--a key that may be the target 
>> >> of
>> >> an
>> >> update.  Would it always be the case then that a new individual is
>> >> selected?
>> >> Certainly not.
>>
>> > You seem to be confusing entities which have identifying attributes,
>> > and propositions which have keys. These are completely distinct, and
>> > we don't need to go anywhere near keys in this discussion. Similarly I
>> > worry that you still  consider an update as something other than an
>> > operation which deletes one proposition and inserts another. Its /
>> > just/ a shortcut, and promoting it to having some primary status is a
>> > mistake.
>>
>> No.  I'm not.  The individuals referenced in a proposition are identified 
>> by
>> sets of properties, and those properties are represented in each tuple as
>> values for prime attributes.  Keys are critical to the discussion.  I 
>> surely
>> hope you can see that!
>

> I'm really not. By the time one gets down to keys the hard work should
> already have been done - solved at the conceptual level.
>
>> In addition, identification is not identity!
>

> Aha! That's where we differ then. That is /exactly/ what identity is
> in my opinion. Identification is stating that if I know one attribute
> (or set of attributes) I can functionally determine the rest. Perhaps
> we should discuss that and then the rest of the arguments might fall
> into place? Let me start the ball rolling, with a catfood example for
> the new century ;)
>

An individual's identification is a set of properties that distinguishes the individual from all others in the context of a picture of the universe; an individual's identity is that set of properties that defines the individual. These are two different things.

> ------------------------------------------
> I am shown a can of catfood from an identical batch of three. Its
> only, single identifying feature is a number on it. I read it, and the
> can is taken away. I am then shown a new can. Is it the same can? Does
> it have the same identity? I read the number on it, which is
> different. I conclude therefore, quite sensibly, it is a different can
> to the first.

>

> Unbeknownst to me someone had shuffled the can numbers up at random
> after i'd read the first one. Even this mischevious soul himself has
> no idea if the original can I was given ended up with the same number
> on it as before (the shuffling was done blind). In fact /noone/ in the
> world now knows.
>

> Where does identity stand now?
> ------------------------------------------
>

Identity stands as it always did. What is different is identification. This is a perfect example of why update is primitive and assignment isn't. An assignment replaces the current relation value with a new one, blindly, but an update specifies which tuples are different and how each differs, which has the same effect as observing each can of cat food throughout the interval from the first reading to the second. Obviously, if you were able to simultaneously observe each can, then there would be no doubt as to whether the new can is the original can.

>> Identification is used by one individual to pick another out of a crowd,
>> whereas identity is what one individual is.  It may be that much of the
>> confusion is caused by misinterpreting this simple distinction.
>> Identification is the nominative form of the verb "to identify."
>>
>> Update is a primitive operation.  It is not a shortcut--it cannot be a
>> shortcut, because not all key values permanently identify individuals.
>
>
>

>
Received on Wed Aug 15 2007 - 07:45:15 CEST

Original text of this message