Re: Another view on analysis and ER

From: JOG <jog_at_cs.nott.ac.uk>
Date: Sat, 15 Dec 2007 05:48:37 -0800 (PST)
Message-ID: <eb2e158c-91be-4e38-8f98-3ee6157e87f2_at_s8g2000prg.googlegroups.com>


On Dec 15, 10:57 am, Jan Hidders <hidd..._at_gmail.com> wrote:
> On 14 dec, 13:00, JOG <j..._at_cs.nott.ac.uk> wrote:
>
>
>
> > On Dec 13, 12:26 am, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > On 11 dec, 12:37, JOG <j..._at_cs.nott.ac.uk> wrote:
>
> > > > On Dec 10, 6:33 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > > > On 9 dec, 22:10, JOG <j..._at_cs.nott.ac.uk> wrote:
>
> > > > > > On Dec 9, 5:20 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > > > > > On 9 dec, 04:04, JOG <j..._at_cs.nott.ac.uk> wrote:
>
> > > > > > > > Now in ontology, it is generally accepted that an
> > > > > > > > object, or entity, is nothing more than a compressence of a collection
> > > > > > > > of properties - i.e. (attribute, value) pairs.
>
> > > > > > > [....]
>
> > > > > > > I'm also not comfortable with the usage of "is" here. I'd agree that
> > > > > > > this is how entities can be described, but saying that they "are"
> > > > > > > these descriptions seems wrong to me.
>
> > > > > > Why are you uncomfortable with that. An entity is nothing more and
> > > > > > nothing less than the 'compressence' of its _full_ set of all its
> > > > > > attributes.
>
> > > > > > > After all, different descriptions may describe the same entity.
>
> > > > > > Well, I haven't talked about describing entities, rather we're
> > > > > > defining them. This is an entity as our model sees it, not how it is
> > > > > > seen in the real world (obviously there are concessions, given the set
> > > > > > of possible attributes is probably infinite).
>
> > > > > But that is what I'm saying, isn't it? These sets of properties are
> > > > > part of your model of a piece of reality and as such *represent*
> > > > > entities that are part of that reality, Saying that they *are* these
> > > > > entities is sloppy use of language and confuses the map with the
> > > > > territory. If I didn't know any better I'd almost think you could be
> > > > > accused of muddled thinking. :-)
>
> > > > Ha, I'll have you know that it would only be a case of muddled writing
> > > > not muddled thinking sir! In my defence I'd refer you back to some
> > > > posts I made a while back in another thread where I was promoting a
> > > > distinction between a "construct" and an "entity" to try and avoid the
> > > > very ambiguity that you are talking about. I hold little hope of
> > > > changing anyones terminology though, however worthwhile I think that
> > > > would be ;)
>
> > > Just our own terminology for the duration of this discussion seems
> > > ambitious enough. :-) At least it seems we're on the same page here,
> > > so that's nice. Btw. what is the difference between your internal
> > > entity / construct and a tuple with named fields?
>
> > The construct/entity might well be encoded as a tuple, but there may
> > be a host of other valid encodings. I would not want the concept be
> > seen as tied to an RM encoding, nor constrain it to being viewed as a
> > finite partial function. I would rather see it in a more general
> > fashion as a mathematical relation between attributes (a name and a
> > domain) and values (objects/entities/whatever), over which one might
> > apply all the facilities that set theory can accord.
>
> Two thing are puzzling for me here. Why are you now suddenly including
> a domain in the definition? That is certainly not usual in ontology,
> and it looks to me like an echo of a certain rather clumsy
> formalization of the relational model. Why not simply a binary
> relation over attribute names and entities?

That would be fine for simplicity. I mention domains only in that for an attribute of an item, such as colour for example, there are is a constrained set of applicable entities (blue, red, etc) that are acceptable. Why do you view this as clumsy?

> And why do you leave out
> the functionality requirement, i.e., for each attribute name there is
> at most one associated entity?

Because I do not see on what ground you would posit that an entity can only have one attribute with a certain role-name. My name is James, but it is equally Jim or Jamie. They are all used as a first_name attribute for me. As a completely different example consider a friendship entity - it has two components which play exactly the same roles as "friend" attributes (unless you wouldprefer "friend1" and "friend2"!). I see no argument to be bound to a partial function, rather than using a the more general mathematical relation of which binary functions are a subset.

> Other than that I see no difference
> with tuple, except that you allow it to be infinite. Correct? In that
> case I think I would prefer the terminology of "infinite tuple".

I have found the fact that database tuples and mathematical tuples have different definitions confuses many learning db theory. Additionally, using the term 'tuple' might blur the separation of conceptual and logical layers, so I think i'd prefer term "attribute sets" for the sake of clarity.

>
> > > > I would say though that the internal entity (henceforth referred to as
> > > > a construct by myself) and the external entity, /must/ share the same
> > > > identifiers for them to be consistent with each other. Its a simple
> > > > rule, but without it one ends up in a artificial quagmire of hidden
> > > > surrogates or OID's (which have no correspondence whatsoever with data
> > > > as observed out in the wild), or worse still, broken databases.
>
> > > That is something that you still have to show. To me it is very clear
> > > what OIDs correspond to: they correspond the entities we want to
> > > represent.
>
> > Well, I have never suggested that anyone doesn't understand what an
> > OID corresponds to.
>
> Well. "Has no correspondence whatsoever with data as observed" might
> be construed as such. :-)

The key was "as observed", in that we do not have OID's outside of the database, which I am sure we would all agree. But I see where the confusion lay.

>
> > The concern is the fact they are superfluous and
> > facilitate results which have no correspondence to the real world with
> > which we are modelling - they add nothing that cannot be achieved with
> > content-based addressing. But then this is all well documented by
> > date, pascal, darwen, etc.
>
> > Ought I infer that you don't agree with their perspective?
>
> That's putting it very mildly.
>
> > That
> > somehow all of an entity's properties can change and yet, because it
> > has an OID, it is magically the same thing? No theory of identity that
> > I have ever read would accede to such a view (even substance
> > theorists), and yet it perpetuates in computer science due to the
> > familiarity we all have with memory allocation.
>
> It is basically a correct view. There is no law that says that you
> necessarily have to have all the direct properties in your UoD that
> are needed for identfication, or that the properties that identify you
> are immutable.

Look, to identify an external entity, some attribute /must/ be immutable for us to recognize it as the same thing (in fact for it to be the same thing full stop), so let me exemplify what I think is the problem in your reasoning:

  1. Say the construct in our UoD representing the entity E does not use its immutable identifer, X.
  2. In the outside world imagine that, unbeknownst to us everything about the external entity E changes apart from X.
  3. We are presented with E, but cannot find /one single/ attribute that matches with any of the constructs in our UoD.
  4. We hence deign it to be a new construct, and add it. The original construct is now garbage, and its continued existence in the db will generate serious querying errors.
  5. Broken database.

Incorrect schema choices (not picking X for the internal construct) are a serious design error that will generate this problem. However OID's positively facilitate the mistake, promoting the concept that E has an identity outside of its attributes. They don't even require you to take a stab at picking the correct identifer, so the whole mess can be avoided.

These are serious practical issues for data modelling imo. External entities must correspond to your internal constructs, and if your UoD cannot do this (which I concede to your point that this can very easily happen), then that is when we require a surrogate to be invented to provide the correspondence.

> So it is certainly not at odds with the theories of
> identity that you mention, and the presence of weak entities tells us
> that this is a natural and frequent phenomenon. Of course, the way
> around that is to broaden the definition of property such that you
> also include the ones that the weak entity inherits from the strong
> entity. I wouldn't say that this is necessarily a bad thing or
> unintuitive, but just that use of a concept similar to OIDs allows you
> to make a distinction between direct properties and inherited
> properties that seems natural. Moreover, it makes it very easy to
> understand and deal with updates of identifying properties.
>
> > And let me preempt the
> > argument for OID's and hidden surrogates that follows that suggestion
> > that are necessary due to the distinction between modelled entities
> > and their external counterparts - it does not stand due to the
> > requirement of /identifiable/ correspondence between the two.
>
> I'm not claiming they are necessary, just very convenient.
>
> > > These, by definition, can be observed in the wild, at least
> > > in the sense that is relevant here.
>
> > Entities or OID's can be observed in the wild? OID's I disagree 100%.
> > And in the sense that the world around us has no innate partitioning,
> > entities are constructed rather than observed. And then to
> > subsequently recognize them again we observe identifying properties.
>
> And these properties are often also themselves rather constructed than
> directly observed. I cannot directly observe your age, or your shoe
> size, or your name.

Lordy, you've been a bit literal with the word 'observation' there! I can happily tell you my age, shoe size or name. Its about being able to obtain the info in the outside world, not about it being labelled on my forehead! ;)

> All I can directly observe is the images made up
> of photons hitting my retina, molecules hitting my eardrum, pressure
> on my skin, et cetera. Or no, it's not the photons that I see, but
> just the signal they generate, et cetera. It's turtles all the way
> down. :-) So are you suggestion all our entities in the database must
> be described in those terms? Of course not. So where then do you draw
> the line? At constructs made up of direct observations? Or constructs
> of those constructs? Or constructs of constructs of ..., or where?
>
> > > So as far as I am concerned you
> > > are just introducing an ad hoc rule for no other reason than that it
> > > seems to lead to the conclusion you were trying to prove, namely that
> > > your construct is the best way of representing entities in a database.
>
> > Well, I certainly hope not. I have started from base principles of how
> > we identify items, looked to a valid representation in set theory, and
> > continued from there. I see no ad-hoccity, but if you do, then great,
> > lets stamp on it ;)
>
> Let me get my heavy boots. :-)

You'll need some heavier ones at the moment mate ;) I am finding the conversation very interesting though.

>
> Cheers,
>
> -- Jan Hidders
Received on Sat Dec 15 2007 - 14:48:37 CET

Original text of this message