Re: Another view on analysis and ER

From: Jan Hidders <>
Date: Sat, 15 Dec 2007 02:57:40 -0800 (PST)
Message-ID: <>

On 14 dec, 13:00, JOG <> wrote:
> On Dec 13, 12:26 am, Jan Hidders <> wrote:
> > On 11 dec, 12:37, JOG <> wrote:
> > > On Dec 10, 6:33 pm, Jan Hidders <> wrote:
> > > > On 9 dec, 22:10, JOG <> wrote:
> > > > > On Dec 9, 5:20 pm, Jan Hidders <> wrote:
> > > > > > On 9 dec, 04:04, JOG <> wrote:
> > > > > > > Now in ontology, it is generally accepted that an
> > > > > > > object, or entity, is nothing more than a compressence of a collection
> > > > > > > of properties - i.e. (attribute, value) pairs.
> > > > > > [....]
> > > > > > I'm also not comfortable with the usage of "is" here. I'd agree that
> > > > > > this is how entities can be described, but saying that they "are"
> > > > > > these descriptions seems wrong to me.
> > > > > Why are you uncomfortable with that. An entity is nothing more and
> > > > > nothing less than the 'compressence' of its _full_ set of all its
> > > > > attributes.
> > > > > > After all, different descriptions may describe the same entity.
> > > > > Well, I haven't talked about describing entities, rather we're
> > > > > defining them. This is an entity as our model sees it, not how it is
> > > > > seen in the real world (obviously there are concessions, given the set
> > > > > of possible attributes is probably infinite).
> > > > But that is what I'm saying, isn't it? These sets of properties are
> > > > part of your model of a piece of reality and as such *represent*
> > > > entities that are part of that reality, Saying that they *are* these
> > > > entities is sloppy use of language and confuses the map with the
> > > > territory. If I didn't know any better I'd almost think you could be
> > > > accused of muddled thinking. :-)
> > > Ha, I'll have you know that it would only be a case of muddled writing
> > > not muddled thinking sir! In my defence I'd refer you back to some
> > > posts I made a while back in another thread where I was promoting a
> > > distinction between a "construct" and an "entity" to try and avoid the
> > > very ambiguity that you are talking about. I hold little hope of
> > > changing anyones terminology though, however worthwhile I think that
> > > would be ;)
> > Just our own terminology for the duration of this discussion seems
> > ambitious enough. :-) At least it seems we're on the same page here,
> > so that's nice. Btw. what is the difference between your internal
> > entity / construct and a tuple with named fields?
> The construct/entity might well be encoded as a tuple, but there may
> be a host of other valid encodings. I would not want the concept be
> seen as tied to an RM encoding, nor constrain it to being viewed as a
> finite partial function. I would rather see it in a more general
> fashion as a mathematical relation between attributes (a name and a
> domain) and values (objects/entities/whatever), over which one might
> apply all the facilities that set theory can accord.

Two thing are puzzling for me here. Why are you now suddenly including a domain in the definition? That is certainly not usual in ontology, and it looks to me like an echo of a certain rather clumsy formalization of the relational model. Why not simply a binary relation over attribute names and entities? And why do you leave out the functionality requirement, i.e., for each attribute name there is at most one associated entity? Other than that I see no difference with tuple, except that you allow it to be infinite. Correct? In that case I think I would prefer the terminology of "infinite tuple".

> > > I would say though that the internal entity (henceforth referred to as
> > > a construct by myself) and the external entity, /must/ share the same
> > > identifiers for them to be consistent with each other. Its a simple
> > > rule, but without it one ends up in a artificial quagmire of hidden
> > > surrogates or OID's (which have no correspondence whatsoever with data
> > > as observed out in the wild), or worse still, broken databases.
> > That is something that you still have to show. To me it is very clear
> > what OIDs correspond to: they correspond the entities we want to
> > represent.
> Well, I have never suggested that anyone doesn't understand what an
> OID corresponds to.

Well. "Has no correspondence whatsoever with data as observed" might be construed as such. :-)

> The concern is the fact they are superfluous and
> facilitate results which have no correspondence to the real world with
> which we are modelling - they add nothing that cannot be achieved with
> content-based addressing. But then this is all well documented by
> date, pascal, darwen, etc.
> Ought I infer that you don't agree with their perspective?

That's putting it very mildly.

> That
> somehow all of an entity's properties can change and yet, because it
> has an OID, it is magically the same thing? No theory of identity that
> I have ever read would accede to such a view (even substance
> theorists), and yet it perpetuates in computer science due to the
> familiarity we all have with memory allocation.

It is basically a correct view. There is no law that says that you necessarily have to have all the direct properties in your UoD that are needed for identfication, or that the properties that identify you are immutable. So it is certainly not at odds with the theories of identity that you mention, and the presence of weak entities tells us that this is a natural and frequent phenomenon. Of course, the way around that is to broaden the definition of property such that you also include the ones that the weak entity inherits from the strong entity. I wouldn't say that this is necessarily a bad thing or unintuitive, but just that use of a concept similar to OIDs allows you to make a distinction between direct properties and inherited properties that seems natural. Moreover, it makes it very easy to understand and deal with updates of identifying properties.

> And let me preempt the
> argument for OID's and hidden surrogates that follows that suggestion
> that are necessary due to the distinction between modelled entities
> and their external counterparts - it does not stand due to the
> requirement of /identifiable/ correspondence between the two.

I'm not claiming they are necessary, just very convenient.

> > These, by definition, can be observed in the wild, at least
> > in the sense that is relevant here.
> Entities or OID's can be observed in the wild? OID's I disagree 100%.
> And in the sense that the world around us has no innate partitioning,
> entities are constructed rather than observed. And then to
> subsequently recognize them again we observe identifying properties.

And these properties are often also themselves rather constructed than directly observed. I cannot directly observe your age, or your shoe size, or your name. All I can directly observe is the images made up of photons hitting my retina, molecules hitting my eardrum, pressure on my skin, et cetera. Or no, it's not the photons that I see, but just the signal they generate, et cetera. It's turtles all the way down. :-) So are you suggestion all our entities in the database must be described in those terms? Of course not. So where then do you draw the line? At constructs made up of direct observations? Or constructs of those constructs? Or constructs of constructs of ..., or where?

> > So as far as I am concerned you
> > are just introducing an ad hoc rule for no other reason than that it
> > seems to lead to the conclusion you were trying to prove, namely that
> > your construct is the best way of representing entities in a database.
> Well, I certainly hope not. I have started from base principles of how
> we identify items, looked to a valid representation in set theory, and
> continued from there. I see no ad-hoccity, but if you do, then great,
> lets stamp on it ;)

Let me get my heavy boots. :-)


  • Jan Hidders
Received on Sat Dec 15 2007 - 11:57:40 CET

Original text of this message