Re: Entity and Identity
Date: Mon, 28 Sep 2009 19:52:07 +1000
Walter Mitty wrote:
> My initial motive in starting the thread was precisely to get some kind of
> rational discussion going between people who find value in object models and
> people who find value in relational models.
I believe that it's possible to unite them, and I've set myself that task.
It does require a change to the programming language paradigm; my API has no new/delete, only assert/deny within a constellation of facts. Most of the rest of the O-O paraphernalia remains intact, and doesn't interfere with the relational and transactional nature of the system.
> When we take on the task of building an information system that represents
> some part of the universe (read "real world"), we accept as given the
> division of that universe into identifiable component parts (what I call
> "entities"). This division is inherent in the problem statement for the
> information system we are to build.
Right; that's why we call it "modeling", and why it's viewed as a design activity rather than a descriptive one (see Simsion's PhD thesis for an exploration of this position).
> If my practical experience is any guide, the community of stakeholders
> often has a hazy idea about just what those entities are, and very haphazard
> notions about how to identify them. Different parts of the community will
> use different identifiers to identify the same entity, and even the same
> identifier to identify different (although closely related) entities.
To my mind this is at the root of perhaps the preeminent problem in delivering information systems, namely the problem of specification. It's for that reason that the Constellation Query Language is plain text, so as not to exclude anyone (capable of understanding the domain) from participating. The text can be synchronised with diagrams, but no special training is required to read and critique a CQL model.
> Sorting that confusion out is (part of) data analysis (in the case of
> building a database), and it has to be done regardless of whether one
> intends to take a O-O or an RM view of the data.
Sadly it's often not done in O-O projects, and those projects suffer from the lack of it. I think *that* issue is the main reason for complaining about O-O... but really, the problem is in the training of the teams, and to an extent in the languages.
In CQL, it's not possible to define an entity without defining its identification pattern; in fact it's not possible to include more in the initial definition of an entity than what is required to identify the entity. Even the syntax enshrines the need for identification. There are four kinds of object type:
- Value types (also known as lexical types). Because instances of value types are identified by a lexical (written) form, the CQL syntax encodes that, for example, "Name is written as String(20);"
- Entity types (non-lexical types) of three forms:
- Subtypes, defined as for example "Employee is a kind of Person". Here the identification is inherited from the first supertype.
- Normal entity types, defined using the keyphrase "is identified by". For example, "Company is identified by CompanyName where <fact type>;" or "Company is identified by its Name;". This format can be mixed with the supertype syntax, for example "Employee is a kind of Person identified by its Nr";
- Objectified Fact Types, for example "Directorship is where Person directs Company;" It's also possible to mix "is identified by" into this pattern, where the identification of an objectified fact type is external.
I haven't published much of the actual query syntax yet, but from what I've done, it's easy to see how SQL can be completely hidden under a truly relational definition and query language. I have no goal to make CQL a language for expressing updates, but that might be a possible future direction. In the meantime the API will suffice.
-- Clifford Heath, Data Constellation, http://dataconstellation.com Agile Information Management and DesignReceived on Mon Sep 28 2009 - 04:52:07 CDT