Hierarchy as 'UP' constraint

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Fri, 25 Nov 2005 00:34:42 +0100
Message-ID: <43864d6a$0$11072$e4fe514c_at_news.xs4all.nl>



[[[Hierarchy as 'UP' constraint]]]

Many articles (in c.d.t. and elsewhere) and even a book (Joe Celko's “Trees and hierarchies in SQL for smarties“) have been written about hierarchies and relational databases, mostly on how to implement hierarchies in tables. Let's not immediately do that and still have the desired effect - which is to manage data less clumsily when there are hierarchies involved. Let's first deal with meaning and usage.

A big selling point of the relational model is to separate the model from the implementation – every minute spent less on the how gives us more time to spend on the what – in yet other words: separate the logical from the physical. When we deal with hierarchies the distinction seems blurred – especially when both the specification and the implementation are in SQL.

While trying to keep the distinction I'll describe an example that intuitively makes sense – Ill make some choices which may or may not turn out to be arbitrary.

I'll start with expanding on just one of several adjacency list (note 1) possibilities mentioned in J.C.'s book here in a slightly different approach:

     Separate the data and the hierarchy, see the hierarchy
     strictly as a table constraint.


[[Example]]

Example in imaginary syntax (note 2)
  CREATE TABLE TREENODES (

     NR      NUMBER            PRIMARY KEY,
     NAME    VARCHAR2(50),
     PNR     up);

Expansion
'up' in the example is (for now – it will be simplified later)

   short for these rules:
(0) (note 3). PNR is of the same type as the primary key of the

     same table (for simplicity: assume it must be single column):
     in this case NUMBER

(1). PNR has a special foreign key-like constraint, referencing
the primary key of the same table. So it is not very foreign, it is more like domestic: we could call a self referencing (same table, different column, not necessarily same row) FK a domestic key constraint, somehow implementing rule (0) and (1)).
(2). When a row X is deleted, all rows Y referencing the deleted
row X are updated to copy the deleted rows X.PNR (instead of X.NR).

(3). No cycles.

What should the PNR of the root be - is this an arbitrary choice?

[[Semantics]]
The data-trees I know depict organization and/or containment.

What should the PNR of the root be? Part one.

Let's see:
It can be the same as NR, it can be NULL, it can be an unique value denoting it as root.

These alternative representations of root have different interpretations.

PNR = NR says:
'I am my boss', (organizational) or , 'I am self-contained' – looking at the rows as containers: things containing other things – nested set-like.

semantics++.

PNR = NULL says:
'I don't know who my boss is, don't even know if I have one', or - my boss, existing or not, is absent. 'Am I contained? '

semantics--.

PNR = unique_value(root) says: 'I am the boss', or 'I contain everything else'.

semantics++, but slightly different from PNR = NR, more specific.

I wonder if there are more frequently occurring interpretations for data-trees, beside organization and containment.

[[Pragmatics]]
What should the PNR of the root be? Part two.

PNR = NR
What to do when a row X having X.PNR = X.NR gets deleted while there are other rows Y having Y.PNR = X.NR? Interpreted: “What to do when 'I am my boss' X having subordinates Y gets fired”,
or “when an 'I am self-contained' container containing other thing gets thrown away”.

  'up' - rule (2) would make Y violate the 'domestic key contra int.

  Luckily we can still amend rule (2) to read:
(2). When a row X is deleted, all rows Y that referenced the
deleted row X are updated to copy the deleted rows X.PNR
(instead of its NR: Y.PNR := X.PNR) unless the deleted row
had X.NR = X.PNR, in that case Y.PNR := Y.NR.

That makes sense both in the organizational interpretation and in the container interpretation:
When a boss row is deleted, the immediate subordinates become their own boss. Professor Celko calls this “send the orphans to grandmother”. When a container is deleted, it's contents becomes visible: unwrap the package.
When you want to delete the container and destroy its contents you'll have to be explicit about it.

A consequence is that the table can hold many unrelated hierarchies – unless there is a rule (4) (see below).

There seems to be another problem. Rule (3) says: No Cycles. PNR = NR surely looks like one.
So:
(3). No Cycles. That is, no cycles over more than one row.
A cycle over one row (PNR = NR) has a specific, fixed interpretation: this row is (a) root.

Yet another consequence is that PNR = NULL is free to mean the same as in a non-hierarchical context – whatever that may be (as you can tell I never fully understood NULL).

PNR = NULL
On Rule (2) PNR = NULL works out the same as PNR = NR. It doesn't need any qualification in the 'No cycles' rule (3). PNR = special
PNR = special_value(root) suggests a rule (4): 'There can be only one' (rumbling noise in background) or 'ONE ROOT'. Then again, such a constraint could be implemented along with the first two root representations just as hard/easily.

PNR = NR
No need for special values, no need for NULLs – the only drawback is the necessary qualification of the “No cycles” rule. Let's proceed with the alternative where PNR = NR says: 'I am my boss'. The rules to interpret 'up' in   CREATE TABLE TREENODES (

	NR      NUMBER            PRIMARY KEY,
      NAME    VARCHAR2(50),
      PNR     up);

are now:

'up' is short for:
(1) (note 4). PNR has a 'domestic' key constraint (note 5).
(2). When a row X is deleted, all rows Y (subordinate or contained)
that referenced the deleted row X are updated to copy the deleted rows X.PNR (instead of what is was before the delete, its X.NR). We can think of this as 'ON DELETE PROMOTE' or 'ON DELETE EXPOSE'
(3). No cycles over more than one row.

Obviously the 'ON DELETE EXPOSE' and 'ONE ROOT' constraint qualifiers combined make it impossible to delete the root-row if it has more than one subordinate row.

[[Implementation]]
It looks very much like TREENODES with PNR = NR for self-contained containers is implementation: it suggests an adjacency list, and without anything else, it /is/.

With the 'up' constraint however we can treat the whole as an implementation neutral specification – logical instead of physical. I don't want to hold my breath waiting for an SQL standard with 'up' in the sense just discussed. We'll have to check whether we can also generate other than adjacency list implementations from this. The imaginary syntax should have room for a choice of implementation strategy of course.

[[notes]]
1. Giving away the pointe: Whether it really is, is an implementation issue: 'up' may very well be implemented with the 'nested intervals' approach. One could imagine this being steered by syntactical additions in an 'up' constraint sub-clause. However, the adjacency list is what will be visible anyway.

2. I might have tried a Tutorial D example, but I'm not familiar enough with Tutorial D yet. I don't think the example suffers from the weaknesses of SQL to which Tutorial D has an answer. So for now: SQL.

3. It will become clear why I start with (0) instead of (1) later on.
4. See? See note 3
5. Domestic key: A self referencing 'foreign' key: same table, different column, possibly, but not necessarily same row. Not really necessary, foreign keys will work just fine – 'foreign' just doesn't sound right. Received on Fri Nov 25 2005 - 00:34:42 CET

Original text of this message