Hierarchy as 'UP' constraint
Date: Fri, 25 Nov 2005 00:34:42 +0100
Message-ID: <43864d6a$0$11072$e4fe514c_at_news.xs4all.nl>
[[[Hierarchy as 'UP' constraint]]]
Many articles (in c.d.t. and elsewhere) and even a book (Joe Celko's “Trees and hierarchies in SQL for smarties“) have been written about hierarchies and relational databases, mostly on how to implement hierarchies in tables. Let's not immediately do that and still have the desired effect - which is to manage data less clumsily when there are hierarchies involved. Let's first deal with meaning and usage.
A big selling point of the relational model is to separate the model from the implementation – every minute spent less on the how gives us more time to spend on the what – in yet other words: separate the logical from the physical. When we deal with hierarchies the distinction seems blurred – especially when both the specification and the implementation are in SQL.
While trying to keep the distinction I'll describe an example that intuitively makes sense – Ill make some choices which may or may not turn out to be arbitrary.
I'll start with expanding on just one of several adjacency list (note 1) possibilities mentioned in J.C.'s book here in a slightly different approach:
Separate the data and the hierarchy, see the hierarchy
strictly as a table constraint.
[[Example]]
Example in imaginary syntax (note 2)
CREATE TABLE TREENODES (
NR NUMBER PRIMARY KEY,
NAME VARCHAR2(50),
PNR up);
Expansion
'up' in the example is (for now – it will be simplified later)
short for these rules:
(0) (note 3). PNR is of the same type as the primary key of the
same table (for simplicity: assume it must be single column):
in this case NUMBER
(1). PNR has a special foreign key-like constraint, referencing
the primary key of the same table. So it is not very foreign,
it is more like domestic: we could call a self referencing
(same table, different column, not necessarily same row) FK a
domestic key constraint, somehow implementing rule (0) and (1)).
(2). When a row X is deleted, all rows Y referencing the deleted
row X are updated to copy the deleted rows X.PNR (instead of X.NR).
(3). No cycles.
What should the PNR of the root be - is this an arbitrary choice?
[[Semantics]]
The data-trees I know depict organization and/or containment.
What should the PNR of the root be? Part one.
Let's see:
It can be the same as NR, it can be NULL, it can be an unique
value denoting it as root.
These alternative representations of root have different interpretations.
PNR = NR says:
'I am my boss', (organizational) or , 'I am self-contained'
– looking at the rows as containers: things containing other
things – nested set-like.
semantics++.
PNR = NULL says:
'I don't know who my boss is, don't even know
if I have one', or - my boss, existing or not,
is absent. 'Am I contained? '
semantics--.
PNR = unique_value(root) says: 'I am the boss', or 'I contain everything else'.
semantics++, but slightly different from PNR = NR, more specific.
I wonder if there are more frequently occurring interpretations for data-trees, beside organization and containment.
[[Pragmatics]]
What should the PNR of the root be? Part two.
PNR = NR
What to do when a row X having X.PNR = X.NR gets
deleted while there are other rows Y having Y.PNR = X.NR?
Interpreted: “What to do when 'I am my boss' X having
subordinates Y gets fired”,
or “when an 'I am self-contained' container containing
other thing gets thrown away”.
'up' - rule (2) would make Y violate the 'domestic key contra int.
Luckily we can still amend rule (2) to read:
(2). When a row X is deleted, all rows Y that referenced the
deleted row X are updated to copy the deleted rows X.PNR
(instead of its NR: Y.PNR := X.PNR) unless the deleted row
had X.NR = X.PNR, in that case Y.PNR := Y.NR.
That makes sense both in the organizational interpretation
and in the container interpretation:
When a boss row is deleted, the immediate subordinates become
their own boss. Professor Celko calls this “send the orphans
to grandmother”. When a container is deleted, it's contents
becomes visible: unwrap the package.
When you want to delete the container and destroy its contents
you'll have to be explicit about it.
A consequence is that the table can hold many unrelated hierarchies – unless there is a rule (4) (see below).
There seems to be another problem. Rule (3) says: No Cycles.
PNR = NR surely looks like one.
So:
(3). No Cycles. That is, no cycles over more than one row.
A cycle over one row (PNR = NR) has a specific, fixed
interpretation: this row is (a) root.
Yet another consequence is that PNR = NULL is free to mean the same as in a non-hierarchical context – whatever that may be (as you can tell I never fully understood NULL).
PNR = NULL
On Rule (2) PNR = NULL works out the same as PNR = NR.
It doesn't need any qualification in the 'No cycles' rule (3).
PNR = special
PNR = special_value(root) suggests a rule (4): 'There can
be only one' (rumbling noise in background) or 'ONE ROOT'.
Then again, such a constraint could be implemented along
with the first two root representations just as hard/easily.
PNR = NR
No need for special values, no need for NULLs – the only
drawback is the necessary qualification of the “No cycles” rule.
Let's proceed with the alternative where PNR = NR says:
'I am my boss'. The rules to interpret 'up' in
CREATE TABLE TREENODES (
NR NUMBER PRIMARY KEY,
NAME VARCHAR2(50),
PNR up);
are now:
'up' is short for:
(1) (note 4). PNR has a 'domestic' key constraint (note 5).
(2). When a row X is deleted, all rows Y (subordinate or contained)
that referenced the deleted row X are updated to copy the deleted
rows X.PNR (instead of what is was before the delete, its X.NR).
We can think of this as 'ON DELETE PROMOTE' or 'ON DELETE EXPOSE'
(3). No cycles over more than one row.
Obviously the 'ON DELETE EXPOSE' and 'ONE ROOT' constraint qualifiers combined make it impossible to delete the root-row if it has more than one subordinate row.
[[Implementation]]
It looks very much like TREENODES with PNR = NR for self-contained
containers is implementation: it suggests an adjacency list, and
without anything else, it /is/.
With the 'up' constraint however we can treat the whole as an implementation neutral specification – logical instead of physical. I don't want to hold my breath waiting for an SQL standard with 'up' in the sense just discussed. We'll have to check whether we can also generate other than adjacency list implementations from this. The imaginary syntax should have room for a choice of implementation strategy of course.
[[notes]]
1. Giving away the pointe: Whether it really is, is an
implementation issue: 'up' may very well be implemented
with the 'nested intervals' approach. One could imagine
this being steered by syntactical additions in an
'up' constraint sub-clause. However, the adjacency list
is what will be visible anyway.
2. I might have tried a Tutorial D example, but I'm not familiar enough with Tutorial D yet. I don't think the example suffers from the weaknesses of SQL to which Tutorial D has an answer. So for now: SQL.
3. It will become clear why I start with (0) instead
of (1) later on.
4. See? See note 3
5. Domestic key: A self referencing 'foreign' key:
same table, different column, possibly, but not necessarily
same row. Not really necessary, foreign keys will work just
fine – 'foreign' just doesn't sound right.
Received on Fri Nov 25 2005 - 00:34:42 CET
