Hierarchical queries and indexes

From: Troels Arvin <troels_at_arvin.dk>
Date: Tue, 05 Oct 2004 13:46:48 +0200
Message-ID: <pan.2004.10.05.11.46.48.266310_at_arvin.dk>



Hello,

When storing hierarchically related rows in a RDBMS, a number of different _encoding_ models exist to help speed up various hierarchical queries
(nested intervals model, nested set model, materialized path model). The
subject is certainly interesting and it's nice to see an increasing number of articles (and even a book) about the subject, but I can't help wondering:

Should such encondings really belong on explicit table columns? Aren't the different encodings actually specialized indexes which ought to exist as such?

My thought is that it should be possible to create a table like

CREATE TABLE ancestor (
  fullname VARCHAR(100) PRIMARY KEY,
  ancestor_fullname VARCHAR(100)
);

-and then add an index:

CREATE INDEX anc_hier
  ON ancestor(ancestor_fullname CONNECT BY fullname);

The SQL standard (and DB2) offers a general, and powerful
(expression-wise) syntax for recursive queries: WITH RECURSIVE. That
syntax is probably too general to build indexes for, but a query type like Oracle's CONNECT BY should be able to make use of a specialized index for hierarchical queries.

The creation of the special CONNECT BY index could be implemented through a suitable encoding scheme like nested intervals or nested set, but the encodings should not be user visible, in the same way that the internals of b-trees in normal indexes aren't user visible.

I wonder why Oracle doesn't offer a (set of) suitable, specialized index type(s) to assist in hierarchical queries, now that they have the CONNECT BY feature.

Celko - if you are reading this:
You normally argue against explicit tuple numbering (auto-generated surrogate keys), and I agree on the principle. However, aren't the lft/rgt attributes in the nested set model actually instances of (specialized) tuple numbering?

I realize that hypothetical thoughts on how things could/should be done in RDBMSes sometimes conflict with potential implementation difficulties, backwards compatiblity issues, etc. - But since this is a theory-group, I hope someone is interested in discussing why hierarchical encoding models have to exist.

-- 
Greetings from Troels Arvin, Copenhagen, Denmark
Received on Tue Oct 05 2004 - 13:46:48 CEST

Original text of this message