Re: Database design

From: Mark Johnson <102334.12_at_compuserve.com>
Date: Sat, 25 Feb 2006 04:48:51 -0800
Message-ID: <5oi0021jrtddqlqrohdi1b4dcud62k3tdl_at_4ax.com>


mAsterdam <mAsterdam_at_vrijdag.org> wrote:

>Mark Johnson wrote:
> > But there are important considerations
> > addressed by database builders having to
> > do with integrity (floating, loose records), and simply the rate at
> > which queries can be performed. That was the reason for many looking
> > to an explict/full/materialized path for such tables which otherwise
> > violate the basics of the RM. The idea was that tables derived from
> > the RM would not be ordered. But these hierarchical tables are
> > explicitly ordered and by self-referential links. It's even worse than
> > mere 'storage order'. And if one were to suggest - but wait, it's
> > basically a bunch of normalized tables all overlayed on the same
> > logical table for convenience, and because it lends itself easily to
> > automated programmatic control - they'd probably be having none of
> > that, particularly because one is not constrained to view any
> > particular list, but could pick any tree, or the table as a whole,
> > intentionally or by mistake.

>I guess that if I spend some time on this, I may extract some sensible
>statements out of this.

Are you saying that you find the above, confusing? Let me break it down:

"there are important considerations addressed by database builders having to do with integrity (floating, loose records), and simply the rate at which queries can be performed."

It is recommended that SQL be used to perform the basic operations on the dataset. Some object model coding, instead, might be used with these hierarchical, self-referential table schemes, which was the subject, here. That could introduce error. One sort of error might have to do with integrity, which typically refers to a loss of reference among particular elements; something called 'dangling records', or such. And more importantly, in order to reconstruct the structure, an access path must be 'chased' which can take time, depending on the speed of the routine which performs what would be this frequent, routine function. Oracle, for example, provides such an extension called, Connect By, and related extensions.

"That was the reason for many looking to an explict/full/materialized path for such tables which otherwise violate the basics of the RM."

In order to avoid the use of such 'chasing' functions, many have suggested using an explicit path. One way to preserve storage is to minimally represent that path as the concatenation of sort values. If a list is a super-tree, a tree above or superior to another, contained four items in order, with an attribute to record that ordering, then the first position on the path would be that number, taken from that attribute. And that path itself would be stored in another attribute, which could also serve as a primary key. A node on its subtree, also in a particular order, would take the number just found for the

superior node, add a dot or slash or whatever, some separator, and
then it's own sort value, for its primary key, to continue describing
the explict path. As for the basics of the RM, the very idea of
sorting the relation, or relationship, could well not correspond to either the notion of a relation or relationship. For while the row might in some sense correspond to an ordered tuple, and as represented in a grid it would be placed in one column of the same domain, and not another, the tuples or elements themselves are not supposed to be in any particular order. That's the point of contention.

> > Proponents of the RM against such schemes would certainly argue that
> > they only have in mind the reduction of confusion, of error, and an
> > increase in the speed and simplicity of retrieval of the data by
> > whatever ordering. And if Connect By is burdensome or slow, even just
> > in that they have a point, never mind that it is proprietary. Perhaps
> > if it were as easy to transparently create tables without overhead or
> > explicit design, as it is to add subtrees and leaves to a hierarchies,
> > it might be less of an issue. But even in a well-known organization,
> > with few anticipated changes in structure and order, surely some
> > things come up, if just forced on them by changes made by large
> > clients or vendors. I think that's why Oracle does have a Connect By,
> > in the first place. And I'm sure many companies get a lot of use out
> > of it.

>Idem.

You don't understand? Let me break it down for you:

"Proponents of the RM against such schemes would certainly argue that they only have in mind the reduction of confusion, of error, and an increase in the speed and simplicity of retrieval of the data by whatever ordering."

If that's not self-explanatory, I don't know what to add.

"And if Connect By is burdensome or slow, even just in that they have a point, never mind that it is proprietary."

Again, this is found as a proprietary feature of Oracle. A) if it proved to be slow in certain cases, in chasing down those paths, then if another design proved faster, that would generally be considered an advantage - a good thing. B) it is proprietary, and some consider that unnecessarily limiting and objectionable.

"Perhaps if it were as easy to transparently create tables without overhead or explicit design, as it is to add subtrees and leaves to a hierarchies, it might be less of an issue."

That is, there is a design effort, a rework effort, required to modify an existing scheme of tables and foreign keys as the structure may need to be rearranged. But to rearrange one of these self-referencing tables, built on a database supposedly inspired by the RM, there exists the row, uniquely identified, but in a structure easily changed by simply adding, deleting, relocating or resorting on a tree display, and which would intuitively likely correspond closely to the organization or structure of the data being retained.

"But even in a well-known organization, with few anticipated changes in structure and order, surely some things come up, if just forced on them by changes made by large clients or vendors. I think that's why Oracle does have a Connect By, in the first place. And I'm sure many companies get a lot of use out of it."

That is, as a practical matter, as with whether Connect By is slow, or suitable, many things cannot be anticipated. Ideally, they could be foreseen. But as a practical matter, not. The vendors may completely revamp their catalog. The client may require a restructuring of data, depending on what you are doing for them. And perhaps that is why some would turn to the easier, but problematic, self-referencing tables, at some point, or just in part. That is, there's a reason Oracle included this. And I stated my opinion that I suspect it gets a lot of use, in companies and projects of all sorts. Don't read into it that I think it's a good thing or a bad thing. It just is.

Look, if you think this isn't helpful, then fine. I tried to make suggestions for your glossary, as well. If you think it isn't helpful, then so be it. But I don't really see how you were that confused by what I had written above. It just wasn't that complicated. Nonetheless, there you have it. Received on Sat Feb 25 2006 - 13:48:51 CET

Original text of this message