Re: Database design

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sat, 25 Feb 2006 14:41:21 +0100
Message-ID: <44005e37$0$11062$e4fe514c_at_news.xs4all.nl>


Mark Johnson wrote:
> mAsterdam wrote:
>
>

>>Mark Johnson wrote:
>>
>>>But there are important considerations
>>>addressed by database builders having to
>>>do with integrity (floating, loose records), and simply the rate at
>>>which queries can be performed. That was the reason for many looking
>>>to an explict/full/materialized path for such tables which otherwise
>>>violate the basics of the RM. The idea was that tables derived from
>>>the RM would not be ordered. But these hierarchical tables are
>>>explicitly ordered and by self-referential links. It's even worse than
>>>mere 'storage order'. And if one were to suggest - but wait, it's
>>>basically a bunch of normalized tables all overlayed on the same
>>>logical table for convenience, and because it lends itself easily to
>>>automated programmatic control - they'd probably be having none of
>>>that, particularly because one is not constrained to view any
>>>particular list, but could pick any tree, or the table as a whole,
>>>intentionally or by mistake.

>
>

>>I guess that if I spend some time on this, I may extract some sensible >>statements out of this.

In the labour division between one writer and many readers it be better that the writer do the extraction work.

> Are you saying that you find the above, confusing?

No, confused. As I said (you deleted it, so I'll quote):

>> In the labour division between one writer and >> many readers it be better that the writer do the extraction work.

> Let me break it down:
>
> "there are important considerations addressed by database builders
> having to do with integrity (floating, loose records), and simply the
> rate at which queries can be performed."
>
> It is recommended that SQL be used to perform the basic operations on
> the dataset. Some object model coding, instead, might be used with
> these hierarchical, self-referential table schemes, which was the
> subject, here. That could introduce error. One sort of error might
> have to do with integrity, which typically refers to a loss of
> reference among particular elements; something called 'dangling
> records', or such. And more importantly, in order to reconstruct the
> structure, an access path must be 'chased' which can take time,
> depending on the speed of the routine which performs what would be
> this frequent, routine function. Oracle, for example, provides such an
> extension called, Connect By, and related extensions.

Yep. Are you saying that to deal with hierarchies in SQL databases you have to put some effort into it? Some of the contributors to this newsgroup wrote articles and even books about it.

> "That was the reason for many looking to an explict/full/materialized
> path for such tables which otherwise violate the basics of the RM."
>
> In order to avoid the use of such 'chasing' functions, many have
> suggested using an explicit path. One way to preserve storage is to
> minimally represent that path as the concatenation of sort values. If
> a list is a super-tree, a tree above or superior to another, contained
> four items in order, with an attribute to record that ordering, then
> the first position on the path would be that number, taken from that
> attribute. And that path itself would be stored in another attribute,
> which could also serve as a primary key. A node on its subtree, also
> in a particular order, would take the number just found for the
> superior node, add a dot or slash or whatever, some separator, and
> then it's own sort value, for its primary key, to continue describing
> the explict path. As for the basics of the RM, the very idea of
> sorting the relation, or relationship, could well not correspond to
> either the notion of a relation or relationship. For while the row
> might in some sense correspond to an ordered tuple, and as represented
> in a grid it would be placed in one column of the same domain, and not
> another, the tuples or elements themselves are not supposed to be in
> any particular order. That's the point of contention.

The subject is interesting, but this is not the way to discuss it. I'm getting bored of having to guess what you mean.

>>>Proponents of the RM against such schemes would certainly argue that
>>>they only have in mind the reduction of confusion, of error, and an
>>>increase in the speed and simplicity of retrieval of the data by
>>>whatever ordering. And if Connect By is burdensome or slow, even just
>>>in that they have a point, never mind that it is proprietary. Perhaps
>>>if it were as easy to transparently create tables without overhead or
>>>explicit design, as it is to add subtrees and leaves to a hierarchies,
>>>it might be less of an issue. But even in a well-known organization,
>>>with few anticipated changes in structure and order, surely some
>>>things come up, if just forced on them by changes made by large
>>>clients or vendors. I think that's why Oracle does have a Connect By,
>>>in the first place. And I'm sure many companies get a lot of use out
>>>of it.

>
>
>>Idem.

>
>

 > You don't understand?

I may, I may not. I might extract some sense out of it if I tried. >> In the labour division between one writer and many readers >> it be better that the writer do the extraction work.

> Let me break it down for you:
>
> "Proponents of the RM against such schemes would certainly argue that
> they only have in mind the reduction of confusion, of error, and an
> increase in the speed and simplicity of retrieval of the data by
> whatever ordering."
>
> If that's not self-explanatory, I don't know what to add.

You might be more specific as to which schemes you are referring to than just "such schemes".
You might add some quotes by proponents of the RM against such schemes.

> "And if Connect By is burdensome or slow, even just in that they have
> a point, never mind that it is proprietary."
>
> Again, this is found as a proprietary feature of Oracle. A) if it
> proved to be slow in certain cases, in chasing down those paths, then
> if another design proved faster, that would generally be considered an
> advantage - a good thing. B) it is proprietary, and some consider that
> unnecessarily limiting and objectionable.

You might want to search this newsgroup for "Hierarchy as up constraint".

> "Perhaps if it were as easy to transparently create tables without
> overhead or explicit design, as it is to add subtrees and leaves to a
> hierarchies, it might be less of an issue."
>
> That is, there is a design effort, a rework effort, required to modify
> an existing scheme of tables and foreign keys as the structure may
> need to be rearranged. But to rearrange one of these self-referencing
> tables, built on a database supposedly inspired by the RM, there
> exists the row, uniquely identified, but in a structure easily changed
> by simply adding, deleting, relocating or resorting on a tree display,
> and which would intuitively likely correspond closely to the
> organization or structure of the data being retained.

Again, are you saying that to deal with hierarchies in SQL databases you have to put some effort into it? Some of the contributors to this newsgroup wrote articles and even books about it.

> "But even in a well-known organization, with few anticipated changes
> in structure and order, surely some things come up, if just forced on
> them by changes made by large clients or vendors. I think that's why
> Oracle does have a Connect By, in the first place. And I'm sure many
> companies get a lot of use out of it."
>
> That is, as a practical matter, as with whether Connect By is slow, or
> suitable, many things cannot be anticipated. Ideally, they could be
> foreseen. But as a practical matter, not. The vendors may completely
> revamp their catalog. The client may require a restructuring of data,
> depending on what you are doing for them. And perhaps that is why some
> would turn to the easier, but problematic, self-referencing tables, at
> some point, or just in part. That is, there's a reason Oracle included
> this. And I stated my opinion that I suspect it gets a lot of use, in
> companies and projects of all sorts. Don't read into it that I think
> it's a good thing or a bad thing. It just is.

I don't know how widespread the use of the Oracle's 'connect by' construct is. In my personal experience it is not widely used. I know a few hundred oracle instances, and there are just a few teams aware of it. I did see some experiments, but non of them led to use of it in production systems.

> Look, if you think this isn't helpful, then fine.

Yes, it is. Part of it, at least. You might notice that when I /do/ understand what you are saying, that I do try to provide a reasonable answer.

> I tried to make
> suggestions for your glossary, as well.

It is not my glossary. I have tried several times to have the maintenance done by somebody else. However, because several nice contributors reassured me that they appreciate having it around, by lack of another volonteer, I am still doing it.

> If you think it isn't helpful,
> then so be it. But I don't really see how you were that confused by
> what I had written above. It just wasn't that complicated.
> Nonetheless, there you have it.

Thanks for the effort.
I am pretty sure that I am not the only one who finds it difficult to get what you mean from what you write. Received on Sat Feb 25 2006 - 14:41:21 CET

Original text of this message