Re: TRM - Morbidity has set in, or not?

From: J M Davitt <jdavitt_at_aeneas.net>
Date: Mon, 15 May 2006 22:16:00 GMT
Message-ID: <A47ag.25500$YI5.17664_at_tornado.ohiordc.rr.com>


Paul Mansour wrote:
> J M Davitt wrote regarding inverted indexes:
>
>

>>This has huge significance: all values in a date domain covering
>>hundreds of years require fewer than 100,000 values.  Time-of-day
>>precise to a second requires only 86,000 values.  Given these domains,
>>adding records to a system would require no new values -- the domains
>>can be established before the first data arrive

>
>
> But wouldn't adding a row require an index for each (indexed) column?
> In the case of a date domain, the index value will in all likelyhood be
> the same size as the date value it points to. Correct? Or am I missing
> something? How, in the case of a date or time domain, does this have
> "huge significance"?
>

In typical physical implementations, adding a value to an indexed column results in adding the value to whatever represents the row - or column - as well as adding an entry to an index. As an example: Date of Birth for driver's license holders in New York State: there are probably, what, 10 million current licenses? I'd guess that there are fewer than 35,000 distinct Dates of Birth in that population. In that case, there are more than 9,000,000 "extra" copies of the same date values in the row representations.

If dates are maintained in a domain, they should well include Date of Issue and Expiration Date -- maybe Date Suspended, too. If these were all indexed columns and each index was maintained "in order" - which would certainly be the case if the index were to be at all useful - the typical implementation's physical storage would be littered with the same values in several places. (What are we up to: 27 million observations of some 36,000 values?)

TRM, on the other hand, would maintain exactly one ordered set of values for the domain and everything referencing the same date would refer to the same value. Indices aren't really needed. Index maintenance - the dreaded B-tree "rotate the root" operation - would never occur. Sure, as birth dates are corrected and licenses are renewed, the value a given record refers to would change -- but the values remain undisturbed and there's no need for index maintenance.

There is, of course, a trade-off: the record reconstruction table has to be maintained. That's significant work and the techniques for doing it efficiently are, AFAIK, a closely-held secret. (Not everything' s covered by the patent, you know. When you apply for a patent, you have to tell the world how you did it. Some of the most profitable industrial secrets are not patented.) Received on Tue May 16 2006 - 00:16:00 CEST

Original text of this message