Re: TRM - Morbidity has set in, or not?

From: J M Davitt <jdavitt_at_aeneas.net>
Date: Sat, 13 May 2006 21:58:12 GMT
Message-ID: <UDs9g.25112$YI5.23858_at_tornado.ohiordc.rr.com>


Marshall Spight wrote:
> J M Davitt wrote:
>

>>Marshall Spight wrote:
>>
>>
>>>  I can't find
>>>anything to suggest that it's anything besides a traditional column
>>>store.
>>>Various parties, including FP himself, have on occasion said, "oh no,
>>>it's much more than that" but they don't back it up at all, so their
>>>claims are unevaluable.
>>
>>It *is* much more that a column store storage scheme.  I don't know
>>whether you've read a description of TRM, but it features (a) a
>>not-so-surprising ordered collection of observed values, (b) a mildly
>>clever permutation and inverse permutation index, and (c) a very clever
>>"record reconstruction table."

>
>
> Your paragraph above seems to me to be a pretty good description
> of a column store with a fully inverted index. My understanding
> is that these techniques are decades old.

Like the "inverted hierarchy?" Yes, that is old enough to be well known. The TRM difference is that a value in a column appears exactly once -- no matter ho many times it appears in the representation. A further point not made is that each value need appear only once in a domain. In other words, if there are many columns holding date values, with the same value appearing in not only one but many columns, the value need to be stored only once in TRM. This has huge significance: all values in a date domain covering hundreds of years require fewer than 100,000 values. Time-of-day precise to a second requires only 86,000 values. Given these domains, adding records to a system would require no new values -- the domains can be established before the first data arrive Social security numbers? There are far fewer that 10^9 possible. License plates numbers? What, 36^6 or 36^8 -- times 50? That's not a big gulp. Names? Far fewer than one might think. If domains such as these are enumerated before the system requiring the database is turned on, it could conceivably operate for years without seeing a "new" value. The benefit in the physical layer -- which is where all commercial products now have trouble when "big data" come to the party -- are is that space required for storing values becomes a mere tiny fraction of what modern systems require. 1/1,000,000 is not an unreasonable expectation.

> Now, it is possible that "transrelational" is something more than
> that, or not; I have no way of evaluating any given statement
> about it. So far.
>
>
> Marshall
>
Received on Sat May 13 2006 - 23:58:12 CEST

Original text of this message