Re: TRM - Morbidity has set in, or not?

From: x <x_at_not-exists.org>
Date: Mon, 15 May 2006 15:16:57 +0300
Message-ID: <e49rbk$31t$1_at_emma.aioe.org>


"J M Davitt" <jdavitt_at_aeneas.net> wrote in message news:UDs9g.25112$YI5.23858_at_tornado.ohiordc.rr.com...
> Marshall Spight wrote:
> > J M Davitt wrote:
> >
> >>Marshall Spight wrote:
> >>
> >>
> >>> I can't find
> >>>anything to suggest that it's anything besides a traditional column
> >>>store.
> >>>Various parties, including FP himself, have on occasion said, "oh no,
> >>>it's much more than that" but they don't back it up at all, so their
> >>>claims are unevaluable.
> >>
> >>It *is* much more that a column store storage scheme. I don't know
> >>whether you've read a description of TRM, but it features (a) a
> >>not-so-surprising ordered collection of observed values, (b) a mildly
> >>clever permutation and inverse permutation index, and (c) a very clever
> >>"record reconstruction table."
> >
> >
> > Your paragraph above seems to me to be a pretty good description
> > of a column store with a fully inverted index. My understanding
> > is that these techniques are decades old.

> Like the "inverted hierarchy?" Yes, that is old enough to be well
> known. The TRM difference is that a value in a column appears
> exactly once -- no matter ho many times it appears in the
> representation. A further point not made is that each value need
> appear only once in a domain. In other words, if there are many
> columns holding date values, with the same value appearing in not
> only one but many columns, the value need to be stored only once in
> TRM.

> This has huge significance: all values in a date domain covering
> hundreds of years require fewer than 100,000 values. Time-of-day
> precise to a second requires only 86,000 values. Given these domains,
> adding records to a system would require no new values -- the domains
> can be established before the first data arrive Social security
> numbers? There are far fewer that 10^9 possible. License plates
> numbers? What, 36^6 or 36^8 -- times 50? That's not a big gulp.
> Names? Far fewer than one might think. If domains such as these
> are enumerated before the system requiring the database is turned on,
> it could conceivably operate for years without seeing a "new" value.
> The benefit in the physical layer -- which is where all commercial
> products now have trouble when "big data" come to the party -- are
> is that space required for storing values becomes a mere tiny fraction
> of what modern systems require. 1/1,000,000 is not an unreasonable
> expectation.

What about all the combinations of the domain values ? Is not this called compression (to ignition :) ? Received on Mon May 15 2006 - 14:16:57 CEST

Original text of this message