Re: cyclical redundancy checksum algorithm(s)?

From: David Portas <REMOVE_BEFORE_REPLYING_dportas_at_acm.org>
Date: 27 Sep 2006 15:31:10 -0700
Message-ID: <1159396270.608584.312560_at_h48g2000cwc.googlegroups.com>


Karen Hill wrote:
> Tom Lane wrote:
> > "Karen Hill" <karen_hill22_at_yahoo.com> writes:
> > > Ralph Kimball states that this is a way to check for changes. You just
> > > have an extra column for the crc checksum. When you go to update data,
> > > generate a crc checksum and compare it to the one in the crc column.
> > > If they are same, your data has not changed.
> >
> > You sure that's actually what he said? A change in CRC proves the data
> > changed, but lack of a change does not prove it didn't.
>
>
> On page 100 in the book, "The Data Warehouse Toolkit" Second Edition,
> Ralph Kimball writes the following:
>
> "Rather than checking each field to see if something has changed, we
> instead compute a checksum for the entire row all at once. A cyclic
> redundancy checksum (CRC) algorithm helps us quickly recognize that a
> wide messy row has changed without looking at each of its constituent
> fields."
>
> On page 360 he writes:
>
> "To quickly determine if rows have changed, we rely on a cyclic
> redundancy checksum (CRC) algorithm. If the CRC is identical for the
> extracted record and the most recent row in the master table, then we
> ignore the extracted record. We don't need to check every column to be
> certain that the two rows match exactly."
>

Be careful with Kimball. Read him to get the industry argot but treat his ideas on design and implementation with some healthy scepticism. His bases are often shaky or obscure and sometimes just plain wrong.

-- 
David Portas
Received on Thu Sep 28 2006 - 00:31:10 CEST

Original text of this message