Re: Timestamp dimensions in a data warehouse

From: Greg Rahn <greg_at_structureddata.org>
Date: Wed, 20 Apr 2011 17:31:55 -0700
Message-ID: <BANLkTimA8RtRHeG0_NxXStDwKfftuZmN0Q_at_mail.gmail.com>



I think the usage of the data (questions/queries) should influence the design here more than a text book, but there is truth to Kimball's logic and design. The challenge I see here is the granularity level. If you have intervals of less than 1 minute, will you "round down" to whole seconds? If so, consider how much data you will have and how large that time dimension will be - 31,556,926 rows per year as that is how many seconds there are - that makes a pretty large time dimension.

In this case, using an index isn't the modern, big data way, partition elimination is. So your fact table should be partitioned at appropriate granule size on the time key column, whether that be an integer date key or timestamp type.

On Wed, Apr 20, 2011 at 11:35 AM, Schauss, Peter (IT Solutions) <peter.schauss_at_ngc.com> wrote:
> I am in the early stages of designing a data warehouse intended to store what I will call metrics and diagnostic data.  The granularity is, most likely, going to be at intervals of less than a minute.  If I follow what I understand to be the classical data warehouse approach, I should have a dimension table with one row for each time interval instead of simply adding a timestamp column to the fact table.  In Ralph Kimball's <The Data Warehouse Toolkit> the author justifies this approach by asserting that "most databases don't index SQL date calculations, so queries constraining on an SQL -calculated field wouldn't take advantage of an index".
>
> Is this statement still valid?
>
> Would there be other reasons for putting the date in a dimension table?

-- 
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Apr 20 2011 - 19:31:55 CDT

Original text of this message