RE: Timestamp dimensions in a data warehouse

From: Mark W. Farnham <>
Date: Thu, 21 Apr 2011 15:23:01 -0400
Message-ID: <0a1101cc0059$88d4f990$9a7eecb0$>

Point well made. Jared's as well in the slightly earlier post.

There remain things too small to justify buying partitioning. There are things too big to do reasonably without partitioning. And there can be a need for both partition pruning AND variable time granularity within a reasonable partitioning granularity. The most productive route for a given situation may be indexing alone, partitioning alone, or both partitioning and indexing together. You can probably model a reasonable tradeoff guess amongst those three possibilities that will be good for a few years.

But there is no magic size evaluation you can do on today's hardware that will stand the test of time (yes, that is an intentional pun). But don't worry too much about that: You'll have to reload your data into quantum dna multidimensional nanotechnology media real soon now anyway.


-----Original Message-----
From: [] On Behalf Of Greg Rahn
Sent: Wednesday, April 20, 2011 8:32 PM
Subject: Re: Timestamp dimensions in a data warehouse

I think the usage of the data (questions/queries) should influence the design here more than a text book, but there is truth to Kimball's logic and design. The challenge I see here is the granularity level. If you have intervals of less than 1 minute, will you "round down" to whole seconds? If so, consider how much data you will have and how large that time dimension will be - 31,556,926 rows per year as that is how many seconds there are - that makes a pretty large time dimension.

In this case, using an index isn't the modern, big data way, partition elimination is. So your fact table should be partitioned at appropriate granule size on the time key column, whether that be an integer date key or timestamp type.

On Wed, Apr 20, 2011 at 11:35 AM, Schauss, Peter (IT Solutions) <> wrote:
> I am in the early stages of designing a data warehouse intended to store
what I will call metrics and diagnostic data.  The granularity is, most likely, going to be at intervals of less than a minute.  If I follow what I understand to be the classical data warehouse approach, I should have a dimension table with one row for each time interval instead of simply adding a timestamp column to the fact table.  In Ralph Kimball's <The Data Warehouse Toolkit> the author justifies this approach by asserting that "most databases don't index SQL date calculations, so queries constraining on an SQL -calculated field wouldn't take advantage of an index".
> Is this statement still valid?
> Would there be other reasons for putting the date in a dimension table?

Greg Rahn

Received on Thu Apr 21 2011 - 14:23:01 CDT

Original text of this message