Re: Timestamp dimensions in a data warehouse

From: Jared Still <jkstill_at_gmail.com>
Date: Fri, 22 Apr 2011 09:45:17 -0700
Message-ID: <BANLkTi=PiPEFNyFgY_+o4__0sxiCCd-s_g_at_mail.gmail.com>



In addition, I don't really think that performance is the sole consideration in deciding to use a timestamp dimension.

The timestamp dimension makes it possible to easily create queries that would otherwise be difficult to do.

Jared Still
Certifiable Oracle DBA and Part Time Perl Evangelist Oracle Blog: http://jkstill.blogspot.com Home Page: http://jaredstill.com

On Thu, Apr 21, 2011 at 12:23 PM, Mark W. Farnham <mwf_at_rsiz.com> wrote:

> Point well made. Jared's as well in the slightly earlier post.
>
> There remain things too small to justify buying partitioning. There are
> things too big to do reasonably without partitioning. And there can be a
> need for both partition pruning AND variable time granularity within a
> reasonable partitioning granularity. The most productive route for a given
> situation may be indexing alone, partitioning alone, or both partitioning
> and indexing together. You can probably model a reasonable tradeoff guess
> amongst those three possibilities that will be good for a few years.
>
> But there is no magic size evaluation you can do on today's hardware that
> will stand the test of time (yes, that is an intentional pun). But don't
> worry too much about that: You'll have to reload your data into quantum dna
> multidimensional nanotechnology media real soon now anyway.
>
> mwf
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of Greg Rahn
> Sent: Wednesday, April 20, 2011 8:32 PM
> To: peter.schauss_at_ngc.com
> Cc: oracle-l_at_freelists.org
> Subject: Re: Timestamp dimensions in a data warehouse
>
> I think the usage of the data (questions/queries) should influence the
> design here more than a text book, but there is truth to Kimball's logic
> and
> design. The challenge I see here is the granularity level.
> If you have intervals of less than 1 minute, will you "round down" to whole
> seconds? If so, consider how much data you will have and how large that
> time dimension will be - 31,556,926 rows per year as that is how many
> seconds there are - that makes a pretty large time dimension.
>
> In this case, using an index isn't the modern, big data way, partition
> elimination is. So your fact table should be partitioned at appropriate
> granule size on the time key column, whether that be an integer date key or
> timestamp type.
>
> On Wed, Apr 20, 2011 at 11:35 AM, Schauss, Peter (IT Solutions)
> <peter.schauss_at_ngc.com> wrote:
> > I am in the early stages of designing a data warehouse intended to store
> what I will call metrics and diagnostic data. The granularity is, most
> likely, going to be at intervals of less than a minute. If I follow what I
> understand to be the classical data warehouse approach, I should have a
> dimension table with one row for each time interval instead of simply
> adding
> a timestamp column to the fact table. In Ralph Kimball's <The Data
> Warehouse Toolkit> the author justifies this approach by asserting that
> "most databases don't index SQL date calculations, so queries constraining
> on an SQL -calculated field wouldn't take advantage of an index".
> >
> > Is this statement still valid?
> >
> > Would there be other reasons for putting the date in a dimension table?
>
> --
> Regards,
> Greg Rahn
> http://structureddata.org
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Apr 22 2011 - 11:45:17 CDT

Original text of this message