Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: How to store 50 Terabytes per day?

Re: How to store 50 Terabytes per day?

From: Tim Gorman <tim_at_sagelogix.com>
Date: Thu, 09 Sep 2004 19:51:06 -0600
Message-ID: <BD6664AA.1BFF3%tim@sagelogix.com>


Jaromir,

> Hi Tom,
>

Some call me.....Tim? Are you looking for A GRR-R-R-R-RAIL?

>> ETL would have to consist of inserts only once, no updates or deletes EVER
>> Current-image data would have to be implemented as either views or
>> materialized views, if desired...

>
> I completely agree with this strategy but in my opinion there are two
> problems to be solved:
>
> a) the dealing with rolling window - e.g. the data are held for 7 years, but
> the "current image" should contain complete data. This is not a trivial
> task , for example using materialized views a drop of an old partition
> should have no influence to the materialized view (may be not so difficult)
> and a complete refresh of a materialized view must remain possible (may be
> more difficult).

The most valuable feature of MVs is query rewrite. In fact, I would argue that there is no other reason to use MVs (please let's not argue though -- obviously there are other cool uses like DBMS_REDEFINITION). Thus, other features such as fast refresh are merely "nice-to-have" which help you achieve query rewrite. Query rewrite is the "sizzle" in this "steak"...

I have no hesitation in not using any of the MV refresh mechanisms provided by Oracle. If they work, fine. If they do not, then hello to custom-built fast refreshes...

>
> b) the validation and cleaning make the "insert only once" more complicated.
> I mean the enforcing of business rules beyond of primary and foreign keys.
> Those actions can be moved in current-image views, but this is for sure not
> the desired solution (validation will be repeated for each query).
>

Depends on how much validation and cleaning you have to do. It also depends on how you do it. Some ETL tools and methodologies are quite wasteful from an I/O perspective; some are less so.

Such a high rate of insertion (50Tb/day implies an average of 600Mb/sec) presses hard against the throughput limits of most storage units available today. Top of the line 15000rpm hard drives usually provide sustained external throughput of about 100 Mb/sec, so where are we at when we are talking averaging 600 Mb/sec, all day, every day? Just for writes, never mind reads?

So, one insert for each row only, one pass at indexing, and one chance to "analyze". Massive RAID-0 striping to maximize throughput across devices. RAID-1 for redundancy -- no foolish talk about RAID-"F" or "caches"...

Even with the insertion of the data once, we're already past the throughput capacity of most storage arrays, never mind additional like undo and redo, so direct-path and nologging would be key. And you really can't have direct-path without using EXCHANGE PARTITION. Set the tablespaces to READ ONLY as soon as possible after load and get 'em backed up.

Whatever throughput is left over must be made available for queries by end-users...

Hope this helps...

-Tim

--
To unsubscribe - mailto:oracle-l-request_at_freelists.org&subject=unsubscribe 
To search the archives - http://www.freelists.org/archives/oracle-l/
Received on Thu Sep 09 2004 - 23:16:50 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US