Re: 500 million rows every (8 hour) day

From: hpuxrac <johnbhurley_at_sbcglobal.net>
Date: Tue, 24 Nov 2009 16:31:40 -0800 (PST)
Message-ID: <a9e2e355-e8ba-473f-9176-5f99b12cae91_at_d21g2000yqn.googlegroups.com>



On Nov 24, 10:25 am, Mark D Powell <Mark.Powe..._at_hp.com> wrote:

snip

> Richard, I think the answer depends on what and how the data will be
> used for.  That is how will the data be queried?  Will it permanently
> reside in the database or is the database just a holding point till
> the data is filtered and transferred to its pemanent home?  Is the
> initial insert done to the data's final store within the database or
> is it moved within the database?
>
> I am not a big fan of using sqlldr direct path loads in a production
> environment.  If a direct path load job fails then the indexes become
> unusable and require rebuilding.  Having to rebuild large indexes on a
> massive single point of insertion table pretty much brings the system
> to a halt.  If there will be multiple, concurrent sources of data
> intput I would insist on using convention load or using programs
> written to use bulk inserts or the older array insert feature of pro*
> languages.

What the OP seemed to describe is a daily load cycle and if I understood it correctly I would seriously think about using some kind of daily staging table that accumulates the input. It possibly could have no real useful indexes on it ... just have data dumped into it ( depending on what and where validation is needed ) ... then near the end of the cycle use it to go elsewhere.

Partitioning might be a possibility. Inserting 500 million rows over 8 hours is pretty trivial on even most modest servers.

Designing properly the storage structures to handle 6 months of this stuff then ditch it ( perhaps ) in some fashion or move it out elsewhere to me sounds like where the business rules would really need to be understood and clarified. Received on Tue Nov 24 2009 - 18:31:40 CST

Original text of this message