Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: 1 Billion 11 Byte Words... Need to Check Uniqueness UsingOracle

Re: 1 Billion 11 Byte Words... Need to Check Uniqueness UsingOracle

From: Keith Boulton <kboulton_at_ntlworld.com>
Date: Sun, 10 Feb 2002 10:21:46 -0000
Message-ID: <6Mr98.30742$H37.3683670@news2-win.server.ntlworld.com>


I must admit, it's many years since I did this, but a dedicated sort program on a mainframe handled this sort of task more than 100x faster than Oracle.

Bryan W. Taylor <bryan_w_taylor_at_yahoo.com> wrote in message news:11d78c87.0202091151.3256e0f_at_posting.google.com...
> "Keith Boulton" <kboulton_at_ntlworld.com> wrote in message
news:<YH498.7607$YA2.1485257_at_news11-gui.server.ntli.net>...
>
> > > Surely some sort of file based sort program would be a lot cheaper if
you
> > don't.
> > >
> > And very, very much faster!
>
> Done correctly it would be faster, but not by as much as you think.
> You are underestimating the capabilities of oracle's multithreaded IO.
> It would not be a trivial programming task if you want to be
> competitive.
>
> The key is not to have to use disk more than is essential IO during
> the sort. Since you likely have more data than memory, you have to
> store it to disk and eventually read it back. This will be by far the
> slowest operation. IO managment on a multi-disk SMP machine is not
> trivial. Oracle has multithreaded IO built in - you'll have to write
> your own. If your program isn't making multiple disks read and write
> simultaneously, you'll lose.
>
> The method would essentially parallel the method I outlined for oracle
> to use: split into separate files of managable size based on a partial
> ordering hash. Then sort each file in memory and scan it for repeats.
>
> The only advantages you'll have over oracle are 1)oracle will put the
> data into blocks, which makes it expand and creates more IO and 2)
> your in-memory executable code will probably be smaller, thereby
> allowing more memory for the sort and potentially allowing you to use
> fewer pieces. Along these lines, if the data is printable characters,
> compression of the 11-byte words will probably help substantially.
Received on Sun Feb 10 2002 - 04:21:46 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US