Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: 1 Billion 11 Byte Words... Need to Check Uniqueness UsingOracle
I must admit, it's many years since I did this, but a dedicated sort program
on a mainframe handled this sort of task more than 100x faster than Oracle.
Bryan W. Taylor <bryan_w_taylor_at_yahoo.com> wrote in message
news:11d78c87.0202091151.3256e0f_at_posting.google.com...
> "Keith Boulton" <kboulton_at_ntlworld.com> wrote in message
news:<YH498.7607$YA2.1485257_at_news11-gui.server.ntli.net>...
>
> > > Surely some sort of file based sort program would be a lot cheaper if
you
> > don't.
> > >
> > And very, very much faster!
>
> Done correctly it would be faster, but not by as much as you think.
> You are underestimating the capabilities of oracle's multithreaded IO.
> It would not be a trivial programming task if you want to be
> competitive.
>
> The key is not to have to use disk more than is essential IO during
> the sort. Since you likely have more data than memory, you have to
> store it to disk and eventually read it back. This will be by far the
> slowest operation. IO managment on a multi-disk SMP machine is not
> trivial. Oracle has multithreaded IO built in - you'll have to write
> your own. If your program isn't making multiple disks read and write
> simultaneously, you'll lose.
>
> The method would essentially parallel the method I outlined for oracle
> to use: split into separate files of managable size based on a partial
> ordering hash. Then sort each file in memory and scan it for repeats.
>
> The only advantages you'll have over oracle are 1)oracle will put the
> data into blocks, which makes it expand and creates more IO and 2)
> your in-memory executable code will probably be smaller, thereby
> allowing more memory for the sort and potentially allowing you to use
> fewer pieces. Along these lines, if the data is printable characters,
> compression of the 11-byte words will probably help substantially.
Received on Sun Feb 10 2002 - 04:21:46 CST