Re: 1 Billion 11 Byte Words... Need to Check Uniqueness UsingOracle

From: Connor McDonald <connor_mcdonald_at_yahoo.com>
Date: Tue, 12 Feb 2002 22:49:41 +0000
Message-ID: <3C699C05.1A5D@yahoo.com>

Bryan W. Taylor wrote:
>
> nsouto_at_optushome.com.au.nospam (Nuno Souto) wrote in message news:<3c67996a.5023940_at_news-vip.optusnet.com.au>...
> > Bryan W. Taylor doodled thusly:
> >
> > >> I'm also not convinced that the Oracle sort algorithm is very efficient.
> > >
> > >What are you basing this on? Sorting is a core competency for a
> > >database application, so I'd be astounded if they don't do this pretty
> > >well. I'm assuming that the data in question here is randomly chosen
> >
> > Well, you see: that was never a major concern with RDBMSs until sizes
> > started getting very, very large.
>
> Sorting a new found concern -- Huh? Sorting occurs and has always
> occured during "group by", index creation, "select distinct" and
> sort-merge joins, each of which has been core RDBMS functionality
> since the absolute beginning. To say the leading enterprise vendor
> doesn't have sorting as a "major concern" is asking me to believe that
> they don't care about the performance of their basic SQL operations.
> Sorry, I don't buy it.
>
> > So large that the uber argument of "hardware is cheap, just buy more"
> > to solve performance design issues didn't apply any longer.
> >
> > Many years ago in non RDBMS days, relative sort performance was a BIG
> > issue. It's coming back again. About time.
>
> I think it pretty much disappeared because widespread knowledge of
> algorithms like quicksort and heapsort made it trivial to get optimal
> sort algorithm performance. Why do you think is "coming back".
>
> > I believe there is something in 9i to address that. As in: you can
> > plug-in your own super-duper sort code if you want.
>
> I'm not familiar with this. Could you be more specific?
>
> Sorting is covered extensively in undergraduate algorithms classes.
> There are many well known O(n log n) sorting algorithms, and I'm sure
> Oracle uses them. Some algorithms have favorable properties when the
> data is not randomly distributed. Perhaps having a custom sort routine
> would be of a benefit in certain circumstances where the stock sorting
> doesn't exploit the particular pattern in the data.
>
> For the problem at hand, I'm assuming that the data is randomly
> ordered, in which case I seriously doubt that you will improve on
> Oracle's default sorting algorithm.
>
> If somebody want to keep arguing this, please provide references.
>
> > I also think that while the max speed for serial sort algorithms is
> > well known, for parallel sorts things are a little different. Maybe
> > that's what Keith was referring to with the 10X factor?
>
> Once again, let me repeat: the most important element in the process
> is the IO. In-memory sorting does not matter much relative to the time
> spent doing IO. Even if your sort algorithm does the impossible and
> beats oracles by 10X but your IO is 10% slower, you will probably
> lose. This whole sort issue is a red herring.
>
> Don't take my word for it, take that of Steve Adams of Ixora, who is
> one of the most respected Oracle tuning experts and the author of the
> Oracle Internals O'Reilly book: "The CPU cost of sorting the sort runs
> and merging them is almost insignificant by comparison with the I/O
> cost of merging." see "Sort Performance" section at
> http://www.ixora.com.au/q+a/sorts.htm#end .

Not that I'm trying to buy into the importance or otherwise of the sort phase, but there are a number of products out there that can outperform Oracle in purely a sort operation. SyncSort is a popular one - it sorts on flat files and will (substantially) outperform the equivalent Oracle sort on a table. Of course, it doesn't have to worry about read consistency, row overhead, block overhead, blah blah..

hth
connor

-- 
==============================
Connor McDonald

http://www.oracledba.co.uk

"Some days you're the pigeon, some days you're the statue..."

Received on Tue Feb 12 2002 - 16:49:41 CST