Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: 1 Billion 11 Byte Words... Need to Check Uniqueness UsingOracle

Re: 1 Billion 11 Byte Words... Need to Check Uniqueness UsingOracle

From: Bryan W. Taylor <bryan_w_taylor_at_yahoo.com>
Date: 12 Feb 2002 14:22:44 -0800
Message-ID: <11d78c87.0202121422.7b65c7a8@posting.google.com>


nsouto_at_optushome.com.au.nospam (Nuno Souto) wrote in message news:<3c67996a.5023940_at_news-vip.optusnet.com.au>...
> Bryan W. Taylor doodled thusly:
>
> >> I'm also not convinced that the Oracle sort algorithm is very efficient.
> >
> >What are you basing this on? Sorting is a core competency for a
> >database application, so I'd be astounded if they don't do this pretty
> >well. I'm assuming that the data in question here is randomly chosen
>
> Well, you see: that was never a major concern with RDBMSs until sizes
> started getting very, very large.

Sorting a new found concern -- Huh? Sorting occurs and has always occured during "group by", index creation, "select distinct" and sort-merge joins, each of which has been core RDBMS functionality since the absolute beginning. To say the leading enterprise vendor doesn't have sorting as a "major concern" is asking me to believe that they don't care about the performance of their basic SQL operations. Sorry, I don't buy it.  

> So large that the uber argument of "hardware is cheap, just buy more"
> to solve performance design issues didn't apply any longer.
>
> Many years ago in non RDBMS days, relative sort performance was a BIG
> issue. It's coming back again. About time.

I think it pretty much disappeared because widespread knowledge of algorithms like quicksort and heapsort made it trivial to get optimal sort algorithm performance. Why do you think is "coming back".  

> I believe there is something in 9i to address that. As in: you can
> plug-in your own super-duper sort code if you want.

I'm not familiar with this. Could you be more specific?

Sorting is covered extensively in undergraduate algorithms classes. There are many well known O(n log n) sorting algorithms, and I'm sure Oracle uses them. Some algorithms have favorable properties when the data is not randomly distributed. Perhaps having a custom sort routine would be of a benefit in certain circumstances where the stock sorting doesn't exploit the particular pattern in the data.

For the problem at hand, I'm assuming that the data is randomly ordered, in which case I seriously doubt that you will improve on Oracle's default sorting algorithm.

If somebody want to keep arguing this, please provide references.

> I also think that while the max speed for serial sort algorithms is
> well known, for parallel sorts things are a little different. Maybe
> that's what Keith was referring to with the 10X factor?

Once again, let me repeat: the most important element in the process is the IO. In-memory sorting does not matter much relative to the time spent doing IO. Even if your sort algorithm does the impossible and beats oracles by 10X but your IO is 10% slower, you will probably lose. This whole sort issue is a red herring.

Don't take my word for it, take that of Steve Adams of Ixora, who is one of the most respected Oracle tuning experts and the author of the Oracle Internals O'Reilly book: "The CPU cost of sorting the sort runs and merging them is almost insignificant by comparison with the I/O cost of merging." see "Sort Performance" section at http://www.ixora.com.au/q+a/sorts.htm#end . Received on Tue Feb 12 2002 - 16:22:44 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US