Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Table Joins

RE: Table Joins

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 25 May 2006 05:59:35 -0400
Message-ID: <KNEIIDHFLNJDHOOCFCDKEEPOIBAA.mwf@rsiz.com>


That was a pretty slick way to benchmark the comparison filter costs.

For the original poster I'm wondering whether they are looking for the solution to a particular case or for a development guideline.

For a particular case, I'll echo the "measure it" sense of the thread.

For a development guideline, my prediction is that over a wide range of joins of a wide range of objects, that numbers will indeed win.

It appears from the results below that number have a little less than a 1 percent advantage on the comparison, and your mileage may vary. For example, if you used rownum+a quadrillion, then the character comparisons will probably take longer by a bigger margin. (Gints - if you do that on exactly your system for comparison that might be useful to the thread.)

Cross type comparisons, as Gints succintly demonstrated, are significantly more expensive.

Over the broad case of pre-designed queries and joins, you'll probably have indexes of proper types designed to support the joins, and the number of "keys" per block and row to block selectivity will probably dominate. Numbers are not always shorter than character fields, but usually they are in cases of large numbers of rows and reasonable row selectivity. If two indexes of roughly equal row to block cohesion have different depths due to size at the steady state number of rows you expect to have in the table, that will make a difference unless the join is traversing the leafs (leaves if you prefer).

So whatever produces a smaller overall key tends to be good.

Whatever tends to have correspondence to the order new rows are inserted into the database tends to be good, because that tends to result in good row to block cohesion. Generating a need for later periodic maintenance of row to block cohesion with respect to a join key at design time is a bad idea. (And in some specific expensive cases such periodic maintenance is one of the few remaining valid justifications for rebuild maintenance, meaning bad working hours for DBAs, which in turn is bad for your data since sleep deprivation is a leading cause of operator error.) Sequence generated artificial numeric keys, aside from any bugs in any versions or data accidents that may generate tree skewing that is curable in any case, have historically had excellent row to block cohesion.

Mark Bobak's advice will probably tend to get you good cohesiveness between row selection and block selection as well. Following the natural logic of a situation tends to do that, especially with dates. Also notice that if generating a numeric key as a proxy for a name means the query will be forced to do an extra lookup and pull an extra table into the join, that is not likely to be a win for an individual query where you measure joins, but it might well be an overall benefit to your relational model.

Jared's advice about using the tool probably tends to get you a small key, but I'm at a loss to predict whether it will tend to produce row to block cohesion. It certainly might, and in any case you don't always have a reason to suspect that one or another key will dominate queries in future use and that tool is unlikely to be worse than neutral.

If you decide to benchmark entire designs, be extra careful about accidents of sample size and row to block cohesion. If your data and keys may be frequently updated in your steady state system, the best row to block cohesion may also have the side effect of tending to generate hot blocks. Since the general cure for that is reducing the number of rows per block, making there tend to be more blocks to handle per query, that is not to be ignored.

All my musings about tendencies, of course, pale beside testing actuals to determine what is best when you have actuals to test.

Regards,

mwf

-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]On Behalf Of Gints Plivna
Sent: Thursday, May 25, 2006 2:42 AM
To: genegurevich_at_discoverfinancial.com
Cc: oracle-l
Subject: Re: Table Joins

I'd say that for such a comparison you probably don't even need anything to join.
For my very very simple test below you can see that difference for comparison of 5'877'112 rows two identical number columns needed mostly 01.02 sec but for the same comparison for the same values of varchar columns needed 01.03 sec.

BUT as you can imagine the worst case is to compare number to varchar2 because it takes 2 times more 02.04 sec.

So that's the real comparison that shoul be avoided.

Of course you can use Kyte's nice tool runstats to get more precise numbers for more indicators and get better picture of all that.

Gints Plivna

SQL> create table big as select rownum rn1, rownum rn2, to_char(rownum) rn3, to_char(rownum) rn4   2 from dba_source;

Table created.

Elapsed: 00:00:06.04

SQL> insert into big select * from big;

734639 rows created.

Elapsed: 00:00:04.09
SQL> / 1469278 rows created.

Elapsed: 00:00:09.06
SQL> / 2938556 rows created.

Elapsed: 00:00:19.04
SQL> select count(*) from big where rn1 = rn2;

  COUNT(*)


   5877112

Elapsed: 00:00:02.01
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:01.02
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:01.02
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:01.02
SQL> select count(*) from big where rn3 = rn4;

  COUNT(*)


   5877112

Elapsed: 00:00:01.03
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:01.03
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:01.03
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:01.03
SQL> select count(*) from big where rn1 = rn3;

  COUNT(*)


   5877112

Elapsed: 00:00:02.05
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:02.04
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:02.05
SQL> /   COUNT(*)


   5877112

Elapsed: 00:00:02.04

2006/5/25, genegurevich_at_discoverfinancial.com <genegurevich_at_discoverfinancial.com>:

> Hi all:
>
> One of my developers insists that joins by numerical fields result in
> better preformance than the joins by
> character fields. I don't remember reading much on this point. Is there
any
> truth in it?
>
> thank you
>
> Gene Gurevich

--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l Received on Thu May 25 2006 - 04:59:35 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US